Introductory Econometrics IGNOU
Introductory Econometrics IGNOU
INTRODUCTORY ECONOMETRICS
Unit 11 Heteroscedasticity
Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
Unit 12 Autocorrelation
Block 5 Econometric Model Specification and Diagnostic Testing
Unit 13 Model Selection Criteria Dr. Sahba Fatima, Independent Researcher, Lucknow
Unit 14 Tests for Specification Errors
Glossary 204
7
cs and COURSE INTRODUCTION
me
Econometrics is an interface between economics, mathematics and statistics. It is mainly concerned with the
empirical estimation of economic theory. The present course provides a comprehensive introduction to basic
econometric concepts and techniques. The course is divided into five blocks comprising 14 Units.
Block 1 titled, Econometric Theory: Fundamentals, comprises three units. Unit 1 is introductory in
nature. It defines econometrics and lists the steps we follow in an econometric study. Unit 2 provides an
overview of the concepts frequently used in econometrics. In Unit 3 we define the concept and procedure of
hypothesis testing.
Block 2 is titled, Regression Models: Two Variables Case. It consists of three Units. Unit 4 begins with
the estimation procedure of simple regression model by ordinary least squares (OLS) method. It also
describes the properties of OLS estimators and goodness of fit of regression models. Unit 5 continues with
the simple regression model and describes the procedure of testing of hypothesis. In this context it explains
the procedure of forecasting with regression models. Unit 6 extends the simple regression models in terms of
log-linear models and changing the measurement units of the variables in a regression model.
Block 3 titled, Multiple Regression Models, considers cases where there are more than one explanatory
variable. There are three Units in this Block. Unit 7 deals with estimation of multiple regression models.
Unit 8 deals with hypothesis testing in the case of multiple regression models. Unit 9 looks into structural
stability of regression models and includes dummy variables as explanatory variables in multiple regression
models.
Block 4 deals with Treatment of Violations of Assumptions. Unit 10 addresses the issue of
multicollinearity. It outlines the consequences, detection and remedial measures of multicollinearity. Unit 11
deals with the issue of heteroscedasticity – its consequences, detection and remedial measures. Unit 12 deals
with another important problem in multiple regression models, i.e., autocorrelation. It discusses the
consequences, detection and remedial measures of autocorrelation.
Block 5 is titled, Econometric Model Specification and Diagnostic Testing. There are two Units in this
Block. Unit 13 deals with model selection criteria. In this Unit we discuss issues such as the exclusion of
relevant variables and inclusion of irrelevant variables. The subject matter of Unit 14 is tests for
specification errors. In this context it gives an outline of Akaike Information Criterion (AIC), Schwarz
Information Criterion (SIC), and Mallows’ Criterion.
UNIT 1 INTRODUCTION TO ECONOMETRICS *
Structure
1.0 Objectives
1.1 Introduction
1.2 Meaning of Econometrics
1.3 Economics and Econometrics
1.4 Methodology of Econometrics
1.5 Association and Causation
1.6 Let Us Sum Up
1.7 Answers/ Hints to Check Your Progress Exercises
1.0 OBJECTIVES
After going through this unit, you will be able to
explain the significance of econometrics in the field of economics;
distinguish between econometrics, mathematical economics and economic
statistics;
describe the steps to be followed in an econometric study; and
distinguish between association and causation.
1.1 INTRODUCTION
Econometrics connects the real world to the existing economic theories.
Econometrics is based on the development of statistical methods for testing
economic relationships and various economic theories. Econometrics helps us in
two ways so far as relationship among variables is concerned: (i) explaining the
past relationship among the variables, and (ii) forecasting the value of one
variable on the basis of other variables.
*Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
5
Econometric Theory:
Fundamentals
1.2 MEANING OF ECONOMETRICS
As mentioned earlier, econometrics deals with ‘economic measurement’. It can
be defined as a stream of social science which uses techniques of mathematics,
statistical inference and economic theory applied to analyze any economic
phenomenon. It deals with applications of mathematical statistics to economic
data. The objective is to provide empirical support to the economic models
constructed with the help of mathematical relationship and therefore obtain
numerical results. Thus econometrics makes use of economic theory,
mathematical economics, and economic statistics.
𝑌 = 𝑎 + 𝑏𝑋 …(1.1)
𝑌 = 𝑎 + 𝑏𝑋 + 𝑢 …(1.2)
6
We will discuss further in Unit 4 on stochastic relationship among variables. In Introduction to
Econometrics
econometrics we generally require special methods due to the unique nature of
economic data since such data are not generated under controlled experiments.
The aim of econometrics is to bridge the gap between economic theory and actual
measurement simply using the technique of statistical inference.
The law of demand does not provide any numerical measure of the strength of
relationship between the two variables namely, price and quantity demanded of
the commodity. It fails to answer the question that by how much the quantity will
go up or down as a result of a certain change in price of commodity.
7
Econometric Theory: Check Your Progress 1
Fundamentals
1) Bring out the differences between econometrics, mathematical economics
and statistics.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
2) Bring out the prominent features of econometrics.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
10
Step 5: Estimation of the Parameters of the Econometric Model Introduction to
Econometrics
We have discussed about sampling procedure, statistical estimation and testing of
hypothesis in Block 4 of BECC 107. You need a thorough understanding of those
concepts. Remember that in econometric estimation, the number of equations is
more than the number of parameters. In order to estimate such models we need
certain estimation methods. As you will come to know in subsequent Units of
this course, there are quite a few estimation methods. You have been introduced
to the least squares method in Unit 5 of the course BECC 107: Statistical
Methods for Economics. There are certain econometric software available for
estimation purpose. You will learn about econometric software in the course
BECE 142: Applied Econometrics.
Step 6: Testing of Hypothesis
Once you obtain the estimates of the parameters, there is a need for test of the
hypothesis. As you know, in a sampling distribution of an estimator, the estimate
varies across sample. The estimate that you have obtained could be a matter of
chance, and the parameter may be quite different from the estimate obtained. We
need to confirm whether the difference between the parameter and the estimate
really exists or it is a matter of sampling fluctuation.
For the consumption function (1.4), we should apply one sided t-test for testing
of the condition 𝐶 > 0 . For the marginal propensity to consume we should
apply two-sided t-test 𝐻 ∶ 𝑐 = 0. For testing both the parameters together we
should apply F-test.
There is a need to check for the correct specification of the model. Two issues are
important here: (i) how many explanatory variables should be there in the
regression model, and (ii) what is the functional form of the model.
The consumption function (see equation (1.4)) is a case of two-variable regress
model. There is one explained variable and one explanatory variable in the
model. If we include more number of explanatory variables (such as education,
type of residential area, etc.) it becomes a multiple linear regression model. The
functional form again could be linear or non-linear.
Step 7: Forecasting or Prediction
The estimated model can be used for forecasting or prediction. We have the
actual value of the dependent variable. On the basis of the estimated regression
model, we obtain the predicted value of the dependent variable. The discrepancy
between the two is the prediction error. This prediction error is required to be as
small as possible.
Step 8: Interpretation of Results
There is a need for correct interpretation of the estimates. In later Units of this
course we will discuss issues such as model specification and interpretation of
the result. The estimated model can be used for policy recommendation also.
11
Econometric Theory:
Fundamentals
1.5 ASSOCIATION AND CAUSATION
As you know from ‘BECC 107: Statistical Methods for Economics’ correlation
implies association between two variables. Technically we can find out the
correlation coefficient between any two variables (say the number of students
visiting IGNOU library and the number of road accidents in Delhi). In some
cases we find the correlation coefficients to be high also. Such relationship
between variables however leads to spurious correlation. If we take two such
variables (where correlation coefficient is high) and carry out a regression
analysis we will find the estimates to be statistically significant. Such regression
lines are meaningless. Thus regression analysis deals with the association or
dependence of one variable on the other. It does not imply ‘causation’ however.
The notion of causation has to come from existing theories in economics.
Therefore a statistical relationship can only be statistically strong or suggestive.
Unless causality is established between the variables the purpose of testing the
economic theory would not make any sense. Most of the economic theories test
the hypothesis whether one variable has a causal effect on the other.
Thus logic or economic theory is very important in regression analysis. We
should not run a regression without establishing the logic for the relationship
between the variables. Let us look into the case of the law of demand. While
analysing consumer demand, we need to understand the effect of changing price
of the good on the quantity demanded holding the other factors such as income,
price of other goods, tastes and preferences of individuals unchanged. However,
if the other factors are not held fixed, then it would be impossible to know the
causal effect of price change on quantity demanded.
Check Your Progress 2
1) Explain the steps you would follow in an econometric study.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
2) Assume that you have to carry out an econometric study on Keynesian
consumption function. Write down the steps you would follow.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
12
3) What do you understand by cause and effect relationship? How is it Introduction to
Econometrics
different from association?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
14
UNIT 2 OVERVIEW OF STATISTICAL
CONCEPTS
Structure
2.0 Objectives
2.1 Introduction
2.2 Meaning of Statistical Inference
2.3 Central Limit Theorem
2.4 Normal Distribution
2.5 Chi-Square Distribution
2.6 The t-Distribution
2.7 The F-Distribution
2.8 Estimation of Parameters
2.8.1 Point Estimation
2.8.2 Interval Estimation
2.0 OBJECTIVES
After going through this unit, you will be able to
explain the concept and significance of probability distribution;
identify various types of probability distributions;
describe the properties of various probability distributions such as normal, t,
F and chi-square;
explain the process of estimation of parameters and
describe the properties of a good estimator.
Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
Econometric Theory:
Fundamentals
2.1 INTRODUCTION
Statistical concepts and estimation methods hold crucial significance in
understanding the tools of econometrics. Therefore, you should be able to define
various concepts and distinguish between them. The essence of econometrics is
based on empirical analysis which deals with data. In fact, the tools of
econometric analysis emerge from statistical methods.
Statistical concepts guide us to make judgement in the presence of uncertainty.
Statistics provides the platform for data collection methods which becomes the
basis for carrying out econometric analysis. Econometricians need to work with
large population, which becomes a challenge. Therefore, there is a need to select
appropriate sample and draw appropriate inferences based on probability
distributions. Econometrics calls for a strong understanding of statistical concepts
which help economists to choose the right sample and infer correctly from the
chosen sample.
The population is a collection of items, events or people. It is difficult to examine
every element in the population. Therefore it makes sense in taking a subset of
the population and examining it. This subset of population is called a ‘sample’
which is further used to draw inferences. If the sample is random and large
enough, the information collected from the sample can be used for making
inference about the population.
Any experiment which gives random outcomes is referred to as random
experiment. A variable which takes values which are outcome of random process
is called a random variable. Thus, for a random variable each outcome is
associated with certain probability of occurrence.
Random variables are discrete random variables when they take finite values. If
the random variable assumes infinite number of values between any two pints, it
is called a continuous random variable. Random variables have a probability
distribution. If the random variable is discrete then the probability function
associated with it is called ‘probability distribution function’. If the random
variable is continuous, then the probability function is referred to as ‘probability
density function’. Random variables can have variety of distribution functions
depending on their probabilities. Some of the commonly used distribution
functions are described in this Unit.
The central limit theorem implies that the expected sample mean and standard
deviation (SD) would converge as follows:
E (𝑋) = µ and SD (𝑋) = ... (2.2)
√
17
Econometric Theory:
Fundamentals
This further implies that 𝑋 → N (µ, ). In other words, the sample mean can be
approximated with a normal random variable with mean 𝜇 and standard deviation
. We discuss certain important probability distribution functions below.
18
Standard Normal Distribution: N ~ N (0,1) Overview of
Statistical Concepts
It is a normal distribution with mean zero (𝜇 = 0) and unit variance (σ = 1),
then the probability distribution function is given by
𝑓( 𝑥 | 0, 1 ) = e ... (2.5)
√ π
All the properties of the normal distribution mentioned above are applicable in
the case of standard normal distribution.
Check Your Progress 1
1) Assume that X is normally distributed with mean 𝜇 = 30 and standard
deviation 𝜎 = 4. Find P(X < 40).
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
19
Econometric Theory:
Fundamentals
2.5 CHI-SQUARE DISTRIBUTION
Suppose X is a normal variable with mean 𝜇 and standard deviation 𝜎 , then
𝑧= is a standard normal variable, i.e., 𝑧~N(0,1). If we take the square of z,
i.e., 𝑧 = , then 𝑧 is said to be distributed as a variable with one
degree of freedom and expressed as .
𝑧= 𝑧
20
the right but it becomes symmetric as the sample size increases. All the values of Overview of
Statistical Concepts
the chi-square distribution are positive.
Properties of Chi-square distribution
1. The mean of the chi-square distribution is equal to the number of degrees
of freedom (k).
2. The variance of the chi-square distribution is equal to two times the
number of degrees of freedom: σ = 2k
3. When the degrees of freedom are greater than or equal to 2, the maximum
value of Y occurs when 2 = k – 2.
4. As the degree of freedom increases, the chi-square curve approaches a
normal distribution.
21
Econometric Theory: √
𝑡= = … (2.6)
Fundamentals ( / ) √
where, 𝑋 is the sample mean, μ is the population mean, s is the standard deviation
of the sample and n is the sample size.
Properties of t-Distribution
1. The mean of the distribution is equal to 0
2. The variance is equal to [k/(k – 2)] where k is the degrees of freedom and
k >2.
3. The variance is always greater than 1, although it is close to 1 when the
degree of freedom is large. For infinite degrees of freedom the t-
distribution is the same as the standard normal distribution.
The t-distribution can be used under the following conditions:
1. The population distribution is normal
2. The population distribution is symmetric, unimodal without outliers, and
the sample size is at least 30
3. The population distribution is moderately skewed, unimodal without
outliers and the sample size is at least 40
4. The sample size is greater than 40 without outliers.
Look into the above conditions. If the parent population (from which the sample
is drawn) is normal we can apply t-distribution for any sample size. If population
is not normal, the sample size should be large. The t-distribution should not be
used with small samples drawn from a population that is not approximately
normal.
23
Econometric Theory: The F-distribution is used to test the population variance. We can test whether
Fundamentals
two normal populations have the same variance. The null hypothesis is that the
variances are same while alternative hypothesis is that one of the variances is
larger than the other. That is:
Ho: 𝜎 = 𝜎
HA: 𝜎 > 𝜎
The alternative hypothesis states that the first population has larger variance. The
null hypothesis can be tested by drawing a sample from each population and
calculating the estimates 𝑠 and 𝑠 . The samples are assumed to be
independently drawn with size 𝑛 and 𝑛 respectively. We test the ratio
𝐹= ~𝐹 ,,
… (2.9)
If null hypothesis is not true, the ratio would be statistically different from unity.
We should compare the calculated value of F (obtained from equation (2.9)) with
the tabulated value of F (given in the appendix table at the end of the book). If
the calculated value exceeds the tabulated value, then the null hypothesis is
rejected.
Check Your Progress 2
1) A newly developed battery lasts 60 minutes on single charge. The
standard deviation is 4 minutes. For the purpose of quality control test, the
department randomly selects 7 batteries. The standard deviation of
selected batteries is 6 minutes. What is the probability that the standard
deviation in new test would be greater than 6 minutes?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
4) Test whether the students from private high schools are more
homogeneous with respect to their science test score than the students
from public high schools. It is given that the sample variances are 91.74
and 67.16 respectively for public and private schools. The sample sizes of
the students are 506 for public schools and 94 for private schools.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
25
Econometric Theory: X i
Fundamentals point estimate of the parameter where X . This formula is called the point
n
estimator. You should note that the point estimator is a random variable as its
value varies from sample to sample.
2.8.2 Interval Estimation
In point estimation we estimate the parameter by a single value, usually the
corresponding sample statistic. The point estimate may not be realistic in the
sense that the parameter value may not exactly be equal to it.
An alternative procedure is to give an interval, which would hold the parameter
with certain probability. Here we specify a lower limit and an upper limit within
which the parameter value is likely to remain. Also we specify the probability of
the parameter remaining in the interval. We call the interval as ‘confidence
interval’ and the probability of the parameter remaining within this interval as
‘confidence level’ or ‘confidence coefficient’.
The concept of confidence interval is somewhat complex. We have already
explained it in BECC 107, Unit 13. Let us look at it again. We have drawn a
sample of size n from a normal population. We do not know the population mean
𝜇 and population variance 𝜎 . We know the sample mean 𝑋 and sample
variance 𝑆 . Since 𝑋 varies across samples, we use the properties of the sampling
distribution of 𝑋 to draw inferences about 𝜇 .
If X is normally distributed, i.e. , we know that
2
X ~ X , x … (2.10)
n
From (2.10) we can say that sampling distribution of sample mean X follows
normal distribution with mean 𝜇 and standard deviation 𝜎 ⁄𝑛. Let us transform
the above as a standard normal variable.
X x
Z ~ N (0, 1) … (2.11)
X
n
Now the problem before us is that we do not know the population variance 𝜎 .
( X i X ) 2
Thus we take its estimator S X2 . In that case, the appropriate test
n 1
statistic is
( X X )
t … (2.11)
Sx / n
27
Econometric Theory: 1
Fundamentals X1 X 2 ..... X n … (2.14)
n
Sample mean is the linear estimator because it is a linear function of the
observations.
2.9.2 Unbiasedness
The value of a statistic varies across samples due to sampling fluctuation.
Although the individual values of a statistic may be different from the unknown
population parameter, on an average, the value of a statistic should be equal to
the population parameter. In other words, the sampling distribution of 𝑋 should
have a central tendency towards 𝜇 . This is known as the property of
unbiasedness of an estimator. It means that although an individual value of a
given estimator may be higher or lower than the unknown value of the population
parameter, there is no bias on the part of the estimator to have values that are
always greater or smaller than the unknown population parameter. If we accept
that mean (here, expectation) is a proper measure for central tendency, then 𝑋 is
an unbiased estimator for 𝜇 if
E( X ) X
2.9.4 Efficiency
The property of unbiasedness is not adequate by itself. It is possible to obtain two
or more estimators of a parameter as unbiased. Therefore, we must choose the
most efficient estimator. Suppose two estimators of 𝜇 as given as follows:
2
X ~ N X , … (2.15)
n
2
X med ~ N X , , 3.142 (approx.) … (2.16)
2 n
In the case of large samples, the median computed from a random sample of
normal population also follows normal distribution with the same 𝜇 . However,
it has a large variance.
28
2 Overview of
Var ( X med ) n Statistical Concepts
1.571 (approx.) … (2.17)
Var ( X ) 2 2 2
n
Equation (2.17) implies that the variance of sample median is 57% larger than the
variance of sample mean. Therefore, the sample mean provides more precise
estimate of population mean compared to the median (Xmed). Thus, X is an
efficient estimator of 𝜇 .
2.9.6 Consistency
Consistency is a large sample property. If we increase the sample size, the
estimator should have a tendency to approach the value of the parameter. Thus,
an estimator is said to be consistent if the estimator converges to the parameter as
n .
Suppose X ~ N ( X , X2 ) . We draw a random sample of size n from the
population.
Two estimators of X are ]
Xi
X
n … (2.18)
Xi
X* … (2.19)
n 1
As you know, the first estimator (2.18) is the sample mean and it is unbiased
since E ( X ) X .
As the sample size increases we should not find much difference between the two
estimators. As n increases, X* will approach 𝜇 . Such an estimator is known as
consistent estimator. An estimator is consistent estimator if it approaches the true
value of parameter as sample size gets larger and larger.
29
Econometric Theory: Check Your Progress 3
Fundamentals
1) Describe the desirable properties of an estimator.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
2) For a sample of size 30, the sample mean and standard deviation are 15
and 10 respectively. Construct the confidence interval of population mean
(𝝁𝑿 ) at 5 per cent level of significance.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
This implies 99.6% chance that the sample average will be no greater than
110.
4) The degrees of freedom (n1–1) and (n2–1) are 505 and 93 respectively. Our
null hypothesis H0 is that the both type schools are equally homogeneous
with respect to science marks. We are comparing variances. Thus we apply
F-test.
𝑠 .
F= = = 1.366
𝑠 .
The tabulated value of F for 505 and 93 degrees of freedom is 1.27. Since
calculated value is more than the tabulated value, we reject the H0. We
conclude that the students from private schools are more homogeneous
with respect to science marks.
Check Your Progress 3
1) Go through Section 2.9 and answer.
2) Since population standard deviation is not known, you should apply t-
distribution. Check the tabulated value of t given at the Appendix for 29
degrees of freedom and 5 per cent level of significance. Construct the
confidence interval as given at equation (2.12).
31
UNIT 3 OVERVIEW OF HYPOTHESIS TESTING
Structure
3.0 Objectives
3.1 Introduction
3.2 Procedure of Hypothesis Testing
3.3 Estimation Methods
3.4 Rejection Region and Types of Errors
3.4.1 Rejection Region for Large Samples
3.4.2 One-tail and Two-tail Tests
3.4.1 Rejection Region for Small Samples
3.5 Types of Errors
3.6 Power of Test
3.7 Approaches to Parameter Estimation
3.7.1 Test of Significance Approach
3.7.2 Confidence Interval Approach
3.8 Let Us Sum Up
3.9 Answers/Hints to Check Your Progress Exercises
3.0 OBJECTIVES
After going through this unit, you will be able to
explain the concept and significance of hypothesis testing;
describe the applications of a test statistic;
explain the procedure of testing of hypothesis of population parameters;
distinguish between the Type I and Type II errors; and
apply the tests for comparing parameters from two different samples.
3.1 INTRODUCTION
The purpose behind statistical inference is to use the sample to make judgement
about the population parameters. The concept of hypothesis testing is crucial for
predicting the value of population parameters using the sample. Various test
statistics are used to test hypotheses related to population mean and variance. The
variance of two different samples can also be compared using hypothesis testing.
There are two approaches to testing of hypothesis: (i) test of significance
Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
approach, and (ii) confidence interval approach. While testing a hypothesis, there Overview of
Hypothesis Testing
is a likelihood of committing two types of errors: (i) type I error, and (ii) type II
error. In this unit we will elaborate on the process of hypothesis testing, and
explain the method of rejecting the null hypothesis on the basis of appropriate
test statistic.
33
Econometric Theory:
Fundamentals
(i) Identification of the test statistic: The null hypothesis is put to test by a
test statistic. There are several test statistics (such as t, F, chi-square,
etc.) available in econometrics. We have to identify the appropriate test
statistic.
(ii) Interpretation of the results based on the value of the test statistic: After
carrying out the test, we interpret the results. When we apply the test
statistic to the sample data that we have, we obtain certain value of the
test statistic (for example, t-ratio of 2.535). Interpretation of results
involves comparison of two values: tabulated value of the test statistics
and the computed value. If the computed value exceeds the tabulated
value we reject the null hypothesis.
The sampling distribution of a test statistic under the null hypothesis is called the
‘null distribution’. When the data depicts strong evidence against the null
hypothesis, the value of test statistic becomes very large. By observing the
computed value of the test statistic we draw inferences. Apart from the test
statistic econometric software provides a p-value. The p-value indicates the
probability of the null hypothesis being true. Thus, if we obtain a p-value of 0.04,
it says the probability of the null hypothesis being true is 0.04 or 4 per cent.
Therefore, if we take 5 per cent level of significance, we reject the null
hypothesis.
Recall that area under the standard normal curve gives the probability for
different range of values assumed by z. These probabilities are presented as the
area under standard normal curve.
Let us explain the concept of critical region or rejection region through the
standard normal curve given in Fig. 14.1 below. When we have a confidence
coefficient of 95 percent, the area covered under the standard normal curve is 95
per cent. Thus 95 per cent area under the curve is bounded by 1.96 z 1.96 .
The remaining 5 per cent area is covered by z 1.96 and z 1.96 . Thus 2.5 per
cent of area on both sides of the standard normal curve constitute the rejection
region. This area is shown in Fig. 3.1. If the sample mean falls in the rejection
region we reject the null hypothesis.
36
The critical values for z depend upon the level of significance. In the appendix Overview of
Hypothesis Testing
tables at the end of this book Table 14.1 these critical values for certain specified
levels of significance ( ) are given.
On the other hand, if population standard deviation is not known the test statistic
is
x
t …(3.2)
s n
In the case of t-distribution, however, the area under the curve (which implies
probability) changes according to degrees of freedom. Thus while finding the
critical value of t we should take into account the degrees of freedom. You
should remember two things while finding critical value of t. These are: i) level
of significance, and ii) degrees of freedom.
As pointed out above, there are types I and type II errors in hypothesis testing.
Thus, there are two types of risks: (i) represents the probability that the null
hypothesis is rejected when it is true and should not be rejected. (ii) represents
the probability that null hypothesis is not rejected when in reality it is false. The
power of test is referred to as (1 ), that is the complement of . It is basically
the probability of not committing a type II error.
A 95% confidence coefficient means that we are prepared to accept at most 5%
probability of committing type I error. We do not want to reject a true hypothesis
by more than 5 out of 100 times. This is called 5% level of significance.
The power of test depends on the extent of difference between the actual
population mean and the hypothesized mean. If the difference is large then the
power of test will be much greater than if the difference is small. Therefore,
selection of level of significance is very crucial. Selecting large value of
makes it easier to reject the null hypothesis thereby increasing the power of the
test (1 ).
At the same time increasing the sample size increases the precision in the
estimates and increases the ability to detect the difference between the population
parameter and sample, increasing the power of the test.
If the difference between 𝑋¯ and 𝜇 is small, |t| value will also be small, where |t|
is the absolute value of t-statistic. You should note that t = 0, if 𝑋¯ = 𝜇 . In this
case we do not reject the null hypothesis. As | t | gets larger, we would be more
inclined to reject the null hypothesis.
Example. Suppose for a dataset 𝑋¯ = 23.25, 𝑆 = 4.49, and 𝑛 = 28. Our nulland
alternative hypothesis are
𝐻 : 𝜇 = 18.5
𝐻 : 𝜇 ≠ 18.5 (two-tailed test)
Since the computed t value is 2.6486. This value lies in the right-tail critical
region of the t-distribution. We therefore reject the null hypothesis (H0 ) that the
true population mean is 18.5.
A test is statistically significant means that we one can reject the null hypothesis.
This implies that the probability of observed difference between the sample value
and the critical value (also called tabulated value) is not small and is not due to
chance.
A test is statistically not significant means that we do not reject the null
hypothesis. The difference between the sample value and the critical value could
be due to sampling variation or due to chance mechanism.
The probability that t value lies between the limits ( −2.360 ≤ 𝑡 ≤ 2.360) is
0.95 or 95%. The values −2.360 and 2.360 are the critical t values.
If we substitute the above value in equation (3.6) and re-arrange terms we obtain
40
Overview of
Equation (3.7) provides a 95% confidence interval for the parameter 2. Such a Hypothesis Testing
confidence interval is known as the region of acceptance (H0). The area outside
the confidence interval is known as the rejection region (HA).
If the confidence interval includes the value of the parameter 2,we do not reject
the hypothesis. But if the parameter lies outside the confidence interval, we reject
the null hypothesis.
Check Your Progress 2
1) What is meant by power of a test?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
2) We have given the types of errors in table 3.1. You should elaborate on
that.
3) a) In order to test this we use z-statistics z = (X – µ)/𝜎, z= (225 – 180)/ 20 =
2.25
b) The area corresponding to the z value of 2.25 is 0.0122, which the
probability of making type I error. An area of tail as 2% corresponds to Z
= 2.05.
Z = (X – µ)/𝜎
2.05 = (X – µ)/ 20, i.e., (X – µ) = 2.05 * 20 = 41
X = 41 + 180 = 221
Check Your Progress 2
1) Go through Section 3.6 and answer.
2) Go through Section 3.7.2 and answer.
42
UNIT 4 SIMPLE LINEAR REGRESSION
MODEL: ESTIMATION
Structure
4.0 Objectives
4.1 Linear Regression Model
4.2 Population Regression Function (PRF)
4.2.1 Deterministic Component
4.0 OBJECTIVES
After going through this unit, you should be able to
describe the classical linear regression model;
differentiate between Population Regression Function (PRF) and Sample
Regression Function (SRF);
find out the Ordinary Least Squares (OLS) estimators;
describe the properties of OLS estimators;
explain the concept of goodness of fit of regression equation; and
describe the coefficient of determination and its properties.
Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
Regression Model: Two
Variables Case 4.1 INTRODUCTION
In Unit 5 of the course BECC 107: Statistical methods for Economics we
discussed the topics correlation and regression. In that Unit we gave a brief idea
about the concept of regression. You already know that there are two types of
variables in regression analysis: i) dependent (or explained) variable, and ii)
independent (or explanatory) variable. As the name (explained and explanatory)
suggests the dependent variable is explained by the independent variable.
Usually we denote the dependent variable as Y and the independent variable as
X. Suppose we took up a household survey and collected n pairs of observations
in X and Y. The relationship between X and Y can take many forms. The general
practice is to express the relationship in terms of some mathematical equation.
The simplest of these equations is the linear equation. It means that the
relationship between X and Y is in the form of a straight line, and therefore, it is
called linear regression. When the equation represents curves (not a straight line)
the regression is called non-linear or curvilinear.
Thus in general terms we can express the relationship between X and Y as
follows in equation (4.1).
𝑌 = 𝑓(𝑋) … (4.1)
In this block (Units 4, 5 and 6) we will consider simple linear regression models
with two variables only. The multiple regression model comprising more than
one explanatory variable will be discussed in the next block.
Regression analysis may have the following objectives:
To estimate the mean or average value of the dependent variable, given the
values of the independent variables.
To test the hypotheses regarding the underlying economic theory. For
example, one may test the hypotheses that the price elasticity of demand is
(–)1 that is, the demand is perfectly elastic, assuming other factors affecting
the demand are held constant.
To predict the mean value of the dependent variable given the values of the
independent variable.
Weekly Exp.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
2) Why does the average value of the dependent variable differ from the
actual value?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
47
Regression Model: Two
Variables Case
SRL1
Expenditure
SRL2
0
PDI
Exp.
SRL: 𝑌 = 𝑏 + 𝑏 𝑋
𝑌
𝑌 PRL: 𝐸(𝑌|𝑋 ) = 𝛽 + 𝛽 𝑋
𝑒
𝑢
𝑢
𝑒 𝑌
0
𝑋 𝑋 PDI
Y PRL:𝐸(𝑌|𝑋 )=𝛽 + 𝛽 𝑋
+𝑢
-𝑢
0 X
50
Simple Linear Regression
Model: Estimation
Y
PRF: 𝑌 = 𝛽 + 𝛽 𝑋
0 X
PRF:𝑌 = 𝛽 + 𝛽 𝑋
0 X
𝑒 = Yi − 𝑌
∑ 𝑒 =∑ Yi − 𝑌 ... (4.11)
The first order condition of minimization requires that the partial derivatives are
equal to zero. Note that we have to decide on the values of 𝑏 and 𝑏 such that
ESS is the minimum. Thus, we have take partial derivates with respect to 𝑏 and
𝑏 . This implies that
=0 … (4.13)
and
=0 … (4.14)
2𝛴 (𝑌 − 𝑏 − 𝑏 𝑋 ) (−1) = 0
𝛴𝑌 = 𝑛𝑏 + 𝑏 𝛴𝑋 … (4.15)
2𝛴 (𝑌 − 𝑏 − 𝑏 𝑋 ) (−𝑋 ) = 0
𝛴𝑋 𝑌 = 𝑏 𝛴𝑋 + 𝑏 𝛴𝑋 … (4.16)
Equations (4.15) and (4.16) are called normal equations. We have two equations
with two unknowns (𝑏 and 𝑏 ) .
Thus, by solving these two normal equations we can find out unique values of 𝑏
and 𝑏 .
53
Regression Model: Two
Variables Case
By solving the normal equations (4.15) and (4.16) we find that
𝑏 = 𝑌 − 𝑏 𝑋¯ … (4.17)
and
( ¯ )( )
𝑏 = ( )
= ( )
𝑏 = … (4.18)
As you can see from the formula for b2, it is simpler to write the estimator of the
slope coefficient in deviation form. Expressing the values of a variable from its
mean value does not change the ranking of the values, since we are subtracting
the same constant from each value. It is crucial to note that b1 and b2 are
expressed in terms of quantities computed from the sample, given by the formula
in expressions in (4.17) and (4.18).
We mention below the formulae for variance and standard deviation of the
estimators b1 and b2
𝑉𝑎𝑟(𝑏 ) = 𝜎 = 𝜎 … (4.19)
𝑆𝐸 (𝑏 ) = 𝑉𝑎𝑟(𝑏 ) … (4.20)
𝑉𝑎𝑟(𝑏 ) = 𝜎 =
𝑆𝐸 (𝑏 ) = √var(𝑏 ) … (4.21)
σ = = = … (4.22)
. .
𝑏 ~𝑁 𝛽 , 𝜎 … (4.24)
𝑏 ~𝑁 𝛽 , 𝜎 … (4.25)
54
Check Your Progress 2 Simple Linear Regression
Model: Estimation
1) Distinguish between the error term and the residual by using appropriate
diagram.
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
2) Prove that the sample regression line passes through the mean values of X
and Y.
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
𝑌¯ = 𝑏 + 𝑏 𝑋¯ …(4.26)
Mean value of residuals 𝑒¯ is always zero 𝑒¯ = = 0. This implies that on
an average, the positive and negative residual terms cancel each other.
b) 𝛴𝑒 𝑋 = 0 …(4.27)
c) 𝛴𝑒 𝑌𝑖 = 0 …(4.28)
Or, equivalently,
yi2 b22 xi2 ei2 … (4.34)
𝑅 = … (4.37)
1=𝑅 + =𝑅 +
Therefore,
𝑅 =1− … (4.38)
You should note that 𝑅 gives the percentage of TSS explained by ESS. Thus, if
𝑅 = 0.75, we can say that 75 per cent variation in the dependent variable is
explained by explanatory variable in the regression model. The value of R2 or
coefficient of determination lies between 0 and 1. This is mainly because it
represents the ratio of explained sum of squares to total sum of squares.
Now let us look into the algebraic properties of 𝑅 and interpret it. When 𝑅 = 0
we have ESS = 1. It indicates that no proportion of the variation in the dependent
variable is explained by ESS. If R2 = 1, the sample regression is a perfect fit. If
R2 = 1, all the observations lie on the estimated regression line. A higher value of
the R2 implies a better fit of a regression model.
4.7.2 F-Statistic for Goodness of Fit
The statistical significance of a regression model is tested by the F-statistic. By
using the t-test we can test the statistical significance of a particular parameter of
the regression model. For example, the null hypothesis 𝐻 : 𝛽 = 0 implies that
there is no relationship between Y and X in the population. By using F-statistic,
we can test the null hypothesis that all the parameters in the model are zero.
Therefore, we use F-statistics for goodness of fit.
Therefore,
(𝑌𝑖 ¯) ([ ] [ ¯ ])
F= = ... (4.42)
⁄( )
𝐹= . 𝑏 (𝑋 − 𝑋¯) …(4.44)
We know that
𝑣𝑎𝑟(𝑏 ) =
𝐹= = ( )
=[ ( )]
=𝑡 ... (4.45)
58
Therefore, the F-statistic is equal to square of the t-statistic (𝐹 = 𝑡 ). The above Simple Linear Regression
Model: Estimation
result, however, is true for the two-variable model only. If the number of
explanatory variable increases in a regression model, the above result may not
hold.
Check Your Progress 3
1) Is it possible to carry out F-test on the basis of the coefficient of
determination? Explain how.
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
2) Can the coefficient of determination be greater than 1? Explain why.
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
59
Regression Model: Two
Variables Case
2) The relationship between Y and X is stochastic in nature. There is an error
term added to the regression equation. The inclusion of the random error
term leads to a difference between the expected value and the actual value
of the dependent variable.
3) There are three reasons for inclusion of the error term in the regression
model. See Sub-Section 4.2.2 for details.
Check Your Progress 2
1) Go through Section 4.3. You should explain the difference between the
error term and the residual by using Fig. 4.3.
2) In the OLS method we minimise 𝛴𝑒 by equating its partial derivates to
zero. The condition = 0 gives us the first normal equation:
𝑌 = 𝑛𝑏 + 𝑏 𝛴𝑋 . If we divide this equation by the sample size, n, we
obtain 𝑌¯ = 𝑏 + 𝑏 𝑋¯ . Thus, the estimated regression passes through the
point 𝑋¯, 𝑌¯ .
Check Your Progress 3
1) Yes, we can carry out F-test on the basis of the 𝑅 value. Go through
equation (4.40).
2) The value of R2 or the coefficient of determination lies between 0 and 1.
This is mainly because it represents the ratio of ESS to TSS. It indicates
the proportion of variation in Y that has been explained by the
explanatory variables. The numerator ESS cannot be more than the TSS.
Therefore, R2 cannot be greater than 1.
60
UNIT 5 SIMPLE REGRESSION MODEL:
INFERENCE
Structure
5.0 Objectives
5.1 Introduction
5.2 Testing of Hypothesis
5.3 Confidence Interval
5.4 Test of Significance
5.5 Analysis of Variance (ANOVA)
5.6 Gauss Markov Theorem
5.7 Prediction
5.7.1 Individual Prediction
5.7.2 Mean Prediction
5.8 Let Us Sum Up
5.9 Answers/Hints to Check Your Progress Exercises
5.0 OBJECTIVES
After reading this unit, you will be able to:
explain the concept of Testing of Hypothesis;
derive the confidence interval for the slope coefficient in a simple linear
regression model;
explain the approach of ‘test of significance’ for testing the hypothesis on the
estimated slope coefficient;
describe the concept of Analysis of Variance (ANOVA);
state the Gauss Markov Theorem with its properties; and
derive the confidence interval for the predicted value of Y in a simple
regression model.
5.1 INTRODUCTION
In Unit 4 we discussed the procedure of estimation of the values of the
parameters. In this unit, we focus upon how to make inferences based on the
estimates of parameters obtained. We consider a simple linear regression model
with only one independent variable. This means we have one slope coefficient
associated with the independent variable and one intercept term. We begin by
recapitulating the basics of ‘hypothesis testing’.
Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
Simple Regression
Model: Two
5.2 TESTING OF HYPOTHESIS
Variables Case
Testing of hypothesis refers to assessing whether the observation or findings are
compatible with the stated hypothesis or not. The word compatibility implies
“sufficiently close” to the hypothesized value. It further indicates that we do not
reject the stated hypothesis. The stated hypothesis is also referred to as ‘Null
Hypothesis’ and it is denoted by H0. The null hypothesis is usually tested against
the ‘alternative hypothesis’, also known as maintained hypothesis. The
alternative hypothesis is denoted by H1. For instance, suppose the given
population regression function is given by the equation:
𝑌 =𝛽 +𝛽 𝑋 +𝑢 ... (5.1)
where 𝑋 is personal disposable income (PDI) and 𝑌 is expenditure. Now, the
null hypothesis is:
H0 : 2 0 ... (5.2)
while the alternative hypothesis is:
H1: β2 ≠ 0 ... (5.3)
We deliberately set the null hypothesis to ‘zero’ in order to find out whether Y is
related to X at all. If X really belongs to the model, we would fully expect to
reject the zero-null hypothesis H0 in favour of the alternatives hypothesis 𝐻1 . The
alternative hypothesis implies that the slope coefficient is different from zero. It
could be positive or it could be negative. Similarly, the true population intercept
can be tested by setting up the null hypothesis:
H1: β1 = 0 … (5.4)
while the alternative hypothesis is:
H1: β1 ≠ 0 … (5.5)
The null hypothesis states that the true population intercept is equal to zero, while
the alternative hypothesis states that it is not equal to zero. In case of both the
null hypotheses,, i.e., for true population parameter or slope and the intercept, the
null hypothesis as stated is a ‘simple hypothesis’. The alternative hypothesis is
composite. It is also known as a two-sided hypothesis. Such a two-sided
alternative hypothesis reflects the fact that we do not have a strong apriori or
theoretical expectation about the direction in which the alternative hypothesis
must move from the null hypothesis. However, when we have a strong apriori or
theoretical expectations, based on some previous research or empirical work,
then the alternative hypothesis can be one-sided or unidirectional rather than two-
sided. For instance, if we are sure that the true population value of slope
coefficient is positive then the best way to express the two hypotheses is
H0 : 2 0
62
H1: β2 > 0 Simple Regression Model:
Inference
Let us take an example from macroeconomics. The prevailing economic theory
suggests that marginal propensity to consume is positive. This means that the
slope coefficient is positive. Now, suppose that the given population regression
function is estimated by using a sample regression by adopting Ordinary Least
Squares estimate. Let us also suppose that the results of sample regression yield
the value of estimated slope coefficient as b 2 0.0814 . This numerical value
will change from sample to sample. We know that 2 follows normal
distribution, i.e., 𝑏 ~𝑁 𝛽 , . There are two methods of testing the null
hypothesis that the true population slope coefficient is equal to zero. The next
two sections of this unit describe the two methods of testing of hypothesis of
regression parameters.
Hence, the probability that t value lies between the limits –2.306, +2.306) is 0.95
or 95%. These are the critical t values. Substituting the value of t into equation
(5.6) and rearranging the terms in (5.7) we get:
P b 2 2 . 306 .SE b 2 2 b 2 2 .306 SE b 2 0 . 95
The above equation provides a 95% confidence interval for 2 . Such a
confidence interval is known as the region of acceptance (for H0) and the area
outside the confidence interval is known as the region of rejection [for (𝐻0 )]. If
this interval includes the value of 2 we do not reject the hypothesis; but if it lies
outside the confidence interval, we reject the null hypothesis.
63
Simple Regression
Model: Two
Variables Case
Note that *2 is some specific numerical value of 2 . Thus, the computed value of
the test-statistic ‘t’ will be like:
b2 2*
t ...(5.10)
SEb2
This can be computed from sample data as all values are available. The t value
computed from (5.10) follows t distribution with (n – k) degrees of freedom
(d.f.). This testing procedure is called the t-test. Fig. 5.3 depicts the region of
rejection and the region of acceptance. One method of deciding on the result of
the testing is to compare the computed value with the tabulated value (also called
the ‘critical value’). If the computed value of t is greater than the critical value of
t then we reject the null hypothesis. This means we are rejecting the hypothesis
that the true population parameter, or the slope coefficient, is zero. It implies that
the explanatory variable plays a significant role in determining dependent
variable. On the other hand, if the computed t value is less than critical value of t,
then we do not reject the null hypothesis that the true value of the population
parameter (or the slope coefficient) is zero. Not rejecting the null hypothesis
implies that the value of slope coefficient is zero and that the explanatory
variable does not play any significant role in determining the dependent variable.
65
Simple Regression
Model: Two
Variables Case
Equation (5.11) distributes the total variation in the dependent variable Y into two
parts, i.e., the variation in mean and the residual value. Squaring each of the
terms in equation (5.11) and adding over all the n observations, we get the
following equation.
The above equation can be written as: TSS = ESS + RSS where TSS is the Total
Sum of Squares, ESS is the Explained Sum of Squares, and RSS is the Residual
66
Sum of Squares. The RSS is also called the ‘Sum of Squares due to Error (SSE)’. Simple Regression Model:
Inference
The ratio ESS / TSS is defined as the coefficient of determination R2. The R2
indicates the proportion of total sum of squares explained by the regression
model. An ANOVA analysis is carried out with the help of a table (Table 5.1).
From such a table of analysis of variance, the F-statistic can be computed as:
ESS/RSS. This F-statistic is used to test the overall level of significance of the
model. The null hypothesis and the alternative hypothesis for testing the overall
significance using ANOVA are given by:
H0: Slope coefficient is zero
/( )
F= gives the observed value. The F-critical value at (k-1) and (n-k)
/( )
degrees of freedom can be located from the statistical table. When the computed
F is > than F-critical, the null hypothesis is rejected. Since the alternative
hypothesis is accepted, the inference is that the explanatory variable plays a
crucial role in determining the dependent variable. Similarly, when the F
computed is < than the F-critical, the null hypothesis is not rejected. In this case,
the hypothesis that the explanatory variable plays no role in determining the
dependent variable is accepted. Again, here also, we can base our inference based
on the p-value. This means if p < 0.05, we reject the null hypothesis.
Check Your Progress 2 [answer questions in about 50-100 words]
1) What is ment by the ‘test of significance approach’ to hypothesis testing?
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
67
Simple Regression 2) What does the ‘level of significance’ indicate?
Model: Two
Variables Case .......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
68
5.7 PREDICTION Simple Regression Model:
Inference
So far we have spoken about estimation of population parameters. In the two
variable model, we derived the OLS estimators of the intercept (𝛽 ) and slope
(𝛽 ) parameters. Prediction refers to estimation of the dependent value at a
particular value of the independent variable. In other words, we use the estimated
regression model to predict the value of Y corresponding to a given value of X.
Prediction is important to us for two reasons: First, it helps us in policy
formulation. On the basis of the econometric model, we can find out the impact
of changes in the explanatory variable on the dependent variable. Second, we can
find out the robustness of our estimated model. If our econometric model is
correct, the error between forecast value and actual value of the dependent
variable should be small. Prediction could be of two types, as mentioned below.
5.7.1 Individual Prediction
If we predict an individual value of the dependent variable corresponding to a
particular value of the explanatory variable, we obtain the individual prediction.
Let us take a particular value of X, say X = X0. Individual prediction of Y at X =
X0 in obtained by:
𝑌 =𝛽 +𝛽 𝑋 +𝑢 … (5.13)
Therefore,
Ŷ0 b1 b 2 X 0 … (5.14)
Since Ŷ0 is an estimator, the actual value 𝑌 will be different from 𝑌 , and there
will be certain ‘prediction error’.
𝑌 − 𝑌 = (𝑏 + 𝑏 𝑋 ) − (𝛽 + 𝛽 𝑋 + 𝑢 ) … (5.15)
𝑌 − 𝑌 = (𝑏 − 𝛽 ) + (𝑏 − 𝛽 )𝑋 − 𝑢
We know that
𝑉(𝑏 ) = 𝜎 … (5.18)
𝑉 (𝑏 ) = … (5.19)
( ¯)
𝑉 𝑌 −𝑌 =𝜎 1+ + ... (5.21)
𝑃 −𝑡 ⁄ ≤𝑡≤𝑡 ⁄ = 1−𝛼
𝑃 (𝑏 + 𝑏 𝑋 ) − 𝑡 ⁄ 𝑆𝐸 𝑌 ≤ (𝛽 + 𝛽 𝑋 ) ≤ (𝑏 + 𝑏 𝑋 ) + 𝑡 ⁄ 𝑆𝐸 𝑌 =
1−𝛼 …(5.23)
Let us lok into equation (5.21) again. We see that the variance of 𝑌 increases
with (𝑋 − 𝑋¯) . Thus, there is an increase in variance if 𝑋 is farther away from
𝑋¯, the mean of the sample on the basis of which 𝑏 and 𝑏 are computed. In Fig.
5.4 we depict the confidence interval for 𝑌 (see the dotted line)
70
Simple Regression Model:
Inference
Y
Confidence Interval
Regression Line
0 X
𝑋
𝑌 − 𝑌 = (𝑏 + 𝑏 𝑋 ) − (𝛽 + 𝛽 𝑋 ) … (5.24)
We can re-arrange the terms in equation (5.24) to obtain
𝑌 − 𝑌 = (𝑏 − 𝛽 ) + (𝑏 − 𝛽 )𝑋
If we take the expected value of (5.24)
E(𝑌 − 𝑌 ) = 𝐸(𝑏 − 𝛽 ) + 𝐸(𝑏 − 𝛽 )𝑋 … (5.25)
Thus, we find that expected value of prediction error is zero.
Now let us find out the variance of the prediction error in thecase of mean
prediction.
The variance of the prediction error,
V(𝑌 − 𝑌 ) = 𝑉(𝑏 − 𝛽 ) + 𝑉(𝑏 − 𝛽 )𝑋
+2 𝑋 𝑐𝑜𝑣(𝑏 − 𝛽 , 𝑏 − 𝛽 ) … (5.26)
If we compare equations (5.17) and (5.26) we notice an important change – the
term 𝑉(𝑢 ) is not there in (5.26). Thus the variance of the prediction error in the
case of mean prediction is less compared to individual prediction. There is a
change in the variance of 𝑌 in the case of mean prediction, however. Variance
of the prediction error, in the case of mean prediction is given by
71
Simple Regression ( ¯)
𝑉 𝑌 −𝑌 =𝜎 + ... (5.27)
Model: Two
Variables Case
Again, there is an increase in the variance of prediction error if 𝑋 is farther away
from 𝑋¯, the mean of the sample on the basis of which 𝑏 and 𝑏 are computed. It
will look somewhat like the confidence interval we showed in Fig. 5.4, but the
width of the confidence interval will be smaller.
An inference we draw from the above is that we can predict or forecast the value
of the dependent variable, on the basis of the estimated regression equation, for a
particular value of the explanatory variable (𝑋 ). The reliability of our forecast,
however, will be lesser if the particular value of X is away from 𝑋¯.
Check Your Progress 3 [answer questions within the given space in about 50-
100 words]
1) State Gauss-Markov Theorem.
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
72
5.9 ANSWERS/HINTS TO CHECK YOUR Simple Regression Model:
Inference
PROGRESS EXERCISES
Check Your Progress 1
1) In case of both the null hypothesis (, i.e., for true population parameters of
slope and intercept), the null hypothesis are a simple hypothesis, whereas the
alternative hypothesis are composite. The former is usually equated to zero
(unless equated to a known value) and the latter in stated in inequality terms.
The latter is also known as two-sided hypothesis when stated in ‘not equal to’
terms. It is considered one sided if stated in > or < terms.
2) False. It is the alternative hypothesis that decides whether it is composite or
one-sided hypothesis. If the alternative hypothesis is stated as not equal to zero
then it is composite or two-tailed test. Otherwise, i.e., if the alternative
hypothesis is stated in positive or negative terms, then it will be a one-sided
test.
3) The confidence interval approach is a method of testing of hypothesis. It refers
to the probability that a population parameter falls within the set of critical
values drawn from the Table.
4) We say that the hypothesised value is contained in the interval because the
value of the interval depends upon the sample or the data used for estimation.
The true population parameter value is fixed but the interval changes
depending on the sample.
Check Your Progress 2
1) The test of significance approach is another method of testing of hypothesis.
The decision to accept or reject H0 is made on the basis of the value of test
statistic obtained from the sample data. This test statistic is given by:
b 2
t 2 and it follows t distribution with (n 1 d.f.)
SEb2
2) It is a measure of the strength of evidence when the null hypothesis is
rejected It concludes that the effect is statistically significant. It is the
probability of rejecting the null hypothesis when it is true. This is a grave
error to commit and hence is chosen in a small measure like 1% or 5%.
3) Analysis of Variance (ANOVA) is a technique or a tool used to analyse the
given data in two ways or direction. One is attributed to the deterministic
factors, also called the explained part or the systematic part. The other is
called the random or the unexplained part. This method of analysis of
variance method was developed by Ronald Fisher in 1918.
4) The t-test is used to test the significance of estimated individual coefficients.
It is distributed as t with (k – 1) degrees of freedom (d.f.). where k is the
number of parameters estimated including the intercept term. Thus, for a
simple linear regression, it is [n – (2 – 1)] = (n –1). The F-distribution is used
for testing the significance of the whole model. It has two parameters. The
d.f. for a F test, in general is (k – 1) and (n – k). K includes the intercept term.
Hence, in a simple linear regression, the d.f. for F is: (2 – 1) and (n – 2) or 1
73
Simple Regression and (n – 2) Note that in a simple linear regression, the t test and the F test are
Model: Two equivalent because the number of independent variable is only one.
Variables Case
Check Your Progress 3
1) The Gauss-Markov theorem states that the Ordinary Least Squares (OLS)
estimators are also the best linear unbiased estimator (BLUE). The presence
of BLUE property implies that the estimator obtained by the OLS method
retains the following properties: (i) it is linear, i.e., the estimator is a linear
function of a random variable such as the dependent variable Y in the
regression model; (ii) it is unbiased, i.e., its average or expected value is
equal to the true value in the sense that E (b2) = β2; (iii) it has minimum
variance in the class of all such linear unbiased estimators. Such an estimator
with the least variance is also known as an efficient estimator.
2) Prediction implies predicting two types of values: prediction of conditional
mean, i.e., E (Y│X0) → a point on the population regression line. This is
called as the Mean Prediction. Prediction of individual Y value,
corresponding f (X0) is called the Individual Prediction.
74
UNIT 6 EXTENSION OF TWO VARIABLE
REGRESSION MODELS
Structure
6.0 Objectives
6.1 Introduction
6.2 Regression through the Origin
6.3 Changes in Measurement Units
6.4 Semi-Log Models
6.5 Log-linear Models
6.6 Choice of Functional Form
6.7 Let Us Sum Up
6.8 Answers/Hints to Check Your Progress Exercises
6.0 OBJECTIVES
After going through this Unit, you should be in a position to
interpret regression models passing through the origin;
explain the impact of changes in the unit of measurement of dependent and
independent variables on the estimates;
interpret parameters in semi-log and log-linear regression models; and
identify the correct functional form of a regression model.
6.1 INTRODUCTION
In the previous two Units we have discussed how a two variable regression
model can be estimated and how inferences can be drawn on the basis of the
estimated regression equation. In this context we discussed about the ordinary
least squares (OLS) method of estimation. Recall that the OLS estimators are the
best linear unbiased estimators (BLUE) in the sense that they are the best in the
class of linear regression models.
The two variable regression model has the function as follows:
𝑌 =𝛽 +𝛽 𝑋 +𝑢 … (6.1)
Prof. Kaustuva Barik, Indira Gandhi National Open University, New Delhi
Regression Models: where Y is the dependent variable and X is the independent variable. We added a
Tow Variables Case stochastic error term (𝑢 ) to the regression model. We cited three reasons for
inclusion of the error term in the regression model: (i) it takes care of the
excluded variables in the model, (ii) it incorporates the unpredictable human
nature into the model, and (iii) it absorbs the effects measurement error, incorrect
function form, etc.
We assumed that the regression model is correctly specified. All relevant
variables are included in the model. No irrelevant variable is included in the
regression model. In this Unit we will continue with the two variables case as in
the previous two units. We also continue with the same assumptions, as
mentioned in Unit 4.
Let us look into the regression model given at equation (6.1). We observe that the
regression model is linear in parameters. We do not have complex forms of the
parameters such as 𝛽 or 𝛽 𝛽 as parameters. Further, the regression model is
linear variables. We do not have 𝑋 or log 𝑋 as explanatory variable. Can we
have these sorts of variables in a regression model? How do we interpret the
regression model if such variables are there? We will extend the simple
regression model given in equation (6.1) and explain how the interpretation of the
model changes with the modifications.
76 This implies
−2 ∑𝑒 (𝑋 𝑌 − 𝑏 𝑋 ) = 0 Extension of Two Variable
Regression Models
∑𝑋 𝑌 − 𝑏 ∑𝑋 = 0
∑
𝑏 = … (6.6)
∑
The estimator given at (6.6) is unbiased. The variance of the estimator is given by
𝑣𝑎𝑟(𝑏 ) = … (6.7)
Let us compare the above estimator with the estimator for the regression model
𝑌 = 𝛽 + 𝛽 𝑋 + 𝑢 (see equation (4.18) in Unit 4)
𝑏 = … (6.8)
and
𝑣𝑎𝑟(𝑏 ) = … (6.9)
Note that in equation (6.6) the variables are not in deviation form. Thus when we
do not have an intercept in the regression model, the estimator of the slope
parameter is different from that of a regression model with intercept. Both the
estimators will be the same if and only if 𝑋 = 0.
We present a comparison between the regression model with intercept and
without intercept in Table 6.1.
Table 6.1: Features of Regression Model without Intercept
∑𝑥 𝑦 ∑𝑋 𝑌
𝑏 = 𝑏 =
∑𝑥 ∑𝑋
𝜎 𝜎
var(𝑏 ) = var(𝑏 ) =
∑𝑥 ∑𝑋
∑𝑒 ∑𝑒
σ = σ =
𝑛−2 𝑛−1
77
Regression Models:
Tow Variables Case
6.3 CHANGES IN MEASUREMENT UNITS
Suppose you are given time series data on GDP and total consumption
expenditure of India for 30 years. You are asked to run a regression model with
consumption expenditure as dependent variable and income as the independent
variable. The objective is to estimate the aggregate consumption function of
India. Suppose you took GDP and Consumption Expenditure in Rs. Crore. The
estimated regression equation you found is
𝑌 = 237 + 0.65𝑋 … (6.11)
When you presented the results before your seniors, they pointed out that the
measure of GDP and consumption expenditure should have been in Rs. Million,
so that it is comprehensible outside India also. If you re-estimate the results by
converting the variables, will estimates be the same? Or, do you expect some
changes in the estimates? Let us discuss the issue in details.
Suppose we transform both the dependent and independent variables as follows:
𝑌 ∗ = 𝑤 𝑌 and 𝑋 ∗ = 𝑤 𝑋 … (6.12)
The regression model (6.1) can be transformed as follows:
𝑌∗ = 𝛽 + 𝛽 𝑋∗ + 𝑢 … (6.13)
Estimation of equation (6.13) by OLS method gives us the following estimators
𝑏∗ = 𝑌 ∗ − 𝑏∗ 𝑋∗ … (6.14)
∑ ∗ ∗
𝑏∗ = ∑ ∗
…
(6.15)
In a similar manner you can find out the variance of 𝑏∗ and 𝑏∗ , and the estimator
of the error variance.
From equation (6.15) we can find out that
𝑏∗ = 𝑏 … (6.16)
and
𝑏∗ = 𝑤 𝑏 … (6.17)
Now let us look into the implications of the above.
(i) Let us begin with the dependent variable, 𝑌 . Suppose 𝑌 is doubled
(𝑤 = 2) and 𝑋 is unchanged (𝑤 = 1). What will happen to 𝑏 and
𝑏 ? Substitute the values of 𝑤 and 𝑤 in equations (6.16) and
(6.17). We find that both the estimates are doubled. Thus, if the
dependent variable is multiplied by a constant c, then all OLS
coefficients will be multipled by c.
(ii) Now let us take the case of the independent variable. Suppose 𝑋 is
doubled (𝑤 = 2) and 𝑌 is unchanged (𝑤 = 1). On substitution of
the values of 𝑤 and 𝑤 in equations (6.16) and (6.17) we find that
78
the slope coefficient (𝑏 ) is halved, but the intercept (𝑏 ) remains Extension of Two Variable
Regression Models
unchanged.
(iii) If we double both the variables 𝑋 and 𝑌 , then the slope coefficient
(𝑏 )will remain unchanged, but the intercept will change. Remember
that the intercept is changed by a change in the scale of measurement
of the dependent variable.
Now the question arises: Will there be a change in the t-ratio and F-value of
the model? No, the t and F statistics are not affected by a change in the scale
of measurement of any variable.
Check Your Progress 1
1) Under what condition should we run a regression through the origin?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
2) What are the implications of a regression model through origin?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
3) What are the implications on the estimates if there is a change in the
measurement scale of the explanatory variable?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
4) What are the implications on the estimates if there is a change in the
measurement scale of the dependent variable?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
79
Regression Models:
Tow Variables Case
6.4 SEMI-LOG MODELS
In some of the cases the regression model is non-linear, but by taking logarithm
on both sides of the regression equation, we get a linear model. If a model is non-
linear , but becomes linear after transformation of its variables, then the model is
said to be intrinsically linear . Thus, semi-log and log-linear models are
intrinsically linear models. We discuss about the semi-log model in this section.
We will discuss about the log-linear model in the next section.
Let us begin with a functional form as follows:
𝑌 =𝑒 … (6.18)
This regression model, in its present form, is non-linear . Therefore, it cannot be
estimated by OLS method. However, if we take natural logs of both the sides, we
obtain
ln𝑌 = 𝛽 + 𝛽 𝑡 + 𝑢 … (6.19)
It transforms into a semi-log equation. It is called a semi-log model as one of the
variables is in log form.
If we take ln𝑌 = 𝑌 ∗ , then equation (6.19) can be written as
𝑌∗ = 𝛽 + 𝛽 𝑡 + 𝑢 … (6.20)
Estimation of equation (6.20) is simple. The equation is linear in parameters and
in variables. Thus, we can apply OLS method to estimate the parameters. The
implication of the regression model (6.20), however, is much different from the
regression model (6.1).
If we take the differentiation of the regression model 𝑌 = 𝛽 + 𝛽 𝑋 + 𝑢 , we
obtain
=𝛽 … (6.21)
Equation (6.21) shows that the slope of the regression equation is constant. An
implication of the above is that the absolute change in the dependent variable for
unit increase in the independent variable is constant throughout the sample. If
there is an increase in X by one unit, Y increases by 𝛽 unit.
Now let us consider the regression model ln𝑌 = 𝛽 + 𝛽 𝑡 + 𝑢 . If we take
differentiation of equation (6.19) we find that
=𝛽
which means
=𝛽 … (6.22)
80
For equation (6.19), we interpret the slope coefficient (𝛽 ) as follows: For every Extension of Two Variable
Regression Models
unit increase in X, there is 𝛽 per cent increase Y. Thus, for a semi-log model the
change in the dependent variable in terms of percentages. The semi-log model is
useful is estimating growth rates.
𝑌=𝛽 𝑋 … (6.23)
𝑌∗ = 𝛽 + 𝛽 𝑋∗ + 𝑢 … (6.24)
Using the logarithm of each of the variable in equation (6.23), we get the
following transformed equation:
= ... (6.27)
= 𝛽
Or,
=𝛽 ... (6.28)
A closer look at equation (6.28) shows that the slope parameter represents the
elasticity between Y and X.
This attractive feature of the log-linear model has made it popular in applied
work. The slope coefficient 𝛽 measures the elasticity of Y with respect to X ,
that is, the percentage change in Y for one per cent change in X . Thus, if Y
represents the quantity of a commodity demanded and X its unit price, then 𝛽
measures the price elasticity of demand.
81
Regression Models:
Tow Variables Case
6.6 CHOICE OF FUNCTIONAL FORM
By you would have observed that the two variable regression model could have
three functional forms as given below.
(I) 𝑌 =𝛽 +𝛽 𝑋 +𝑢
(II) 𝑙𝑛𝑌 = 𝛽 + 𝛽 𝑋 + 𝑢
84
UNIT 7 MULTIPLE LINEAR REGRESSION
MODEL: ESTIMATION
Structure
7.0 Objectives
7.1 Introduction
7.2 Assumptions of Multiple Linear Regression Model
7.2.1 Interpretation of the Model
7.0 OBJECTIVES
After going through this unit, you will be able to:
specify the multiple regression model involving more than one explanatory
variable;
estimate the parameters of the multiple regression model by the OLS method
stating their properties;
interpret the results of an estimated multiple regression model;
indicate the advantage of using matrix notations in multiple regression
models;
explain the maximum likelihood method of estimation showing that the
‘maximum likelihood estimate (MLE)’ and the OLS estimate are
asymptotically similar;
derive the expression for the coefficient of determination (R2) for the case of
a simple multiple regression model with two explanatory variables; and
distinguish between R2 and adjusted R2 specifying why adjusted R2 is
preferred in practice.
7.1 INTRODUCTION
By now you are familiar with the simple regression model where there is one
dependent variable and one independent variable. The dependent variable is
explained by the independent variable. Now let us discuss about the multiple
regression model. In a multiple regression model, there is one dependent variable
Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
Multiple Regression and more than one independent variable. The simplest possible multiple
Models
regression model is a three-variable regression model, with one dependent
variable and two explanatory variables. Such a three-variable multiple regression
equation or model is expressed as follows:
𝑌 =𝛽 +𝛽 𝑋 +𝛽 𝑋 +𝑢 … (7.1)
Throughout this unit, we shall be mostly dealing with a multiple regression
model as specified in equation (7.1) above. Here, Y is the dependent variable and
X2 and X3 are independent variables. ui is the stochastic error term. The
interpretation of this error term is the same as in the simple regression model.
You may wonder as to why there is no X1 in equation (7.1). The answer is that X1
is implicitly taken as 1 for all observations. In the above equation, the parameter
𝛽 is the intercept term. We can think of Y, X2 and X3 as some variables from
economic theory. We may treat it as a demand function, where Y stands for
quantity demanded of a good, and X2 and X3 are price of that good and the
consumer’s income, respectively. As another example, we can think of a
production/demand function with two inputs. Here Y is the quantity produced or
demanded of a good, and X2 is the labour input, and X3 the capital input. You can
think of many similar examples.
86
(iii) The mean of the error terms is zero. In other words, the expected value of Multiple Linear Regression
Model: Estimation
the error term conditional upon the explanatory variables X2i and X3i is
zero. This means:
𝐸(𝑢 ) = 0 or 𝐸(𝑢 |𝑋 , 𝑋 ) = 0 … (7.3)
(iv) No autocorrelation: This assumption means that there is no serial
correlation or autocorrelation between the error terms of the individual
observations. This implies that the covariance between the error term
associated with the ith observation 𝑢 and that with the jth observation 𝑢 is
zero. In notations, this means:
cov 𝑢 , 𝑢 =0 … (7.4)
(v) Homoscedasticity: The assumption of homoscedasticity implies that the
error variance is constant for all observations. This means:
𝑣𝑎𝑟 (𝑢 ) = 𝜎 … (7.5)
(vi) No exact collinearity between the X variables. This is the new additional
assumption made for multiple regression models. This implies that there
is no exact linear relationship between X2 and X3 This is referred to as the
assumption of no perfect multicollinearity.
(vii) The number of observations n must be greater than the number of
parameters to be estimated. In other words, the number of observations n
must be greater than the number of explanatory variables k.
(viii) No specification bias: It is assumed that the model is correctly specified.
The assumption of no specification bias implies that there are no errors
involved while specifying the model. This means that both the errors of
including an irrelevant variable and not including a relevant variable are
taken care of while specifying the regression model.
(ix) There is no measurement error, i.e., X’s and Y are correctly measured.
7.2.1 Interpretation of the Model
In the multiple regression model as in equation (7.1), the intercept β1 measures
the expected value of the dependent variable Y, when the values of explanatory
variables X2 and X3 are zero. The other two parameters, β2 and β3, are the partial
regression coefficients. Let us know more about these coefficients. The
regression coefficients β2 and β3 are also known as the partial slope coefficients.
β2 measures the change in the mean value of Y [ i.e., E(Y)] per unit change in X2,
( )
holding the value of X3 constant. This means: 𝛽 = . It gives the ‘direct’ or
the ‘net’ effect of a unit change in X2 on the mean value of Y holding the effect of
X3 constant. Likewise, β3 measures the change in the mean value of Y, per unit
change in X3, holding the value of X2 constant. Like β2, β3 is given by: 𝛽 =
( )
. Thus, the slope coefficients of multiple regression measures the impact of
87
Multiple Regression one explanatory variable on the dependent variable keeping the effect of the other
Models
variables fixed.
Min ∑𝑒 = ∑ 𝑌 − 𝑌
= ∑(𝑌 − 𝑏 − 𝑏 𝑋 − 𝑏 𝑋 ) [since 𝑌 = 𝑏 + 𝑏 𝑋 + 𝑏 𝑋 ]
∑ ∑
We now consider the three first order conditions, i.e., = 0, = 0 and
∑
= 0. From these three partial derivatives, we obtain the estimators as:
(i) 𝑏 = 𝑌¯ − 𝑏 𝑋¯ − 𝑏 𝑋¯
(∑ . ) ∑ (∑ )(∑ )
(ii) 𝑏 = (∑ )
∑ ∑
(∑ . ) ∑ (∑ )(∑ )
(iii) 𝑏 = (∑ )
∑ ∑
The corresponding variances and standard errors of the parameters are given as:
¯ ∑ ¯ ∑ ¯ ¯ ∑
𝑉 (𝑏 ) = + (∑ )
𝜎
∑ ∑
𝑆𝐸(𝑏 ) = + 𝑉(𝑏 )
∑
𝑉(𝑏 ) = (∑ )
×𝜎
∑ ∑
𝑆𝐸 (𝑏 ) = + 𝑉 (𝑏 )
∑
𝑉 (𝑏 ) = (∑ )
×𝜎
∑ ∑
88
Multiple Linear Regression
𝑉 (𝑏 ) =
∑ Model: Estimation
𝑆𝐸(𝑏 ) = + 𝑉(𝑏 )
(i) 𝐶𝑂𝑉 (𝑏 , 𝑏 ) =
and the estimates of error variance and the partial correlation coefficients are
given by:
∑
𝜎 = = … (7.6)
(∑ )
𝑟 = … (7.7)
∑ ∑
Note that in the above expressions, lower case letters represent deviations from
the mean. We know that, since we are considering the ‘classical’ linear multiple
regression model, the OLS estimators of the intercept and the partial slope
coefficients satisfy the following properties:
b) The mean value of the estimated 𝑌𝚤 is equal to the mean value of actual
Yi, i.e., Yˆ Y . i
c) ∑𝑒 = 𝑒¯ = 0.
x x
2 2
g) In view of f) above, given the values of r23 and 2i or 3i
the
𝑛 𝑛 1 (𝑌 − 𝛽 − 𝛽 𝑋 − 𝛽 𝑋 )
𝑙𝑛𝐿 = − 𝑙𝑛𝜎 − 𝑙𝑛(2𝜋) − ∑
2 2 2 𝜎
……………………………………………………………………….
=− + ∑(𝑌 − 𝛽 − 𝛽 𝑋 −. . . −𝛽 𝑋 ) (k + 1)
Setting these equations to zero (i.e., applying the first-order condition for
optimization), and re-arranging terms, and denoting by 𝛽˜ , 𝛽˜ , . . . , 𝛽˜ and 𝜎˜ as
the ‘maximum likelihood estimates (MLEs)’, we get:
∑𝑌 = 𝑛𝛽˜ + 𝛽˜ ∑𝑋 + ⋯ + 𝛽˜ ∑𝑋
∑𝑌 𝑋 = 𝛽˜ + ∑𝑋 + 𝛽˜ ∑𝑋 +. . . +𝛽˜ ∑𝑋 𝑋
………………………………………………………
∑𝑌 𝑋 = 𝛽˜ ∑𝑋 + 𝛽˜ ∑𝑋 𝑋 +. . . +𝛽˜ ∑𝑋
90
The above equations are precisely the normal equations of the OLS method of Multiple Linear Regression
Model: Estimation
estimation. Therefore, the MLEs of the 𝛽˜ ′𝑠 are the same as the OLS estimates of
the 𝛽˜ ′𝑠. Thus, substituting the MLEs (or the OLS estimators) into the (K +1)st
equation above, and simplifying, we obtain the MLEs of 2 as
𝜎˜ = ∑ 𝑌 − 𝛽˜ − 𝛽˜ 𝑋 −. . . −𝛽˜ 𝑋
1
n
uˆ i2
You may note that this estimator differs from the OLS estimator 𝜎 = ∑𝑢 /(𝑛 −
~ 2 is a biased
𝑘). Since the latter is an unbiased estimator of 2 , the MLE of
estimator. However, you should note that, asymptotically, 𝜎˜ is also unbiased.
This means, asymptotically, the estimates of MLE and OLS are similar. Further,
the MLE estimator is biased but it is consistent.
For multiple regression models, the above algebraic expressions become
unwieldy. Hence, we can take recourse to matrix algebra (on which you have
studied in your earlier course BECC 104) to depict the multiple regression model.
For this, let:
1
⎡𝑋 ⎤
⎢ ⎥
⎢𝑋 ⎥
𝑋 =⎢ . ⎥ … (7.8)
⎢ . ⎥
⎢ . ⎥
⎣𝑋 ⎦
be the vector of values of the X variables for which we wish to predict Y the
mean prediction of Y. Now the estimated multiple regression equation in the
scalar form is:
^ = 𝛽^ + 𝛽^ 𝑋 + 𝛽^ 𝑋 +. . . +𝛽^ 𝑋 + 𝑢
𝑌 … (7.9)
^ = 𝑥 𝛽^
𝑌 … (7.10)
where 𝑥 = [1 𝑋 𝑋 . . . 𝑋 ] and
𝛽^
⎡ ⎤
⎢𝛽^. ⎥
𝛽^ = ⎢ . ⎥
⎢ ⎥
⎢ . ⎥
⎣𝛽^ ⎦
Equation (7.9) or (7.10) is the mean prediction of Yi corresponding to given 𝑥 .
Hence, if xi is as given in (7.8), (7.10) becomes
91
Multiple Regression ^ |𝑥
𝑌 = 𝑥 𝛽^ … (7.11)
Models
where, the values of x0 are specified. Note that (7. 11) gives an unbiased
prediction of 𝐸(𝑌 |𝑥 ), since 𝐸 𝑥 𝛽^ = 𝑥 𝛽^. The estimate of the variance of
^ |𝑥
𝑌 is given by:
^ |𝑥
𝑉𝑎𝑟 𝑌 = 𝜎 𝑥 (𝑋 𝑋 ) 𝑥 … (7.12)
where 2 is the variance of u i , x0 are the given values of the X variables for
which we wish to predict the future values, and ( X X ) is the matrix. In practice,
we replace 2 by its unbiased estimator ˆ 2 .
Check Your Progress 1 [answer the questions in 50-100 words within the given
space]
1) Specify the simplest form of a multiple regression model with examples.
Why is it the simplest?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
2) Enumerate the assumptions made for the CLRM in broad terms. What is
the additional assumption made for the multiple regression model?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
3) How are the estimated parameters of a multiple regression model
interpreted?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
4) Specify the satisfaction of the property which makes the OLS estimators
obey the Gauss Markov theorem?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
92
Multiple Linear Regression
Model: Estimation
7.5 COEFFICIENT OF DETERMINATION: R2
In multiple regression, a measure of goodness of fit is given by R2. This is also
called as the ‘coefficient of determination’. It is the ratio of the ‘explained sum of
squares’ to the ‘total sum of squares’. In other words, it is the proportion of total
variation in the dependent variable explained by the independent (or the
explanatory) variables included in the model. To derive R2, we consider the
sample regression function or equation as follows:
𝑌 =𝑏 +𝑏 𝑋 +𝑏 𝑋 +𝑒 … (7.13)
𝑌 = 𝑌¯ − 𝑏 𝑋¯ − 𝑏 𝑋¯ + 𝑏 𝑋 + 𝑏 𝑋 + 𝑒
Therefore, 𝑌 − 𝑌¯ = 𝑏 (𝑋 − 𝑋¯ ) + 𝑏 (𝑋 − 𝑋¯ ) + 𝑒
Rewriting the above in lower case, i.e., by considering in deviation from mean,
we get:
𝑦 =𝑏 𝑥 +𝑏 𝑥 +𝑒 … (7.14)
Yˆi b1 b2 X 2 i b3 X 3i
Y b1 b2 X 2 b3 X 3
Yˆi Y b2 X 2 i X 2 b3 X 3i X 3
Now, consider:
yi yˆ i ei
y 2
i yˆ i2 ei2 2 yˆ i ei
y i
2
yˆ i2 ei2 0 since Cov yˆ i , e 0 0
93
Multiple Regression
Models
y i2 yˆ i2 ei2 … (7.16)
ESS
It means TSS = ESS + RSS. Now, consider: R 2
TSS
where ESS = yˆ 2
i .
Now, e 2
i ei ei
= e y i i b2 x 2 i b3 x3i
ei y i b2 ei x 2i b3 ei x 3i
e 2
i ei y i [since e xi 2i ei x 3i 0
].
e 2
i y i ei y i y i b2 x 2 i b3 x 3i
ei2 y i2 b2 y i x 2i b3 y i x 3i
… (7.17)
y 2
i yˆ i2 y i2 b2 y i x 2i b3 yi x3i
yˆ i2 b2 y i x 2i b3 y i x3i = ESS
∑ ∑
Therefore, 𝑅 = = … (7.18)
∑
2 1
V bi
x 1 Ri2
2
y
R2 1
RSS
1
ei 2
TSS yi2
(n k )ˆ 2
1
(n 1) Sy 2
e or ˆ 2 2
e 2
i
ei2 n k ˆ 2
i
nk
Sy 2
y 2
i
yi2 ( n 1) Sy 2
94 n 1
7.6 ADJUSTED-R2 Multiple Linear Regression
Model: Estimation
In comparing two regression models with the same dependent variable but
differing number of X variables, one should be careful in choosing the model
with highest R2. In order to understand why this is important, consider:
∑
𝑅 = = =
∑
Note that as the number of explanatory variables increase, the numerator ESS
keeps on increasing. In other words, R2 increases as k, the number of independent
variables increase. The above expression for R2 implies that R2 does not give any
weightage to the number of independent variables in the model. Due to this
reason, for comparison of two regressions with differing number of explanatory
variables, we should not use R2. We now need an alternative coefficient of
determination which takes into account the number of parameters estimated, i.e.,
k. For this, we consider the following measure called the adjusted R2 defined as
follows.
/
𝑅¯ = 1 −
/
∑ ⁄( )
=1− ⁄( )
∑
where k is the number of parameters in the model including the intercept term.
The above is same as saying:
1 ˆ 2
R2
S 2y
𝑅 = 1 − (1 − 𝑅 ) … (7.19)
If R 2 1, R 2 1.
95
Multiple Regression Thus, adjusted R2 can be negative. In such cases, it is conventional to take the
Models 2
value of R as zero. Thus, a conclusive opinion on which of the two is superior
to indicate the goodness of fit of a regression model is not possible. However, in
2
practice, in multiple regression models, adjusted R is used to decide for the
goodness of fit of the model for the reason that it takes into account the number
of regressors and thereby the number of parameters estimated.
Check Your Progress 2 [answer the questions in 50-100 words within the given
space]
1) Distinguish between the OLS estimate and the MLE.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
..............................................................................................................
2) How is R2 defined? Indicate with suitable expressions.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
..............................................................................................................
3) State the importance of adjusted-R2 as compared to R2.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
4) How are R2 and adjusted-R2 related? What is the difference between the
two?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
5) How is the situation of adjusted-R2 being negative dealt with in practice?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
96 .......................................................................................................................
Multiple Linear Regression
Model: Estimation
7.7 LET US SUM UP
This unit has described the multiple regression model and its inferences.
Recapitulating the assumptions of the multiple classical regression model, the
unit indicates how an additional assumption on multicollinearity is necessary in
multiple regression models. The interpretation of parameters, i.e., the intercept
and the partial slope coefficient are explained. The unit has first discussed the
estimation of parameters of the multiple regression model by the OLS (ordinary
least squares) method. An alternative method, namely the method of maximum
likelihood estimation (MLE) is introduced in the unit next. It is shown that
asymptotically the OLS and the MLE coincide. The concept of ‘coefficient of
determination’ or goodness of fit has been described. Finally, the need and the
use of adjusted R2 has been explained.
97
Multiple Regression 4) Under the assumptions of CLRM, the OLS estimators of partial regression
Models
coefficients are not only linear and unbiased but also have minimum
variances in the class of all unbiased estimators, i.e., they are BLUE (best
liner unbiased estimate). It is this property that makes the OLS estimates
satisfy the Gauss-Markov theorem.
1) Check Your Progress 2The OLS estimators are obtained by minimizing the
residual sum of squares, i.e., Min ∑𝑒 = ∑ 𝑌 − 𝑌 . The MLEs are
obtained by maximising the ‘likelihood function’ of the corresponding pdf.
There is thus a basic difference in the approach of the two methods. However,
once the first order conditions are applied and simplified, the equations that
we obtain in the MLE approach is same as the normal equations that we get
in the OLS method. Hence, the estimates for the parameters obtained by
solving those equations are the same. However, there is an essential
difference relating to the unbiased estimate of 𝜎 . The denominator of the
expression for this unbiased estimate in the OLS method is ‘n-k’ whereas in
the ML method it is ‘n’. This important difference makes the estimate of
2
98
UNIT 8 MULTIPLE LINEAR REGRESSION
MODEL: INFERENCES
Structure
8.0 Objectives
8.1 Introduction
8.2 Assumptions of Multiple Regression Models
8.2.1 Classical Assumptions
8.2.1 Test for Normality of the Error Term
8.3 Testing of Single Parameter
8.3.1 Test of Significance Approach
8.3.2 Confidence Interval Approach
8.4 Testing of Overall Significance
8.5 Test of Equality between Two Parameters
8.6 Test of Linear Restrictions on Parameters
8.6.1 The t-Test Approach
8.6.2 Restricted Least Squares
8.7 Structural Stability of a Model: Chow Test
8.8 Prediction
8.8.1 Mean Prediction
8.8.2 Individual Prediction
8.9 Let Us Sum Up
8.10 Answers/ Hints to Check Your Progress Exercises
8.0 OBJECTIVES
After going through this unit, you should be able to
explain the need for the assumption of normality in the case of multiple
regression;
describe the procedure of testing of hypothesis on individual estimators;
test the overall significance of a regression model;
test for the equality of two regression coefficients;
explain the procedure of applying the Chow test;
make prediction on the basis of multiple regression model;
interpret the results obtained from the testing of hypothesis, both individual
and joint; and
apply various tests such as likelihood ratio (LR), Wald (W) and Lagrange
Multiplier Test (LM).
Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
Multiple Regression
Models
8.1 INTRODUCTION
In the previous unit we discussed about the interpretation and estimation of
multiple regression models. We looked at the assumptions that are required for
the ordinary least squares (OLS) and maximum likelihood (ML) estimation. In
the present Unit we look at the methods of hypothesis testing in multiple
regression models.
Recall that in Unit 3 of this course we mentioned the procedure of hypothesis
testing. Further, in Unit 5 we explained the procedure of hypothesis testing in the
case of two variable regression models. Now let us extend the procedure of
hypothesis testing to multiple regression models. There could be two scenarios in
multiple regression models so far as hypothesis testing is concerned: (i) testing of
individual coefficients, and (ii) joint testing of some of the parameters. We
discuss the method of testing for structural stability of regression model by
applying the Chow test. Further, we discuss three important tests, viz.,
Likelihood Ratio test, Wald test, and Lagrange Multiplier test. Finally, we deal
with the issue of prediction on the basis of multiple regression equation.
One of the assumptions in hypothesis testing is that the error variable 𝑢 follows
normal distribution. Is there a method to test for the normality of a variable? We
will discuss this issue also. However, let us begin with an overview of the basic
assumptions of multiple regression models.
where
n = sample size
101
Multiple Regression S = measure of skewness ( )
Models
K = measure of kurtosis ( )
102
8.3 TESTING OF SINGLE PARAMETER Multiple Linear Regression
Model: Inferences
There are two approaches to hypothesis testing: (i) test of significance approach,
and (ii) confidence interval approach. We discuss both the approaches below.
8.3.1 Test of Significance Approach
In this approach we proceed as follows:
(i) Take the point estimate of the parameter that we want test, viz., b1, or
b2 or b3.
(ii) Set the null hypothesis. Suppose we expect that variable 𝑋 has no
influence on Y. It implies that 𝛽 should be zero. Thus, null hypothesis
is 𝐻 : 𝛽 = 0. In this case what should be alternative hypothesis? The
alternative hypothesis is 𝐻 : 𝛽 ≠ 0.
(iii) If 𝛽 ≠ 0, then 𝛽 could be either positive or negative. Thus we have
to apply two-tail test. Accordingly, the critical value of the t-ratio has
to be decided.
(iv) Let us consider another scenario. Suppose we expect that 𝛽 should be
positive. It implies that our null hypothesis is 𝐻 : 𝛽 > 0 . The
alternative hypothesis is 𝐻 : 𝛽 ≤ 0.
(v) If 𝛽 > 0, then 𝛽 could be either zero or negative. Thus the critical
region or rejection region lies on one side of the t probability curve.
Therefore, we have to apply one-tail test. Accordingly the critical
value of t-ratio is to be decided.
(vi) Remember that the null hypothesis depends on economic theory or
logic. Therefore, you have to set the null hypothesis according to
some logic. If you expect that the explanatory variable should have no
effect on the dependent variable, then set the parameter as zero in the
null hypothesis.
(vii) Decide on the level of significance. It represents extent of error you
want to tolerate. If the level of significance is 5 per cent (α = 0.05),
103
Multiple Regression your decision on the null hypothesis will go be wrong 5 per cent
Models
times. If you take 1 per cent level of significance (α = 0.01), then your
decision on the null hypothesis will be wrong 1 per cent times (i.e., it
will be correct 99 per cent times).
(viii) Compute the t-ratio. Here the standard error is the positive square root
of the variance of the estimator. The formula for the variance of the
OLS estimators in multiple regression models is given in Unit 7.
𝑡= … (8.5)
( )
(ix) Compare the computed value of the t-ratio with the tabulated value of
the t-ratio. Be careful about the two issues while reading the t-table:
(i) level of significance, and (ii) degree of freedom. Level of
significance we have mentioned above. Degree of freedom is (n–k), as
you know from the previous Unit.
(x) If the computed value of t-ratio is greater than the tabulated value of t-
ratio, reject the null hypothesis. If computed value of t-ratio is less
than the tabulated value of t-ratio, do not reject the null hypothesis
and accept the alternative null hypothesis.
8.3.2 Confidence Interval Approach
We have discussed about interval estimation in Unit 3 and Unit 5. Thus, here we
bring out the essential points only.
(i) Remember that confidence interval (CI) is created individually for
each parameter. There cannot be a single confidence interval for a
group of parameters.
(ii) Confidence interval is build on the basis of the logic described above
in the test of significance approach.
(iii) Suppose we have the null hypothesis 𝐻 : 𝛽 = 0 and the alternative
hypothesis is 𝐻 : 𝛽 ≠ 0. The estimator of 𝛽 is 𝑏 . We know the
standard error of 𝑏 .
(iv) Here also we decide on the level of significance (α). We refer to the t-
table and find out the t-ratio for desired level of significance.
(v) The degree of freedom is known to us, i.e., (n–k).
(vi) Since the above is case of two-tailed test, we take 𝛼⁄2 on each side of
the t probability curve. Therefore, we take the t-ratio corresponding to
the probability 𝛼⁄2 and the degrees of freedom applicable.
(vii) Remember that confidence interval is created with the help of the
estimator and its standard error. We test whether the parameter lies
within the confidence interval or not.
(viii) Construct the confidence interval as follows:
104
𝑏 −𝑡 ⁄ 𝑆𝐸 (𝑏 ) ≤ 𝛽 ≤ 𝑏 + 𝑡 ⁄ 𝑆𝐸 (𝑏 ) . … (8.6) Multiple Linear Regression
Model: Inferences
(ix) The probability of the parameter remaining in the confidence interval
is (1 − 𝛼). If we have taken the confidence interval as 5 per cent, then
the probability that 𝛽 will remain in the confidence interval is 95 per
cent.
𝑃 𝑏 −𝑡 ⁄ 𝑆𝐸 (𝑏 ) ≤ 𝛽 ≤ 𝑏 + 𝑡 ⁄ 𝑆𝐸 (𝑏 ) = (1 − 𝛼) … (8.7)
(x) If the parameter (in this case, 𝛽 ) remains in the confidence interval,
do not reject the null hypothesis.
(xi) If the parameter does not remain within the confidence interval, reject
the null hypothesis, and accept the alternative null hypothesis.
Check Your Progress 2
1) Describe the steps you would follow in testing the hypothesis that 𝛽 < 0.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
........................................................................................................................
105
Multiple Regression (iii) Decide on the level of significance. It has the same connotation as in
Models
the case of t-test described above.
(iv) For multiple regression model the F-statistic is given by
/( )
𝐹= ( )
… (8.10)
𝐻 :𝛽 = 𝛽 = ⋯ = 𝛽 = 0 … (8.25)
(iv) The corresponding alternative hypothesis will be that the 𝛽s are not
zero.
(v) Estimate the unrestricted regression model given at (8.11). Obtain the
residual sum of squares (RRS) on the basis of the estimated regression
equation. Denote it as RSSUR.
(ix) Find out the computed value of F on the basis of equation (8.10).
Compare it with the tabulated value of F (given at the end of the
book). Read the tabulated F value for desired level of significance and
applicable degrees of freedom.
(x) If the computed value of F is greater than the tabulated value, then
reject the null hypothesis.
(xi) If the computed value is less than the tabulated value, do not reject the
null hypothesis.
As mentioned earlier, the residual sum of squares (RSS) and the coefficient of
determination (𝑅 ) are related. Therefore, it is possible to carry out the F-test on
the basis of 𝑅 also. If we have the coefficient of determination for the
unrestricted model (𝑅 ) and the coefficient of determination for the restricted
model (𝑅 ), then we can test the joint hypothesis about the set of parameters.
The F-statistic will be
/
𝐹= … (8.27)
/( )
110
8.7 STRUCTURAL STABILITY OF A MODEL: Multiple Linear Regression
Model: Inferences
CHOW TEST
Many times we come across situations where there is a change in the pattern of
data. The dependent and independent variables may not remain the same
throughout the sample. For example, saving behaviour of poor and rich
households may be different. The production of an industry may be different after
a policy change. In such situations it may not be appropriate to run a single
regression for the entire dataset. There is a need to check for structural stability of
the econometric model.
There are various procedures to bring in structural breaks in a regression model.
We will discuss about the dummy variable cases in unit 9. In this Unit we discuss
a very simple and specific case.
Suppose we have data on n observations. We suspect that the first 𝑛
observations are different from the remaining 𝑛 observations (we have 𝑛 +
𝑛 = 𝑛). In this case run the following three regression equations:
𝑌 =𝜆 +𝜆 𝑋 +𝑢 (number of observations: 𝑛 ) … (8.28)
𝑌 =𝑟 +𝑟 𝑋 +𝑣 (number of observations: 𝑛 ) … (8.29)
𝑌 = 𝛼 + 𝛼 𝑋 + 𝑤 (number of observations: 𝑛 = 𝑛 + 𝑛 ) … (8.30)
If both the sub-samples are the same, then we should have 𝜆 = 𝑟 = 𝛼
and 𝜆 = 𝑟 = 𝛼 . If both the sub-samples are different then there will be a
structural break in the sample. It implies the parameters of equations (8.28) and
(8.29) are different. In order to test for the structural stability of the regression
model we apply Chow test.
We process as follows:
(i) Run the regression model (8.28). Obtain residual sum of squares RSS1.
(ii) Run regression model (8.29). Obtain residual sum of squares RSS2.
(iii) Run regression model (8.30). Obtain residual sum of squares RSS3.
(iv) In regression model (8.30) we are forcing the model to have the same
parameters in both the sub-samples. Therefore, let us call the residual
sum of squares obtained from this model RSSR.
(v) Since regression models given at (8.28) and (8.29) are independent, let
us call this the unrestricted model. Therefore, 𝑅𝑆𝑆 = 𝑅𝑆𝑆 + 𝑅𝑆𝑆
(vi) Suppose both the sub-samples are the same. In that case there should not
be any difference between 𝑅𝑆𝑆 and 𝑅𝑆𝑆 . Our null hypothesis in that
case is H0: There is not structural change (or, there is parameter
stability).
(vii) Test the above by the following test statistic:
111
Multiple Regression )/
𝐹= … (8.31)
Models /
𝑌 𝑋′ = 𝑋′ 𝛽 ... (8.37)
where the values of x0 are fixed. You should note that (8.36) gives an unbiased
prediction of 𝐸 𝑌 𝑋 ′ , since 𝐸 𝑋 ′ 𝛽 = 𝑋 ′ 𝛽 .
where var(𝑌 |𝑋 ) stands for E Y0 Ŷ0 | X 2 . In practice we replace 2 by its
unbiased estimator ̂2 .
113
Multiple Regression Check Your Progress 4
Models
1) Consider a Cobb-Douglas production. Write down the steps of testing the
hypothesis that it exhibits constant returns to scale.
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
3) Point out why individual prediction has higher variance than mean
prediction.
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
115
UNIT 9 EXTENSION OF REGRESSION
MODELS: DUMMY VARIABLE
CASES
Structure
9.0 Objectives
9.1 Introduction
9.2 The Case of Single Dummy: ANOVA Model
9.3 Analysis of Covariance (ANCOVA) Model
9.4 Comparison between Two Regression Models
9.5 Multiple Dummies and Interactive Dummies
9.6 Let Us Sum Up
9.7 Answers/Hints to Check Your Progress Exercises
9.0 OBJECTIVES
After reading this unit, you will be able to:
define a qualitative or dummy variable;
discuss the ANOVA model with a single dummy as exogenous variable;
specify an ANCOVA model with one quantitative and one dummy
variable;
interpret the results of dummy variable regression models;
differentiate between ‘differential intercept coefficient’ and ‘differential
slope coefficient;
describe the concepts of ‘concurrent, dissimilar and parallel’ regression
models that you encounter while considering ‘differential slope dummies’;
and
explain how more than two dummies and interactive dummies can be
formulated into a regression model.
9.1 INTRODUCTION
In real life situations, some variables are qualitative. Examples are gender,
choices, nationality, etc. Such variables may be dichotomous or binary, i.e., with
responses limited to two such as in ‘yes’ or ‘no’ situations. Or they may have
more than two categorical responses. We need methods to include such variables
in the regression model. In this unit, we consider some such cases. We limit this
unit to consider regressions in which the dependent variable is quantified. You
may note in passing that when the dependent variable itself is a dummy variable,
we have to deal with them by models such as Probit or Logit. In such models, the
Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi and Prof. B S
Prakash, Indira Gandhi National Open University, New Delhi
OLS method of estimation does not apply. In this unit, we will not consider such Extension of Regression
cases. You will study about them in the course ‘BECE 142: Applied Models: Dummy
Econometrics’. Variable Cases
In this unit, we consider only such cases in which the independent variable is a
dummy variable. Qualitative variables are not straightaway quantified. By
treating them as dummy variables we can make them quantified (or categorical).
For instance, consider variables such as male or female, employed or
unemployed, etc. These are quantifiable in the sense that by treating them as 1 if
‘female, and 0 if ‘male’. Similar examples could be 1 if yes and 0 if no; 1 if
employed and 0 if unemployed, etc. In the above, we have converted a qualitative
response into quantitative form. Thus, the qualitative variable is now quantified.
Such regressions could be a simple regression, i.e., there is only one independent
variable which is qualitative and treated as dummy variable. Or there could be
two independent variables, one of which can be treated as dummy and the other
is its covariant, i.e., there is a close relationship with the variable treated as
dummy. For instance, pre-tax income of persons can be classified above a
threshold level and treated as dummy variable, i.e., above or below the threshold
level income with response taken as 1 or 0. Now, the post-tax income, which is a
co-variant of pre-tax income, can be considered by its actual quantified value.
There could be similar extension of situations where you have to consider
multiple dummies and cases where you have to consider interactive dummies.
The nature of such regressions, particularly for their inference or interpretational
interest, is what we consider in the present unit.
118
Table 9.2: Food Expenditure in Relation to Income and Gender Extension of Regression
Models: Dummy
Observation Food Expenditure Income ($) Gender Variable Cases
($)
1 1983 11557 1
2 2987 29387 1
3 2993 31463 1
4 3156 29554 1
5 2706 25137 1
6 2217 14952 1
7 2230 11589 0
8 3757 33328 0
9 3821 36151 0
10 3291 35448 0
11 3429 32988 0
12 2533 20437 0
119
Multiple Regression 𝑌= 2673.667 + 503.1667 Di
Models
se = (233.0446) (329.5749)
t= (11.4227) (–1.5267) R2 = 0.1890
Thus, we notice that the mean food consumption expenditures of the two genders
have remained the same. The R2 value is also the same. The absolute value of the
dummy variable coefficient and their standard errors are also the same. The only
change is in the numerical value of the intercept term and its t value.
Another question that we may get is: since we have two categories, male and
female, can we assign two dummies to them? This means we consider the model
as:
Yi = β1 + β2 D2i + β3Di + ui … (9.4)
where Y is expenditure on food, D2 = 1 for female and 0 for male and D3 = 1 for
male and 0 for female. Essentially, we are trying to see whether we can assign
two dummies for male and female separately? The answer is ‘no’. To know the
reason for this, consider the data for a sample of two females and three males, for
which the data matrix is as in Table 9.3. We see that D2 = 1 – D3 or D3 = 1 – D2.
This is a situation of perfect collinearity. Hence, we must always use only one
dummy variable if a qualitative variable has two categories, such as the gender
here.
Table 9.3: Data Matrix for the Equation
Gender Intercept D2 D3
Male Y1 1 0 1
Male Y2 1 0 1
Female Y3 1 1 0
Male Y4 1 0 1
Female Y5 1 1 0
A more general rule is: if a model has the common intercept β1, and the
qualitative variable has m categories, then we must introduce only (m – 1)
dummy variables. If we do not do this, we get into a problem of estimation called
as the ‘dummy variable trap’. Finally, note that when we have a simple
regression model with only one dummy variable as considered here, the model
considered is also called as the ANOVA model. This is because there is no
second variable from which we are seeking to know the impact or variability on
the dependent variable. When we have this, we get what we call as an ANCOVA
model. We take up such a case in the next section.
120
9.3 ANALYSIS OF COVARIANCE (ANCOVA) Extension of Regression
Models: Dummy
MODEL Variable Cases
Male
Food Expenditure
Female
After-tax expenditure
122
3) What happens if the base value is reassigned for the dummy variable, say Extension of Regression
gender, in a simple regression model as in equation (9.1)? Models: Dummy
Variable Cases
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
7) Specify the general form of an ANCOVA model with one qualitative and one
quantitative variable. What does the slope oefficient for the quantitative
variable considerd indicate in general?
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
123
Multiple Regression
Models
9.4 COMPARISON BETWEEN TWO REGRESSION
MODELS
In the example considered above, i.e., for both the ANOVA and the ANCOVA
models, we saw that the slope coefficients were same but the intercepts were
different. This raises the question on whether the slopes too could be different?
How do we formulate the model if our interest is to test for the difference in the
slope coefficients too? In order to capture this, we introduce a ‘slope drifter’. For
the example of consumption expenditure for male or female considered above, let
us now proceed to compare the difference in the consumption expenditure by
gender by specifying the model with dummies as follows:
𝑌 = 𝛽 + 𝛽 𝐷 + 𝛽 𝑋 + 𝛽 (𝐷 𝑋 ) + 𝑢 … (9.6)
{since Di = 0}
{since Di = 1}
In equation (9.8), (β1 + β2) gives the mean value of Y for the category that
receives the dummy value of 1 when X is zero. And, (β3 + β4) gives the slope co-
efficient of the income variable for the category that receives the dummy value of
1. Note that the introduction of the dummy variable in the ‘additive form’ enables
us to distinguish between the intercept terms of the two groups. Likewise, the
introduction of the dummy variable in the interactive (or multiplicative) form
(i.e., 𝐷 𝑋 ) enables us to differentiate between the slope coefficients (or terms) of
the two groups. Depending on the statistical significance of the differential
intercept coefficient, β2, and the differential slope coefficient, β4, we can infer
whether the female and male food expenditure functions differ in their intercept
values, or their slope values, or both. There can be four possibilities as shown in
Fig. 9.2. Fig. 9.2 (a) shows that there is no difference in intercept or the slope
coefficient of the two food expenditure regressions. Such regression equations
are called ‘Coincident Regressions’.
124
Extension of Regression
Models: Dummy
Y Y Variable Cases
X X
0 0
(a) Coincident Regressions (b) Parallel Regressions
X
X
0
0
(c) Concurrent Regressions (d) Dissimilar Regressions
125
Multiple Regression where Y is income, X is education measured in number of years of schooling, D2
Models
is gender (0 if male, 1 if female), D3 is if in reserved segment or group (e.g.
SC/ST/OBC) taking the value 0 if ‘not in reserved segment’, i.e., in general
segment and 1 if ‘in reserved segment’. Here, gender (D2) and reservation (D3)
are qualitative variables and X is quantitative variable. In this formulation (for
example, equation 9.7) we have made an implicit assumption that the differential
effect of gender is constant across the two segments of reservation. We have
likewise assumed that the differential effect of reservation is constant across the
two genders. This means if the average income is higher for males than for
females, it is so whether the person is in the general segment or in the reservation
segment. Likewise, it is assumed here that if the average income is different
between the two reservation segments, it is so irrespective of gender. However, in
many cases, such assumptions may not be tenable. This means, there could be
interaction between gender and reservation dummies. In other words, their effect
on average income may not be simply additive as in (9.7) but could be
multiplicative. If we wish to consider for this interactive effect, we must specify
the model as follows:
Yi = β1 + β2D2i + β3D3i + β4(D2i D3i) +β5Xi + ui … (9.8)
In equation (9.8), the dummy variable D2iD3i is called as ‘interactive or
interaction dummy’. It represents the joint or simultaneous effect of two
qualitative variables. Taking expectation on both sides of equation (9.8), i.e., by
considering the average effect on income across gender and reservation, we get:
E (Yi │ D2i =1, D3i = 1, Xi) = β1 + β2 + β3 + β4 + β5Xi … (9.9)
Equation (9.9) is the average income function for female reserved category
workers where β2 is the differential effect of being female, β3 is the differential
effect of being in the reserved segment and β4 is the interactive effect of being
both a female and in reserved segment. Depending on the statistical significance
of various dummies, we need to make relevant inferences. The specification can
easily be generalized for more than one quantitative variable and more than two
qualitative variables.
Check Your Progress 2 [answer questions within the given space in about 90-
100 words]
1) What is meant by a ‘slope drifter’? When is it introduced and for what use?
Specify a general model with such a ‘slope drifter’ and comment on the
additional variable introduced.
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
126
2) Differentiate between the four type of regressions that we might get when Extension of Regression
considering a model of the type in equation (9.6) with two slope drifters 𝛽 Models: Dummy
Variable Cases
and 𝛽 as therein.
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
127
Multiple Regression variable and a case where we might be interested in examining for the interactive
Models
effect of the two qualitative variables. For this, we considered models such as Yi
= β1 + β2D2i + β3D3i + β4(D2i D3i) +β5Xi + ui.
128
In other words, regression models in which some independent variables are Extension of Regression
qualitative and some others are quantitative, are called as ANCOVA models. Models: Dummy
Variable Cases
6) The advantage is that ANCOVA models provide a method of statistically
controlling the effects of covariates. The consequence of excluding a
covariant from being included in the model is that the model suffers from
‘specification error’. The consequence of committing specification errors are
that the ideal assumptions required for the OLS estimators to be efficient are
violated. Consequently, they lose out on their efficiency properties.
10.0 OBJECTIVES
After going through this unit, you should be able to
explain the concept of multicollinearity in a regression model;
comprehend the difference between the near and perfect multicollinearity;
describe the consequences of multicollinearity;
1
explain how multicollinearity can be detected; and
describe the remedial measures of multicollinearity; and
explain the concept of ridge regression.
10.1 INTRODUCTION
The classical linear regression model assumes that there is no perfect
multicollinearity. Multicollinearity means the presence of high correlation
between two or more explanatory variables in a multiple regression model.
Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
Absence of multicollinearity implies that there is no exact linear relationship Multicollinearity
among the explanatory variables. The assumption of no perfect multicollinearity
is very crucial to a regression model since the presence of perfect
multicollinearity has serious consequences on the regression model. We will
discuss about the consequences, detection methods, and remedial measures for
multicollinearity in this Unit.
Let us consider the same demand function of good Y. In this case we however
assume that there is imperfect multicollinearity between the explanatory variables
(in order to distinguish it from the earlier case, we have changed the parameter
notations). The following is the population regression function:
𝑌 =𝐵 +𝐵 𝑋 +𝐵 𝑋 +𝑢 ….(10.5)
Equation (10.5) refers to the case when two or more explanatory variables are not
exactly linear. For the above regression model, we may obtain an estimated
regression equation as follows:
Since the explanatory variables are not exactly related, we can find estimates for
the parameters. In this case, regression can be estimated unlike the first case of
prefect multicollinearity. It does not mean that there is no problem with our
estimators if there is imperfect multicollinearity. We discuss the consequences of
132 multicollinearity in the next section.
Check Your Progress 1 Multicollinearity
Since the values of standard errors have increased the interval reflected in
expression in (10.7) has widened.
(d) Insignificant t ratios: As pointed out above, standard errors of the
estimators increase due to multicollinearity. The t-ratio is given as
= . Therefore, the t-ratio is very small. Thus we tend to accept (or
( )
do not reject) the null hypothesis and tend to conclude that the variable
has no effect on the dependent variable.
(e) A high 𝑅 and few significant t-ratios: In equation (10.6) we notice that
the 𝑅 is very high, about 98% or 0.98. The t-ratios of both the
explanatory variables are not statistically significant. Only the price
variable slope coefficient has significant t-value. However, using F-test
while testing overall significance 𝐻 : 𝑅 = 0, we reject the null
hypotheses. Thus there is some discrepancy between the results of the F-
test and the t-test.
(f) The OLS estimators are mainly partial slope coefficients and their
standard errors become very sensitive to small changes in the data. If
there is a small change in data, the regression results change substantially.
(g) Wrong signs of regression coefficients: It is a very prominent impact of
the presence of multicollinearity. In the case of the example given at
equation (10.6) we find that the coefficient of the variable income is
negative. The income variable has a ‘wrong’ sign as economic theory
suggests that income effect is positive unless the commodity concerned is
an inferior good.
𝑌 =𝛽 +𝛽 𝑋 +𝛽 𝑋 +𝛽 𝑋 +𝑢 … (10.8)
Suppose the explanatory variables are perfectly correlated with each other
as shown in equation (10.9) below
𝑋 =𝜆 𝑋 +𝜆 𝑋 … (10.9)
X4 is an exact linear combination of X2 and X 3
𝑅 . = ... (10.10)
Suppose, r42 0.5, r43 0.5, r23 0.5. If we substitute these values in
equation (10.10), we find that 𝑅 . = 1. An implication of the above is
that all the correlation coefficients (among explanatory variables) are not
very high but still there is perfect multicollinearity.
where VIF =
Note that as 𝑅 increases the VIF also increases. This inflates the variance
and hence standard errors of b 2 and b3
if 𝑅 = 1, 𝑉𝐼𝐹 = ∞ ⇒ 𝑉 (𝑏 ) → ∞, 𝑉 (𝑏 ) → ∞
Note that 𝑣𝑎𝑟 (𝑏 ) depends not only on 𝑅 , but also on 2 and x 22i . It
is possible that R i2 is high (say, 0.91) but 𝑣𝑎𝑟(𝑏 ) could be lower due to
low 2 or high x 22i . Thus Vb2 is still lower resulting in high t value.
Thus 𝑅 obtained from auxiliary regression is only a superficial indicator
of multicollinearity.
136
Check Your Progress 2 Multicollinearity
On the other hand, if the objective of the study is not only prediction but also
reliable estimations of the individual parameters of the chosen model then serious
collinearity may be bad, since multicollinearity results in large standard errors of
estimators and therefore widens confidence interval. Thus, resulting in accepting
null hypotheses in most cases. If the objective of the study is to estimate a group 137
Treatment of Violations of coefficients (i.e., sum or difference of two coefficients) then this is possible
of Assumptions
even in presence of multicollinearity. In such a case multicollinearity may not be
a problem.
𝑌 =𝐶 +𝐶 𝑋 +𝑢 …(10.13)
C1 A1 300A 3 , C 2 A 2 2A 3
var (𝑏 ) =
∑
If the cost curves are U-shaped Average Marginal cost curves then the theory
suggests that the coefficient should satisfy following
1) 1 , 2 and 4 0
2) 3 0
3) 32 3 2 4
140
10.7 ANSWERS/ HINTS TO CHECK YOUR Multicollinearity
PORGRESS EXERCISES
Check Your Progress 1
1) The case of perfect multicollinearity mainly reflects the situation when
the explanatory variables and perfectly correlated with each other
implying the coefficient of correlation between the explanatory variables
is 1.
2) This refers to the case when two or more explanatory variables are not
exactly linear this reinforces the fact that collinearity can be high but not
perfect. “High collinearity” refers to the case of “near” or imperfect” or
high multicollinearity. Presence of multicollinearity implies “imperfect
multicollinearity’’
3) In the case of perfect multicollinearity it is not possible to obtain
estimators for the parameters of the regression model. See Section 10.2
for details.
Check Your Progress 2
1) (i) In case of imperfect multicollinearity, some of the estimators are
statistically not significant. But OLS estimates still retain their BLUE
property that is, Best Linear Unbiased Estimators. Therefore, imperfect
multicollinearity does not violate any of the assumptions, OLS estimators
retain BLUE property. Being BLUE with minimum variance does not
imply that the numerical value of variance will be small.
(ii) The R 2 value is very high but very few estimators are significant (t-ratios
low). The example mentioned in earlier section where the demand
function of good Y we computed using the earnings of individuals,
reflects the situation where R 2 is quite high about 98% or 0.98 but only
price variable slope coefficient has significant t-value. However, using F-
test while testing overall significance H 0 : R 2 0, we reject the
hypotheses that both prices and earnings have no effect on the demand of
Y.
(iii) The ordinary least square OLS estimators mainly partial slope coefficients
and their standard errors become very sensitive to small changes in the
data, i.e. they then to be rentable. A small charge of data, the regression
results change quite substantially as in case example of near or imperfect
multicollinearity mentioned above, the standard errors go down and t-
ratios have increased in absolute values.
(iv) Wrong signs of regression coefficients. It is a very prominent impact of
presence of multicollinearity. In case of example where earnings of
individuals were used in deriving demand curve of good Y, the earning
141
Treatment of Violations variable has the ‘wrong’ sign for the economic theory since the income
of Assumptions
effect usually positive unless it is case of inferior good.
2) Examining partial correlations: In case of three explanatory variables
X 2 , X3 and X 4 very high or perfect multicollinearity between X 4 and
X 2 , X3 .
Subsidiary or auxiliary regressions: When one explanatory variables X is
regressed on each of the remaining X variable and the corresponding R 2
is computed. Each of these regressions is referred as subsidiary or
auxiliary regression. A regression Y on X 2 , X3 , X 4 , X5 , X 6 and X 7 with
six explanatory variables. If R 2 comes out to be very high but few
significant t-ratios or very few X coefficients are individually statistically
significant then the purpose is to identify the source of the
multicollinearity or existent of perfect or near perfect linear combination
of other X s .
2
var (b2 )
2
2
x 2i 1 R 2
2 1
.
2 2
X 2i 1 R 2
1 2
VIF Vb2 V.I.F.
1 R2 2
x 2i
142 2
2 Multicollinearity
Similarly, Vb3 VIF
2
x3i
1
VIF is variance inflation factor. As R 2 increases VIF increased
1 R2
thus inflating the variance and hence standard errors of b2 and b3
2 2
If R 2 0, VIF 1 Vb2 and Vb 3
2 2
x 2i x 3i
No collinearity
If R 2 1, VIF V b 2 , V b3
143
UNIT 11 HETEROSCDASTICTY
Structure
11.0 Objectives
11.1 Heteroscedasticity
11.2 Heteroscedasticity: Definition
11.2.1 Homoscedasticity
11.2.2 Heteroscedasticity
11.0 OBJECTIVES
After going through this unit, you should be able to
explain the concept of heteroscedasticity in a regression model;
identify the consequences of heteroscedasticity in the regression model;
explain the methods of detection of heteroscedasticity;
describe the remedial measures for resolving heteroscedasticity;
show how the use of deflators can help in overcoming the consequences of
heteroscedasticity; and
identify the correct functional form of regression model so that
heteroscedasticity is avoided.
Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
Heteroscedasticity
11.1 INTRODUCTION
A crucial assumption of the Classical Linear Regression Model (CLRM) is that
the error term 𝑢 in population regression function (PRF) is homoscedastic. It
means that 𝑢 has the same variance 𝜎 throughout the population. An alternative
scenario arises where the variance of ui is 𝜎 . In other words, the error variance
varies from one observation to another. Such cases are referred to as cases of
heteroscedasticity.
0 X
Fig.11.1: Case of Homoscedasticity
𝐸(𝑢 ) = 𝜎 … (11.1)
We can alternatively express equation (11.1) as a case where:
𝑉(𝑢 ) = 𝜎 … (11.2)
In Fig. 11.1, we see a case of homoscedasticity where the variance of the error
term is a constant value, 𝜎 . This is expressed in the form of an equation as in
145
Treatment of Violations (11.2). Since the expected value of the error term is zero, the expression 𝑉(𝑢 ) =
of Assumptions
𝜎 can also be written as 𝐸(𝑢 ) = 𝜎 as in equation (11.1).
11.2.2 Heteroscedasticity
As PDI increases, the average level of savings increases. However, the variance
of savings does not remain the same at all the levels of PDI. This is the case of
heteroscedasticity or unequal variance. In other words, high-income people, on
average, save more than low-income people, but at the same time, there is more
variability in their savings. This can be graphically represented as in Fig. 11.2.
We now therefore have:
Savings
0 X
The following are the characteristics of the OLS model in the presence of
heteroscedasticity.
(i) The OLS estimators are linear function of the variables. The
regression equation is also linear in its parameters.
(ii) The ordinary least squares (OLS) estimators are unbiased. This means
the expected value of estimated parameters is equal to the true
population parameters.
(iii) The OLS estimators though unbiased, are no longer with minimum
variance, i.e., they are no longer efficient. In fact, even in large
samples, the OLS estimators are not efficient. Therefore, the OLS
estimators are not BLUE both in small as well as asymptotically large
samples.
(iv) In light of the above, the usual formula for estimating variances of
OLS estimator is biased, i.e., they are either upward biased (positive
bias) or downward biased (negative bias). Note that when the OLS
147
Treatment of Violations
of Assumptions
overestimates the true variances of estimators, a positive bias is said
to occur, and when it underestimates the true variances of estimators,
we say that a negative bias occurs.
∑
(v) The estimator of true population variance as given by 𝜎 = =
is biased. That is
𝐸 (𝜎 ) ≠ 𝜎 … (11.4)
b) From the regression obtain e i and square them. Then take the logs of ei2 .
|𝑒 | = 𝛽 + 𝛽 +𝑣 … (11.9)
The above means that the Glejser test suggests various plausible (linear as
well as non-linear) relationships between the residual term and the
explanatory variable to investigate the presence of heteroscedasticity.
150
e) For each of the cases given, test the null hypothesis that there is no Heteroscedasticity
Equation (11.13) tells us that the product of sample size (n) and the
coefficient of determination (R2) follows 2 distribution with degrees of
freedom (k–1). Here k is the number of regressors in the auxiliary
regression (equation 11.11).
𝜎 =𝜎 𝑋 … (11.14)
𝐻 :𝜎 = 𝜎 ... (11.15)
ʎ=
ʎ= … (11.6)
( , ) … (11.17)
1) State the steps in conducting the Park test for detection of heteroscedasticity.
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
.........................................................................................................................
Note that the error term gets transformed due to the division by 𝜎 . Let the new
error term be vi. Squaring the new error term we get:
𝑣 = …(11.20)
Since the variance of error term is given by 𝑣𝑎𝑟(𝑣 ) = 𝐸(𝑣 ), taking the
expectation of both sides of the equation (11.20) we get:
𝐸(𝑣 ) = 𝐸
= . 𝐸 (𝑢 )
= =1
153
Treatment of Violations Thus, the transformed error-term 𝑣 is homoscedastic. Therefore, equation
of Assumptions
(11.19) can be estimated by the usual OLS method. The OLS estimators of
𝛽 and 𝛽 thus obtained are called the Weighted Least Squares (WLS)
estimators.
11.5.2 Case II: i2 is Unknown
When the error variance i2 is not known, we need to make further assumptions
to use the WLS method. Here, we consider the following two cases.
𝐸(𝑢 ) = 𝜎 𝑋
Now, the square root transformation requires that we divide both sides of
equation (11.18) by to get:
=𝛽 +𝛽 +
=𝛽 +𝛽 𝑋 +𝑣 … (11.22)
where 𝑣 = … (11.23)
𝑣 = … (11.24)
Now, the variance of the transformed error term, i.e., equation (11.24) is:
( )
𝐸(𝑣 ) = = ... (11.25)
= 𝜎 ⇒ homoscedasticity
Thus, when we apply the square root transformation (𝑣 = ), we could
Here, we have:
𝐸(𝑢 ) = 𝜎𝑋 … (11.27)
154
Heteroscedasticity
𝑉(𝑢 ) = 𝜎𝑋
=𝛽 +𝛽 +
=𝛽 +𝛽 +𝑣 … (11.28)
Equation (11.28) is the transformed PRF in which the error term is:
𝑣 = , … (11.29)
𝑣 = … (11.30)
𝐸 (𝑣 ) = = =𝜎 … (11.31)
155
Treatment of Violations 2) Explain how the usage of deflators serve to tackle the problem of
of Assumptions
heteroscedasticity when the error variance is proportional to 𝑋 .
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
Model 1: 𝑌 =𝛽 +𝛽 𝑋 +𝑢 …….(11.33)
In Model 1, the dependent variable is linearly related to one (or more than one) of
the Xs. In Model 2, the relationship between the dependent and independent
variable is non-linear. The MWD test involves considering a null and an alternate
hypothesis as follows:
H1: Log- Linear Model, i.e., ln Y is a linear function of 𝑙𝑛𝑋 (equation (11.34))
Following are the steps for carrying out the MWD test:
(i) Estimate the linear model and obtain the estimated Y values. Let the
estimated Y values be denoted as Yf .
(ii) Estimate the log-linear model and obtain the estimated lnY values.
Let the estimated values of the log-linear Y be denoted as lnYf.
Suppose the linear model I in equation (11.33) is in fact the correct model. In that
case, the constructed variable Z1 should not be statistically significant in Step
(iv). For, in that case the estimated Y values from the linear model and those
estimated from the log-linear model (after taking their antilog values for
comparative purposes) in equation (11.34) should not be different. The same
logic applies to the alternative hypothesis H1.
Check Your Progress 5
1) Outline the MWD test for choosing the appropriate functional form of the
regression model between its linear and log-linear forms.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
157
Treatment of Violations Due to the scale effect, in cross-sectional data, there is a greater chance of
of Assumptions
coming across heteroscedasticity in the error terms.
Check Your Progress 2
1) The OLS estimators are unbiased but they no longer have minimum
variance, i.e., they are no longer efficient. Even in large samples the OLS
estimators are not efficient. Therefore, the OLS estimators are not BLUE
in small as well as large samples (asymptotically).
The usual formula for estimating the variances of OLS estimator are
biased i.e. there is either upward bias (positive bias) or downward bias
(negative bias).
2) The OLS estimator of error variance is a biased estimator. Thus it will
either overestimate or underestimate. In fact, the OLS estimator of error
variance is inefficient, thereby meaning that it is very high; thus it is
always an overestimate.
Check Your Progress 3
1) In the presence of heteroscedasticity, the heteroscedastic variance i2 may
be systematically related to one or more explanatory variables. Therefore,
we can regress i2 on one or more of X- variables as:
𝜎 = 𝑓(𝑋 ) or ln𝜎 = 𝛽 + 𝛽 ln𝑋 + 𝑣
where 𝑣 = 𝑛𝑒𝑤 residual term. If 𝜎 are not known, estimated ei can be
used as proxies for 𝑢 . A statistically significant relationship implies that
the null hypothesis of no heteroscedasticity is rejected suggesting the
presence of heteroscedasticity which requires remedial measures. If null
hypothesis is not rejected then it means we accept 2 0 and value of 1
can be taken as the common, homoscedastic variance 2 .
2) Heteroscedasticity means that the OLS estimators are unbiased but
estimators are no longer efficient, not even in large samples. This lack of
efficiency makes the conventional hypothesis testing of OLS estimators
unreliable. For remedial measures, it is important to know whether the
true error variance i2 is known or not. In such cases, use of deflators will
help rectify the problem of heteroscedasticity. Various deflators can be
used to convert the error variance ti make them homoscedastic.
When i2 is known, the method of Weighted Least Squares (WLS) can be
considered. In this, the error variance i2 is used to divide both sides of the
equation by i . See Section 11.5 for details.
3) The estimated residuals show a pattern similar to earlier case I, but error
variance is not linearly related to X but increases proportional to square of
X. Hence, 𝐸(𝑢 ) = 𝜎𝑋 and 𝑉(𝑢 ) = 𝜎𝑋 . Dividing both sides by 𝑋 , we
get:
=𝛽 +𝛽 +
=𝛽 +𝛽 +𝑣
158
Heteroscedasticity
𝑣 = ,𝑣 =
𝐸 (𝑣 ) = = =𝜎
Thus, the transformed equation is homoscedastic.
Check Your Progress 5
1) The test for selection of the appropriate functional form for regression as
proposed by Mackinnon, White and Davidson is known as MWD Test. The
MWD test is used to choose between the two models. See Section 11.6 for
details.
159
UNIT 12 AUTOCORRELATION
Structure
12.0 Objectives
12.2 Concept of Autocorrelation
12.3 Reasons for Autocorrelation
12.4 Consequences of Autocorrelation
12.5 Detection of Autocorrelation
12.5.1 Graphical Method
12.5.2 Durbin-Watson Test
12.5.3 The Breusch-Godfrey (BG) Test
12.6 Remedial Measures for Autocorrelation
12.6.1 Known Autoregressive Scheme: Cochrane-Orcutt Transformation
12.6.2 Unknown Autoregressive Scheme
12.6.3 Iterative Procedure
12.7 Autocorrelation in Models with Lags
12.8 Let Us Sum Up
12.9 Answers/ Hints to Check Your Progress Exercises
12.0 OBJECTIVES
After going through this unit, you should be able to:
outline the concept of autocorrelation in a regression model;
describe the consequences of presence of autocorrelation in the regression
model;
explain the methods of detection of autocorrelation;
discuss the procedure of carrying out the Durbin-Watson test for detection of
autocorrelation;
elucidate the remedial measures for resolving autocorrelation; and
outline the procedure of dealing with situations where autocorrelation exists
in models with a lagged dependent variable.
12.1 INTRODUCTION
In the previous unit, you studied about heteroscedasticity. You saw that
heteroscedasticity is a violation of one of the assumptions of the Classical Linear
Regression Model (CLRM), viz., homoscedasticity. If the variance of the error
term is not constant across all observations, then we have the problem of
heteroscedasticity. In this unit, we discuss about the violation of another
assumption of the CLRM. Recall that one of the assumptions about the error
Dr. Pooja Sharma, Assistant Professor, Daulat Ram College, University of Delhi
terms is that the error term of one observation is not correlated with the error Autocorrelation
term of another observation. If they are correlated, then the situation is said to be
one of autocorrelation. This is also called as the problem of serial correlation.
This can be present in both cross-section as well as time series data. Let us
discuss the concept of autocorrelation in a little more detail.
161
Treatment of Violations of
Assumptions
163
Treatment of Violations of
Assumptions 12.4 CONSEQUENCES OF AUTOCORRELATION
When the assumption of no-autocorrelation is violated, the estimators of the
regression model based on sample data suffers from certain consequences. More
specifically, the OLS estimators will suffer from the following consequences.
a) The least squares estimators are still linear and unbiased. In other words,
the estimated values of parameters continue to be unbiased. However,
they are not efficient because they do not have minimum variance.
Therefore, the usual OLS estimators are not BLUE (best linear unbiased
estimators).
b) The estimated variances of OLS estimators (𝑏 and 𝑏 ) are biased.
Hence, the usual formula used to estimate the variances, and their
standard errors underestimate the true variances and standard errors.
Consequently, the decision of rejecting a parameter on the basis of t-
values, concluding that a particular coefficient is statistically different
from zero, would be an incorrect conclusion. In other words, the usual t
and F tests become unreliable.
164
of true . In particular, it underestimates the true . As a consequence,
2 2 Autocorrelation
165
Treatment of Violations of
Assumptions 12.5 DETECTION OF AUTOCORRELATION
There are many methods of detecting the presence of autocorrelation. Let us
discuss them now.
12.5.1 Graphical Method
A visual examination of OLS residuals 𝑒 quite often conveys the presence of
autocorrelation among the error terms 𝑢 . Such a graphical presentation (Fig.
12.3) is known as the ‘time sequence plot’. The first part of this figure does not
show any clear pattern in the movement of the error terms. This means there is an
absence of autocorrelation. In the lower part of Fig. 12.3, you will notice that the
correlation between the two residual terms is first negative and then becomes
positive. Therefore, plotting the sample residuals gives us the first indication on
the presence or absence of autocorrelation.
166
one observation is lost in taking the successive differences. There are certain Autocorrelation
3. Find out the critical Table values 𝑑 and 𝑑 for given sample size and
given number of explanatory variables.
Follow the decision rule, as depicted in Fig. 12.4.
where 𝑣 is the white noise or the stochastic error term. We wish to test:
𝐻 : 𝜌 = 𝜌 =. . . 𝜌 = 0 … (12.9)
The null hypothesis says that there is no autocorrelation of any order. Now, the
BG test involves the following steps:
(c) Obtain R2 from the auxiliary regression (12.8) in the step (b) above.
(d) Now, for large samples, the Breusch and Godfrey test statistic is
computed as:
(𝑛 − 𝑝)𝑅 ~𝜒 … (12.10)
170
5) In what ways the BG test for autocorrelation is an improvement over the Autocorrelation
DW test?
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
…………………………………………………………………………….
171
Treatment of Violations of The transformed model will be
Assumptions
𝑌 ∗ = 𝛽∗ + 𝛽 𝑋 ∗ + 𝑣 … (12.16)
Now, the transformed variables 𝑌𝑡 ∗ and 𝑋𝑡 ∗ will have the desirable BLUE
property. The estimators obtained by applying the OLS method to (12.16) are
called the Generalized Least Squares (GLS) estimators. The transformation as
suggested above is known as the Cochrane-Orcutt transformation procedure.
12.6.2 Autoregressive Scheme is not Known
Suppose we do not know 𝜌. Thus, we need methods for estimating 𝜌. We first
consider the case where 𝜌 = 1. This amounts to assuming that the error terms are
perfectly positively autocorrected. This case is called as the First Difference
Method. If this assumption holds, a generalized difference equation can be
considered by taking the difference between (12.11) and its first order
autoregressive schemes as:
𝑌 −𝑌 = 𝛽 (𝑋 − 𝑋 )+𝑣 … (12.17)
i.e., 𝛥𝑌 = 𝛽 𝛥𝑋 + 𝑣 … (12.18)
where the symbol 𝛥 (read as delta) is the first difference operator. Note that the
difference model (12.17) has no intercept. If 𝜌 is not known, then we can
estimate 𝜌 by the following two methods.
(i) Durbin Watson Method
From equation (12.6) we see that d-statistic and 𝜌 are related. We can this
relationship to estimate 𝜌. The d-statistic and 𝜌 are related as:
𝜌 ≈ 1− … (12.19)
ℎ≈𝜌= ( )
… (12.22)
174
Autocorrelation
11.9 ANSWERS/ HINTS TO CHECK YOUR
PORGRESS EXERCISES
Check Your Progress 1
1) Autocorrelation refers to the presence of correlation between the error terms
of any two observations. This means if Ui and Uj are the error terms, then
Corr (Ui, Uj) ≠ 0 for i ≠ j. In the CLRM, one of our assumptions is that the
Corr (Ui, Uj) = 0. This means the two error terms are not correlated.
Violation of this assumptions is a situation of autocorrelation.
2) The problem of autocorrelation is more common in time series data. This is
because a phenomena affecting the error term in one point of time is more
likely to be influencing the error term in the next point of time. This is
especially identified as the factor of ‘inertia or sluggishness’. Across units of
cross section this is less likely. But it cannot be ruled out even in cross
section data. In such cases, due to the spatial effect in cross section data,
which is more like a demonstration effect, it is distinctly termed as spatial
correlation.
3) Inertia or sluggishness, specification error in the model, cobweb phenomenon
and data smoothening.
4) The consequences are: (i) least squares estimators are not efficient, (ii) the
estimated variances of OLS estimates are biased, (iii) the standard error of
true variances are underestimated, (iv) we are more likely to commit an error
in deciding on the hypothesis of ‘no statistical significance’ of a particular
estimated coefficient i.e. the decisions based on t and F tests would be
unreliable, (v) estimated error variance would be biased and (vi) the value of
R2 would be misleading or unreliable.
Check Your Progress 2
1) Time sequence plotting (graphical method), Durbin-Watson test and Breusch-
Godfrey (BG) Test.
𝒏
(𝒆𝒕 𝒆𝒕 𝟏 )𝟐
2) 𝒅 = 𝒕 𝟐
𝒏 . It is the ratio of the sum of the squared differences in the
𝒆𝟐𝒕
𝒕 𝟏
successive residuals to the residual sum of squares.
3) The regression model includes an intercept term, the X variables are non-
stochastic, the error term follows the following mechanism 𝑢 = 𝜌𝑢 +𝑣 ,
−1 ≤ 𝜌 ≤ 1, and the regression does not contain any lagged values of the
dependent variable as one of the explanatory variables.
4) The one drawback of the d-test is that it has two zones of indecision, viz., dL
< d < dU and (4 – dU < d < 4 – dL ).
175
Treatment of Violations of 5) (i) It can handle non-stochastic regressors as well as the lagged values of Yt ,
Assumptions
(ii) it can deal with higher-order autoregressive schemes such as AR(2)… etc.
and (iii) it can also handle simple or higher order moving averages.
Check Your Progress 3
1) In this method we lag the regression equation by one period; multiply it by 𝝆;
and subtract it from the original regression equation. This gives us a
transformed regression model. When estimated by OLS method, the
estimators of the transformed model are BLUE.
2) In Sub-Section 12.6.3 we have outlined steps of the Cochrane-Orcutt iterative
procedure. You should go through it and answer.
3) The h-statistic can be used in regression models having lagged dependent
variables as explanatory variables.
176
UNIT 13 MODEL SELECTION CRITERIA
Structure
13.0 Objectives
13.1 Introduction
13.2 Issues in Specification of Econometric Model
13.2.1 Model Specification
13.2.2 Violation of Basic Assumptions
13.3 Consequences of Specification Errors
13.3.1 Inclusion of Irrelevant Variable
13.3.2 Exclusion of Relevant Variable
13.3.3 Incorrect Functional Form
13.4 Error of Measurement in Variables
13.4.1 Measurement Error in Dependent Variable
13.4.2 Measurement Error in Independent Variable
13.5 Let Us Sum Up
13.7 Answers/ Hints to Check Your Progress Exercises
13.0 OBJECTIVES
After going through this unit, you will be able to
appreciate the importance of correct specification of an econometric
model;
identify the important issues in specification of econometric models;
find out the consequences of including an irrelevant variable;
find out the consequences of excluding a relevant variable; and
find out the impact of measurement errors in dependent and independent
variables.
13.1 INTRODUCTION
In the previous Units of the course we have discussed about various econometric
tools. We began with the classical two variable regression model. Later on, we
extended it to the classical multiple regression model. The steps of carrying out
the ordinary least squares (OLS) method were discussed in details. Recall that the
Dr. Sahba Fatima, Independent Researcher, Lucknow.
Econometric Model
Specification and
classical regression model is based on certain assumptions. When these
Diagnostic Testing assumptions are met, the OLS estimators are the best linear unbiased estimators
(BLUE). When these assumptions are violated the OLS estimators are not BLUE
– they lose some of their desirable properties. Therefore, when some of the
classical assumptions are not fulfilled, we have to adopt some other estimation
method.
Thus far our objective has been to explain how various estimation methods are
applied. Now let us look into certain other important issues regarding
specification of econometric models.
b) 𝐸 (𝑋 𝑢 ) = 0 (regressor is non-stochastic)
c) 𝐸 (𝑢 ) = 0
d) 𝐸(𝑢 ) = 𝜎
e) 𝐸 𝑢𝑢 = 0 for 𝑖 ≠ 𝑗
179
Econometric Model Assumption (b) says that 𝑋 and 𝑢 are independent. Thus if we take the 𝑋 values
Specification and
Diagnostic Testing randomly, the joint probability of both that 𝑋 and 𝑢 will not be zero. In order to
avoid this problem we assume that 𝑋 is non-stochastic. All explanatory variables
are fixed in repeated sampling.
Assumption (c) says that the mean of the error term (𝑢 ) is zero. There could be
errors in individual observations; on the whole these errors cancel out. If
𝐸 (𝑢 ) ≠ 0, OLS estimator of the intercept term (𝛽 ) will be biased. Estimators of
the slope parameters 𝛽 and 𝛽 will remain unbiased. For example, suppose
𝐸 (𝑢 ) = 3. In that case 𝐸(𝑌 ) will be
𝐸(𝑌 ) = 𝐸(𝛽 + 𝛽 𝑋 + 𝛽 𝑋 + 𝑢 )
Remember that 𝛽 are parameters of the model. They are constants. We have
assumed 𝑋 to be fixed across samples. Thus
𝐸(𝑌 ) = 𝛽 + 𝛽 𝑋 + 𝛽 𝑋 + 3
Thus the intercept term will be (𝛽 + 3). Remember that if assumption (d) is
violated we have the problem of heteroscedasticity, which is discussed in Unit
11. If assumption (e) is violated we have the problem of autocorrelation, that we
have discussed in Unit 12. In case the assumption (f) is violated we have the
problem of multicollinearity (see Unit 10).
Check Your Progress 1
1) List the assumptions of the classical regression model.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
180
3) What are the implications of violations of the basic assumptions classical Model Selection
Criteria
regression model?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
which is unbiased.
For the model (13.4) that we have taken, we obtain
181
Econometric Model (∑ ) ∑ (∑ ) (∑ )
Specification and 𝛽 =𝛽 = … (13.6)
∑ ∑ (∑ )
Diagnostic Testing
𝑦 = 𝛽 𝑥 + (𝑢 − 𝑢 ) … (13.7)
Substituting for 𝑦 from (13.7) into (13.6) and simplifying, we obtain
∑ ∑ (∑ )
E(𝛽 ) = E(𝛽 ) = 𝛽 … (13.8)
∑ ∑ (∑ )
(∑ ) ∑ (∑ ) ∑
E(𝛽 ) =E(𝛽 ) = 𝛽 … (13.10)
∑ ∑ (∑ )
In order to calculate the bias in the estimated value of 𝛽 in the incorrect model
(equation (13.12)) as compared to the true model (equation (13.11)), we take the
following steps:
Substituting the expression of y from the true model in (13.11), we get
∑ ( ) ∑ ∑
𝛽∗ = ∑
=𝛽 + 𝛽 ∑
+ ∑
… (13.14)
Since E (∑ 𝑥 𝑢) = 0 we get
E (𝛽 ∗ ) = 𝛽 + 𝑏 𝛽 … (13.15)
∑
where 𝑏 = ∑
is the regression coefficient from a regression of X2 (omitted
variable) on X1.
Thus 𝛽 ∗ is a biased estimator for 𝛽 and the bias is given by
Bias = (coefficient of the excluded variable) × (regression coefficient in a
regression of the excluded variable on the included variable) … (13.16)
In the deviation form, the three-variable population regression model can be
written as
𝑦 = 𝛽 𝑥 + 𝛽 𝑥 + ( 𝑢 − 𝑢) … (13.17)
∑ 𝑦 𝑥 = 𝛽 ∑ 𝑥 + 𝛽 ∑ 𝑥 𝑥 + ∑ 𝑥 ( 𝑢 − 𝑢) … (13.18)
∑ 𝑦 𝑥 = 𝛽 ∑ 𝑥 𝑥 + 𝛽 ∑ 𝑥 + ∑ 𝑥 ( 𝑢 − 𝑢) … (13.19)
Thus we have
∑
by2 = ∑
∑
b32 = ∑ 183
Econometric Model Hence (13.20) can be written as
Specification and
Diagnostic Testing
∑ ( )
by2 = 𝛽 + 𝛽 b32 + ∑
… (13.21)
Similarly, if 𝑥 is omitted from the model, the bias in E(by3) can be calculated.
The variance of 𝛽 ∗ (parameter of the incorrect model) can also be derived by
using the formula for variance. As it is a bit complex, we do not present it here.
You should note that the variance of 𝛽 ∗ is higher than that of 𝛽 . An implication
of the above is that usual tests of significance concerning parameters are invalid,
if some of the relevant variables are excluded from a model.
Thus we know that
(i) When an irrelevant variable is included in the model: (a) the
estimators of parameters are unbiased, (b) efficiency of the estimators
decline, and (c) estimator of the error variance is unbiased. Thus
conventional tests of hypothesis are valid. The inferences drawn could
be somewhat erroneous.
(ii) When a relevant variable is dropped from the model: (a) estimators of
parameters are biased, (b) efficiency of estimators decline, and (c)
estimator of error variance is biased. Thus conventional tests of
hypothesis are invalid. The inferences drawn are faulty.
13.3.3 Incorrect Functional Form
Apart from inclusion of only relevant variables in an econometric model, another
specification error pertains to functional form. There is a tendency the part of
researchers to assume a linear relationship between variables. This however is
not always true. If the true relationship is non-linear and we take a linear
regression model for estimation, we will not be able to draw correct inferences.
There are test statistics available to choose among functional forms. We will
discuss these test statistics in Unit 14.
𝑌 = 𝑌∗ + 𝑒 … (13.24)
where 𝑒 denote measurement error in 𝑌 ∗ .
Therefore, instead of estimating
𝑌 ∗ = 𝛼 + 𝛽𝑋 + 𝑢 , we estimate
𝑌 = 𝛼 + 𝛽𝑋 + 𝑢 + 𝑒
= 𝛼 + 𝛽𝑋 + (𝑢 + 𝑒 )
185
Econometric Model Let us re-write the above equation as
Specification and
Diagnostic Testing
𝑌 = 𝛼 + 𝛽𝑋 + 𝑣 … (13.25)
where 𝑣 = 𝑢 + 𝑒
In equation (13.25) we take 𝑣 as a composite error term comprising population
disturbance term (𝑢 ) and measurement error term (𝑒 ).
Let us assume that the following classical assumptions hold
a) E(𝑢 ) = E(𝑒 ) = 0
b) Cov (𝑋 , 𝑢 ) = 0
c) Cov (𝑢 , 𝑒 ) = 0
An implication of (c) above is that the stochastic error term and the measurement
error term are uncorrelated. Thus expected value of the composite error term is
zero; 𝐸 (𝑣 ) = 0. By extending the logic given in Unit 4, we can say that 𝐸 𝛽 =
𝛽. It implies that 𝛽 is unbiased.
Now let us look into the issue of variance in the case of measurement error in the
dependent variable. As you know, variance of the estimator 𝛽 in a two variable
regression model (13.23) is given by
Var(𝛽 ) = ∑ ,
Var(𝛽 ) = ∑
= ∑
… (13.26)
Thus we see that the variance of the error term is larger if there is measurement
error in the dependent variable. This leads to inefficiency of the estimators. They
are not best linear unbiased estimators (BLUE).
13.4.2 Measurement Error in Independent Variable
There could be measurement error in explanatory variables. Let us assume the
true regression model to be estimated is
𝑌 = 𝛼 + 𝛽𝑋 ∗ + 𝑢 … (13.27)
Suppose we do not have data on variable 𝑋 ∗ . On the other hand, suppose we have
data on 𝑋 . In that case, instead of observing 𝑋 ∗ , we observe
𝑋 = 𝑋∗ + 𝑤 … (13.28)
where 𝑤 represents error of measurement in 𝑋 ∗ .
In the permanent income hypothesis model, for example,
𝑌 = 𝛼 + 𝛽𝑋 ∗ + 𝑢
where 𝑌 is current consumption expenditure
𝑋 ∗ is permanent income
𝑢 is stochastic disturbance term (equation error)
186
From equation (13.27) and (13.28) we find that Model Selection
Criteria
𝑌 = 𝛼 + 𝛽 (𝑋 − 𝑤 ) + 𝑢 … (13.29)
= 𝛼 + 𝛽𝑋 + (𝑢 − 𝛽𝑤 )
= 𝛼 + 𝛽𝑋 + 𝑧 … (13.30)
where 𝑧 = (𝑢 − 𝛽𝑤 ) . You should notice that 𝑧 is made up of two terms:
stochastic error and measurement error.
Now, let us assume that 𝑤 has zero mean; it is serially independent; and it is
uncorrelated with 𝑢 . Even in that case, the composite error term 𝑧 is not
independent of the explanatory variable 𝑋 .
Cov (𝑧 , 𝑋 ) = E[𝑧 − 𝐸 (𝑧 )[𝑋 − 𝐸 (𝑋 )]
= E(𝑢 − 𝛽𝑤 )(𝑤 )
= E(−𝛽𝑤 )
= −𝛽𝜎 … (13.31)
From (13.31) we find that the independent variable and the error term are
correlated. This violates the basic assumption of the classical regression model
that the explanatory variable is uncorrelated with the stochastic disturbance term.
In such a situation the OLS estimators are not only biased but also inconsistent,
that is they remain biased even if the sample size n increases infinitely.
Check Your Progress 3
1) Explain the consequences measurement error in the dependent variable.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
187
Econometric Model 3) Measurement error in the dependent variable is a lesser evil than
Specification and
Diagnostic Testing measurement error in the explanatory variable.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
189
UNIT 14 TESTS FOR SPECIFICATION
ERROR
Structure
14.1 Introduction
14.2 Objectives
14.3 Tests for Identifying the Most Efficient Model
14.3.1 The 𝑅 Test and Adjusted 𝑅 Test
14.1 INTRODUCTION
In the previous Unit we highlighted the consequences of specification errors.
There could be three types of specification errors; inclusion of an irrelevant
variable, exclusion of a relevant variable, and incorrect functional form. When
the econometric model is not specified correctly, the coefficient estimates, the
confidence intervals, and the hypothesis tests are misleading and inconsistent. In
view of this, econometric models should be correctly specified.
Dr. Sahba Fatima, Independent Researcher, Lucknow.
Econometric theory suggests certain criteria and test statistics. On the basis of Tests for Specification
Error
these criteria we select the most appropriate econometric model. We describe
some of these criteria below.
14.2 OBJECTIVES
After going through this Unit, you should be in a position to
identify econometric models that are not specified correctly;
take remedial measures for correcting the specification error; and
evaluate the performance of competing models.
191
Econometric Model model with higher R2 is preferred. You should however keep in mind that a very
Specification and high R2 indicates the presence of multicollinearity in the model. If the R2 is high
Diagnostic Testing
but the t-ratio of the coefficients are not statistically significant you should check
for multicollinearity. The R2 is calculated on the basis of the sample data.
Thus the explanatory variables included the model are considered for estimation
of R2. Variables not included in the model do not account for the variation in the
dependent variable.
There is a tendency of the R2 to increase if more explanatory variables are added.
Thus, we are tempted to add more explanatory variables to increase the
explanatory power of the model. If we add irrelevant explanatory variables in a
model, the estimators are unbiased, but there is an increase in the variance of the
estimators. This makes forecast and analysis on the basis of such models
unreliable.
In order to overcome this difficulty, we use the ‘adjusted-R2’. It is denoted by 𝑅
and defined as follows:
⁄( )
𝑅 =1− ⁄( )
= 1 − (1 − 𝑅 ) … (14.4)
ln 𝐴𝐼𝐶 = + ln … (14.6)
192
where ln 𝐴𝐼𝐶 is the natural log of AIC, and is the penalty factor. Tests for Specification
Error
Remember that the model with a lower value of lnAIC is considered to be better.
Thus, when we compare two models by using the AIC criterion, the model with
lower value of AIC has a better specification. The logic is simple. An
econometric model that reduces the residual sum of squares is a better specified
model.
14.3.3 Schwarz Information Criterion
The Schwarz Information Criterion (SIC) also relies on the RSS, like the AIC
criterion mentioned above. This method also is popular for analysing correct
specification of an econometric model. The SIC is defined as follows:
⁄ ∑ ⁄
𝑆𝐼𝐶 = 𝑛 = 𝑛 … (14.7)
ln 𝑆𝐼𝐶 = ln 𝑛 + ln … (14.8)
where [(𝑘⁄𝑛) ln 𝑛] is the penalty factor. Note that the SIC criterion imposes a
harsher penalty for inclusion of explanatory variable compared to the AIC
criterion.
14.3.4 Mallow’s 𝑪𝒑 Criterion
When we do not include all the relevant variables in a model, the estimators are
biased. The Mallow’s Cp Criterion evaluates such bias to find out whether there
is significant deviation from the unbiased estimators. Thus, the Mallow’s Cp
Criterion helps us in selecting the best among competing econometric models.
If some of the explanatory variables are dropped from a model, there is an
increase in the residual sum of squares (RSS). Let us assume that the true model
has k regressors. For this model, 𝜎 is the estimator of true 𝜎 . Now, suppose we
drop p regressors from the model. The residual sum of squares obtained from the
truncated model is 𝑅𝑆𝑆 . The Mallow’s Cp Criterion is based on the following
formula:
𝐶 = − (𝑛 − 2𝑝) ... (14.9)
193
Econometric Model you should go by the theoretical appropriateness of including or excluding a
Specification and variable. In order to have a correctly specified model, a thorough understanding
Diagnostic Testing
of the theoretical concepts and the related literature is necessary. Also, the model
that we fit will only be as good as the data that we have collected. If the data
collected does not suffer from, say, multicollinearity or autocorrelation, we are
likely to have a more robust model.
As mentioned earlier, the criteria for selecting an appropriate model primarily
rests on the theory behind it and the strength of the collected data. Many a time,
we observe certain relationship between two variables. Such relationship
however may be superficial or spurious. Let us take an example. At a traffic light,
cars stop when the signal is red. It does not mean that cars cannot move when
there is red light in front of them. It also does not mean that traffic light has some
damaging effect on moving cars. The reason is observance of traffic rules. Unless
we look into the traffic rules and go by observation only, our reasoning will be
wrong. The dependent variable and the independent variable both may be
affected by another variable. In such cases the relationship is confounded.
You should note one more issue regarding selection of econometric models.
Different test criteria may suggest different models. For example, economic logi
suggests that there could two possible econometric models (say, model A and
model B) for a particular issue. You may come across a situation such that 𝑅
test suggests model A and AIC criterion suggest model B. In such situations you
should carry out a number of tests and then only chose the best model.
Adjusted R-squared, Mallows 𝐶 , p-values, etc. may point to different regression
equations without much clarity to the econometrician. Thus, we conclude that
none of the methods for model selection listed above are adequate by itself.
There is no substitute to theoretical understanding of the related literature,
accurately collected data, practical understanding of the problem, and common
sense while specifying an econometric model. We will discuss further on the
model selection criteria in the course BECC 142: Applied Econometrics.
Check Your Progress 1
1) Explain why 𝑅 is a better criterion than R2 in model specification.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
194
2) Explain how the AIC and BIC criteria are applied in selection of Tests for Specification
econometric models. Error
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
195
APPENDIX TABLES
Table A1: Normal Area Table
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
196
Table A2: Critical Values of Chi-squared Distribution
197
Table A3: Critical Values of t Distribution
198
Table A4: Critical Values of F Distribution
(5% level of significance)
df2/df1 1 2 3 4 5 6 7 8 9 10
1 161.448 199.500 215.707 224.583 230.162 233.986 236.768 238.883 240.543 241.882
2 18.513 19.000 19.164 19.247 19.296 19.330 19.353 19.371 19.385 19.396
3 10.128 9.552 9.277 9.117 9.014 8.941 8.887 8.845 8.812 8.786
4 7.709 6.944 6.591 6.388 6.256 6.163 6.094 6.041 5.999 5.964
5 6.608 5.786 5.410 5.192 5.050 4.950 4.876 4.818 4.773 4.735
6 5.987 5.143 4.757 4.534 4.387 4.284 4.207 4.147 4.099 4.060
7 5.591 4.737 4.347 4.120 3.972 3.866 3.787 3.726 3.677 3.637
8 5.318 4.459 4.066 3.838 3.688 3.581 3.501 3.438 3.388 3.347
9 5.117 4.257 3.863 3.633 3.482 3.374 3.293 3.230 3.179 3.137
10 4.965 4.103 3.708 3.478 3.326 3.217 3.136 3.072 3.020 2.978
11 4.844 3.982 3.587 3.357 3.204 3.095 3.012 2.948 2.896 2.854
12 4.747 3.885 3.490 3.259 3.106 2.996 2.913 2.849 2.796 2.753
13 4.667 3.806 3.411 3.179 3.025 2.915 2.832 2.767 2.714 2.671
14 4.600 3.739 3.344 3.112 2.958 2.848 2.764 2.699 2.646 2.602
15 4.543 3.682 3.287 3.056 2.901 2.791 2.707 2.641 2.588 2.544
16 4.494 3.634 3.239 3.007 2.852 2.741 2.657 2.591 2.538 2.494
17 4.451 3.592 3.197 2.965 2.810 2.699 2.614 2.548 2.494 2.450
18 4.414 3.555 3.160 2.928 2.773 2.661 2.577 2.510 2.456 2.412
19 4.381 3.522 3.127 2.895 2.740 2.628 2.544 2.477 2.423 2.378
20 4.351 3.493 3.098 2.866 2.711 2.599 2.514 2.447 2.393 2.348
21 4.325 3.467 3.073 2.840 2.685 2.573 2.488 2.421 2.366 2.321
22 4.301 3.443 3.049 2.817 2.661 2.549 2.464 2.397 2.342 2.297
23 4.279 3.422 3.028 2.796 2.640 2.528 2.442 2.375 2.320 2.275
24 4.260 3.403 3.009 2.776 2.621 2.508 2.423 2.355 2.300 2.255
25 4.242 3.385 2.991 2.759 2.603 2.490 2.405 2.337 2.282 2.237
26 4.225 3.369 2.975 2.743 2.587 2.474 2.388 2.321 2.266 2.220
27 4.210 3.354 2.960 2.728 2.572 2.459 2.373 2.305 2.250 2.204
28 4.196 3.340 2.947 2.714 2.558 2.445 2.359 2.291 2.236 2.190
29 4.183 3.328 2.934 2.701 2.545 2.432 2.346 2.278 2.223 2.177
30 4.171 3.316 2.922 2.690 2.534 2.421 2.334 2.266 2.211 2.165
40 4.085 3.232 2.839 2.606 2.450 2.336 2.249 2.180 2.124 2.077
60 4.001 3.150 2.758 2.525 2.368 2.254 2.167 2.097 2.040 1.993
120 3.920 3.072 2.680 2.447 2.290 2.175 2.087 2.016 1.959 1.911
inf 3.842 2.996 2.605 2.372 2.214 2.099 2.010 1.938 1.880 1.831
199
Table A4: Critical Values of F Distribution (Contd.)
(5% level of significance)
200
Table A4: Critical Values of F Distribution (contd.)
(1% level of significance)
df2/df1 1 2 3 4 5 6 7 8 9 10
1 4052.181 4999.500 5403.352 5624.583 5763.650 5858.986 5928.356 5981.070 6022.473 6055.847
2 98.503 99.000 99.166 99.249 99.299 99.333 99.356 99.374 99.388 99.399
3 34.116 30.817 29.457 28.710 28.237 27.911 27.672 27.489 27.345 27.229
4 21.198 18.000 16.694 15.977 15.522 15.207 14.976 14.799 14.659 14.546
5 16.258 13.274 12.060 11.392 10.967 10.672 10.456 10.289 10.158 10.051
6 13.745 10.925 9.780 9.148 8.746 8.466 8.260 8.102 7.976 7.874
7 12.246 9.547 8.451 7.847 7.460 7.191 6.993 6.840 6.719 6.620
8 11.259 8.649 7.591 7.006 6.632 6.371 6.178 6.029 5.911 5.814
9 10.561 8.022 6.992 6.422 6.057 5.802 5.613 5.467 5.351 5.257
10 10.044 7.559 6.552 5.994 5.636 5.386 5.200 5.057 4.942 4.849
11 9.646 7.206 6.217 5.668 5.316 5.069 4.886 4.744 4.632 4.539
12 9.330 6.927 5.953 5.412 5.064 4.821 4.640 4.499 4.388 4.296
13 9.074 6.701 5.739 5.205 4.862 4.620 4.441 4.302 4.191 4.100
14 8.862 6.515 5.564 5.035 4.695 4.456 4.278 4.140 4.030 3.939
15 8.683 6.359 5.417 4.893 4.556 4.318 4.142 4.004 3.895 3.805
16 8.531 6.226 5.292 4.773 4.437 4.202 4.026 3.890 3.780 3.691
17 8.400 6.112 5.185 4.669 4.336 4.102 3.927 3.791 3.682 3.593
18 8.285 6.013 5.092 4.579 4.248 4.015 3.841 3.705 3.597 3.508
19 8.185 5.926 5.010 4.500 4.171 3.939 3.765 3.631 3.523 3.434
20 8.096 5.849 4.938 4.431 4.103 3.871 3.699 3.564 3.457 3.368
21 8.017 5.780 4.874 4.369 4.042 3.812 3.640 3.506 3.398 3.310
22 7.945 5.719 4.817 4.313 3.988 3.758 3.587 3.453 3.346 3.258
23 7.881 5.664 4.765 4.264 3.939 3.710 3.539 3.406 3.299 3.211
24 7.823 5.614 4.718 4.218 3.895 3.667 3.496 3.363 3.256 3.168
25 7.770 5.568 4.675 4.177 3.855 3.627 3.457 3.324 3.217 3.129
26 7.721 5.526 4.637 4.140 3.818 3.591 3.421 3.288 3.182 3.094
27 7.677 5.488 4.601 4.106 3.785 3.558 3.388 3.256 3.149 3.062
28 7.636 5.453 4.568 4.074 3.754 3.528 3.358 3.226 3.120 3.032
29 7.598 5.420 4.538 4.045 3.725 3.499 3.330 3.198 3.092 3.005
30 7.562 5.390 4.510 4.018 3.699 3.473 3.304 3.173 3.067 2.979
40 7.314 5.179 4.313 3.828 3.514 3.291 3.124 2.993 2.888 2.801
60 7.077 4.977 4.126 3.649 3.339 3.119 2.953 2.823 2.718 2.632
120 6.851 4.787 3.949 3.480 3.174 2.956 2.792 2.663 2.559 2.472
inf 6.635 4.605 3.782 3.319 3.017 2.802 2.639 2.511 2.407 2.321
201
Table A4: Critical Values of F Distribution (contd.)
(1% level of significance)
202
Table A5: Durbin-Watson d-statistic Level of Significance = 0.05 k= no. of regressors
________________________________________________________________________________________________
203
GLOSSARY
Association : It refers to the connection or relationship between
variables
205
Econometric Model : These are statistical models specifying relationship
between relationships between various economic
quantities.
208
MWD test : This is the test for the selection of the appropriate
functional form for regression as proposed by
Mackinnon, White and Davidson. The test is hence
known as the MWD Test.
211
Type II Error : The error that occurs when we accept a null
hypothesis that is actually false. It is the probability
of accepting the null hypothesis when it is false.
212