100% found this document useful (1 vote)
2K views101 pages

Econometrics II

This document discusses regression analysis with qualitative explanatory variables. It begins by introducing the topic and noting that both dependent and explanatory variables can be qualitative. It then discusses three main types of models: (1) regression with only dummy explanatory variables, known as analysis of variance (ANOVA) models; (2) regression with a mix of qualitative and quantitative variables, known as analysis of covariance (ANCOVA) models; and (3) using a numerical example to illustrate how dummy variables represent qualitative variables and how to interpret regression results.

Uploaded by

DENEKEW LESEMI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
2K views101 pages

Econometrics II

This document discusses regression analysis with qualitative explanatory variables. It begins by introducing the topic and noting that both dependent and explanatory variables can be qualitative. It then discusses three main types of models: (1) regression with only dummy explanatory variables, known as analysis of variance (ANOVA) models; (2) regression with a mix of qualitative and quantitative variables, known as analysis of covariance (ANCOVA) models; and (3) using a numerical example to illustrate how dummy variables represent qualitative variables and how to interpret regression results.

Uploaded by

DENEKEW LESEMI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 101

Arba Minch University

College of Business and Economics


Department of Economics

Econometrics II (Econ 3062)

MODULE

Prepared By: Kasahun Tilahun (MSc)

December, 2022/23

Page 1
CHAPTER ONE
REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

Introduction
Both the dependent and explanatory variables that we have encountered in the preceding chapters were
essentially quantitative/continuous. Regression models can also handle qualitative variables as dependent and
explanatory variables. In this chapter, we shall consider models that may involve not only quantitative variables
but also qualitativevariables. Such variables are also known as indicator variables, categorical
variables,nominal scale variables or dummy variables.

To be specific, in this chapter we shall discuss two main topics: (1) regression with qualitative explanatory
variables and (2) regression with qualitative dependent variables.

1.1. REGRESSION WITH DUMMY EXPLANATORY VARIABLES

1.1.1 The Nature of Dummy Explanatory Variables

In regression analysis the dependent variable, or regressand, is frequently influenced not only by ratio scale
variables (e.g. income, output, prices, costs, height, temperature) but also by variables that are essentially
qualitative, or nominal scale, in nature. Examples are sex, race, color, religion, nationality, geographical region
and political party membership. The importance of such variables is evidenced by several studies. For instance,
studies found that female workers earn less than male workers, holding other factors constant. Also, in
multicultural contexts,black workers are found to earn less than whites.This clearly shows that(whatever the
reasons) qualitative variablessuch as sex and race can influence the dependent variable and should explicitly be
included among explanatory variables.

How do you think will these variables be included in regression analysis?


Qualitative variables can be used in regression models just as easily as quantitative variables. However, to
include qualitative variables into empirical analysis, one has to change them into quantitative variables. One
way to quantify nominal scale variables is by creating artificial variables that take on values of 1 or 0, where 1
indicates the presence of a certain attribute/quality and 0 indicates the absence of that attribute. This artificial
representation is possible since nominal scale variables usually indicate the presence or absence of a certain
quality or an attribute. For example, if an investigator encounters ”the sex of a household head” as a nominal
variable in a regression analysis, he/she can use 1 to indicate that the household head is male and 0 to designate
that the household head is female. Variables that assume 0 and 1 artificially are called dummy
variables.Hence, dummyvariables can be considered as a method to classify data on qualitative variables into
mutually exclusive categories such as presence or absence of an attribute.

Page 2
In a given regression model, the qualitative and quantitative variables may also occur together, i.e., some
variables may be qualitative and others are quantitative. When all explanatory variables are
- quantitative, then the model is called a regression model,
- qualitative, then the model is called an analysis of variance model (ANOVA) and
- quantitative and qualitative both, then the model is called analysis of covariance model (ANCOVA).

 Such models can be dealt within the framework of regression analysis. The usual tools of regression
analysis can be used in case of dummy variables.

1.1.2 Regression with only Qualitative Variables (ANOVA Models)


To exemplify the application of dummy variables in regression analysis, let‟s suppose that an investigator wants
to find out if average wheat farm productivity (yield) differs among male and female headed households. This
objective can be accomplished within the framework of regression analysis.

To see this, consider the following regression model:

Where, Wheat farm productivity ofthe ithWheat producing household


{

Note that equation 1.1 has only one dummy variable since the grouping/qualitative variable in equation 1.1 has
two categories. Similarly, if a qualitative variable has three categories, we introduce only two dummy variables.
For instance, suppose that the investigator we considered above wants to see whether wheat farm productivity
depends on education attainment of household head, where education attainment of household heads‟ are
divided in to three mutually exclusive categories: illiterate, elementary school and high school and above. The
model can be stated as follows:

Where, {

What does the model (1.2) tell us? Assuming that the error term satisfies the usual assumptions of OLS, taking
expectation on both sides of (1.2), we obtain:
 Mean wheat farm productivity of households with elementary education:

 Mean wheat farm productivity of households with high school and above schooling:

 Mean wheat farm productivity of illiterate households:
Page 3

In other words, in the multiple regression equation (1.2), the mean wheat farm productivity of illiterate
households is given by the intercept, ,and the slope coefficients tell by how much the mean wheat
farm productivity of households withelementary education and high school and aboveschooling differ from the
mean wheat farm productivity of illiterate households. But how do we know that these differences are
statistically significant? To address this question, let‟s consider a numerical example.

Numerical Example-1:

Given data of the following form on wheat farm productivity (yield) and education attainment of household heads based
on a sample of 120 farm households, we can estimate yield as a function of education attainmentwhich is given in three
categories: illiterate, elementary and high school & above.

Table-1.1: data on wheat farm productivity and educational attainment

Obs Yield D1 D2 Obs Yield D1 D2


1 12 0 0 8 12 0 0
2 9 1 0 9 12.8 1 0
3 10.6 0 0 10 12.8 1 0
4 13.4 0 1 . . . .
5 12 1 0 . . . .
6 12 1 0 119 15.6 0 1
7 10.6 1 0 120 18 0 1

Based on the above data, if we estimate equation 1.2, we obtain the following regression result

Since we are treating illiterate household heads as the benchmark, the coefficients attached to the various
dummies are differential intercepts, showing by how much the average value of in the category that
receives 1 differs from that of the benchmark category. That is, as the regression result shows, the coefficient of
average yield of households with elementary education and households with high school and
above schooling is larger than average yield level of households with no schooling by about and
quintals, respectively. Therefore, the actual mean yield of households with elementary education and high
school and above schooling can be obtained by adding the differential yields. Doing so, we obtain that the mean
yield of households with elementary education is 12.66quintals and that of households with high school and
above is 14 quintals.

Page 4
But how do we know that these mean yields are statistically different from the mean yield of illiterate
households, the benchmark category? This is straightforward. All we have to do is to find out if each of the
slope coefficients in the above equation are statistically significant. As can be seen from the regression result,
the estimated slope coefficients both households with elementary and high school and above are statistically
significantas implied by the P-values of the t-statistics.

1.1.3 Regression with a mixture of Qualitative and Quantitative Explanatory Variables: ANCOVA
Sometimes an investigator may deal with a regression analysis to assess the statistical significance of the
relationship between quantitative dependent variable and qualitative explanatory variables. In such cases one
often uses dummy variable models with some assumptions about their use. For instance, the implicit assumption
in this case is that the regression lines for the different groups differ only in the intercept term but have the same
slope coefficients.

To exemplify the application of dummy variables with the above implicit assumption, let us suppose that the
investigator we considered above wants to analyze the relationship between wheat farm productivity and
chemical fertilizer application for two groups: male and femaleheaded households.The relationship can be
represented by the following equation:

Where, the ith household wheat farm productivity in quintal

chemical fertilizer applied in Kg


What does the model (1.4) tell us? Assuming that the error term satisfies the usual assumptions of OLS, taking
expectation on both sides of (1.4), we obtain:
 Mean wheat farm productivity of male headed households:

This is a straight line relationship with intercept and slope .
 Mean wheat farm productivity of female headed households:

This is a straight-line relationship with intercept and slope .
Graphical representation of equation 1.4 is given as:

Page 5
Figure 1.1: Regression lines with a common slope and different intercept

It is evident from equation 1.4 that the coefficient of the dummy variable, measures the differences in the two
intercept terms. Hence, it is called differential intercept coefficientbecause it tells by how much the value of
the intercept that receives the value of 1 differs from the intercept coefficient of the benchmark category.

Note that as we did above, ifthere is a constant term in the regression equation, the number of dummies
defined should be one less than the number of the categories of the variable.This is because the constant
term is the intercept for the base group. As it can be seen from equation 1.4, the constant term, measures
the intercept for female headed households. Furthermore, the constant term plus the coefficient of
measures the intercept for the male headed households. This interpretation holds as long as the base
group is female headed households. But this should not connote that the base group is always female headed
households. Any one group may be chosen as the base group,depending on the preference of the investigator.

On the other hand, if we do not introduce a constant term in the regression equation, we can define a dummy
variable for each group, and in this case the coefficient of the dummy variables measures the intercepts for the
respective groups. However, if we include both the constant term and dummies for all categories of a variable,
we will encounter the problem of perfectmulticollinearityand the regression program either will not run or will
omit one of the dummies automatically.

In general, when we are introducing dummy variables, we have to follow the following rule. If the qualitative
variable has categories and the regression equation has a constant intercept, introduce only dummy
variables, otherwise we will fall into what is called the dummy variable trap, that is, the problem of perfect
multicollinearity.

As you might have noted, so far we have been dealing with regression analysis where there is only one
qualitative variable. In practice, however, we may have more than one qualitative variable affecting the

Page 6
dependent variable. Then, how will you introduce more than one qualitative variable in your regression
analysis?

The introduction of dummy variables for more than one qualitative variable in a regression analysis is
straightforward.That is, for each qualitative variable we introduce one or more than one dummy following the
rule mentioned above. For example, suppose that we want to analyze the determinants of household
consumption(C), and hence we have data on:
1. Y: income of households
2. S: the sex of the head of the household.
3. A:age of the head of the household, which is given in three categories: < 25 years, 25 to 50 years, and>
50 years.
4. E:education of the head of household, also in three categories: < high school, high school but < college
degree, and college degree and above.

Following the rule of dummy variables, we can include these qualitative variables in the form of dummy
variables as follows:

Where, {

For each category the number of dummy variables is one less than the number of classifications.
The assumption made in the dummy-variable method is that it is only the intercept that changes for each group
but not the slope coefficients (i.e., coefficients of Y). The intercept term for each individual is obtained by
substituting the appropriate values for through . For instance, for a male, age < 25, with college degree,
we have , , , , and hence the intercept is . For a female,
age> 50 years, with a college degree, we have , , , , and hence the
intercept term is just .

Furthermore, the coefficients of the dummies are interpreted as the differences between the average
consumption of the omitted (base) category and the category represented by the dummy under consideration,
keeping other things constant. For instance, in equation 1.5 above is interpreted as, keeping other things

Page 7
constant, the average consumption expenditure of male headed households is greater/less than that of their
female headed counterparts.

The other most important question, perhaps, is how to check the statistical significance of the differences among
groups (if any). For this purpose we simply test whether the coefficients of the dummies are statistically
significant or not by using the usual procedure of hypothesis testing, i.e., by using the standard error test or the
students t-test procedures.

1.1.4 Dummy Variables in Seasonal Analysis


Many economic time series that are based on monthly or quarterly data exhibit seasonal patterns, i.e., they show
regular oscillatory movements. Examples are sales of department stores at Christmas and other major holiday
times, demand for money by households at holiday times, prices of crops right after harvesting season, demand
for air travel, etc.

Often it is desirable to remove that seasonal factorfrom time series, so that one can concentrate on the other
components, such as the trend. The process of removing the seasonal component from a time series is known as
deseasonalizationor seasonal adjustment, and the time series thus obtained is called thedeseasonalized, or
seasonally adjustedtime series.Usually the important economic series, such as the unemployment rate, the
consumer price index (CPI), the producer price index (PPI), and the industrial production index that we see in
different reports are published in their seasonally adjusted form.

One of the methods used to deseasonalizea time series is the method of dummy variables.On the other hand,
regression on seasonal dummy variables such as quarterly dummy or monthly dummy is a simple way to
estimate seasonal effects in time series data sets.

Example:
To illustrate the dummy variables technique in seasonal analysis, suppose that we have quarterly data on sales
of refrigerators over the years 1978 through 1985 given in the following table.
Table-1.2: Quarterly Data on Sales of refrigerators (in thousands) (1978-I to 1985-IV)
Year Quart FRIG Year Quart FRIG Year Quart FRIG Year Quart FRIG
1978 I 1980 I 1982 I 1984 I
1317 1277 943 1429
II II II II
1615 1258 1175 1699
III III III III
1662 1417 1269 1749
IV IV IV IV
1295 1185 973 1117
1979 I 1981 I 1983 I 1985 I
1271 1196 1102 1242

Page 8
II II II II
1555 1410 1344 1684
III III III III
1639 1417 1641 1764
IV IV IV IV
1238 919 1225 1328

But first let us look at the data, which is shown in Figure 1.2 below.

1000 1200 1400 1600 1800

1978q1 1980q1 1982q1 1984q1 1986q1

Year

Figure 1.2: Quarterly Data on Sales of refrigerators (in thousands) (1978-I to 1985-IV)
FRIG FRIG
FRIG FRIG
This figure suggests that perhaps there is a seasonal pattern in the data associated with the various quarters.How
FRIG

can we measure such seasonality in refrigerator sales? We can use the seasonal dummy regression to
estimate the seasonal effect. To do this, let‟s treat the first quarter as the reference quarter and assign dummies
to the second, third, and fourth quarters (i.e., the data setup looks the table below).That is, to havethe following
model:

Where, sales of refrigerators (in thousands) and are seasonal dummies defined as,
Where, {

Table-1.3:Quarterly Data on Sales of refrigerators (in thousands) with seasonal dummies


Quarter FRIG D2 D3 D4 Quarter FRIG D2 D3 D4
1978-I 1317 0 0 0 1984-I 1277 0 0 0
-II 1615 1 0 0 -II 1258 1 0 0
-III 1662 0 1 0 . . . . .
-IV 1295 0 0 1 . . . . .
1979-I 1271 0 0 0 1985-I 1242 0 0 0
-II 1555 1 0 0 -II 1684 1 0 0
-III 1639 0 1 0 -III 1764 0 1 0
-IV 1238 0 0 1 -IV 1328 0 0 1

Page 9
From the data on refrigerator sales given in Table 1.3, we obtain the following regression results:

Where,the values in the brackets are student’s t-statistics for the respective coefficients.

Interpretation:
Since we are treating the first quarter as the benchmark, the coefficients attached to the various dummies are
differential intercepts, showing by how much the average value of in the quarter that receives a
dummy value of 1 differs from that of the benchmark quarter. Put differently, the coefficients on the seasonal
dummies will give the seasonal increase ordecrease in the average value of Yrelative to the base season. For
instance, the coefficient of quarter-2 dummy; average volume of refrigerator sales in season two
is larger than it in season one by units and the difference is statistically significant at 1% level of
significance. If you add the various differential intercept values to the benchmark average value of 1222.125,
you will get the average value for the various quarters.

1.2. Regression with Dummy Dependent Variables

So far in this chapter, we have been considering models where the explanatory variables are qualitative or
dummy variables.

In this section, we shall look at models where the dependent variable, Y is itself a qualitative variable. Such
models are called limited dependent variable models or qualitative or categorical variable models. This
dummy variable can take on two or more values. However, we shall concentrate on the binary case where
can take only two values. For example, we want to study the determinants of house ownership by households
in Arba Minch town. Since a household may either own a house or not, house ownership is a Yesor Nodecision.
Hence, the response variable, or regressand, can take only two values, say, 1 if the household owns a house and
0 if s/he does not own a house. A second example would be a model of women labor force participation (LFP).
The dependent variable in this case is the labor force participation (LFP) which would take the value of one if
the women do participate in the labor force and a value of zero if the women do not participate in the labor
force.A third example would be the poverty status of households in rural Ethiopia. The dependent variable in
this case is the poverty status of households which would take the value of one if the householdis poor or zero if
the household is non-poor.In all of the above instances, the dependent variable is a
dichotomous/dummy/binary/qualitative/limited variable.

Page 10
To explain the models of the above nature, varioustypes of explanatory variables could be included: both
continuous variables such as age and income and dichotomous variables such as sex and race or exclusively
continuous or exclusively qualitative.

If the dependent variable or the response variable (Y) is qualitative/dichotomous/limited/dummy or binary in


nature and the Y assumes only a restricted value, we call the model aLimited Dependent Variable
Model(LDVM).

Binary choice models are commonly used for micro analysis in social sciences and medical research. There are
threetypes of Limited Dependent Variable Models:
A. The Linear Probability Models
B. The Logit Models
C. The Probit Models

1.2.1 The Linear Probability Models


The linear probability model applies the linear model,

to the case where Y is a dichotomous variable taking the value zero or one, wherewe require the assumption
that .
 The conditional expectation of the observation is given by:
)= '
The conditional expectation of the dependent variable is equal to the probability of something happening,
.

The conditional expectation in this model has to be interpreted as the probability that given the
particular value of . In practice, however, there is nothing in this model to ensure that these probabilities will
lie in the admissible range (0, 1). Since can only take the two values of 0 or 1, it follows that the error term
can only take the two values of ' or ' .

 What do you think is the difference between a regression model where the regressand Y is
quantitative and a model where it is qualitative?
In a model where Yis quantitative, our objective is prediction of the average value of the dependent variable
from the given values of explanatory variables. That means, in such models the very objective of regression
analysis is to determine, ⁄ On the other hand, in models where Yis qualitative, our
objective is to predict the probability of something happening, such as owning a smartphone, or owning a
house, or being non-poor, etc. In other words, in binary regression models,
)= '
and hence,
Page 11
 That is why qualitative response regression models are often calledprobability models.

From the assumption that it follows that the probabilities of these two events are given by '
and ' , respectively. Thus, the probability distribution of , can be given by the following table.
Value of Probability of Probability of
1 1- ' ' '
0 - ' 1- ' 1- '
Total 1 1 1

Both the dependent variable ( ) and the error term ( ) assume only two values with some probabilities ( '
and ' ). If a variable assumes two values with a probability, that variable is called Bernoulli variable
and its distribution is called Bernoulli distribution. Therefore, the dependent variables do not follow a normal
distribution but they follow the Bernoulli distribution. The dependent variable follows the Bernoulli distribution
with a mean values of or ' and variance of or ' ( ' ). While the disturbance term
follows the Bernoulli distribution with a mean value of zero and variance of or ' (1- ' . As can
be seen, the variance of the disturbance term ( ) is not constant. It is a function of explanatory variables ( ).
Thus, the disturbance term in the case of LPM is heteroscedastic. Clearly, since its variance depends on , is
heteroscedastic. Therefore, the OLS estimation of LPMsare not be efficient.

Limitations of Linear Probability Models (LPM)

The problem with the linear probability model (LPM) is that it models the probability of as being linear.
That means, If we were to use OLS regression line, we would get some straight
line, perhaps at higher values of X we would get values of Y above 1 and for low values of X we would get
values of Y below 0. But we cannot have probabilities that fall below 0 or above 1.

If the values of the dependent variable ( ) are limited or discrete, we cannot use ordinary least square method
(OLS). This is because, unless we also restrict the values of the explanatory variable ( ), we may get negative
values of dependent variable for small values of explanatory variable and values greater than one for large
values of the explanatory variable. The estimated probability lies outside the limit zero and one.

The following are the main problems with linear probability models:
A. The dependent ( ) and the disturbance term ( ) are not normally distributed
One of the assumption of linear regression model is that the disturbance term ( ), the dependent variable ( ) and
OLS estimates ( ̂ ) are all normally distributed. But, as we have noted above, both the dependent variable and the
disturbance term followsBernoulli distribution. The error term ( ) follows Bernoulli distribution with average

Page 12
value of zero and variance of . Similarly, the dependent variable follows Bernoulli distribution with
mean and variance of .
[ ]
[ ]
Both the variance of and are the same and a function of explanatory variable ( ). Thus,
the variance of the error term, var( ), is not constant or it varies with the values of explanatory variable ( ).
Since the error term ( ) is not normally distributed, so does the distribution of our estimates. If the estimates
are not normally distributed, we cannot use them for hypothesis testing and estimation (inference). But, this
limitation of LPM is not a series problem because as sample size increases, the distribution of the error term
( ) approaches normal distribution.

B. Violation of the assumption of a homoskedastic error term


One of the assumptions of linear regression model is that the variance of the disturbance term is constant
(homoscedastic). Since the variance of the error term and the variance of the dependent variable is the same, the
assumption of homoscedastic variance implies that the average deviation of the dependent variable ( ) from
estimated value is constant. However, in LPM, the variance of the disturbance term ( ) is given by the
following expression:

The variance of the error term is a function of the explanatory variable ( ) which shows that the variance of
is heteroskedastic. That is, the variance of the error term varies with the explanatory variable, .

C. Violation of one of the axioms of probability (the probability may lie outside the interval; )
Since in the linear probability models measure the conditional probability ( ) of the event Yoccurring
given X, it must necessarily lie between 0 and 1. Although this is true a priori, there is no guarantee that ̂ the
estimators of ⁄ , will necessarily fulfill this restriction, and this is the realproblem with the
estimation of the LPMsby OLS. There are two ways to get rid of this problem. One is to estimate the LPM by
the usual OLS method and find out whether the estimated ̂ lie between 0 and 1. If some are less than 0 (that is,
negative), we equate ̂ with zero ( ),if they are greater than 1, can be equated with one ( ). The
second solution is to devise an estimating technique that will guarantee that the estimated conditional
probabilities ̂ will lie between 0 and 1. In this respect, the Logit and Probit models will guarantee that the
estimated probabilities will indeed lie between the logical limits 0 and 1.

Page 13
1
.5

Predicted probability ( )
0

5 10 15 20 25
Income
Figure 1.3: The scatter plot of the actual Y and predicted probabilities against the explanatory variables

As the above scatter plot shows(which is drawn based on the data on home ownership status given in table-1.4
below), the predicted probability ( ) lies outside the limiting values, for some values of . This
violet one of the assumption of the axioms of probability which states that probability lies only between 0 and 1.

D. The conventionally computed value of is not dependable


To see why, consider figure 1.3 above. Corresponding to a given X, Y is either 0 or 1. Therefore, all the Yvalues
will either lie along the X axis or along the line corresponding to 1. That is, when we regress on the given
explanatory variable ( ) in the case of linear regression model, we will get the fitted regression line given
above (i.e., Figure_1.3). Therefore, generally no LPM is expected to fit such a scatter well. As a result, the
conventionally computed is likely to be much lower for such models. In most practical applications, the
ranges between 0.2 to 0.6.

Numerical Example
To illustrate the method of Linear Probability Modeland the points raised above about it, let‟s assume that we
want to study home ownership statusof households in a given town. Assume that we have data for 40
households on home ownership status and income of households (in thousands ofBirr) as given below.
Table-1.4: Hypothetical data on home ownership status ( if owns home, otherwise)
and income, X (thousands of Birr)
Household Y X Household Y X
1 0 8 21 1 24
2 1 20 22 0 16
3 1 18 23 0 12
4 0 11 24 0 11
5 0 12 25 0 16
6 1 19 26 0 11
7 1 20 27 1 20
8 0 13 28 0 18

Page 14
9 1 20 29 0 11
10 0 10 30 0 10
11 1 17 31 1 17
12 1 18 32 0 13
13 0 14 33 1 21
14 0 25 34 1 20
15 1 6 35 0 11
16 1 19 36 0 8
17 1 16 37 1 17
18 0 10 38 1 16
19 0 8 39 0 7
20 1 18 40 1 17

Based on this data, the LPM estimated by OLS is as follows:


̂

Interpret the Regression Result


The intercept of gives the “probability” that a household with zero income will own a house. Since this
value is negative, and since probability cannot be negative, we treat this value as zero, which is sensible in the
present instance. The slope value of means that for a unit change in income (here ), on
average theprobability of owning a house increases by or 6.52 percent

Table-1.5 below shows the estimated probabilities, ̂ , for the various income levels. The mostnoticeable
feature of this table is that two estimated values are negative and two values are in excess of 1, demonstrating
clearly the point made earlier that, although is positive and less than 1, their estimators, ̂ ,need not
be necessarily positive or less than 1. This is one reason that the LPM is not the recommended model when the
dependent variable is dichotomous.

Table-1.5: Actual and predicted


̂ ̂ ̂ ̂
0 0.022 1 0.609 1 1.065 1 0.609
1 0.804 1 0.674 0 0.543 0 0.348
1 0.674 0 0.413 0 0.280 1 0.869
0 0.217 0 1.130 0 0.217 1 0.804
0 0.283 1 -0.109 0 0.543 0 0.217
1 0.739 1 0.739 0 0.217 0 0.022
1 0.804 1 0.543 1 0.804 1 0.609

Page 15
0 0.348 0 0.152 0 0.674 1 0.543
1 0.804 0 0.022 0 0.217 0 -0.043
0 0.152 1 0.674 0 0.152 1 0.609
Even if the estimated were all positive and less than 1, the LPM still suffers from the problem of
heteroscedasticity, which can be seen readily from (1.9). As a consequence, we cannot trust the estimated
standard errors reported in (1.10) above.

1.2.2. Alternatives to the Linear Probability Models (LPM)


In the previous section, we have seen the simplest and the less commonly used type of binary regression model,
called the linear probability model (LPM). Linear probability model (LPM) is plagued with the following
problems.
 Non-normality of the distribution of the disturbance term (Ui)
 Heteroscedastic disturbance term
 Low value of
 Predicted probability lies outside the natural interval,
We can remove the problem of non-normality of the error term by increasing the sample size. As sample size
increases, the distribution of the error term which follows Bernoulli distribution in LPM will approach normal
distribution. Similarly, we can also remove the problem of heteroscedasticty in LPM.

But, the serious limitation with LPM is that the predicted probability lies outside the natural limit: 0 and 1,
because LPM assumes a linear relationship between the predicted probability and the level of the explanatory
variable ( ). In other words, the problem with LPM is that it assumes that the probability of success or
something happening ( ) is a linear function of explanatory variable(s) ( ). That is, in our house ownership
example, the probability of owning a house ( ) changes by the same amount at both low and high level of
income.

Therefore, the fundamental problem with LPM is that ⁄ ⁄ increases linearly with
. That means, the marginal or incremental effect of remains constant at all levels of income ( ). Thus, in
our home ownership example we found that as X increases by a unit (Birr 1000), the probability of owning
ahouse increases by the same constant amount of . This is so whetherthe income level is Birr 8000, Birr
10,000, Birr 18,000, or Birr 24,000. As can be seen from figure-1.3 above, the predicted probability ( ) is a
linear function of explanatory variable (Xi) which seems patentlyunrealistic.

In reality one would expect that Pi is nonlinearly related to :at very low income a household will not own a
house but at a sufficiently high level of income, say, X*, it most likely will own a house. Any increase in
income beyond X* will have little effect on the probability of owning a house. Thus, at both ends of the income
distribution, the probability of owning a house will be virtually unaffected by a small increase in X.
Page 16
Therefore, what we need is a (probability) model that has these two features: (1) As increases,
⁄ increases but never steps outside the 0–1 interval, and (2) the relationship between and is
non-linear, that is, “one which approaches zero at slower and slower rates as gets small and approaches one at
slower and slower rates as gets very large.”

In this respect, two alternative nonlinear models (1) The Logit Model & (2) The Probit Model were proposed.

The very objective of the Logit and Probit models is to ensure/guarantee that the predicted probability of the
event occurring given the value of explanatory variable remains within the natural [0, 1] bound. That means,
for all X
This requires a nonlinear functional form for the probability. This can be possible if we assume that the
dependent or the error term ( ) follows some sorts of cumulative distribution functions (CDF). The two
important nonlinear functions which are proposed for this are the logistic CDF and the normal CDF.

Where, G is a function taking on values strictly between 0 and 1. That means, , for all real
numbers . This ensures that the predicted probability ( ) strictly lies between 0 and 1.

The logistic distribution function and the Cumulative Normal Distribution Function can be represented
graphically as follows.

Logistic Distribution Function:

= =
0
Cumulative Normal Distribution Function

( )
=∫

Figure1.4: Comparison of the two distribution functions which are commonly used in binary regression analysis

1.2.2.1. The Logit Model


For Logit model, the logistic cumulative distribution function, ( ) is defined as follows:

Page 17
Therefore, ,
Where, .
As ranges from to + , predicted probability of an event occurring ( ) ranges from 0 to 1. In other
words, as ranges from to + , the predicted probability of an event occurring ( ) ranges from 0 to 1.
Moreover, the predicted probability of an event occurring ( ) is non-linearly related with explanatory variable
( ). Thus, the Logit model satisfies the two conditions; namely (1) and (2) is non-linearly
related with .

The following scatter plot shows the relationship between the predicted probability of owning a house ( and
the level of explanatory variable (income). As can be seen from the figure, the predicted probability of owning a
house lies only in the natural limit, . The relationship between the predicted probability of an event
occurring ( ) and the explanatory variable is also nonlinear. That means, at lower and higher income level, the
change in the predicted probability of owning a house is small for a given level of income.
1
.8
.6
.4

Predicted probabilities ( )
.2
0

5 10 15 20 25

Income
Figure 1.5: The scatterplot of the actual Y and predicted probabilities against the explanatory variable

Therefore, when we use Logit model, we can limit theY estimated


Pr(y)
probabilities inside the 0-1 range as shown
above.

But, while we are ensuring that the predicted probability of an event occurring ( ) lies in the natural interval,
0 , we have created estimation problem because is nonlinear in parameters and in explanatory
variables and we cannot apply OLS. However, we can linearize the Logit model as follows.

If = probability of an event occurring/perhaps probability of owning a house

Page 18
= the probability of not owning a house

Take the ratio of the probability of an event happening ( ) to the probability of an event not happening ( )
and the resulting ratio is called odds ratio

Take the natural log of the above odds ratio and the resulting equation is called logit.

Where, is called Logit which is linearly related with


is the explanatory variable (this can be extended to a matrix of multiple explanatory
variables)
Characteristics of Logit Model
a. As goes from - , the predicted probability ( ) goes from 0 to 1. In other words, as Xi goes
from - , the predicted probability ( ) goes from 0 to 1. This implies that in Logit model, the
predicted probability ( ) lies in the natural limit, .

b.Even if the Logit (L) is linear in X, the probabilities themselves are not. This property is in contrast with
the LPM model where the probabilities increase linearly with X.
c. Although we have included only a single regressor, in the preceding model, one can add as many
regressors as may be dictated by the underlying theory.
d. If the Logit (L)is positive, it means that when the value of the regressor increases, the odds that the
regressand equals 1 (meaning some event of interest happens) increases. If the logit (L)is negative, the
odds that the regressand equals 1 decrease as the value of X increases. To put it differently, the Logit
becomes negative and increasingly large in magnitude as the odds ratio decreases from 1 to 0 and becomes
increasingly large and positive as the odds ratio increases from 1 to infinity.

e. More formally, the interpretation of the Logit model given above is as follows: , the slope, measures
the change in L for a unit change in X, that is, it tells how the log-odds in favor of owning a house change
as income changes by a unit(Birr 1000) in our example. The intercept is the value of the log odds in
favor of owning a house if income is zero.

Page 19
f. Given a certain level of income, say, X*, if we actually want to estimate not the odds in favor of owning a
house but the probability of owning a house itself, this can be done directly from , once the
estimates of are available.
g. Whereas the LPM assumes that is linearly related to , the Logit model assumes that the log of the
odds ratio is linearly related to .

Estimation and interpretation of the Logit model


The most common way to estimate binary response models is to use the method of maximum likelihood
(MML). ML estimation is maximizing the likelihood function with respect to the parameters. A parameter
vector at which the likelihood takes on its maximum value is called a maximum likelihood estimate (MLE)
of the parameters. Let‟s first construct likelihood function, i.e., joint distributions.

The likelihood contribution of observation i with is given by as a function of the unknown


parameter vector , and similarly for . Assuming the observations are independent, the likelihood
function (equation) for the entire sample is thus given by the joint probability. The joint density of the entire
sample is just the product of the densities of the individual observations.

That is, suppose we have a random sample of nobservations. Letting denote the probability that or
, the joint probability of observing the values, i.e., is given as:

Where, ∏ .

NB: The likelihood of a Bernoulli variable is the probability of success to the power of times the
probability of failure to the power of .

The joint probability given in Eq. (1.13) is known as the likelihood function (LF). If we take natural logarithm
of eq. (1.13), we obtain what is called the log likelihood function (LLF):

∑[ ]

∑[ ]

∑[ ( )] ∑

But we know that and ( )

Page 20
Therefore, we can rewrite the above LLF as:

∑[ ] ∑ ( )

As you can see from (1.14), the log likelihood function is a function of the parameters , since the
are known. In ML our objective is to maximize the LF (or LLF), that is, to obtain the values of the unknown
parameters in such a manner that the probability of observing the given is as high (maximum) as possible.
For this purpose, we differentiate (1.14) partially with respect to each parameter, set the resulting expressions to
zero and solve the resulting expressions.But the resulting expressions become highly nonlinear in the
parameters and no explicit solutions can be obtained. However, the estimates of the parameters can easily be
computed with the aid of Software packages such as EViews, Stata or any other Software Package. Thus, here
under we shall estimate Logit and Probit models using Stata.

Equally important is the interpretation of the results. Therefore, belowwe shalldiscuss on how to interpret the
regression results of Binary Logit Modelswith the aid of numerical examples (estimated through the principle of
Maximum Likelihood (MML) by Software packages). There are three ways of interpreting the regression
results of a Logit model:

A. Logit interpretation
B. Odds ratio interpretation
C. Probability interpretation (Marginal Effect Interpretation)

Example-1
Suppose that a Mathematical Economics instructorgave routine weekly exercises to her section “A” students in
the previous semester. She wants to study the effect of the performance in these routine exercises on the final
grade of students in the course, Mathematical Economics. At the end of the semester, she found average scores
in exercise (ASE) for each student.

Table -1.6: Cross-sectional Data on GPA, ASE, PC and GRADE for 32 Students
GRADE
Students GPA ASE PC (A=1, rest=0)
1 2.66 20 0 0
2 2.89 22 0 0
3 3.28 24 0 0
4 2.92 12 0 0
5 4 21 0 1
6 2.86 17 0 0
7 2.76 17 0 0
8 2.87 21 0 0
9 3.03 25 0 0
10 3.92 29 0 1
11 2.63 20 0 0

Page 21
12 3.32 23 0 0
13 3.57 23 0 0
14 3.26 25 0 1
15 3.53 26 0 0
16 2.74 19 0 0
17 2.75 25 0 0
18 2.83 19 0 0
19 3.12 23 1 0
20 3.16 25 1 1
21 2.06 22 1 0
22 3.62 28 1 1
23 2.89 14 1 0
24 3.51 26 1 0
25 3.54 24 1 1
26 2.83 27 1 1
27 3.39 17 1 1
28 2.67 24 1 0
29 3.65 21 1 1
30 4 23 1 1
31 3.1 21 1 0
32 2.39 19 1 1
The dependent variable is the final grade of students for Mathematical Economics ( , if the student scores
A and , otherwise). Other explanatory variables include the Grade Point Average (GPA) of students and
Personal –Computer ownership ( , if the student has PC and , if the student does not own PC).

We are going to estimate the following Logit model:

How can we interpret regression results?


In what follows, we present the estimation and interpretation of the model defined above. We can interpret the
regression results using:Logit, Odds ratio and marginal effect interpretations.

A. Logit Interpretation of the Estimated Logit Model


Table-1.7 below presents the logit regression result obtained using Stata14.
Table-1.7: Logistic Regression Result

Page 22
Logistic regression Number of obs = 32
LR chi2(3) = 15.40
Prob > chi2 = 0.0015
Log likelihood = -12.889633 Pseudo R2 = 0.3740

grade Coef. Std. Err. z P>|z| [95% Conf. Interval]

gpa 2.826113 1.262941 2.24 0.025 .3507938 5.301432


ase .0951577 .1415542 0.67 0.501 -.1822835 .3725988
pc 2.378688 1.064564 2.23 0.025 .29218 4.465195
_cons -13.02135 4.931325 -2.64 0.008 -22.68657 -3.35613

We have to interpret the coefficients of the regressors of the logit regression results as log of the odds ratio.
Coefficients of GPA and ASE in the table above measures thechange in the estimated logit for a unit change in
the value of regressors(holding other regressors constant). Thus,
 The GPA coefficient of 2.826 means, with other variables held constant, that if GPA increases by a unit,
on average the estimated logit increases by about 2.826 units, suggesting a positive relationship between
the two. Or holding other factors constant, as GPA increases by one point, the average logit value goes up
by about 2.83, that is, the log of odds in favor of scoring A grade increases by about 2.83 and it‟s
statistically significant as implied by the P-value of the Z-test.
 The holding other factors constant, as ASE increases by one point, the log of the odds ratio increases by
0.095, but this effect is statistically insignificant.
 Among students, students who have a PCcompared to those who do not have PC,the log of odds in favor
of scoring “A” grade increases by about 2.38and it‟s statistically significant as implied by the P-value of
the Z-test.
As you can see, all the regressors have a positive effect on the logit, although statistically the effect of ASE is
not significant. However, together all the regressors have a significant effect on scoring “A” grade, as the LR
statistic (an equivalent to F-test in linear regression) is 15.40, whose P- value is about 0.0015, which is very
small.

Note, however, that a more meaningful interpretation is in terms of odds, which are obtained by taking the
antilog of the various coefficients of the logit model. Thus, if you take the antilog of the GPA coefficient of
2.826 you will get ( ). As GPA of a student rises by a unit, then the student is about 16.88
times likely to get an “A”, other things remaining the same.Using Stata, we can easily estimate the odds ratio
for all regressors as in the following table.

B. Odds Ratio Interpretation


Table-1.8below presents the Odds ratio regression result of the logit model.
Page 23
Table-1.8: Odds Ratio Regression Result
Logistic regression Number of obs = 32
LR chi2(3) = 15.40
Prob > chi2 = 0.0015
Log likelihood = -12.889633 Pseudo R2 = 0.3740

grade Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

gpa 16.87972 21.31809 2.24 0.025 1.420194 200.6239


ase 1.099832 .1556859 0.67 0.501 .8333651 1.451502
pc 10.79073 11.48743 2.23 0.025 1.339344 86.93802
_cons 2.21e-06 .0000109 -2.64 0.008 1.40e-10 .03487

Therefore, the odds ratio interpretation is as follows:


 Holding other variables constant, as GPA increases by one point, the probability of scoring A is
16.87 times the probability of getting other grades (B, C, D, Fx, F).
 Holding other variables constant, as ASE increases by one point, the probability of scoring A is
about 1.1 times the probability of getting other grades (B, C, D, Fx, F) but it is statistically
insignificant.
 Holding other variables constant, students who have PC are 10.79 timesmore likely to score A than
other grades compared to non-PC owners and it is statistically significant at 5% level of
significance.

C. Marginal Effect Interpretation


Since the language of logit and odds ratio may be unfamiliar to some, we always compute the probability of
something happening. Table-1.9below presents the probability/marginal effect regression result of the logit
model.

Table-1.9: Marginal effects after Logit Regression Result

Marginal effects after logit


y = Pr(grade) (predict)
= .25282025

variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

gpa .5338589 .23704 2.25 0.024 .069273 .998445 3.11719


ase .0179755 .02624 0.69 0.493 -.033448 .069399 21.9375
pc* .4564984 .18105 2.52 0.012 .10164 .811357 .4375

(*) dy/dx is for discrete change of dummy variable from 0 to 1

Therefore, the marginal effect/probability interpretation of the estimated logit model given above is as follows:

Page 24
 As GPA increases by one point, the probability of scoring grade “A” by an average student increases by
53%, assuming that the other factors score average.

 As ASE increases by one point, the probability of scoring grade “A” by an average student increases by
about 1.8%, assuming that the other factors score average.

 For a student who owns PC the probability of scoring grade “A”is higher by 45.6% as compared to non-
PC owners, assuming that the other factors score average.

1.2.2.2 The Probit Regression Model


As we have noted, to explain the behavior of a dichotomous dependent variable we will have to use a suitable
CDF. The logit model uses the cumulative logistic function. But this is not the only CDF that one can use. In
some applications, the normal CDF has been found useful.The model that emerges from the normal CDF is
popularly known as the probit model, although sometimes it is also known as the normit model. As can be
seen from Figure-1.4 above, the logistic distribution has a flatter tail than the normal distribution. This is

because the variance of the logistic distribution, is greater than the variance of the standard normal

distribution, 1. The difference between the coefficients of Logit model and that of Probit model is accounted to
the difference in the variance of the two distributions.

The previous section, we have considered the Logit model. In this section, we shall present the Probit model.
We will try to develop the Probit model based on utility theory, or rational choice perspective on behavior, as
developed by McFadden.

Assume that in our house ownership example, the decision to own a house or not depends on an unobservable
utility index (also known as a latent variable), that is determinedby one or more explanatory variables, say
income , in such a waythat the larger the value of the index, the greater the probability of a household
owning a house. We express the index as

Where, is the income of the household.

How is the (unobservable) index related to the actual decision to own a house? As before, let if a
household owns a house and if it does not. Now, it is reasonable to assume that there is a critical or
thresholdlevel of the index, call it such that if exceeds , the household will own house, otherwise it will
not. The threshold , like , is not observable, but if we assume that it is normally distributed with the same
mean and variance, it is possible not only to estimate the parameters of the index given above, but also to get
some information about the unobservable index itself.

The probability that can be calculated as follows:

Page 25
Where, is the probability that a household owns a house. is the standard normal
variable which is normally distributed with a mean of and variance of is the standard normal CDF
which can be explicitly written as follows.

( )


The area under the standard normal curve from up to measures the probability of owning a house and
given by:




In general, the Probit model can be specified as follows.

, ~𝑵
{

( )

This model is called Probit model and can be estimated using Maximum Likelihood Method (MLM). The
coefficients from the Probit model are difficult to interpret because they measure the change in the unobservable
associated with a change in one of the explanatory variables. A more useful measure is what we call the
marginal effects.

Example
Let‟s revisit our grade example data given in table-1.6 abovewhich gives data on 32 students about their final
grade in Mathematical Economics examination in relation to the variables GPA, ASE, and PC. Let us see what
the Probit results look like. The regressionresults estimated based on the method of maximum likelihood using
Stata14 are given in the table below.
Table-1.10: Probit Regression Result

Page 26
Probit regression Number of obs = 32
LR chi2(3) = 15.55
Prob > chi2 = 0.0014
Log likelihood = -12.818803 Pseudo R2 = 0.3775

grade Coef. Std. Err. z P>|z| [95% Conf. Interval]

gpa 1.62581 .6938825 2.34 0.019 .2658255 2.985795


ase .0517289 .0838903 0.62 0.537 -.1126929 .2161508
pc 1.426332 .5950379 2.40 0.017 .2600795 2.592585
_cons -7.45232 2.542472 -2.93 0.003 -12.43547 -2.469166

Interpretation of the estimated coefficients of the Probit model:


Estimated coefficients do not quantify the influence of the regressors on the probability that the regressand takes
on the value one because these coefficients are parameters of the latent model. Thus, there is no any standard
and convincing way to interpret the coefficients from the output of a Probit regression. We need to interpret the
marginal effects of the regressors, that is, how much the (conditional) probability of the outcome variable
changes when you change the value of a regressor, holding all other regressors constant at some values. As far
as the coefficients of a Probitmodel are concerned, what we need to do is to explainon the implication of the
coefficients. For instance,
 The coefficient of GPA is significantly different from zero (at 5% level) indicating relevant relationship
between scoring “A” grade and student‟s GPA.
 The coefficient of PC is significantly different from zero (at 5% level) indicating relevant relationship
between scoring “A” grade and PC ownership.

In short, students with higher GPA and PC are more likely to score “A” compared to their counterparts.

In general, “qualitatively”, the results of the Probit model are comparable with those obtained from the logit
model in that GPA and PC are individually statistically significant while the variable ASE is statistically
insignificant in both models. Collectively, all the coefficients are statistically significant, since the value of the
LR statistic is with a p value of .

Table-1.10: Marginal Effect of the ProbitModel

Page 27
Marginal effects after probit
y = Pr(grade) (predict)
= .26580809

variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

gpa .5333471 .23246 2.29 0.022 .077726 .988968 3.11719


ase .0169697 .02712 0.63 0.531 -.036184 .070123 21.9375
pc* .464426 .17028 2.73 0.006 .130682 .79817 .4375

(*) dy/dx is for discrete change of dummy variable from 0 to 1

The marginal effect results and their interpretation in a Probit model are similar to that of logit model.
Therefore, the marginal effect/probability interpretation of the estimated Probit model presented above is as
follows:
 As GPA increases by one unit, the probability of scoring grade “A” by an average student increases by
53%, holding other factors constant.

 As ASE increases by one point, the probability of scoring grade “A” by an average student increases by
about 1.7%, holding other factors constant.

 For a student who owns PC the probability of scoring grade “A”is higher by 45.6% as compared to non-
PC owners, holding other factors constant.

1.2.2.3. Joint Significance in Qualitative Response Regression Models


As far as joint significance is concerned, as equivalent of the F test(in the linear regression model) there various
ways of testing multiple restrictions in Probit and Logit models.
The most commonly used test, and most easily calculated test is the last, Likelihood Ratio Test. It is used when
we wish to test our exclusion restrictions, i.e. whether we should or should not exclude a set of variables. The
idea is a simple one, since what we are maximizing is the log likelihood function, and since as variables are
excluded from the regression relationship, the objective function falls. The question then is whether we have a
significant fall in the log likelihood function value. Like theF-test in linear regression models, in LR test the
joint hypothesis to be tested is that all the explanatory variables are simultaneously irrelevant versus at least one
of the regressors is relevant. The likelihood ratio statistic is just twice the difference in the log likelihood
functions of the two models:-the unrestricted (the model with all regressors included) and the restricted (the
model with only intercept term):

where and are the log likelihoods for the unrestricted and restricted models.

Page 28
Given the null hypothesis, the LR statistic follows the distribution asymptotically with df equal to the
number of explanatory variables.

Measuring Goodness-of-Fit in Logit and Probit Models


A goodness-of-fit measure is a summary statistic indicating the accuracy with which the model approximates
the observed data. However, the conventional measure of goodness of fit, , is not particularly meaningful in
binary regressand models. In the case in which the dependent variable is qualitative, accuracy can be judged
either in terms of the fit between the calculated probabilities and observed response frequencies or in terms of
the model's ability to forecast observed responses. Measures similar to , called pseudo , are available.Note,
however, thatcontrary to the linear regression model, there is no single measure for the goodness-of-fit in binary
choice models and a variety of measures exists.The most common goodness of fit measure is the one that is
proposed by McFadden(1974), which is defined as:

Pseudo

Where, as before is the log likelihood value you obtain from the unrestricted model, and is
thatgenerated by a regression (either Probit or Logit. But note that this goodness of fit measure is
also used and reported often by statistical packages for any regression that involves MLE) with
only the intercept.

 The Pseudo measures the fit using the likelihood function and measures the improvement in the value
of the log likelihood relative to having no explanatory variables ( ). For instance, in our estimated Logit
and Probit models above, we see that the . This suggests that the log-likelihood
value increases by about 37% with the introduction of the set of regressors in the models.

 Similar to the of the linear regression model, it holds that . An increasing Pseudo
may indicate a better fit of the model, whereas no simple interpretation like for the of the linear
regression model is possible.

Remarks:
Between Logit and Probit, which model is preferable? In most applications the models are quite similar, the
main difference being that the logistic distribution has slightly fatter tails. That is to say, the conditional
probability approaches zero or one at a slower rate in Logit than in Probit.Therefore, there is no compelling
reason to choose one over the other. In practice many researchers choose the Logit model because of its
comparative mathematical simplicity.

Though the models are similar, one has to be careful in interpreting the coefficients estimated by the two
models. For example, for our grade example, the coefficient of GPA of 1.6258 of the Probit model and 2.826 of
the logit model are not directly comparable. The reason is that, although the standard logistic (the basis of logit)

Page 29
and the standard normal distributions (the basis of probit) both have a mean value of zero, their variances are
different; 1 for the standard normal and for the logistic distribution, where . Therefore, if you
multiply the Probit coefficient by about (which is approximately √ ), you will get approximatelythe
logit coefficient. For our example, the Probit coefficient ofGPA is . Multiplying this by , we obtain
, which is close to the Logit coefficient. Alternatively, if you multiply a logit coefficient by
you will get the Probit coefficient. Amemiya, however, suggestsmultiplying a logit estimate
by to get a better estimate of the correspondingProbit estimate. Conversely, multiplying a Probit
coefficient by gives the corresponding logit coefficient.

In general, in practical applications, we follow the suggestion of Amemiya that Logit coefficients are about 1.6
the corresponding Probit coefficients and Probit coefficients are about the corresponding Logit
coefficients. Similarly, the coefficients of LPM are about the corresponding Logit coefficient except for
the intercept. The intercept of LPM is about 0.25 the intercept of Logit plus 0.5. Note, however, that the two
models are the same in magnitude of marginal effect as well as sign of the coefficients of the independent
variables.

Exercise

Assume that we want to study the impact of family size (below 18 years) of a household and the average annual
hours of work by household on the poverty level of a household in a given Village. Suppose that we take a
sample of 20 households from a particular village and obtained the following data on the family size (FS), hours
of work (HRS) and poverty level (Yi) of households(Yi =0, if the household is above the poverty line and Yi
=1, if the house hold is below the poverty line). We use Probit regression to see the effect of family size and
hours of work on the poverty level of a household.

Family Size Hours of Work Poverty


Households (FS) (HRS) Status (Y)

4 200 0
1
2 1 300 0
3 2 600 0
4 1 1000 0
5 8 400 1
6 9 0 1
7 3 900 0
8 2 0 0
9 1 0 1
10 2 0 0
11 1 1000 0
12 4 2000 0
13 3 1000 0
14 6 300 1

Page 30
15 3 1000 0
16 4 1000 1
17 5 200 0
18 8 100 1
19 9 0 1
20 10 0 1

Regression results from Probit Model

. probit Y FS HRS

Iteration 0: log likelihood = -12.93196


Iteration 1: log likelihood = -6.3706877
Iteration 2: log likelihood = -6.2174937
Iteration 3: log likelihood = -6.215108
Iteration 4: log likelihood = -6.2151078

Probit regression Number of obs = 19


LR chi2(2) = 13.43
Prob > chi2 = 0.0012
Log likelihood = -6.2151078 Pseudo R2 = 0.5194

Y Coef. Std. Err. z P>|z| [95% Conf. Interval]

FS .4252467 .1920025 2.21 0.027 .0489287 .8015647


HRS -.0009706 .0008239 -1.18 0.239 -.0025853 .0006441
_cons -1.441228 .8421944 -1.71 0.087 -3.091899 .2094426

. mfx

Marginal effects after probit


y = Pr(Y) (predict)
= .45756724

variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

FS .1686884 .07972 2.12 0.034 .012445 .324932 4.31579


HRS -.000385 .00033 -1.18 0.238 -.001025 .000255 515.789

Since it is difficult to estimate the coefficient of Probit model, we will interpret the coefficient of the marginal
effects.

Regression results from Logit Model for the above Poverty data

Page 31
. logit Y FS HRS, nolog

Logistic regression Number of obs = 19


LR chi2(2) = 13.20
Prob > chi2 = 0.0014
Log likelihood = -6.3299557 Pseudo R2 = 0.5105

Y Coef. Std. Err. z P>|z| [95% Conf. Interval]

FS .7221587 .3533792 2.04 0.041 .0295482 1.414769


HRS -.0015087 .0014825 -1.02 0.309 -.0044145 .001397
_cons -2.518599 1.611066 -1.56 0.118 -5.676231 .6390321

. xi:logit Y FS HRS, or nolog

Logistic regression Number of obs = 19


LR chi2(2) = 13.20
Prob > chi2 = 0.0014
Log likelihood = -6.3299557 Pseudo R2 = 0.5105

Y Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

FS 2.058873 .7275628 2.04 0.041 1.029989 4.115536


HRS .9984924 .0014803 -1.02 0.309 .9955953 1.001398
_cons .0805724 .1298074 -1.56 0.118 .0034264 1.894646

. mfx

Marginal effects after logit


y = Pr(Y) (predict)
= .45509545

variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

FS .1790835 .09312 1.92 0.054 -.003424 .361591 4.31579


HRS -.0003741 .00037 -1.02 0.308 -.001094 .000346 515.789

Assignments:

A. Interpret the Probit Model


B. Interpret the Logit Model
C. Compare the regression results of the two models

Page 32
CHAPTER TWO
INTRODUCTION TO BASIC TIME SERIES DATA ANALYSIS

Introduction
Time series data is one of the important types of data used in empirical analysis. Time series analysis comprises
methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the
data. It is an attempt to understand the nature of the time series and developing appropriate models useful for
forecasting. This chapter is devoted to the discussionof introductory concepts in time series analysis for two
major reasons:
1. Time series data is frequently used in practice.
2. Time series data analysis poses several challenges to econometricians and practitioners.

The following are some of the challenges in time series analysis:


A. Serial correlation/nonindependence of observations: Autocorrelation is common in time series
analysis, because the underlying time series is often nonstationary.
B. Spurious or false regression -in regressing a time series variable on (an)other time series variable(s),
one often obtains a very high (in excess of 0.9) even though there is no meaningful relationship
between the two variables. Sometimes we expect no relationship between two variables, yet a regression
of one on the other variable often shows a significant relationship. This situation exemplifies the
problem of spurious, or nonsense, regression. It is therefore very important to find out if the
relationship between economic variables is spurious or false.
C. Some financial time series, such as stock prices, exhibit what is known as the random walk
phenomenon, that is, such series are non-stationary. Therefore,forecasting in such series would be a
futile exercise.
D. Causality tests (such as the Granger and Sims) assume that the time series involved in analysis are
stationary. Therefore, tests of stationarity should precede tests of causality.

Note that almost all of the above problems arise mainly due to non-stationarity of time series data sets.
Therefore, our major emphasis in this chapter is on the discussion of the nature and tests of stationarity, and the
remedial measures for non-stationary data sets.

2.1 The Nature of Time Series Data


An obvious characteristic of time series data which distinguishes it from cross-sectional data is that a time series
data set comes with a temporal ordering. For instance, in this chapter, we will discuss a time series data set on
GDP(Y), trade balance (TB), money supply (MS2), exchange rate (EX) and government expenditure (G) of
Ethiopia for the periods 1963-2003 E.C.. In this data set, we must know that the data of 1963 immediately
precedesthe data of 1962. For analyzing time series data, we must recognize that the past can affect the future,
Page 33
but not vice versa. Table2.1.1 below gives a listing of the data on five macroeconomic variables in Ethiopia for
the period 1963-2003.

In Econometrics I, we studied the statistical properties of the OLS estimators based on the notion that samples
were randomly drawn from the appropriate population. Understanding why cross-sectional data should be
viewed as random outcomes is fairly straightforward: a different sample drawn from the population will
generally yield different values of the variables. Therefore, the OLS estimates computed from different random
samples will generally differ, and this is why we consider the OLS estimators to be random variables.

How should we think about randomness in time series data? Certainly, economic time series satisfy the
intuitive requirements for being outcomes of random variables. For example, today we do not know what the
trade balance of Ethiopia will be at the end of this year. We do not know what the annual growth in output will
be in Ethiopia during the coming year. Since the outcomes of these variables are not foreknown, they should
clearly be viewed as random variables.

Table2.1.1
Time series Data on some macroeconomic variables in Ethiopia
Year TB MS2 GDP EXR G
1963 0.69 629.6 9,400 2.4000 303.99
1964 0.69 658.3 9,873 2.3000 316.19
1965 1.04 808 9,892 2.1900 348.14
1966 1.15 1066.6 10,353 2.0700 368.57
1967 0.71 1139.4 11,412 2.0700 442.89
1968 0.79 1421.8 11,145 2.0700 551.99
1969 0.86 1467.9 11,916 2.0700 654.73
1970 0.84 1682.2 13,221 2.0700 752.34
1971 0.61 1848 13,890 2.0700 942.56
1972 0.65 2053.2 15,143 2.0700 993.59
1973 0.62 2377.6 16,135 2.0700 1,017.56
1974 0.47 2643.7 16,530 2.0700 1,090.74
1975 0.29 3040.5 17,498 2.0700 1,244.54
1976 0.45 3383.7 19,655 2.0700 1,506.13
1977 0.42 3849 17,865 2.0700 1,454.12
1978 0.43 4448.2 21,517 2.0700 1,524.52
1979 0.36 4808.7 22,367 2.0700 1,636.04
1980 0.34 5238.7 23,679 2.0700 1,726.70
1981 0.43 5705 24,260 2.0700 2,066.50
1982 0.40 6708.2 25,413 2.0700 2,336.99
1983 0.29 7959.2 27,323 2.0700 2,467.53
1984 0.18 9010.7 31,362 2.0700 2,416.75
1985 0.26 10136.7 34,621 4.2700 2,819.51
1986 0.30 11598.7 43,171 5.7700 3,770.58
1987 0.43 14408.5 43,849 6.2500 4,220.57
1988 0.35 15654.8 55,536 6.3200 5,378.98
1989 0.46 16550.6 62,268 6.5000 5,671.26
1990 0.44 18585.3 64,501 6.8800 5,984.25
1991 0.31 19399.0 62,028 7.5100 7,069.36
1992 0.35 22177.8 66,648 8.1400 11,921.90
1993 0.31 24516.2 68,027 8.3300 9,963.85
1994 0.27 27322.0 66,557 8.5400 9,873.38

Page 34
1995 0.26 30469.6 73,432 8.5809 9,849.58
1996 0.23 34662.5 86,661 8.6197 11,315.21
1997 0.23 40211.7 106,473 8.6518 13,203.04
1998 0.22 46377.4 131,641 8.6810 16,080.46
1999 0.23 56651.9 171,989 8.7943 18,071.82
2000 0.22 68182.1 248,303 9.2441 24,364.45
2001 0.19 82509.8 335,392 10.4205 27,592.06
2002 0.24 104432.4 382,939 12.8909 37,527.99
2003 0.33 145377.0 511,157 16.1178 50,093.38

Formally, a sequence of random variables indexed by time is called a stochastic process or a time series
process(“stochastic” is a synonym for random). When wecollect a time series data set, we obtain one possible
outcome, or realization, of the stochasticprocess. We can only see a single realization, because we cannot go
back in timeand start the process over again (this is analogous to cross-sectional analysis where wecan collect
only one random sample). However, if certain conditions in history had beendifferent, we would generally
obtain a different realization for the stochastic process,and this is why we think of time series data as the
outcome of random variables. Theset of all possible realizations of a time series process plays the role of the
populationin cross-sectional analysis.

Arandom or stochastic process is a collection of random variables ordered in time.We let Y denote a random
variable. We will use the notation Yt to express the Y value in time t. In what sense can we regard GDP as a
stochastic process? Consider forinstance the GDP of 9.4 billion Birr of Ethiopia for the period 1963. In theory,
the GDP figure forthe period 1963 could have been any number, depending on the economicand political
climate then prevailing. The figure of 9.4 billion Birr is a particularrealization of all such possibilities.
Therefore, we can say that GDP isa stochastic process and the actual values we observed for the period 1963-
2003 are a particular realization of that process (i.e. the sample). The distinctionbetween the stochastic process
and its realization is akin to the distinctionbetween population and sample in cross-sectional data. Just as weuse
sample data to draw inferences about a population, in time series weuse the realization to draw inferences about
the underlying stochasticprocess.

2.2 Stationary and Non-stationary Stochastic Processes


A type of stochastic process that has received a great deal of attention by time series analysts is the so-called
stationarystochasticprocess. Stationarity means that the probabilistic character of the series must not change
over time, i.e., that any section of the time series is “typical” for every other section with the same length. In
other words, a stochastic process is said to be stationary if itsmean and variance are constant over time and the
value of the covariancebetween any two time periods depends only on the distance or gap or lag betweenthe two
time periods and not the actual time at which the covariance is computed. In the time series literature, such a
stochastic process is known as a weakly stationary, or covariance stationary, or second-order
stationary.This type of stationarity is sufficient for applied time series analysis, and strict stationarity is a

Page 35
practically useless concept.It is a necessary condition for building a time series model that is useful for future
forecasting.To explain weak stationarity, let be a stochastic time series with these properties:
Mean: (1)
Variance: = (2)
Covariance: ( ) = [ ] (3)

Where, the covariance (or autocovariance) at lag k, is the covariance between the values of and , that is,

between two Y values k periods apart. If , we obtain , which is simply the variance of if

, is the covariance between two adjacent values of .

In short, if a time series is stationary, its mean, variance,and autocovariance (at various lags) remain the same
no matter at what pointwe measure them; that is, they are time invariant. Such a time series will tend to return
to its mean (called mean reversion) and fluctuations around this mean (measured by its variance) will have
broadly constant amplitude. If a time series is not stationary in the sense just defined, it is called a non-
stationary time series. In other words, non-stationary time series will have a time varyingmean or a time-
varying variance, or both.

Why are stationary time series so important?There are two major reasons.
1. If a time series is non-stationary, we can study its behavior only for the time period under consideration.
Each set of time series data will therefore be for a particular episode. As a result, it is not possible to
generalize it to other time periods. Therefore, for the purpose of forecasting or policy analysis, such
(non-stationary) time series may be of little practical value.
2. If we have two or more non-stationary time series, regression analysis involving such time series may
lead to the phenomenon of spurious or non-sense regression.

How do we know that a particular time series is stationary? In particular, are the time series shown in
Figure 2.2.1 stationary? If we depend on common sense, it would seem that the time series depicted in the
above five figures are non-stationary, at least in the mean values. This is because some of them are trending
upward while others are trending downwards.

Exercise: for the graphs in Figure 2.2.1, indicate if the time series are stationary.

Figure 2.2.1
Nine examples of time series data; (a) Google stock price for 200 consecutive days; (b) Daily change in the Google stock
price for 200 consecutive days; (c) Annual number of strikes in the US; (d) Monthly sales of new one-family house (e)
Annual price of a dozen eggs in the US (constant dollars); (f) Monthly total of pigs slaughtered in Victoria, Australia; (g)
Annual total of lynx trapped in the McKenzie River district of north-west Canada; (h) Monthly Australian beer
production; (i) Monthly Australian electricity production. (Hyndman &Athanasopoulos, 2018)

Page 36
Answer: b and g are the only stationary time series.

Although our interest is in stationary time series, we often encounter non-stationary time series, the classic
example being the random walk model (RWM). It is often said that asset prices, such as stock prices follow a
random walk; that is, they are non-stationary. We distinguish two types of random walks: (1) random walk
without drift (i.e., no constant or intercept term) and (2) random walk with drift (i.e., a constant term is present).

A. Random Walk without Drift


A random walk is defined as a process where the current value of a variable is composed of the past value plus
an error term defined as a white noise (a normal variable with zero mean and variance one). Algebraically a
random walk is represented as follows:
(4)
In the random walk model, as (4) shows the value of at time is equal to its value at time plus a
random shock. We can think of (4) as a regression of at time ton its value lagged one period.

Now from (4) we can write


= +
= + = +

Page 37
= + = +
In general, if the process started at some time 0 with a value of , we have
= +∑

As the preceding expression shows, the mean of is equal to its initial, or starting, value, which is constant, but
as increases, its variance increases indefinitely, thus violating a condition of Stationarity. In short, the RWM
without drift is a non-stationary stochastic process. In practice is often set at zero, in which case
. An interesting feature of RWM is the persistence of random shocks (random errors), which is clear from (5):
is the sum of initial plus the sum of random shocks. As a result, the impact of a particular shock does not
die away. For example, if = 2 rather than = 0, then all ‟s from onward will be 2 units higher and the
effect of this shock never dies out. That is why random walk is said to have an infinite memory.

Interestingly, if we write (4) as


(8)
Where, is the first difference operator.

it is easy to show that, while is non-stationary, its first difference is stationary. In other words, the first
differences of a random walk time series are stationary. But we will have more to say about this later.

B. Random Walk with Drift


Let us modify (4) as follows
(9)
Where,δis known as the drift parameter. The name drift comes from the fact that if we write the preceding
equation as
(10)
It shows drifts upward or downward, depending on being positive or negative. Following the procedure
discussed for random walk without drift, it can be shown that for the random walk with drift model (9),

)= (11)
)= (12)

Page 38
As can be seen from above,for RWM with drift the mean as well as the variance increases over time, again
violating the conditions of (weak) Stationarity. In short, RWM with or without drift, is a non-stationary
stochastic process.

Remark
 The random walk model is an example of what is known in the literature as a unit root process. Since
this term has gained tremendous currency in the time series literature, we need to notewhat a unit root
process is.
 Let‟s rewrite the RWM (4) as:

 If , (13) becomes anRWM (without drift). If is in fact , we face what is known as the unit root
problem, that is, a situation of non-stationarity; we already know that in this case the variance of is not
stationary. The name unit root is due to the fact that . As noted above, the first differences of a
random walk time series/a unit root process are stationary. Thus, the terms non-stationarity, random
walk, and unit root can be treated as synonymous.

2.3 Trend Stationary- and Difference Stationary Stochastic Process


The distinction between stationary and non-stationary stochastic processes (or time series) has a crucial bearing
on whether the trend (the slow long-runevolution of the time series under consideration) in the economic time
series is deterministic or stochastic.If the trend in a time series is completely predictable and not variable, we
call it a deterministic trend, whereas if it is not predictable, we call it a stochastic trend. To make the
definition more formal, consider the following model of the time series .
Pure random walk:Consider the following model

Which is nothing but an RWM without drift and is therefore non-stationary. But note that, if we write (14) as
(15)
it becomes stationary, as noted before. Hence, an RWM without drift is a difference stationary process (DSP).

Random walk with drift:

which is a random walk with drift and is, therefore, non-stationary. If we write it as

this means will exhibit a positive (if ) or negative ( ) trend. Such a trend is called a stochastic
trend. Equation (16) is a difference stationary process (DSP)because the non-stationary in can be
eliminated by taking first differences of the time series.

Page 39
Deterministic trend: a trend stationary process (TSP) has a data generating process of:

Although the mean of is , which is not constant, its variance is. Once the values of and are
known, the mean can be forecast perfectly. Therefore, if wesubtract the mean of from the resulting series
will be stationary, hencethe name trend stationary. This procedure of removing the (deterministic)trend is
called detrending.

To see the difference between stochastic and deterministic trends, consider Figure 2.3.1 (next page). The
series named stochastic in Figure 2.3.1 is generated bya RWM: , where 500 values of
were generated from a standard normal distribution and where the initial value of was set at 1.The series
named deterministic is generated as follows: , wherethe swere generated as above and where
is time measured chronologically.

As you can see inFigure 2.3.1, in the case of the deterministic trend,the deviations from the trend line (which
represents nonstationary mean) are purely random and they die out quickly, i.e. they do not contribute to the
long-run development of the time series, which is determined by the trend component . In the case of the
stochastic trend, on the other hand, the random component affects the long-run course of the series .

Page 40
Figure 2.3.1
Deterministic Versus Stochastic Trend

Summarizing, a deterministic trendis a nonrandom function of time whereas a stochastic trend is random and
varies over time.The simplest model of a variable with a stochastic trend is the random walk. According to
Stock and Watson (2007), it is more appropriate to model economic time series as having stochastic rather than
deterministic trends. Therefore, our treatment of trends in economic time series focuses mainly on stochastic
rather than deterministic trends, and when we refer to “trends” in time series data, we mean stochastic trends.

2.4 Integrated Stochastic Process


The random walk model is a special case of a more general class of stochastic processes known as integrated
processes. Recall that the RWM without drift is non-stationary, but its first difference is stationary. Therefore,
we call the RWM without drift integrated of order 1, denoted as . Similarly, if a time series has to be
differenced twice (i.e., take the first difference of the first differences) to make it stationary, we call such a time
series integrated of order 2, denoted as . In general, if a (non-stationary) time series has to be differenced
times to make it stationary, that time series is said to be integrated of order d. The order of integration is the
number of differencing operations it takes to make the series stationary. For the random walk without drift,
there is one differencing operation it takes to make the series stationary, so it is an I(1) series. A time series
integrated of order is denoted as .If a time series is stationary to begin with, it is said to be
integrated of order zero, denoted by . Thus, stationary time series and time series integrated of order
zero are generally the same.

Most economic time series are generally ; that is, they generally become stationary only after taking their
first differences. In our example above, trade balance of Ethiopia is integrated of order one, But, the data
on government expenditure (G) and exchange rate (EXR) of Ethiopia are integrated of order two, .
Page 41
2.5 Test Stationarity of Time Series Data

In many time series analyses, one of the most important preliminary steps in regression analysis is to uncover
the characteristics of the data used in the analysis. The main goals of undertaking stationarity test are to get a
variable which has a constant mean, variance and time invariant covariance of the variables called second order
stationary or covariance stationary. Following the stationary test, if the variables are non-stationary, we cannot
use the data for forecasting purpose unless the series is transformed to a stationary one.

Thus, now we may have two important practical questions:


(1) How do we find out if a given time series is stationary?
(2) If we find that a given time series is not stationary, is there a way that it can be made stationary? We
now discuss these two questions in turn.
There are basically three ways to examine the stationarity of a time series, namely: (1) graphical analysis, (2)
correlogram, and (3) the unit root test.

2.5.1 Graphic Analysis

As noted earlier, before one pursues formal tests, it is always advisable to plot the time series under study, as we
have done above for the data given in Table-2.1.1. Such a plot gives an initial clue about the likely nature of the
time series. Take, for instance, the trade balance time series shown in Figure-2.5.1. You will see that over the
period of study,trade balance has been declining, that is, showing a downward trend, suggesting perhaps that the
mean of the TB has been changing. This perhaps suggests that the TB series is not stationary. Such an intuitive
feel is the starting point of more formal tests of stationarity.

Page 42
Figure 2.5.1 The trend of Money Supply (MS2) of Ethiopia
The Trade Balance (TB) of Ethiopia through time through Time

Figure 2.5.2
Figure 2.5.3
The Exchange Rate (ETH/$US) of Ethiopia through
time

Figure2.5.4
The Government Expenditure of Ethiopia through
time

Page 43
Some of the above time series graph have an upward trending (MS2, EX and G) which may be an indication
of the non-stationarity of these data sets. That means, the mean or variance or both may be increasing with

the passage of time. But, the graph for Trade Balance (TB) of Ethiopia ( showed a downward

trend which may also be an indication of the non-stationarity of the trade balance (TB) series.

2.5.2 Autocorrelation Function (ACF) and Correlogram


Autocorrelation is the correlation between a variable lagged one or more periods and itself. The
correlogram or autocorrelation function is a graph of the autocorrelations for various lags of a time series
data.

The autocorrelation function (ACF) at lag k, denoted by , is defined as:

Since both covariance and variance are measured in the same units of measurement, is a unit less, or pure,
number. It lies between −1 and +1, as any correlation coefficient does. If we plot against k, the graph we
obtain is known as the population correlogram. Since in practice we only have a realization (i.e., sample) of
a stochastic process, we can only compute the sample autocorrelation function (SAFC), ̂ . To compute this,
we must first compute the sample covariance at lag k, ̂ , and the sample variance, ̂ , which are defined as
∑ ̅ ̅
̂

∑ ̅
̂

Therefore, the sample autocorrelation function at lag k is simply the ratio of sample covariance (at lag k) to
sample variance is given as
̂
̂
̂
A plot of ̂ against k is known as the sample correlogram.

How does a sample correlogram enable us to find out if a particular timeseries is stationary? For this
purpose, let us first present the sample correlogramof our exchange rate data set given in Table 2.1.1. The
correlogram of the exchange rate data is presented for 20 lags in Figure 2.5.5.

Figure 2.5.5
Correlogram of Ethiopian Exchange Rate (EX), 1963-2003

44
. corrgram ex, lags(20)

-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]

1 0.8542 1.1181 32.157 0.0000


2 0.7439 -0.8482 57.172 0.0000
3 0.6712 0.4299 78.071 0.0000
4 0.6170 0.0438 96.208 0.0000
5 0.5684 0.0759 112.03 0.0000
6 0.5175 0.1102 125.52 0.0000
7 0.4619 0.1555 136.58 0.0000
8 0.4024 0.2919 145.23 0.0000
9 0.3400 0.2532 151.6 0.0000
10 0.2732 0.1159 155.85 0.0000
11 0.2047 0.2835 158.31 0.0000
12 0.1322 0.1529 159.37 0.0000
13 0.0663 0.3603 159.65 0.0000
14 0.0094 0.4066 159.65 0.0000
15 -0.0439 0.4542 159.78 0.0000
16 -0.1012 0.3993 160.5 0.0000
17 -0.1717 0.1370 162.67 0.0000
18 -0.2515 -0.3516 167.52 0.0000
19 -0.3135 . 175.39 0.0000
20 -0.3295 . 184.5 0.0000

Now, look at the column labeled AC, which is the sample autocorrelation function, and the first diagram on
the left, labeled autocorrelation. The solid vertical line in this diagram represents the zero axis; observations
above the line are positive values and those below the line are negative values. Given a correlogram, if the
value of ACF is close to zero from above or below, the data are said to be stationary. In other words, for
stationary time series, the autocorrelations (i.e., ACF) between and for any lag k are close to zero
(i.e., the autocorrelation coefficient is statistically insignificant). That is, the successive values of a time
series (such as and ) are not related to each other. Moreover, the lines on the right-hand side of the
correlograms tends to be shorter to the left or to the right from the vertical zero line for stationary time
series.

On the other hand, if a series has a (stochastic) trend, i.e., nonstationary, successive observations are highly
correlated, and the autocorrelation coefficients are typically significantly different from zero for the first
several time lags and then gradually drop toward zero as the number of lags increases. The autocorrelation
coefficient for time lag 1 is often very large (close to 1).

Thus, looking at Figure2.5.6, we can see that the autocorrelation coefficients for the exchange rate series at
various lags are high (i.e., close to +1 from above) and low (i.e., close to -1 from below). Figure2.5.6 is an
example of correlogram of a nonstationary time series.

For further illustration, the correlogram of the trade balance and money supply time series are presented for
10 lags in Figure2.5.7 & 2.5.8 below.

45
In both correlograms, if we look at the column labeled as AC, all the values are close to 1 from above.
Moreover, the lines on the right-hand side of the correlograms are also longer to the right from the vertical zero
line. That means, the lines are close to +1. Thus, both the money supply and trade balance time series are non-
stationary.

Figure 2.5.6
Correlogram of Ethiopian Trade Balance, 1963-2003
. corrgram tb

-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]

1 0.8528 0.8585 32.058 0.0000


2 0.7267 0.0383 55.93 0.0000
3 0.6570 0.3487 75.958 0.0000
4 0.5617 0.1454 90.99 0.0000
5 0.4358 -0.2702 100.29 0.0000
6 0.3297 0.0446 105.77 0.0000
7 0.2662 0.1022 109.44 0.0000
8 0.1860 0.0269 111.29 0.0000
9 0.0940 -0.0957 111.78 0.0000
10 0.0396 0.1462 111.87 0.0000
11 0.0168 0.0725 111.88 0.0000
12 -0.0130 -0.1034 111.89 0.0000
13 -0.0235 0.0541 111.93 0.0000
14 -0.0680 -0.0144 112.23 0.0000
15 -0.0665 0.1478 112.53 0.0000
16 -0.0536 0.2238 112.73 0.0000
17 -0.0655 -0.1049 113.05 0.0000
18 -0.1155 -0.1439 114.07 0.0000

46
Figure 2.5.7
Correlogram of Ethiopian Money Supply (MS2), 1963-2003
. corrgram ms2

-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]

1 0.7575 1.2930 25.289 0.0000


2 0.6040 -2.1723 41.781 0.0000
3 0.4901 -0.4104 52.926 0.0000
4 0.3974 -0.6611 60.451 0.0000
5 0.3216 -1.4769 65.516 0.0000
6 0.2645 -0.7431 69.038 0.0000
7 0.2149 -0.2352 71.433 0.0000
8 0.1734 -1.1396 73.039 0.0000
9 0.1374 0.6997 74.079 0.0000
10 0.1044 -1.3875 74.699 0.0000
11 0.0742 -0.4461 75.022 0.0000
12 0.0457 -2.1766 75.149 0.0000
 An important practical questionfor the above analysis is: how do we choose the lag length to compute
13 0.0220 -2.5349 75.18 0.0000
14 -0.0040 -3.2277 75.181 0.0000
15 the ACF? Arule of thumb is to compute ACF up to one-third to one-quarter the length of the time series.
-0.0272 5.6331 75.231 0.0000
16 -0.0520 7.2695 75.422 0.0000

2.5.3 The Unit Root Test (Augmented Ducky –Fuller Test)


17 -0.0780 -4.7448 75.869 0.0000
18 -0.0982 -7.3530 76.608 0.0000

A test of stationarity or non-stationarity that has become widely popular over the past several years is the
unit root test.In this section, therefore, we shalladdress the unit root test, using the Augmented Dickey-
Fuller (ADF)testofstationarity of the time series. Dickey and Fuller (1979, 1981) devised a procedure to
formally test for non-stationarity. The ADF test is a modified version of the original Dickey–Fuller (DF)
Test. The modification is the inclusion of extra lagged terms of the dependent variable to eliminate
autocorrelation in the test equation.

The key insight of their test is that testing for non-stationarityis equivalent to testing for the existence of a unit
root. Thus, the starting point is the unit root (stochastic) process given by:

We know that if , that is, in the case of a unit root, (18) becomes a random walk model without drift,
which we know is a non-stationary stochastic process. Therefore, why not simply regress on its one
period lagged value and find out if the estimated is statistically equal to 1? If it is, then is non-
stationary. This is the general idea behind the unit root test of stationarity.

In a nutshell, what we need to examine here is (unity and hence „unit root‟). Obviously, the null
hypothesis is , and the alternative hypothesis is .

We obtain a more convenient version of the test by subtracting from both sides of (18):
+
=(
=

47
Where, is the first-difference operator and In practice, therefore, instead of estimating (18), we
estimate (19) and test the null hypothesis , against the alternative hypothesis . In this
case, if , then i.e. we have a unit root, ,that follows a pure random walk (and, of course,
is nonstationary).

Now let us turn to the estimation of (19). This is simple enough; all we have to do is to take the first
differences of and regress them on and see if the estimated slope coefficient in this regression ̂)
is zero or not. If it is zero, we conclude that is non-stationary. But if it is negative, we conclude that is
stationary.

The modified version of the test equation given by (19), i.e., the test equation with extra lagged terms of the
dependent variable is specified as:

Dickey and Fuller (1981) also proposed two alternative regression equations that can be used for testing for
the presence of a unit root. The first contains a constant in the random walk process, and the second
contains a non-stochastic time trendas in the following equations, respectively:

The difference between the three regressions concerns the presence of the deterministic elements and .

Note that in all the above three test equations is the lag length for the number of lagged dependent
variables to be included. That is, we need to choose a lag length to run the ADF test so that
the residuals are not serially correlated. To determine the number of lags, , we can use one of the
following procedures.
a. General-to-specific testing: Start with Pmax and drop lags until the last lag is statistically
significant, i.e., delete insignificant lags and include the significant ones.
b. Use information criteria such as the Schwarz information criteria, Akaike‟s information criterion
(AIC), Final Prediction Error (FPE), or Hannan-Quinn criterion (HQIC).

In practice, we just click the „automatic selection‟ on the „lag length‟ dialog box in EViews.

Now, the only question is which test we use to find out if the estimated coefficient of in (20, 21 & 22)
is statistically zero or not.The ADF test for stationarity is simply the normal „t‟ test on the coefficient of the
lagged dependent variable from one of the three models (20, 21 &22). This test does not, however, have
48
a conventional „t‟ distribution and so we must use special critical values which were originally calculated by
Dickey and Fuller which is known as the Dickey-Fuller tau statistic1.However, most modern statistical
packages such as Stata and Eviews routinely produce the critical values for Dickey-Fuller tests at 1%, 5%,
and 10% significance levels.

 In all the three test equations, the ADF test concerns whether . The ADF test statistic is the
„t’statistic for the lagged dependent variable. ADF statistic is a negative number and more negative
ADF test statistic is the stronger the rejection of the hypothesis that there is a unit root.

 Null Hypothesis (H0): If accepted, it suggests the time series has a unit root, meaning it is
non-stationary. It has some time dependent structure.
 Alternative Hypothesis (H1): If accepted, the null hypothesis is rejected; it suggests the
time series does not have a unit root, meaning it is stationary.
Equivalently, if
 p-value > 0.05: Accept H0; the data has a unit root and is non-stationary
 p-value ≤ 0.05: Reject H0; the data does not have a unit root and is stationary

In short, if the ADF statistical value is smaller in absolute terms than the critical value(s) or equivalently if
the P-value for the ADF test statistic is significant then we reject the null hypothesis of a unit root and
conclude that is a stationary process.

Note that the choice among the three possible forms of the ADF test equations depends on the knowledge of
the econometrician about the nature of his/her data.Plotting the data2and observing the graph is sometimes
very useful because it can clearly indicate the presence or not of deterministic regressors. However, if the
form of the data-generating process is unknown, estimation of the most general model given by (22) and
then answering a set of questions regarding the appropriateness of each model and moving to the next model
is suggested (i.e., general to specific procedure).

Illustration: Unit Root Test of Some Macroeconomic Variables of Ethiopia using EViews.

As noted above, the ADF test is based on the null hypothesis that a unit root exists in the time series. Using
the ADF test, some of the variables (TB, MS2, and EX) from our time series data set are examined for unit
root as visible in Table 2.5.1-Table 2.5.3.

Table 2.5.1

1The Dickey-Fuller tau statistic can be found from appendix section of any statistical book (Eg: you can find it
in Gujarati).
2Sometimes if you have data that is exponentially trending then you might need to take the log of the data

first before differencing it. In this case in your ADF unit root tests you will need to take the differences of the log
of the series rather than just the differences of the series.
49
Augmented Dickey-Fuller Unit Root Test on Trade Balance

Table2.5.2
Augmented Dickey-Fuller Unit Root Test on Exchange Rate

Table2.5.3
Augmented Dickey-Fuller Unit Root Test on Log of Money Supply

All the above test results showed that the data sets are non-stationary. The test statistics is lower than the
critical values at 1%, 5% and 10% levels of significance for all the three data sets. This implies the
acceptance of the null hypothesis which states that there is unit root in the data sets.

2.5.4. Transforming Non-stationary Time Series


Now that we know the problems associated with non-stationary time series, the practical question is what to
do. To avoid the spurious regression problem that may arise from regressing a non-stationary time series on
one or more non-stationary time series, we have to transform non-stationary time series to make them
stationary. The transformation method depends on whether the time series are difference stationary process
(DSP) or trend stationary process (TSP). We consider each of these methods in turn.

50
A. Difference-Stationary Processes
If a time series has a unit root, the first differences of such time series(i.e. a series with stochastic trend) are
stationary. Therefore, the solution here is to take the first differences of the time series.

Returning to our Ethiopian Trade Balance (TB) time series, we have already seen that it has a unit root. Let
us now see what happens if we take the first differences of the TB series.Will the first difference still have a
unit root? Let‟s perform the Durbin Watson test.

Let Now consider the following regression result:

The 1 percent critical ADF τ value is about −3.5073as givenin APPENDIX-D –Table-7 of Gujarati. Since
the computed τ (= t) is more negative than the critical value, we conclude that the first-differenced TB is
stationary; that is, it is . In other words, the overall unit root test result for TB time series at first
difference is as given in Table 2.5.4 (thetest result from EViews).

51
Table 2.5.4
The output of the unit root test (ADF test) for the variable TB

Figure 2.5.8
First differenced trade balance time series data

If you compare the above Figure 2.5.8 with Figure 2.5.1 (i.e. the Trade Balance figure at level), you will see
the obvious difference between the two.

52
Table2.5.5
Unit Root test of the Log of MS2 data set at first difference

Note that sometimes a series with stochastic trend may need to be differenced twice to be stationary. And
such a series is said to be integrated of order two, . For instance,the Exchange Rate data set became
stationary after second difference.

Table 2.5.6
Unit Root test of the Exchange Rate data set at second difference

In general, the above test results in Table 2.5.4- Table 2.5.6 revealed that Trade balance (TB) and log of
money supply (LMS2) became stationary at first difference while exchange rate data (EX) became
stationary at the second difference. In all the above test results, the null hypothesis of Unit Root is rejected.

B. Trend-Stationary Process (TSP)


As we have noted before, aTSP is stationary around the trend line. Hence, the simplest way to make such a
time series stationary is to regress it on time and the residuals from this regression will then be stationary.In
other words, run the following regression:
(23)
where is the time series under study and where tis the trend variable measured chronologically.
Now,
̂ = (̂ ̂ ̂
will be stationary. ̂ is known as a (linearly) detrended time series.
53
It is important to note that the trend may be nonlinear. For example, it could be
(24)
which is a quadratic trend series. If that is the case, the residuals from the above regression model will now
be (quadratically) detrended time series.Note, however, that most macroeconomic time series are DSP
rather than TSP.

2.5.5.How to run regression after unit-root testing?


The outcome of unit root testing matters for the empirical model to be estimated. The empirical model may
consist of a short run part and a long run part. The short run part is to predict the behavior of the time
series after a shock, the long run part is to predict the behavior of the time series in the long run. The
following cases explain what is needed, dependent on the outcome of the unit root test.
CASE 1: All the series in the model under examination are stationary.
What if all the time series under consideration are stationary? Technically speaking, we mean they are I(0)
series (integrated of order zero).
 Under this scenario, any shock to the system in the short run quickly adjusts to the long run. Therefore,
only the long run model should be estimated. Thus, the estimation of short run model is not necessary
if series are I(0).
 You can run a normal regression with the stationary time series, but do not forget to check the
assumptions of the classical regression analysis (linear relation between the variables, no
multicollinearity, homoskedasticity, no autocorrelation and a normal distribution of the disturbance
term).Two examples of time series analysis with stationary data are given in 2.7.1 and 2.7.2: the ARDL
and the VAR model.
CASE 2: All series in the model under consideration are I(1).
Under this scenario, the series are non-stationary. One special feature of these series is that they are of the
same order of integration, I(1).
 To verify further the relevance of the model, there is need to test for cointegration.
 Cointegration is: can we assume a long run relationship in the model despite the fact that the series
are trending either upward or downward?
 Cointegration implies that, even if there are shocks in the short run, which may affect movement in
the individual series, they would converge with time (in the long run). However, there is no long
run if series are not cointegrated. This implies that, if there are shocks to the system, the model is
not likely to converge in the long run.
 Note that both long run and short run models must be estimated when there is cointegration. If
there is no cointegration, there is no long run and therefore, only the short run model will be
estimated.
 The test to be used are the Engle-Granger test or the Johansen test. These will be explained below.

54
CASE 3: The series are different order of integration.
Researchers are more likely to be confronted with this situation. For instance, some of the variables are I(0)
while others are I(1).
 Like case 2, the cointegration test is also required under this scenario.
 Similar to case 2, if series are not cointegrated, we are expected to estimate only the short run.
However, both the long run model and short run model are valid if there is cointegration.
 The appropriate test to use is the Bounds cointegration test. This test, however, is beyond the scope
of this course. It may be covered in post graduate econometrics courses.

2.6Spurious Regression Co-integration Test of Time Series Data


Why not just run the regression with non-stationary data? The regression of a nonstationary time series on
another nonstationary time series may produce a spurious regression.

The Phenomenon of Spurious Regression


To see why stationary time series are so important, consider the following two random walk models:

)
Assume that and are serially uncorrelated as well as mutually uncorrelated. As you know by now, both
these time series are non-stationary; that is, they are integrated of order one or exhibit stochastic trends.
Suppose we regress on . Since and are uncorrelated processes, the from the regression of
on should tend to zero; that is, there should not be any relationship between the two variables. But the
regression results are as follows:

Table 2.6.1
Output of the linear regression Yt=α+ßXt+ut
Coefficients t-Statistics P-Value
0.54 3.7431 0.0012
Constants 110 2.0832 0.0232
0.712 d-statistics = 0.241

As you can see, the coefficient of tis highly statistically significant, and the value is also high and
greater than the Durbin-Watson d-statistics. From these results, you may be tempted to conclude that there is
a significant statistical relationship between and X, whereas a priori there should be none. This is in a
nutshell the phenomenon of spurious or non-senseregression, first discovered by Yule. Yule showed that
(spurious) correlation could persist in non-stationary time series, even if the sample is very large. That there
is something wrong in the preceding regression is suggested by the extremely low Durbin–Watson d value,

55
which suggests very strong first-order autocorrelation. According to Granger and Newbold, when ,
it is a good to suspect that the estimated regression is , as in the example above.

That the regression results presented above are meaningless can be easily seen from regressing the first
differences of (= ) on the first differences of (= ), (remember that although and are non-
stationary, their first differences are stationary). In such a regression you will find that practically zero,
as it should be, and the Durbin–Watson d is about 2. Although dramatic, this example is a strong reminder
that one should be extremely wary of conducting regression analysis based on time series that exhibit
stochastic trends. And one should therefore be extremely cautious in reading too much in the regression
results based on variables.

Co-integration Test of Time Series Data

One solution to this spurious regression problem exists if the time series under investigation are
cointegrated. A cointegration test is the appropriate method for detecting the existence of long-run
relationship. Engle and Granger (1987) argue that, even though a set of economic series is not stationary,
there may exist some linear combinations of the variables that are stationary, if the two variables are really
related. If the separate series are stationary only after differencing but a linear combination of their
levels is stationary, the series are cointegrated.Cointegration becomes an overriding requirement for any
economic model using nonstationary time series data. If the variables do not co-integrate, we usually face
the problems of spurious regression and econometric work becomes almost meaningless.

Suppose that, if there really is a genuine long-run relationship between any two variables: and ,
although the variables will rise/decline overtime (because they are trended), there will be a common trend
that links them together. For an equilibrium, or long-run relationship to exist, what we require, then, is a
linear combination of and that is a stationary variable [an I(0) variable].

Assume that two variables and are individually nonstationary time series. A linear combination of
and can be directly taken from estimating the following regression:

and taking the residuals:

We now subject to a unit root test and hence if we find that ~ , then the variables and are
said to be co-integrated. This is an interesting situation, for although are individually , that
is, they have stochastic trends, their linear combination is I (0). So to speak, the
linear combination cancels out the stochastic trends in the two series. If you take consumption and income as
two variables, savings defined as (income − consumption) could be . As a result, a regression of
would be meaningful (i.e., not spurious). In this case we say that the two variables are
56
co-integrated. Economically speaking, two variables will be co-integrated if they have a long-term, or
equilibrium, relationship between them.

In short, provided we check that the residuals from regressions like are or
stationary, the traditional regression methodology (including the t and F tests) that we have considered
extensively is applicable to data involving (nonstationary) time series. The valuable contribution of the
concepts of unit root, co-integration, etc., is to force us to find out if the regression residuals are stationary.
As Granger notes, “A test for co-integration can be thought of as a pre-test to avoid „spurious regression‟
situations”.

In the language of co-integration theory, a regression such as , is known as a co-


integrating regression and the slope parameter is known as the co-integrating parameter. The concept
of co-integration can be extended to a regression model containing regressors. In this case we will have
co-integrating parameters.

Testing for Cointegration


A number of methods for testing cointegration have been proposed in the literature. These include
1. Engle–Granger (EG)- appropriate only when all variables are I(1)
2. Johansen and Jusesuis (1990) co-integration test
3. Bounds cointegration test appropriate when variables are I(0) and I(1) or mutually I(0)or
mutually I(1)

In this chapter we will only use the Engle-Granger (1987) two stage co-integration test procedures. The
other procedures are discussed in post-graduate econometrics courses.

Engle-Granger Test for Cointegration


For a single equation, the simple test of cointegration is the ADF unit root tests on the residuals estimated
from the cointegrating regression. This test is a modified test of unit root by the Engle-Granger (EG) and
Augmented Engle-Granger (AEG) tests. Notice the difference between the unit root and cointegration tests.
Tests for unit roots are performed on single time series, whereas cointegration test deals with the
relationship among a group of variables, each having a unit root.

According to the Engle-Granger two steps method, first we have to estimate the static regression equation
to get the long run multiplier. In the second step an error correction model is formulated and estimated using
residuals from the first step as equilibrium error correction.

In other words, all we have to do is estimate a regression like, , obtain the residuals,
and use the ADF unit root test procedure. There is one precaution to exercise, however. Since the estimated
residuals are based on the estimated co-integrating parameter ,the Augmented Dickey Fuller critical

57
significance values are not quite appropriate. However, Engle and Granger have calculated these values,
which can be found in the references of any statistical books. Therefore, the ADF tests in the present
context are known as Engle–Granger (EG) test. However, several software packages now present these
critical values along with other outputs.

Let us illustrate this test using a Trade Balance model.


1. Long run/Static Model of the Trade Balance of Ethiopia

Suppose we want to run the following model that predicts the log of the trade balance (LTB) based on the
logs of MS2, GDP, EXR and LG:

Estimate the above model using least squares and save the residuals for stationarity test.
Table 2.6.2
Dependent Variable: LTB;Method: Least Squares; Included observations: 40 after adjustments
Variables Coefficients Standard Error t-Statistics P-value
LMS2 -0.734042 0.256962 -2.856615 0.0071
LGDP 0.219502 0.174400 1.258610 0.2163
LEXR 0.332259 0.137795 2.411259 0.0211
LG 0.175736 0.298237 0.589248 0.5594
Constant 1.500707 0.951265 1.577591 0.1234
; Durbin Watson d statistics =1.18
a. Interpret the above regression results
b. Which variables significantly affect the trade balance of Ethiopia in the long run?

2. Test of the Stationarity of the residuals obtained from the long run model

Table 2.6.3
Augmented Dickey-Fuller Test Equation; Dependent Variable: ; Method: Least Squares;
Included observations: 40 after adjustments
Variables Coefficients Standard Error t-Statistics P-value
Residualst-1 -0.650962 0.150385 -4.328652 0.0001
Constant 0.014651 0.031967 0.458332 0.6493
0.330246 F-statistic 18.73722
Adjusted 0.312621 Prob(F-statistic) 0.000105
Durbin-Watson stat 1.812471

a. Test the above model for stationarity


b. Is the null hypothesis accepted or rejected?

This above regression result showed that the model is significant and we accept the alternative hypothesis.
That means, the residuals from the long run model is stationary and this implies that regression of
58
is meaningful (i.e., not spurious). Thus, the variables
in the model have a long run r/ships or equilibrium.

3. The Short Run Dynamics (Error Correction Model)


We just showed that LTB, LMS2, LGDP, LEXR and LG are cointegrated; that is, there is a long term, or
equilibrium, relationship between these variables. Of course, in the short run there may be disequilibrium.
Therefore, one can treat the error term in
(27)
as the “equilibrium error”. And we can use this error term to tie the short-run behavior of TB to its long-run
value. Thiserror correctionmechanism (ECM),first used by Sargan and later popularized by Engle and
Granger, corrects for disequilibrium. An important theorem, known as the Granger representation
theorem, states that if two variables Yand X are cointegrated, then the relationship between the two can be
expressed as ECM. To see what this means, we will revert to our example of the trade balance of Ethiopia.
But before that let‟s consider a typical example. Assume that and are co-integrated and assuming
further that is exogenous, the ECM can be given as follows:
(28)
To account for short run dynamics, include lagged terms as:
∑ ∑ (29)
This is called Error Correction Model (ECM) in time series econometrics. the lagged values of
the residuals from the long run model is used in the short run regression analysis and this is because the
last periods disequilibrium or equilibrium error affects the direction of the dependent variable in the current
period.

The coefficient of the lagged values of the residuals is expected to be less than one and negative. It is less
than 1 because the adjustment towards equilibrium may not be 100%. If the coefficient is zero, the system is
at equilibrium.Note that if is negative and significant, the adjustment is towards the equilibrium, but if is
positive and significant, the adjustment is away from the equilibrium.
Illustration: Error Correction Model for the Trade Balance of Ethiopia

The Error Correction Model(ECM) for the Trade Balance of Ethiopia is specified as:

∑ ∑ ∑ ∑

Where, p and q are the optimal lag length (which can be determined by minimizing the model selection
information criteria) for the dependent and explanatory variables, respectively, while isthe

59
one period lagged values of the residuals obtained from the cointegrating regression, i.e., long
run/static model.

Table 2.6.4
Short Run Determinants of the Trade Balance of Ethiopia (Output from EViews);Dependent Variable:
DLTB
Method: Least Squares; Included observations: 39 after adjustments
Variables Coefficients Standard Error t-Statistics P-value
D(LTB(-1)) 0.238526 0.162666 1.466352 0.1523
D(LEXR(-1)) -0.079162 0.274364 -0.288531 0.7748
D(LMS2(-1)) -0.540010 0.580852 -0.929685 0.3595
D(LGDP(-1)) -0.160748 0.351836 -0.456882 0.6508
D(LG(-1)) 0.401352 0.277733 1.445103 0.1582
ECT(-1) -0.888009 0.189013 -4.698145 0.0000

a. Interpret the above regression results of ECM


b. Which variables significantly affect the trade balance of Ethiopia in the short run?

The coefficient of the error-correction term of about suggests that about 89% of the discrepancy
between long-term and short-term Trade Balance is corrected within a year (yearly data), suggesting a high
rate of adjustment to equilibrium.

In summary, the Engle-Granger Two Stage Procedures:

1. Run the Static Model at level

2. Save the residuals from this long run or static model:

3. Test the above residual for unit root:

Where,

4. If the null hypothesis is accepted, the regression

is spurious. We don‟t have long run model and we have to run only the following short run model by
disregarding the long run model.

(NB: one can also add the lagged values of the dependent & explanatory variables)
5. If the null hypothesis is rejected, the regression

60
is meaningful (non-spurious). We do have both the long run model and short run dynamics. We have
to include the lagged value of the residuals in the short run model as one explanatory variables.

2.7 Two other popular models to analyze time series data

Thus far, this chapter focused much on identifying problems related to time series, and a bit on solving these
problems. What still lacks is some concrete examples of time series models, that can be run for the bachelor
thesis next year. Therefore, this section discusses two models that are commonly used in time series
analysis:

- The Autoregressive Distributed Lag (ARDL) model


- The Vector Auto Regression (VAR) model

2.7.1 The ARDL model

Autoregressive distributed lag models are models that contain the lagged values of the dependent variable
and the current and lagged values of regressors. The model is used for forecasting and for policy analysis.
For example, the ARDL model can visualize the effects of changes in government expenditure or taxation
on unemployment and inflation (fiscal policy).Usually the effect of a change in a policy variable (e.g. public
spending) does not immediately fully affect to other economic variables (e.g. GDP). The effect is working
for a longer time period, (various time lags) for psychological, technical and institutional reasons.
Mathematically, the model is expressed as follows:

 p is the number of lags of the dependent variable, y


 q is the number of lags of the additional predictor x

That means, y depends on its own lagged values, and on the lagged values of the explanatory variable, x. If
the variables in the model are stationary, and the usual least squares assumptions are valid, we can estimate
ARDL models using least squares. It is also possible to use non-stationary data for this model, but only if
you add an error correction mechanism (like we discussed in the section above). To simplify the
introduction of the ARDL, we will work with stationary data only in this section 3. A challenge is to use the
optimal number of lag lengths for p and q.Too many lags could increase the error in the forecasts, too few
could leave out relevant information.

How should we choose the lag length p?


 Sufficient lags should be included to solve for serial correlation in the errors. It can be checked if
sufficient lags are included if you make a correlogram.

3
This is done in most econometric textbooks, for example in Hill, Griffiths & Lim (2011)
61
 Including too many lags may lead to insignificant parameter estimates. It is wise to exclude the longest
lag (the final lag) if it is insignificant. Next, re-run the model and perform a hypothesis test on the new
final lag. A drawback of this approach is that it can produce a model that is too large.
 An alternative way to determine the lag length p is to minimize the BIC (Bayes‟ information criterion,
also called Schwarz information criterion), AIC (Akaike‟s information criterion). These criteria consider
the pay off between the benefits of a larger model (with more lagged values, so more explanatory power)
and the disadvantage of a greater model (less degrees of freedom). When running the model,Eviews asks
you which information criterion you want to use. It‟s best to choose the Schwarz information criterion.
The output will reveal the optimal lag length, both for the dependent (p) and the independent (q)
variable.

Example short run Philips curve


When predicting future inflation, economic theory suggests that lagged values of the unemployment rate
might be a good predictor. Therefore, we can use an ARDL(p,q) model to estimate the effect of
unemployment (ILO estimate: U) on inflation (GDP deflator: I).

I. First we check if I and U are stationary


Table 2.7.1 and Table 2.7.2 show the outcome of the unit root tests for U and I.I has no unit root (p=0.0066,
so H0 is rejected). U is officially not stationary (p=0.0501, so H0 cannot be rejected on a 95% confidence
level), but it‟s close to stationarity. For the ease of the assignment we continue with the data.
Table 2.7.1 Table 2.7.2
Unit root test I Unit root test U

II. Next, we check the correlogram for the residuals of , to see


how many lags of I(p) would be wise to include
The regression in Table 2.7.3 shows that unemployment has a significant effect on inflation (p=0.0193), but
the lagged value of inflation has no significant effect on inflation (p=0.5463). The correlogram of the
residuals (see Figure 2.7.1) reveals that there is no autocorrelation in the model, because the autocorrelation
coefficients are all insignificant (Prob>0.05). Based on the correlogram it is right to conclude that no lagged
values of I must be included in the ARDL model. That means p=0.

62
Figure 2.7.1 Table 2.7.3
Correlogram of the residual of the function Output for the function

III. Next, we ask Eviews to run the Table 2.7.4


Output of the ARDL model
model, using the number of lags
(both for I and U) that minimize
the BIC
Table 2.7.4 shows the output of the ARDL
model. The output of Eviews reveals that,
according to Bayes‟ information criterium
(BIC=Schwarz information criterium), the
optimal model is ARDL(1,0). This means
that one lag is included for the dependent
variable (inflation) and no lag is included for
the independent variable (unemployment.
IV. Present and interpret the model
The last step is to interpret the model. We see that, though the lagged value for inflation is included, it is
insignificant (prob.>0.05). Controlled for the previous value of inflation, the effect of a 1 unit increase in the
unemployment rate leads to a 9.28 decrease in the inflation rate. This is a significant effect (p=0.0193).
V. Diagnostic testing of the model

ARDL is a linear regression model and therefore


the underlying assumptions of CLRM have to be
verified. These assumptions are, among others, that the model is correctly specified, homoscedasticity, there
is no serial correlation and the error term is normally distributed.

2.7.2 The VAR model

63
In the above discussed models, we have clearly identified a dependent and several independent variables.
According to Sims (1980), this is inappropriate, because economic variables are mostly not influencing each
other in one direction. For example, inflation affects unemployment, but unemployment also affects
inflation. Therefore, Sims developed the Vector Autoregression (VAR) model. The VAR model is a system
of equations, which means that several functions, and thus several dependent variables, are estimated
simultaneously. We will discuss simultaneous equation modeling more in-depth for cross-sectional data in
chapter 3. This section gives a relatively simple example of a bivariate VAR model, because it is a popular
model and also frequently used by undergraduate thesis
Table 2.7.5
students. The output of the VAR model
Vector Autoregression Estimates
Date: 03/26/20 Time: 14:55
∑ ∑ (31) Sample (adjusted): 2000M03 2020M02
Included observations: 240 after adjustments
Standard errors in ( ) & t-statistics in [ ]
highlighted=significantly different from 0 with
α=0.05

DPI SPI

DPI(-1) 1.475635 0.152899


(0.05555) (0.11178)
[ 26.5624] [ 1.36788]

DPI(-2) -0.523871 -0.147106


(0.05521) (0.11108)
[-9.48907] [-1.32429]

SPI(-1) -0.050337 1.283104


(0.03049) (0.06134)
[-1.65109] [ 20.9171]

SPI(-2) 0.063283 -0.330001


(0.03051) (0.06138)
[ 2.07432] [-5.37598]

C 4.776801 7.101472
(1.84161) (3.70547)
[ 2.59381] [ 1.91648]

R-squared 0.965360 0.946417


Adj. R-squared 0.964770 0.945505
Sum sq. resids 9159.918 37083.63
S.E. equation 6.243266 12.56195
F-statistic 1637.249 1037.683
Log likelihood -777.5797 -945.3803
Akaike AIC 6.521497 7.919836
Schwarz SC 6.594011 7.992349
Mean dependent 139.4091 164.2131
S.D. dependent 33.26257 53.81203

Determinant resid covariance (dof adj.) 6123.484


Determinant resid covariance 5870.996

64
∑ ∑ (32) Log likelihood -1722.424
Akaike information criterion 14.43687
Schwarz criterion 14.58189
We assume that both y and x are stationary, and and
are uncorrelated white noise error terms. Because both equations
are linear in parameters, OLS is used for the estimation. For
example, it is assumed that prices of different food commodities are
interdependent. To this end, the FAO deflated food price index data
is used to specify the following VAR model:

Where = the deflated dairy price index at year t (base year:


2002-2004)

And = the deflated sugar price index at year t (base year: 2002-2004)

The number of lags (2 for each variable) is determined based on a minimization of the AIC and BIC. The
VAR output of Eviews is displayed in Table 2.7.5.Based on the R2 and the F-statistic of the model, we can
judge that the model performs well. Unfortunately, no p-values are given for the t-tests of the individual
parameters, but given the t-statistics (the values in the square brackets []) and given that the sample is large
(there are 240 observations), we know that the critical t-value for a 95% confidence level is 1.96. So, all
parameter estimates with a t-value>1.96 (or <-1.96) are considered significant. That is for the DPI formula
the first and second lag of DPI and the second lag of SPI. For SPI the
Table 2.7.6
DPI lags are insignificant, but the SPI lags are significantly different Output of the Granger causality test
from 0.

Though estimation of the VAR model is relatively simple, it is problematic to interpret VAR parameter
estimates, because it seems that „everything causes everything‟. It is not possible to measure the effect of
DPI on SPI, leaving all other factors constant. Luckily, it is possible to test for the direction of the causality.
Causality refers to the ability of one variable to predict the other variable. If there are only 2 variables, x and
y, there are four possibilities:

- y causes x
- x causes y
- x causes y and y causes x (bi-directional feedback)
- x and y are independent

65
Granger (1969) developed a test that defines causality as follows: if a variable, x, Granger-causes y, y can be
predicted with greater accuracy by using past values of the x variables rather than not using such past values.
To test if x causes y, the Granger causality test uses the F-test on the ß parameters of equation (31)
Similarly, to see if y causes x, an F-test must be conducted on the δ parameters of equation (32).

See Table 2.7.6for the output of the Granger causality test for the sugar and dairy prices. Recall that DPI was
estimated according to the following formula:

The first part ofTable 2.7.6, Dependent variable: DPI, tests the H0 that 0. H0 is rejected
(Prob<0.05), so the lagged values of SPI are able to explain the variation in DPI.

The second part of Table 2.7.6, Dependent variable: SPI, tests the H0 that 0. In other words: the
lagged values of DPI cannot explain the variation Figure 2.7.2
in SPI. This H0 is not rejected (Prob.>0.05). Response graphs

Concluding, this output reveals that SPI Granger-causes DPI, but DPI does not Granger-cause SPI.

Next to knowing the parameter estimates, and knowing which variable Granger-causes the other variable(s),
it may be interesting to create response graphs. In the response graphs you see what a shock to one variable,
for example SPI, does through the fitted VAR model to the other variable, in this case DPI. Figure 2.7.2
shows the response graphs from Eviews for the VAR presented above. The right upper graph displays the
impact of a change in DPI on SPI. The value is around 0, which means that there is no big impact. This is in
line with the Granger causality test which could not reject the H0 0. The lower left graph shows
that the response of a change in SPI does
positively affect the DPI for some lags. This
is in line with the conclusion of the Granger
causality test that rejected the H0 that
0.

Note that we only explored the equilibrium


relationship between DPI and SPI in this
assignment. It is also possible to extend the
VAR model by exploring the co-integration
relationship, as we discussed in the ECM. It
goes beyond the scope of this course to
extend the model in this way, but it may be a good option to do this for your thesis to further improve the
power of your model.

66
2.7.3 Closing comments

In this chapter, much attention was paid to characteristics of time series. These characteristics may be
troublesome while using time series data in regression. We have also seen some solutions, which allow us to
use time series data in regression. The ones we have discussed are the error correction model (ECM), the
autoregressive distributed lag model and the variance autoregression model. There are many more models
that deal with time series data, which go beyond the scope of this course. It may be that you need to use
these models for your thesis next year. If this is the case, we advise you to read the reference materials. The
titles of these works can be found in the course outline.

67
CHAPTER THREE

INTRODUCTION TO SIMULTANEOUS
EQUATION MODELS

Short notes only (consult your teacher for a softcopy


of longer lecture notes with more explanation)

3.1. The Nature of Simultaneous Equation Models


So far, we were concerned exclusively with single
equation models, i.e., models in which there was a
single dependent variable Y and one or more
explanatory variables, the . In such models the
emphasis was on estimating and/or predicting the
average value of Y conditional upon the fixed values
of the X variables. In many situations, such a one-way
or unidirectional cause-and-effect relationship is not
meaningful. This occurs if Y is determined by the ,
and some of the are, in turn, determined by Y. In
short, there is a twoway, or simultaneous, relationship between and (some of) the X‟s, which makes the
distinction between dependent and explanatory variables doubtful.

Recall that one of the crucial assumptions of the method of OLS is that the explanatory X variables are
either nonstochastic or, if stochastic (random), are distributed independently of the stochastic disturbance
term. If this is not the case, as you can see in the below examples, application of the method of OLS is
inappropriate.

Some Examples of Simultaneous-Equation Models

Example 1: Demand-and-Supply Models


In a perfectly competitive market setting, the price of a commodity and the quantity sold are determined
by the intersection of the demand-and-supply curves for that commodity. Thus, assuming for simplicity that
the demand-and-supply curves are linear and adding the stochastic disturbance terms and , we may
write the empirical demand-and-supply functions as:

Demand Function: ; Figure 3.1


Interdependence between price and quantity
Supply Function: ;

Equilibrium condition:

where = quantity demanded


= quantity supplied, time and
the ‟s and ‟s are the parameters.

Now it is easy to see that and are jointly dependent (=interdependent) variables. If, for example, in (1)
changes because of changes in other variables affecting (such as income, consumer confidence and
tastes), the demand curve will shift upward if is positive and downwardif is negative. These shifts are
shown in Figure 3.1.
68
As the figure shows, a shift in the demand curve changes both and . Similarly, a change in (because
of weather condition, import or export restrictions, etc.) will shift the supply curve, again affecting both
and . Because of this simultaneous dependence between and , and in (1) and and in (2)
cannot be independent. Therefore, a regression of on as in (1) would violate an important assumption of
the classical linear regression model, namely, the assumption of no correlation between the explanatory
variable(s) and the disturbance term.

Example 2: Wage–Price Model


Consider the following model of wage and price determination:

Where, Nominal Wage Rate (in Birr)


Unemployment rate (in %)
Rate of change of average prices of goods and services
Rate of change of price of imported raw material
time
Stochastic disturbances
Since the price variable, , enters into the wage equation and the wage variable, , enters into the price
equation, the two variables are jointly dependent. Therefore, these stochastic explanatory variables are
expected to be correlated with the relevant stochastic disturbances. The classical OLS method is therefore
inappropriate to estimate the parameters of the two equations individually.

Example 3: Keynesian Model of Income Determination


Consider the simple Keynesian model of income determination given as:

Where, C = consumption expenditure


Y = income
I = investment (assumed as exogenous)
U = stochastic disturbance term,
Equation (5) is the consumption function; and (6) is the national income identity, signifying that total
income is equal to total consumption expenditure plus total investment expenditure.

From the consumption function,it is clear that C and Y are interdependent and that in (5) is not expected to
be independent of the disturbance term, because when shifts (for example, because people consume more
around Addis Amed), then the consumption function also shifts, which, in turn, affects . Therefore, once
again the classical least-squares method is inapplicable to (5).

In general, when a relationship is a part of a system, then some explanatory variables are stochastic and are
correlated with the disturbances. So, the basic assumption of a linear regression model that the explanatory
variable and disturbance are uncorrelated, or explanatory variables are fixed, is violated.

Endogenous and Exogenous Variables


Variables in simultaneous equation models are classified as endogenous variables and exogenous variables.

Endogenous Variables (Jointly determined variables)

69
Endogenous variables are variables which are influenced by one or more variables in the model. They are
explained by the functioning of the system. Their values are determined by the simultaneous interaction of
the relations in the model. We call themendogenous variables, interdependent variables or jointly
determined variables.

Exogenous Variables (Predetermined variables)


The variables that influence endogenous variables are called exogenous or predetermined variables. The
values of these exogenous variables are determined outside the model.Exogenous variables influence the
endogenous variables but are not themselves influenced by them. A variable which is endogenous for one
model can be exogenous variable for another model.
Example:Consider our previous model of wage and price determination (equation (3) and (4)):

In this model, and are endogenous variables while are exogenous or predetermined
variables.

3.2. Simultaneity Bias

The problem explained above, that there is correlation between the explanatory variable(s) and the error
term, is at the core of Simultaneous Equation Modeling. For example, in the Wage-Price Model, does wage
affect prices or do prices affect the wage? If OLS is used to estimate the equations individually, the problem
of simultaneity bias occurs. That means that the least-squares estimators are biased (for small/finite
samples) and inconsistent(for infinite samples).That means:as the sample size increases indefinitely, the
estimators do not converge to their true (population) values. If you want to see the proof for this, it is
recommended to read the appendix on page 138-139 in Gujarati (Econometrics by example, 2011).

3.3. Order and Rank conditions of Identification

In estimating simultaneous equation models, it is important to see if the model is identified. A simultaneous
equation model can either be overidentified,under identified or exactly identified. If a model isexactly
identified, unique numerical values of the model parameters can be obtained. If a model is under
identified, no model parameters can be obtained. If a model is over identified, more than one numerical
value can be obtained for some of the parameters.
Order condition of identification
There are several ways to check if a system of equations is identified. The most common one is the order
condition.
To understand the order condition, we shall make use of the followingnotations:
M = number of endogenous variables in the model
m = number of endogenous variables in a given equation
K = number of predetermined variables in the model
k = number of predetermined variables in a given equation

The order condition:


 In a model of Msimultaneous equations in order for an equation to be identified, it must exclude at
least variables (endogenous as well as predetermined) appearing in the system of equations.
 If it excludes exactly variables, the equation is just identified.
 If it excludes more than M − 1 variables, it is overidentified.

Or, to define the same thing differently:


70
 In a model of simultaneous equations, in order for an equation to be identified, the number of
predetermined variables excluded from the equation must not be less than the number of endogenous
variables included in that equation less 1, that is,

 If , the equation is just identified


 If , it is overidentified.

Let‟s consider some examples for illustration of the order condition of identification.

Example1:Consider our previous example of the demand function and the supply function.

Demand Function: ;
Supply Function: ;

This model has two endogenous variables P and Q and no predetermined variables. To be identified, each of
these equations must exclude at least variable. Since this is not the case, both equations are not
identified.

Example2: Consider the following demand and supply equations


Demand Function:
Supply Function:

In this model,Q and P are endogenous and (consumer’s income) is exogenous. Applying the order
condition given, we see that thedemand function is unidentified. On the other hand, the supply function is
just identified because it excludes exactly variable,

Example3: Consider the following demand and supply equations


Demand Function:
Supply Function:

In this model, and are endogenous and , , and are


predetermined. The demand function excludes exactly one variable , and hence by the order condition
it isexactly identified. But the supply function excludes two variables and , and hence it is
overidentified. Hence, there is a possibility to have many solutions for , the coefficient of the price
variable of the supply model.

Notice at this juncture that as the previous examples show, identification of an equation in a model of
simultaneous equations is possible if that equation excludes one or more variables that are present
elsewhere in the system. This situation is known as the exclusion (of variables) criterion, or zero
restrictions criterion(the coefficients of variables not appearing in an equation are assumed to have zero
values). This criterion is by far the most commonly used method of securing or determining identification of
an equation. In using this method, the researcher should always consider economic theory and judge on
whether it is correct that the variable(s) are excluded from or included in the equation.

The order condition is necessary for identification, but unfortunately there are some special cases in which
the order condition is insufficient. A more complex, but both necessary and sufficient method is the rank
condition, which will be shortly discussed below.

The Rank Condition of Identification


71
The order condition discussed previously is a necessary but not sufficient condition for identification; that is,
even if it is satisfied, it may happen thatan equation is not identified. We need both a necessary and
sufficient condition for identification. This is provided by the rank4 condition of identification.

Rank condition of identification:In a model containing Mequations in Mendogenous variables, an equation


is identified if a rank of size M-1 can be found in the matrix from the coefficients of the variables (both
endogenous and predetermined) excluded from that particular equation but included in the other equations of
the model. Remember that a rank is the largest-order square matrix (contained in the given matrix) whose
determinant is nonzero.

As this is quite theoretical, and your matrix algebra insight might need refreshment, please check the
example below.

Example of checking the rank condition of identification


As an illustration of the rank condition of identification, consider the following hypothetical system of
simultaneous equations in which the variables are endogenous and the variables are predetermined or
exogenous.

Steps in the Rank Condition


To apply the rank condition of identifiability, one may follow the following steps:
1. Write down the system of equations in a tabular form.
2. Cross out the coefficients of the row in which the equation under consideration appears.
3. Also cross out the columns corresponding to the coefficients which are nonzero for the equation under
consideration.
4.The entries left in the table will then give only the coefficients of the variables included in the system but
not in the equation under consideration. From these entries form all possible matricesof order and
obtain the corresponding determinants. If at least one nonzero determinant can be found, the equation
in question is (just or over) identified. If all the possible matrices of order have a determinant of
zero, the rank ofthe matrix is less than and the equation under investigation is notidentified.

Following the above procedure, let‟s find out whether equation (13) is identified.
Step 1: Table 3.1 displays the system of equations in tabular form.
Step 2:The coefficients for row (13) have been crossed out, because (13) is the equation under
consideration.
Step 3: The coefficients for columns 1, Y1, Y2, Y3 and X1 have been crossed out, because they appear in
(13).

Table 3.1 The tabular form of the systems of equations (Step 1), equation (13) erased (Step 2) and the
coefficients of the variables included in (13) erased (Step 3)
Coefficients of the variables
Equation No. 1

4Theterm rank refers to the rank of a matrix and is given by the largest-order square matrix (contained in the
given matrix) whose determinant is nonzero. Alternatively, the rank of a matrix is the largest number of
linearly independent rows or columns of that matrix.
72
13 1 0 0 0
14 0 1 0 0
15 0 1 0 0
16 0 1 0 0

Step 4: Matrix A (17) is created from the remaining coefficients Table 3.1. For this equation to be
identified, we must obtain at least one nonzero determinant of order 3 × 3 from the coefficients of the
variables excluded from this equation but included in other equations.

[ ]

It can be seen that the determinant of this matrix is zero:

| |

Since the determinant is zero, the rank of matrix A (17), is less than 3, (i.e., M-1). Therefore, Eq. (13) does
not satisfy the rank condition and hence is not identified.

Therefore, although the order condition shows that Eq. (13) is identified, the rank condition shows that it is
not. Apparently, the columns or rows of the matrix A given in (17) are not (linearly) independent, meaning
that there is some relationship between the variables , . As a result, we may not have enough
information to estimate the parameters of equation (13).

Our discussion of the order and rank conditions of identification leads to the following general principles of
identifiability of a structural equation in a system of Msimultaneous equations:
1. If and the rank of the A matrix is , the equation is overidentified.
2. If and the rank of the matrix A is , the equation is exactly identified.
3. If and the rank of the matrix A is less than , the equation is not identified.
4. If , the structural equation is not identified. The rank of the A matrix in this case is bound to be less than
.
Which condition should one use in practice: Order or rank? For large simultaneous-equation models,
applying the rank condition is a formidable task. Therefore, as Harvey notes: “Fortunately, the order
condition is usually sufficient to ensure identifiability, and although it is important to be aware of the rank
condition, a failure to verify it will rarely result in disaster”.

3.4 Indirect squares and 2SLS estimation of structural equations

If an equation is identified, we can estimate it by using by Indirect Least Squares (ILS) or Two-Stage Least
Squares (2SLS)
Indirect Least Squares (ILS)
ILS can be used for an exactly identified structural equation. This section explains the steps involved in ILS.

Step 1: Specify the Structural Model

Consider, for instance, the following demand and supply equations


Demand Function:
Supply Function:

Where, , and are quantity, price, consumer‟s income and lagged price,respectively.
73
Step 2:Find the reduced-form equations.
A reduced-form equation is one that expresses an endogenous variable solely in terms of the
predetermined variables and the stochastic disturbances.The reduced form expresses every endogenous
variable as a function of (an) exogenous variable(s).
Based on the equilibrium condition,
.
Solving this equation, we obtain the following equilibrium price:

We simplify this formula by stating that and

Equation (20) is the first reduced form equation in this model. Because the model has another endogenous
variable, Qt, another reduced form equation must be made. Substituting the equilibrium price into either the
demand or supply equation, we obtain the following equilibrium quantity:

Where,

Step 3: Estimate each of the reduced form equations by OLS individually. This operation is permissible
since the explanatory variables in these equations are predetermined and hence uncorrelated with the
stochastic disturbances.The estimates obtained are thus consistent.

Suppose that the estimation results of the reduced form models are given as follows:

Step 4:Determine the coefficients of the Structural model (i.e., )from the estimated
reduced-form coefficients. As noted before, if an equation is exactly identified, there is a one-to-one
correspondence between the structural and reduced-form coefficients; that is, one can derive unique
estimates of the former from the latter.
and

and

and

and

and

Given we obtain the following equations

74
Solving this system of 2 equations, we obtain

Finally, we can obtain,


Therefore, the estimated Structural equationswill be:
Demand Function:
Supply Function:
As this four-step procedure indicates, the name Indirect Least Squares (ILS) derives from the fact that
structural coefficients (the object of primary enquiry) are obtained indirectly from the OLS estimates of the
reduced-form coefficients.

The Method of Two-Stage Least Squares (2SLS)


If a structural equation is over identified, ILS is consistent, but does not give a unique estimate.Therefore, it
is better to use 2SLS estimation method.

Consider the following income and money supply model:


GDP:
Money Supply:
Where, = GDP
= Money Supply
= Investment spending
= Government expenditure
The variables and are exogenous.
The GDP equation states that national income is determined by money supply, investment expenditure, and
government expenditure. The money supply functionstates that the stock of money is determined (by the
Monetary Authority) on the basis of the level of income.

Applying the order condition of identification, we can see that the income equation is not identified whereas
the money supply equation is overidentified. To estimate the overidentified money supply model, one can
use the method of 2SLS. As its name indicates, the method of 2SLS involves two successive applications of
OLS.

The procedure is as follows:


Stage 1:To get rid of the likely correlation between and , regress first on all the predetermined
variables in the whole system, not just that equation. In the present case, this means regressing on and
as follows:
̂ ̂ ̂ ̂

Where, ̂ are the usual OLS residuals


From Eq. (24) we obtain ̂ ,
̂ ̂ ̂ ̂

Note that (25) is nothing but a reduced-form regression because only the exogenous or predetermined
variables appear on the right-hand side.Equation (25) can now be expressed as
̂ ̂

which shows that the stochastic consists of two parts: ̂ , which is a linear combination of the
nonstochastic X‟s, and a random component ̂ .Following the OLS theory, ̂ and ̂ are uncorrelated.
(Why?)

75
Stage 2: The overidentified money supply equation can now be written as:
̂ ̂
̂ ̂
̂
where ̂
Comparing equation (27) with the original money supply model, we see that they are very similar in
appearance,the only difference being that is replaced by ̂ . What is the advantage of (27)? It can be
shown that although in the original money supply equation is correlated or likely to be correlated with the
disturbance term (hence making OLS inappropriate), ̂ in (27) is uncorrelatedwith . Therefore, OLS
can be applied to (27), which will give consistent estimates of the parameters of the money supply function.

As this two-stage procedure indicates, the basic idea behind 2SLS is to “purify” the stochastic explanatory
variable of the influence of the stochastic disturbance . This goal is accomplished by performing the
reduced-form regression of on all the predetermined variables in the system (Stage 1). These
predetermined variables are called the instrumental variables. Obtaining the estimates ̂ , based on the
instrumental variables,and replacing in the original equation by the estimated ̂ , and then applying OLS
to the equation thus transformed (Stage 2). The estimators thus obtained are consistent; that is, they
converge to their true values as the sample size increases indefinitely.

Note the following features of 2SLS.


1. It can be applied to an individual equation in the system without directly taking into account any other
equation(s) in the system. Hence, forsolving econometric models involving a large number of equations,
2SLSoffers an economical method. For this reason, the method has been used extensivelyin practice.
2. Unlike ILS, which provides multiple estimates of parameters in theoveridentified equations, 2SLS
provides only one estimate per parameter.
3. Although specially designed to handle overidentified equations, themethod can also be applied to exactly
identified equations. But then ILS and2SLS will give identical estimates. (Why?)
4. If the values in the reduced-form regressions (that is, Stage 1 regressions) are very high, say, in excess
of 0.8, the classical OLS estimates and 2SLS estimates will be very close to OLS estimates. This is
because if the value in the first stage is very high, it means that the estimated values of the
endogenous variables are very close to their actual values, and hence the latter are less likely to be
correlated with the stochastic disturbances in the original structural equations.

1.6. Testing Simultaneity


If there is no simultaneous equation, or simultaneity problem, the OLS estimators produce consistent and
efficient estimators. On the other hand, if there is simultaneity, OLS estimators are not even consistent. In
the presence of simultaneity, the methods of Indirect Least Squares (ILS) and Two StageLeast Squares
(2SLS) will give estimators that are consistent and efficient. Oddly, if we apply these alternative methods
when there is in fact no simultaneity, these methods yield estimators that are consistent but not efficient
(i.e., with smaller variance). Therefore, we should check for the simultaneity problem before we discard
OLS in favor of the alternatives.

As we showed earlier, the simultaneity problem arises because some of the regressors are endogenous and
are, therefore, likely to be correlated with the disturbance term. Therefore, a test of simultaneity is
essentially a test of whether (an endogenous) regressor is correlated with the error term. If a simultaneity
76
problem exists, alternatives to OLS mustbe found; if it is not, we can use OLS. To find out which is the case
in a concretesituation, we can use Hausman‟s specification test.
For illustration, consider the following national income and money supply model:
National Income Function:
Money Supply Function:
Where, = Real Gross Domestic Product
= Money Supply
= Investment spending
= Government expenditure

Note that applying the order condition, the national income function is under identified while the money
supply function is overidentified (which can be estimated by 2SLS).

If there is no simultaneity problem(i.e., and are mutually independent), and should be


uncorrelated. On the other hand, if there is simultaneity, and will be correlated. To find out which is
the case, the Hausman test can be used:
The Hausman test involves the following steps:
Step-1: Regress on and to obtain ̂ (i.e., we estimate the reduced-form equation)
̂ ̂ ̂ ̂
̂
Step-2: Regress on and ̂ and perform a test on the coefficient of ̂ . That is,
̂ ̂ ̂ ̂ ̂ ̂
If the coefficient of ̂ is significant (p-value ≤0.05), there is simultaneity, and ILS or 2SLS must be used. If
the coefficient of ̂ is insignificant (p-value >0.05), there is no simultaneity, and it is better to use OLS.

Numerical Example:
Suppose that data is given on GDP, money supply, private investment and government spending. We are
interested to estimate the money supply model by 2SLS (since the equation is overidentified). However, we
need to make sure that there is indeed a simultaneity problem that make OLS inappropriate. Otherwise it
makes no sense to use the 2SLS method. To this end, we considered theHausman‟s specification error
testand obtained the following results.
 First we estimate the reduced-form regression given in equation (30). From this regression we obtain the
estimated GDP and the residuals ̂ .
 Second we regress money supply on estimated GDP and ̂ to obtain the following results:
̂ ̂ ̂

Since the t value of the coefficient of ̂ is statistically significant at 5% significance level (also visible in the
**), we must conclude that there is simultaneity between money supply and GDP. 2SLS is the appropriate
procedure to be used.

Worksheet
Exam training true/false

State whether each of the following statements is true or false:


 In a simultaneous-equation model, there must at least be three dependent variables.

77
 If two variables in a simultaneous-equation model are mutually dependent, they are endogenous
variables.
 In SEM, a predetermined variable is the same as an exogenous variable.
 An exogenous variable can be mutually dependent with an endogenous variable.
 The method of OLS is not applicable to estimate a structural equation in a simultaneous-equation model.
 The reduced form equation satisfies all the assumptions needed for the application of OLS.
 If a model is under-identified, the ILS procedure will give more than one parameter estimate for the
same parameter.
 Even though a model is not identified, it is possible that one of the model‟s equations is identified.
 In case an equation is not identified, 2SLS is not applicable.
 If an equation is exactly identified, ILS and 2SLS give identical results.

Examples of Simultaneous Equation Models


A. What are the three examples of SEMs given in the lecture notes?
B. Describe which assumption of OLS is violated in all three cases.
C. For the first and the second example, indicate which variables are exogenous and which variables are
endogenous.

Simultaneity bias
Simultaneity bias means that the independent variable influences the dependent variable, but also vice versa.
If an external shock influences the dependent variable, this has effect on the independent variable, and
therefore you can observe a correlation between the error term and the explanatory variable. In this simple
model, the demand for eggs is estimated based on the price of eggs.

A. Knowing economic theory, you can expect simultaneity bias in this model. Why?
B. Observe the following scatterplot, displaying the relation between P and U. Is P an exogenous or an
endogenous variable in this model? Explain your answer.

C. Alternative to running an OLS on this simple equation, it may be better to build a system of equations.
Build a system of equations for this case.

Order condition of identification

Use the order condition of identification to see if the following systems of equations are over-, under- or
exactly identified.

Model 1: Interest rate vs. GDP I (source: Gujarati & Porter)

78
Where Y= income (GDP), R=interest rate and M=money supply.

Model 2: Interest rate vs. GDP II (source: Gujarati & Porter)

Where Y= income (GDP), R=interest rate, M=money supply and I=gross private domestic investment.

Model 3: Demand and supply for loans (source: Gujarati & Porter)
Demand:
Supply:
Where Q = total commercial bank loans ($ billion), R = average prime rate, RS = 3-month Treasury bill rate,
RD =AAAcorporate bond rate, IPI = Index of Industrial Production and TBD = total bank deposits.

Model 4: Openness and inflation

Where I=inflation rate, IMP=imports as % of GDP (measure of openness), INC=GDP per capita,
LAND=log of land area in square miles.

Rank condition of identification


Verify that by the rank condition, equations (14) and (15) of the lecture notes (page 6) are unidentified but
equation (16) is identified.

Indirect least squares


There are two major types of coffee beans: the expensive and high-quality Arabica, and the cheaper and
easier-to-produce Robusta. It is stated that the demand and supply for raw coffee beans of type Arabica is
determined by the price of Arabica and Robusta beans:

Demand Function:
Supply Function:
Equilibrium condition:

Where QAt=quantity Arabica beans demanded and supplied (in tonnes), PAt= price of Arabica coffee beans,
PRt=price of Robusta coffee beans, PAt-1=lagged price of Arabica coffee beans.

A. Do you expect a positive or a negative value for , , and ? Base your answer on economic
theory.
B. Why does individual estimation of the equations lead to inconsistent parameters?
C. Is the model identified, according to the order condition? Explain your answer.
D. Rewrite the system of above structural equations in the reduced form. In order to do so, follow these
steps:
I. Equalize QA demanded and QA supplied.
II. Rewrite, so that the equilibrium price, PAt, is the only dependent variable, with PAt-1 and PRtas
explanatory variables. (solve for PAt)
III. Substitute this PAt in the demand or the supply function, so that the equilibrium quantity (QAt) is
the only dependent variable, with PAt-1 and PRtas explanatory variables.
E. Suppose these are the outcomes of the OLS regression on the reduced form equations:
79
Parameter Parameter estimate from OLS
8
0.2
0.4
44
0.6
-0.8
Use the above parameter estimates of the reduced model, to identify the structural parameters.
I. Solve for & (tip: use the estimates and the formulas of
& for solving for , and & for solving for )
II. Solve for &
III. Solve for &
IV. Write the structural equations including the ILS parameter estimates5
F. For each parameter estimate, check if the sign (- or +) makes sense from an economic point of view (law
of demand, law of supply, cross-elasticity of demand).

2SLS
a. Describe the steps of the 2SLS procedure.
b. Explain how the 2SLS procedure solves the simultaneity bias problem.
c. For a system of two equations (one predicting the price (P) and one predicting the wages (W)), the
following structural parameter estimates were obtained from running OLS and 2SLS:

Wt , Pt , Mt , and Xt are percentage changes in earnings, prices, import prices, and labor productivity
(all percentage changes are over the previous year), respectively, and where Vt represents unfilled
job vacancies (percentage of total number of employees).
As can be observed, the differences between the OLS and the 2SLS outcomes are very small.
Indicate for the following statements if they are true or false:
I. Since the OLS and 2SLS results are practically identical, the 2SLS results are meaningless.
II. We will always see similar outcomes for OLS and 2SLS if the equation is exactly identified.
III. Since the OLS and 2SLS results are practically identical, the correlation between W and u and
between P and u was insignificant.
IV. There is little/no simultaneity bias in this system of equations.

5
If you did well, you’ve obtained the following structural parameter estimates:α0=60;α1=-2; α2=1; ß0=20;ß1=3;ß2=-2

80
Lab class
Open the datafile wages.dta in Stata.

Consider the following system of equations:

𝑵
(2)
Nominal Wage Rate (in Birr)
𝑵 Unemployment rate (in %)
Consumer price index
Import price index
time
Stochastic disturbances
A person without knowledge of SEM decides to run just two simple OLS models to estimate the parameters.
Please do so (use the command regress)

A. Run the OLS for equation (1)


B. Save the residuals of the regression (use the command predict res1, res)
C. Run the OLS for equation (2)
D. Save the residuals of the regression (use the command predict res2, res)
E. Proof that “the assumption of no correlation between the explanatory variable(s) and the disturbance
term is violated” by making a scatterplot of the residuals vs. the independent variable(s). Use the
commandstwoway (scatter Pres1)and twoway (scatter W res2).
1) Based on the outcome of f, is there simultaneity bias? Explain your answer

Let‟s investigate the world cotton market. The following variables are available in the datafile cotton.dta:

T=time
QC= quantity of cotton supplied and demanded in million tons
PC= Cotton, CIF Liverpool, US cents per pound
PW=Wool, coarse, 23-micron, Australian Wool Exchange spot quote, US cents per kilogram
PH= Hides, wholesale dealer's price, US, Chicago, fob Shipping Point, US cents per pound

We will investigate the following system of equations:


Demand Function:
Supply Function:
Of course,
2) What are the endogenous and what are the predetermined variables?
3) Are the equations over-, under-, or exactly identified, according to the order condition for
identification?
4) Why may it be useful to include the variable T in the supply function? Refer to what you learned
in chapter 2.
5) Why is 2SLS in this case a better estimation method than ILS?
F. Estimate the first stage of the 2 stages of 2SLS (in other words, run an OLS regression with PC as
dependent variables, and all the predetermined variables as explanatory variables)by using the command
regress.
G. Save the predicted values for PC (use the command: predict PChat, xb)
81
H. Estimate the second stage of the 2 stages of 2SLS. Estimate the supply and the demand function as
described above, using ̂ instead of . Use the command regress two times (one per equation).
6) List the parameter estimates of ̂
7) List the parameter estimates of ̂
8) Which of the listed parameter estimates are significantly different from zero at the 5%
significance level?
I. It is also possible to run 2SLS directly in Stata. Use the commandivregress 2sls QC PW PH (PC =
T)for the demand equation, and ivregress 2sls QC T (PC = PW PH) for the supply equation. Give it a
try.
9) Compare the outcomes of g and f. Is there any difference? Why (not)?

82
4. INTRODUCTION TO PANEL DATA REGRESSION MODELS
4.1. Introduction
Up until now, we have covered cross-sectional and time series regression analysis using pure cross-sectional
or pure time series data.As you know, in cross-sectional data, values of one or more variables are collected
for several sample units, or entities, at the same point in time (e.g. Crime rates in Arba Minch for a given
year). In time series data we observe the values of one or more variables over a period of time (e.g. GDP for
several quarters or years). While these two cases arise often in applications, data sets that have both cross-
sectional and time series dimensions, commonly called panel data, are being used more and more often in
empirical research as such data can often shed light on important policy questions. In panel data, the same
cross-sectional unit (for example: individuals, families, firms, cities, states, etc) is surveyed over time. That
is, panel data have space as well as time dimensions.

For example, a panel data set on individual farmer‟s (household heads) farm productivity (quintals per
hectare), fertilizer applied (DAP in Kg), labour days (in hours), whether the farmer takes farm related
training is collected by randomly selecting farm households from a population at a given point in time
(2005). Then, these same people are re-interviewed at several subsequent points in time (say 2010 and
2015). This gives us data on productivity, fertilizer, labor, and training, for the same group of farm
households in different years(see Table-4.1).

There are other names for panel data, such as pooled data (pooling of time series and cross-sectional
observations), combination of time series and cross-section data, micro panel data, longitudinal data (a study
over time of a variable or group of subjects), event history analysis (e.g. studying the migration over time of
subjects through rural/urban/international areas) and cohort analysis (e.g. following the career path of 1965
graduates of AMU). Although there are subtle variations, all these are about a movement over time of cross-
sectional units.

Why Panel Data?


What are the advantages of panel data over cross-section or time series data?The
following advantages are described by Baltagi (1995):
1. Since panel data relate to individuals, firms, states, countries, etc., over time, there is bound to be
heterogeneity in these units. The techniques of panel data estimation can take such heterogeneity
explicitly into account by allowing for individual-specific variables, as we shall show shortly. We
use the term individual in a generic sense to include microunits such as individuals, firms, states, and
countries.
2. By combining time series of cross-section observations, panel data give more informative data, more
variability, less collinearity among variables, more degrees of freedom and more efficiency.

83
3. By studying the repeated cross section of observations, panel data are better suited to study the
dynamics of change. Spells of unemployment, job turnover, and labor mobility are better studied
with panel data.
4. Panel data can better detect and measure effects that simply cannot be observed in pure cross-section
or pure time series data. For example, the effects of minimum wage laws on employment and
earnings can be better studied if we include successive waves of minimum wage increases in the
federal and/or state minimum wages.
5. Panel data enable us to study more complicated behavioral models.For example, phenomena such as
economies of scale and technologicalchange can be better handled by panel data than by pure cross-
section orpure time series data.
6. By making data available for several thousand units, panel data canminimize the bias that might
result if we aggregate individuals or firms intobroad aggregates.

If each cross-sectional unit has the same number of time series observations, then such a panel (data) is
called a balanced panelor complete panel (see for instance Table-4.1). If the number of observations
differs among panel members, where some cross-sectionalelements have fewer observations, we call such a
panel an unbalanced panel. The same estimation techniques are used in both cases; however, we will be
concerned with a balanced panel in this chapter. Initially, we assume that the X‟s are nonstochastic and that
the error term follows the classical assumptions, namely, E(uit) N(0, σ2).

4.2. Estimation of Panel Data Models: The Fixed Effects Approach


A simple solution to estimating panel data would be to pool all the observations. That means thatall
observation together, no matter the time (t) or the individual (i), will be analyzed as a group, by using OLS.
This is called pooled OLS regression. Doing this, all individual characteristics that are not included in the
explanatory variables are ending up in the error term (i.e. the error term captures differences over time
andindividuals). In other words, we lose the heterogeneity that may exist between the individuals.Despite its
simplicity, the pooled OLS regression may distort thetrue picture of the relationship between the dependent
variable and the regressors across the cross-sectional units; individuals or companies. As this method is
unrealistic and undesirable, the Fixed Effects Model (FEM)of panel data analysisis preferred over OLS.

This is because the FEMaccounts for the “individuality” of each cross-sectional unit by letting the
intercept, to vary for each cross-sectional unit, but assumesthat the slope coefficients, , are constant
across cross-sectional units. Assuming that the intercept term varies for cross-sectional units, we can specify
a model as:
(4.1)
 Notice that we have put the subscript i on the intercept term to suggest that the intercepts of the
cross-sectional units (perhaps firms) may be different; the differences may be due to special features
of each firm.
84
o Note that if we were to write the intercept as , it will suggest that the intercept of each
cross-sectional unit or firm is time variant.
 Note also that the matrix of of equation (4.1)is the same for all cross-sectional units, and for all
time periods. Only - the intercept, varies per cross-sectional unit (i).

Equation (4.1) is known as the Fixed Effects Model(FEM). The term “Fixed Effects”is due to the fact that
each cross-sectional-unit‟s intercept, although different from the intercepts of other cross-sectional units,
does not change over time. The intercept for each cross-sectional unit is time-invariant.

How do we actually allow for the (fixed effect) intercept to vary between cross-sectional units? We can
easily do that by the dummy variable technique that welearned in Chapter 1, particularly, the differential
intercept dummies. Remember from chapter 1 that we have to prevent the dummy variable trap. This
means that we must include M-1 dummies. If the number of individuals is 50, we need 50-1=49 dummy
variables to represent the 50 different individuals in the datafile.This approach to the fixed effects model is
called the least-squares dummy variable (LSDV) model.

A disadvantage of the dummy variable approach is that the R2 is usually very high, because all the dummy
variables are explanatory variables and the R2 always increases if more (dummy) explanatory variables are
introduced in the model. Also it requires a lot of memory on the computer, which is not always available. An
alternative approach to estimating the fixed effects model is the within estimation method. This method is
used by Stata. In the within estimation method the variables (Y and X‟s) are all demeaned, i.e. the time-
averaged values are deducted from each value.
̅ ̅ ̅ ̅
Note that is time invariant, so ̅ =0. Therefore the within estimation for a model with one
explanatory variable (X) can be specified as:
̅ ̅ ̅ (4.2)
Where:
is the average of the individual effects ( )
̅ is the time-average Y value of individual i
̅ is the time-average X value of individual i.
̅ is the time-average error term for individual i.

 Example FEM
In order to estimate the effect of diet on study results, the university collected data from 50 students that eat
university food for 6 semesters. For all 50 students all information is available for the 6 semesters, so we
work with a balancedpanel. The expectation is that eating proteins (present in meat and eggs) has a positive
effect on brain capacity, and thus improves the study results. The variables in the model include:
Yit the average study results of student i in semester t
Pit how many meals with egg and meat student ate during semester
85
Gi a dummy variable equal to 1 if the student is female, and 0 if the student is male
Ait the age in years of student during the end of semester
Si the average state exam results of student i (before joining the university)
Hit the average weekly study hours (class attendance) of student during semester

̅ ̅ ̅ ̅ ̅ ̅ ̅

Note that the time-invariant variables (in this case Gi and Si) will be omitted, because their mean values ( ̅
and ̅ ) will be the same as the values for each t; there is no change over time.Therefore, the FEM does not provide
parameter estimates for time-invariant variables.

Let‟s look at sample output of this FEM:

Indeed, the individual differential intercept coefficients are not presented in the output, but they are taken
into account in estimating the model (as this is a FEM). This output shows the following:
- The F-test that tests the mutual significance of all the coefficients in the model is highly significant
This means that the model is able to explain the variance in Yit.
- The parameter coefficients, and their t-tests reveal that, when controlled for individual differences, the
study hours (Hit) have a significant effect on study results (Yit), because p>|t|<0.05. The effects of
protein intake (Pt) and age (Ait) on study results (Yit) are insignificant, because p>|t|>0.05. No estimates
for the effect of gender (Gi) and state exam result (Si) are given, because the fixed effects model can only
estimate the effect of time variant variables, and Gi and Si do not change over time.

86
- Sigma_u (5.14) is the standard deviation of the time-invariant error term (ui), which is available for
every individual (i). In this example, it expresses the variety in the time-constant error term of the 50
students. If sigma_u is large, it means that students are very different with regards to Y when controlled
for the X‟s. Sigma_e (9.04) is the standard deviation of the residuals (the over-all error term; ei). Rho
(0.244) is the share of the estimated variance of the overall error accounted for by the individual effect,
u_i. A high rho indicates that the fixed effect model is much better than the pooled regression, because
much of the error term can be explained by differences in individuals.
- The F-test that all ui‟s are 0 (F=1.68) reveals that there are significant differences between students
(Prob>F=0.0058). This supports the use of the Fixed Effects model.

 Types of FEM
The above-discussed model is known as a one-way fixed effects model, because we have allowed the
intercept to differ per individuals (i.e. cross-sectional units). We can also allow the time periods to have
different intercepts, by adding dummies for the different semesters. If we consider fixed
effects both for individuals and time periods, the model is called a two-way fixed effects model. In that
case, the within estimation method can be used for the individual effect and time dummies must be
introduced to capture the deterministic trend (chapter 2). However, allowing for individual fixed effects and
time fixed effects results in an enormous loss in degrees of freedom. Degrees of freedom are the total
number of observations in thesample less the number of independent (linear) constraints or restrictions put
on them.In the above model the number of observations is 300. The linear constraints consist of 4
( and ), and 49 individuals (I-1).That means 300-4-49=247 degrees of freedom are left. If
differential intercepts related to time are added, the total degrees of freedom will be 300-4-49-5=242. Losing
too many degrees of freedom usually results in poorer statistical analysis. This is a big disadvantage of the
FEM. More disadvantages are discussed below.

 Disadvantages of FEM
The disadvantages of the fixed effects model (FEM) are as follows:
- Per dummy variable added, a degree of freedom is lost. Therefore, if the sample is not very large,
introducing too many dummies will leave few observations to do meaningful statistical analysis.
- It is impossible to obtain parameter estimates for time-invariant variables like gender and ethnicity.
- The model results are valid under the assumption that ɛit ~N(0, σ2). As has a cross-sectional
component (i) and a time component (t), this assumption may not be realistic.
o Regarding the cross-sectional component: if there is heteroskedasticity, we can obtain
heteroscedasticity-corrected standard errors.
o Regarding the time component: if there is autocorrelation we can adjust the model and assume
autocorrelation. This is, however, beyond the scope of this course.

87
An alternative to the Fixed Effects Model(FEM) is the Random Effects Model (REM), and this REM is
able to deal with some of the above listed problems.

4.3. Estimation of Panel Data Model: The Random Effects Approach


The loss in the degrees of freedom and the fact that time-invariant variables cannot be considered as
explanatory variables in the Fixed Effect Approach, made econometricians consider and develop another
approach in analyzing panel data. In the FEM, dummy variables compensated for all the information of the
cross-sectional unit (the individual, the region, the firm or the country) which was not considered in the
explanatory variables. In the Random Effects Model (REM) this individual-specific element is not included
as dummy variable, but included in the error term. A classical Random Effects Model is called the Error
Component Model (ECM), which assumes that the error of the model consists of an individual and a
common component.

Instead of assuming that all the cross-sectional units are purposively selected, the ECM assumes that the
cross-sectional units are randomly selected from a pool of possible cross-sectional units. Let us consider the
example of panel data from 30 local restaurants with monthly data from 2016-2017. For each restaurant, the
monthly profit (Y) and the average number of customers on weekday evenings (X) were described.
(4.4)
The ECM assumes that the 30 restaurants included are a drawing from a much larger universe of all the local
restaurants in the study area. All these restaurants (the whole population of restaurants) have a common
mean value for the intercept ( ) and individual differences in the intercept values of each local restaurant
are reflected in the error term ui. Therefore,
(4.5)
Substituting (4.5) in (4.4), we get

(4.6)
Where,
The error term, , consists of two components: , which is the individual-specific error (in this case, the
restaurant-specific error) and , which is the combined time series and cross-section error component (the
error that cannot be explained by the independent variables (X‟s) and the individual characteristics (ui)).
For the error terms the ECM assumes (with meaning after ):
ui ~N(0, σu2)
the individual error term is normally distributed with variance σu2
ɛit ~N(0, σɛ2)
The combined time-series and cross-section error component is normally distributed with
variance σɛ2
E(uiɛit)=0

88
The error components ui andɛit are not correlated with each other
E(uiuj)=0 (i≠j)
The individual error components are not correlated with each other
E(ɛitɛis)=E(ɛitɛis)=E(ɛitɛjs)=0 (i≠j; t≠s)
The time series and cross section error component is not autocorrelated.

Notice the difference between the FEM and this ECM. In the FEM each individual has its own fixed
intercept value. In the ECM, on the other hand, the intercept represents the mean value of all these
individual intercepts and the error component, ui, represents the random deviation from this mean. We can
state that E(wit)=0 and var (wit)= σu2+ σɛ2. If σu2=0, there are no differences in the individual intercepts, so
we can pool all the data and run a pooled OLS regression. But if σu2>0, it is useful to run a panel data model.
In the ECM, the disturbance terms, wit for one specific individual (cross-sectional unit) over time are, of
course, correlated, because wit contains ui, which is constant for every individual. Therefore, using OLS to
estimate the model will lead to inefficient outcomes. The most appropriate method to estimate an ECM, is
the method of Generalized Least Squares (GLS). It goes beyond the scope of this course to explain GLS
mathematically, but we can get the outcome from Stata or Eviews also without understanding the
mathematics.

 Example of an ECM
Small local coffee shops in Arba Minch offer tea, coffee, soft drinks and sometimes snacks like bananas,
bombolinos or samosas. Furthermore, some of them offer the opportunity to play checkers (dam). An owner
of a small coffee shop wonders if offering checkers is good for the turnover, or bad. On the one hand,
customers may choose the shop with a checkers playing board, because they like to play while drinking
coffee. On the other hand, customers who play checkersuse the seats for a longer time, not leaving space for
others who also want to drink coffee. To find an answer to this question, the shop owner collected data from
15 randomly selected small coffee shops (N=50) over the period of 10 weeks (T=10). For each shop and
each week, she collected the following data:
yit the revenue of the ith shop in week t
dit if the ith shop offered a dam (checkers) board (1) or not (0) during week t
sit the number of seats available for guests in the ith shop in week t
ci if the ith shop is located in a city center (1) or not (0) (city center: Sikela or Secha)
The ECM model that she ran is the following:

The output GLS of the ECM looks as follows:

89
The output gives parameter estimates both for the time variant (d and s) and the time-invariant (c) variable.
Based on this output, the following conclusions can be drawn:
- The Wald Chi2 test that tests the mutual significance of all the coefficients in the model is highly
significant (Chi2(2)=60.10, p=0.000). This means that the model is able to explain the variance in Yit.
- The parameter coefficients, and their t-tests reveal that the availability of a dam board (d) and the number
of seats (s) both have a significant effect on the revenue (yit), because p>|t|<0.05.Presence in the city
center has no significant effect on the revenue, because p>|t|>0.05. Also, the constant term, the intercept,
is not significantly different from zero.
- Sigma_u (915.78) is the standard deviation of the individual error term (ui) and sigma_e (846.23) is the
standard deviation of the combined time series and cross-section error component(the over-all error term;
ei). Rho (0.539) is the share of the estimated variance of the overall error accounted for by the individual
effect, u_i. It would not be good to use pooled OLS regression in this case, because pooled OLS would
not consider the individual effect and, therefore, not be able to explain much of the variance in Y.
- To answer the question of the lady who runs the small coffee shop: we can conclude from this model that
offering a checkers board has a significant positive effect on the revenue of a local coffee shop. When
controlled for the other independent variables, shops that offer dam make, on average, 417 birr extra per
week.
90
 ECM vs FEM
Thus far we have seen three possibilities to analyze panel data in a model: pooled OLS regression, the Fixed
Effect Model (FEM) and the Random Effect Model, or Error Correction Model (ECM). Which of the three
is best? That depends on how the error term behaves.
- Use pooled OLS regression if the individuals (cross sectional units) are homogenous and if the time
periods are homogenous. We can check this by:
o Individual homogeneity: If, in the output of the FEM and the ECM sigma_u (σu) is close to 0, it
means that individuals are homogenous (there is no individual heterogeneity). If this is the case,
pooled OLS regression can be used.
o Time homogeneity: Check if the time-variant variables are stationary (chapter 2). If they are
stationary, you can use pooled OLS regression. If this is not the case, pooled OLS regression is
not optimal and you could use the FEM model with time trend dummies (if the dependent
variable has a deterministic trend). Alternatively, you can run time series econometrics models,
which are outside the scope of this course.
If there ís heterogeneity of the cross-sectional units (or over time), it is better to use the Fixed Effects
Model or the Error Correction model. How to choose between these two? The decision depends on
several factors:
- If T (number of time series data) is large and N (number of cross-sectional units/individuals) is small,
there is likely to be little difference in the parameter estimates of FEM and ECM. Because FEM is
easier, it is preferred to use FEM.
- When N is large and T is small, the estimates obtained by the two methods can differ significantly,
because there are manydifferent ui‟s. If the assumptions underlying ECM hold, ECM estimators are
more efficient than FEM estimators. Furthermore, the ECM can also give parameter estimates for time-
invariant variables.
How to know whether the assumptions underlyingECM hold?
a. One assumption underlying ECM is that the individuals (cross-sectional components) are randomly
selected from the research population. Because the ECM assumes random selection, statistical inference
can only be done if this is the case. If the selection of i‟s is not random, the FEM is appropriate.
b. Next to the randomness of selected sample units, it is important to see if the individual error component,
ui, and one or more regressors (X‟s) are correlated. If the individual error term is not related to any of the
regressors, it is likely that the sample is randomly selected, and that ß1 is indeed the common intercept
(with a random deviation u for each i). If, however, the X‟s and the ui are correlated, the assumption that
the i‟s are randomly selected is invalid, so ECM is inappropriate and FEM will give less biased
parameter estimates.

91
A test devised by Hausman, which is incorporated also in Stata and Eviews, can be used to check if the
ECM is better than the FEM.

Null-hypothesis of the Hausman test


The null hypothesis is that there is no big difference between the parameter estimates of FEM and ECM. If
the null hypothesis is correct, the individual errors (ui) are distributed independentlyof the explanatory
variables (Xit), so they do not bias the parameter estimates in the ECM model. If this is correct, both random
effects and fixed effects are consistent,but fixed effects will be inefficient becauseitinvolves estimating an
unnecessary set of dummy variable coefficients.
Alternative hypothesis of the Hausman test
If thenull hypothesis is rejected, it means that the parameter estimates of the FEM and the ECM differ
significantly. Therefore, the random effects (ECM) estimates will be subject to unobservedheterogeneity
bias, because the individual errors (ui) are related to the explanatory variables (Xit). Therefore, if the null
hypothesis is rejected, the FEM must be used.

 Example Hausman test


Parlasca et al. (2018) collected data to see if a mobile phone can increase household food security in pastoral
communities in Northern Kenia.
This information is from their abstract: Mobile phones could potentially facilitate access to food markets and thus improve
food security and nutrition, but research on such types of effects remains scarce. In this study we analyze whether mobile phones
improve dietary quality of pastoralists in Northern Kenya. We use six rounds of household panel data covering the period between
2009 and 2015. During this period, mobile phone ownership in the sample increased from less than 30% to more than 70%.

The authors measured the following variables:


Dependent variable:
HDDS: household dietary diversity score, which is a common tool to assess food security and dietary
diversity. The minimum score is 1 (all food is from the same group), the maximum score is 12 (as all food
types are classified in 12 categories). This was measured in two ways, representing two different variables:
- Number of food groups where consumption is mainly from self-production (0-12)
- Number of food groups where consumption is mainly from other sources (0-12)(e.g. purchases, food aid
or gifts)
Independent variables:
- Mobile phone ownership + use. - Cooking appliance: 1 if the household uses
- Income any form of advanced cooking appliance.
- Herd size (in TLU= either 1 head of cattle, or - Household size
0.7 of a camel, or 10 goats, or 10 sheep) - Education, gender and age of the household
- Land farmed (hectares) head
- Radio ownership

92
- Dummy variables for the years in the sample
(from 2009-2013), because of deterministic
trend

93
CHAPTER ONE: REGRESSION WITH DUMMY VARIABLES 2019/20

The results of their Fixed Effects regression are displayed in Table 4.2.
As the results reveal, mobile
Table 4.2: Results of the regression of Parlasca et al (2018)
ownership especially
Independent variables Food groups from Food groups from
increased diversity in food self-production other sources
consumption from non-self- Mobile phone ownership & use -0.011 (0.050) 0.238** (0.097)
produced sources. Income Income 0.557* (0.261) 1.812*** (0.378)
Herd size 0.002* (0.001) 0.002 (0.001)
positively contributed to the
Land farmed -0.001 (0.010) 0.012 (0.010)
diversity in food
Radio ownership 0.016 (0.078) -0.221 (0.235)
consumption, both from self- Cooking source -0.068 (0.130) 0.186 (0.259)
production and from other Household size 0.028***(0.009) 0.054** (0.024)
sources. The herd size is only Education household head 0.075** (0.026) -0.121* (0.064)
(weakly) significant for the Gender household head (1=female) 0.045 (0.089) 0.098 (0.179)
diversity of consumption of Age household head 0.002 (0.002) -0.003 (0.003)

self-produced foods. The 2009 -0.755*** (0.222) -0.015(0.453)


2010 -0.405** (0.179) 0.116 (0.476)
household size is significant
2011 -0.470** (0.214) 0.008 (0.398)
for both diversity self- 2012 -0.142 (0.129) -0.068 (0.247)
production and food from 2013 0.038 (0.065) -0.024(0.169)
other sources. The education Constant 0.865** (0.299) 5.701***(0.664)
of the household head Hausman test 96.37*** 149.74***
contributes positively to the Errors shown in parentheses are robust standard errors. *** p<0.01, ** p<0.05, *
p<0.1
diversity in self-production,
but negatively to the diversity in food obtained from other sources. The seasonal dummies are
significant for the diversity in consumption of self-produced foods. The big question is, if this
fixed effect model is the most efficient way to estimate the impact of the independent variables
on the dependent variable. To this end, the parameter estimates of this FEM must be compared
with the parameter estimates of a REM (e.g. the ECM). This was done, and the following
hypotheses were formed to choose between FEM and REM:
 H0: the parameter estimates of the FEM do not differ significantly from the parameter
estimates of REM (in other words: the individual farmer error is not correlated to any of the
independent variables).
 H1: the parameter estimates of the FEM differ significantly from the parameter estimates of
REM (in other words: the individual farmer error is correlated to (at least one of) the
independent variables, leading to biased parameter estimates).

From Table 4.2 we find that the test result is χ2=96.37 for the first dependent variable (about
food groups from self-production). This is significant at the 0.01 level (visible from ***), so H0
must be rejected. Furthermore, Table 4.2 reveals that the test result is χ2=149.74 for the second
dependent variable (about food groups from other sources). This is significant at the 0.01 level
(visible from ***), so H0 must be rejected.

Page 94
CHAPTER ONE: REGRESSION WITH DUMMY VARIABLES 2019/20

Concluding, for both dependent variables the FEM estimates differ significantly from the REM
estimated. Therefore, the parameter estimates in the REM approach (the ECM) are biased, due to
correlation between the individual error, ui, and (at least one of ) the independent variable(s). The
farmers were, apparently, not very randomly selected from the total population of farmers.
Therefore, the FEM is the optimal way to analyze this panel data. The authors Parlasca et al.
correctly decided to use the FEM.

Concluding note: we have seen that the ECM can provide parameter estimates for time-invariant
variables, such as gender and ethnicity. The FEM does not provide parameter estimates for these
variables, but it does control for them by approaching each individual as an individual. To this
end, the FEM controls for all possible time-invariant variables, whereas the ECM only considers
the time-invariant variables that are explicitly included in the model.

References
Baltagi, B. (2008). Econometric analysis of panel data. John Wiley & Sons.
Gujarati, D. (2003). Panel Data Regression Models in:Basic Econometrics. Fourth Edition. Singapura:
McGraw-Hill, chapter 16.
Parlasca, M. C., Mußhoff, O., & Qaim, M. (2018). How mobile phones can improve nutrition among
pastoral communities: Panel data evidence from Northern Kenya (No. 123). Global Food
Discussion Papers.
Wooldridge, J. M. (2013). Advanced Panel Data Methods in: Introductory Econometrics: A Modern
Approach (South-Western, Mason, OH). Chapter 14.1-14.2.

Worksheet
4.1. Introduction
1. Indicate for the following data if it is cross-sectional data, time series data or panel data:
a. Yearly GDP, unemployment and inflation figures of Ethiopia from 2000-2020.
b. Monthly income and savings figures of 3000 households from 2010-2020.
c. Average academic performance of section A, B, C and D in semester 2 of the academic year
2019-2020.
d. Yearly maize production of- and technologies used by 100 farmers from 1995-2015.
e. Monthly unemployment and economic performance statistics for all Ethiopian provinces.
2. Describe the advantages of panel data over cross-section or time series data in Amharic (do not copy-
paste from Baltagi, but translate to Amharic to get a deep understanding).
3. A balanced panel is preferred over an unbalanced panel. How come that still many analyses involve
unbalanced panel data?
4.2. Estimation of Panel Data Regression Model: The Fixed Effects Approach

Page 95
CHAPTER ONE: REGRESSION WITH DUMMY VARIABLES 2019/20

Gujarati (….) collected data on the civilian unemployment rate Y (%) and manufacturing wage
index (1992 = 100) for Canada, the United Kingdom, and the United States for the period 1980–
1999. Consider the model:

1. From an economic point of view, what is the expected relationship between Y and X? Why?
2. Explain: how can this data be analyzed by using pooled regression?
The pooled regression gives the following results:

3. Interpret the outcome of the model: does the manufacturing wage index have a significant effect on
the unemployment rate?
4. What is the disadvantage of using pooled regression?
5. How does the Fixed Effect Model (FEM) solve for this disadvantage?

Because the data is panel data, also a Fixed Effect Model (FEM) was run:

Page 96
CHAPTER ONE: REGRESSION WITH DUMMY VARIABLES 2019/20

6. Report the results of the F-test to indicate if the model, as a whole, is able to explain the variance in
unemployment (Y).
7. Report the results of the t-test to indicate if the manufacturing wage index is a significant predictor for
the unemployment.
8. Is the standard deviation of the residual within the countries higher than the standard deviation of the
overall error term? Explain your answer by referring to sigma_u and sigma_e.
9. Rho is 0.375. What does that mean?
***
Though the Fixed Effects Model solves the main problem of a pooled OLS regression (loss of
heterogeneity between the groups), there are also some disadvantages. The following questions are related
to the disadvantages of the Fixed Effects Model:
1. An economist wants to predict the effect of corona cases on the import of disinfectant soap. To this
end, he collects panel data from December 2019- June 2020 (7 months) for all 54 African countries.
For each month and each country, he collects the average active corona cases (X) and the monetary
value of imported disinfectant soap (Y). The pooled regression model that he estimates is
.
a. How many degrees of freedom are lost in this simple pooled regression model?6

6
NB: the degrees of freedom equal the number of observations minus the number of parameters you need to
calculate during an analysis.

Page 97
CHAPTER ONE: REGRESSION WITH DUMMY VARIABLES 2019/20

b. How many degrees of freedom are lost if he uses a one-way FEM (i.e. adding differential
intercepts for the countries)?
c. How many degrees of freedom are lost if he uses a two-way FEM? Check page 4 of the lecture
notes chapter 4, to see what‟s a two-way FEM.
d. Why is losing degrees of freedom a disadvantage of the FEM?
2. Why can time-invariant variables, such as gender and ethnicity, not be estimated in a FEM? Check
page 3 of the lecture notes.
4.3. Estimation of Panel Data Regression Model: The Random Effects Approach
There are many banana farmers around Arba Minch. Each harvesting season they face a loss of crop,
possibly due to fruit flies, too much or too little water availability, etc. You are asked to design a study in
which panel data will be collected and an ECM will be run. To this end, do the following:
1. Identify possible determinants of crop loss (in the case of banana).
2. Indicate how these determinants will be measured (Which question will be asked in the
questionnaire? Which variables will be in your econometric model?).
3. Indicate how often the data will be collected over time, and why.
4. Mathematically describe the ECM, including the variables selected above.
5. Explain why the ECM is the appropriate method for analyzing the data.
***
In 2019, Thanh Phan et al published a research titled: Does microcredit increase household food
consumption? A study of rural Vietnam7. Data was collected in 2008, 2010, 2012, 2014 and 2016. Next to
C (food consumption, the dependent variable) and K (microcredit borrowing, the explanatory variable of
interest), several other variables were considered:

In this article, a panel data analysis of an unbalanced panel was presented, by using multiple instrumental
variables (remember: chapter 3) and by including several lagged variables (remember: chapter 2). To
prepare yourself for thesis or post-graduate work, it is very useful to read the full article. For this exercise,
however, the following simplifications are made:

7
Phan, C. T., Sun, S., Zhou, Z. Y., & Beg, R. (2019). Does microcredit increase household food consumption? A
study of rural Vietnam. Journal of Asian Economics, 62, 39-51.

Page 98
CHAPTER ONE: REGRESSION WITH DUMMY VARIABLES 2019/20

 We assume the panel data is balanced (N=1537, T=6)


 We assume that C is the dependent variable, and logK, logY, EduH, HSize, AgeH, Nfarming and
Shock are the independent variables of interest. The other variables of the above model, and
lagged variables are not considered.
1. What does it mean that we assume that the panel data is balanced?
2. Why would it be wise to include logK and logY rather than K and Y? (remember econometrics I,
from last semester)
3. Specify the Error Components Model for the above-described case, including the variables of interest.
The following output, based on the GLS estimation method, is given:

C Coefficients Robust SE
logK 0.461*** 0.178
logY -0.0532* 0.029
EduH -0.0875** 0.037
HSize -0.302** 0.132
AgeH -0.0106 0.013
Nfarming -0.796*** 0.231
Shock -0.367*** 0.116
Note: ***, **, and * indicate significance at 1, 5, and 10%, respectively.
4. Summarize the outcome of this model.
5. Write a conclusion to the above stated research question Does microcredit increase household food
consumption? based on the above presented model.
Fixed effects or Random effects?
Kumari and Shamar (2017)8 published an article in which they presented their panel data model with the
following (fragment of) abstract:
Purpose
The purpose of this paper is to identify key determinants of foreign direct investment (FDI) inflows in developing
countries by using unbalanced panel data set pertaining to the years 1990-2012. This study considers 20 developing
countries from the whole of South, East and South-East Asia.
Design/methodology/approach
Using seven explanatory variables (market size, trade openness, infrastructure, inflation, interest rate, research and
development and human capital), the authors have tried to find the best fit model from the two models considered
(fixed effect model and random effect model) with the help of Hausman test.
Findings
Fixed effect estimation indicates that market size, trade openness, interest rate and human capital yield significant
coefficients in relation to FDI inflow for the panel of developing countries under study. The findings reveal that
market size is the most significant determinant of FDI inflow.

The dependent variable is FDI inflow, and the explanatory variables are market size, trade openness,
infrastructure, inflation rate, interest rate (IR), research and development (R&D) and human capital (HC).
The cross-sectional units are 18 countries in South- East- and South-East Asia with different levels of
development. The authors have estimated the Fixed Effects and the Random Effects Model.

8
Kumari, R., & Sharma, A. K. (2017). Determinants of foreign direct investment in developing countries: a panel
data study. International Journal of Emerging Markets.

Page 99
CHAPTER ONE: REGRESSION WITH DUMMY VARIABLES 2019/20

1. Which approach, fixed effects or random effects, do you think is most suitable for this case? Base
your answer on the case description and on lecture notes p. 8-11.
2. The results of both approaches is presented in the below table (Table V). Compare the results: what
are the biggest differences?
Independent variables Fixed effects (within) regression Random effects GLS regression
Constant -7.84 (-3.58) 5.05 (5.96)
Market size 1.63 (7.39)** 0.33 (3.85)**
Trade openness 0.63 (1.17) -1.108 (-2.19)
Infrastructure -0.00012 (-2.22)** 0.000072 (1.48)**
Inflation -0.00055(-3.67)** 0.0044 (4.66)
Interest rate -0.0084 (-3.67)** -0.0096 (-3.91)**
Research and development -1.09e-06 (-1.27) 5.47e-07 (0.64)
Human capital 0.0037 (2.82)** 0.0046 (3.33)
No of countries 18 18
No of observations 304 304
(unbalanced sample)
Overall R squared 0.54 0.48
F-statistic: 46.88 (p:0.000) Wald chi-square: 262.88
(p:0.000)
Hausman p-value 0.000
Notes: FDI is measured as net inflows (BOP current USD). Market size is measured as the GDP (current
USD). GDP indicates economic growth and standards of living. Trade openness (masured as an import
plus export divided by GDP per capita (current USD))) describes the openness of the economy.
Infrastructure measured by proxies of electric power consumption (kWh per capita). Interest rate
measured by real interest rate (percent). Research and development is measured as the patent
applications residents registered every year. Human capital is measured as the school enrollment
secondary (percent gross). 1990-2021. t-statistics are in parentheses. ** significant at the 5% level.
3. Based on the information in the above table, would you still make the same choice between the Fixed
Effects Approach and the Random Effects Approach? Refer, at least, to the Hausman test in your
answer.

Lab class
A classical key to economic growth is investment. Next to attracting foreign investment, household
savings can also be used to fund investment. In order to save, a household needs surplus income. Many
panel data studies have been done into the relation between household income and household savings. In
this lab class you will analyze (fake) income- and savings data of 221 rich households from Hawassa.
Sit= Savings of household i in period t in ETB
Iit= Income of household i in period t in ETB
Fit= Family size of household i in period t
Ci= If the household owns a car or another expensive asset at the start of the study
First consider the following saving function:

A. Open Stata, and within Stata the file Householddata.dta.


B. Use the command summarize savings.
1) What is the highest saving? And what is the minimum amount saved?
C. Run the above presented model using pooled OLS regression. reg savings income fam_size
Car_or_other_asset
2) Describe the effect of income, family size and asset ownership (like a car) on savings.

Page 100
CHAPTER ONE: REGRESSION WITH DUMMY VARIABLES 2019/20

3) Does this model allow for heterogeneity among the households? Explain your answer.
4) Use the command xtset householdID time to explain Stata that you are using panel data.
D. Run the fixed effect model, by using the command xtreg savings income fam_size
Car_or_other_asset,fe
5) Why is the variable Car_or_other_asset omitted?
6) Describe the effect of income and family size on savings.
7) Is the model able to explain the variation in household saving? Refer to the F-test
8) List the values for sigma_u, sigma_e and rho, and explain what they mean.
9) Is it better to use the Fixed Effects Model or Pooled OLS regression for this dataset? Give at
least one reason for your answer.
E. Save the estimates of this fixed effects regression model, so we can use them later in the Hausman
test. Use the command est store fixed.
F. Now run the Random Effects Model, by using the command xtreg savings income fam_size
Car_or_other_asset, re
10) Describe the difference between the fixed effect model and the random effects model
11) Describe the effect of income, family size and car (or other asset) ownership on savings.
12) Is the model able to explain the variation in household saving? Refer to the Wald Chi-squared
test.
13) List the values for sigma_u, sigma_e and rho, and explain what they mean.
G. Save the estimates of this random effects model, by using the command est store random.
H. Run the Hausman test which compares the estimated parameters of the FEM and the REM. Use the
command hausman fixed random.
14) What is the null hypothesis of the Hausman test in this case?
15) Report the chi-square test statistic of the Hausman test, and it‟s p-value.
16) Which model is better to be used? REM or FEM? Base your answer on the outcome of the
Hausman test.

Page 101

You might also like