0% found this document useful (0 votes)
127 views232 pages

Introduction To Econometrics 12-09-2019

The following estimated equation was obtained by OLS regression using quarterly data for 1978 to 1996 inclusive. Yt = 2.20 + 0.104Xt1 - 3.48 Xt2 + 0.34Xt3 (3.4) (0.005) (2.2) (0.15) Standard errors are in parentheses, the explained sum of squares was 109.6, and the residual sum of squares 18.48. a. Test at the 5% level for the statistical significance of the parameter estimates. b. Calculate the coefficient of determination

Uploaded by

huyn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views232 pages

Introduction To Econometrics 12-09-2019

The following estimated equation was obtained by OLS regression using quarterly data for 1978 to 1996 inclusive. Yt = 2.20 + 0.104Xt1 - 3.48 Xt2 + 0.34Xt3 (3.4) (0.005) (2.2) (0.15) Standard errors are in parentheses, the explained sum of squares was 109.6, and the residual sum of squares 18.48. a. Test at the 5% level for the statistical significance of the parameter estimates. b. Calculate the coefficient of determination

Uploaded by

huyn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 232

Cover design: Jordi Uriel

INTRODUCTION TO ECONOMETRICS

Ezequiel Uriel

2019
University of Valencia
I would like to thank the professors Luisa Moltó, Amado Peiró, Paz Rico, Pilar
Beneito and Javier Ferri for their suggestions for the errata they have detected in previous
versions, and for having provided me with data to formulate exercises. Some students
have also collaborated in the detection of errata. In any case, I am solely responsible for
the errata that have not been detected.
Summary
1 Econometrics and economic data..................................................... 9
1.1 What is econometrics? ................................................................................ 9
1.2 Steps in developing an econometric model .............................................. 10
1.3 Economic data .......................................................................................... 13
2 The simple regression model: estimation and properties............... 15
2.1 Some definitions in the simple regression model ..................................... 15
2.1.1 Population regression model and population regression function.............................. 15
2.1.2 Sample regression function ........................................................................................ 16
2.2 Obtaining the Ordinary Least Squares (OLS) Estimates .......................... 17
2.2.1 Different criteria of estimation................................................................................... 17
2.2.2 Application of least square criterion .......................................................................... 19
2.3 Some characteristics of OLS estimators ................................................... 21
2.3.1 Algebraic implications of the estimation ................................................................... 21
2.3.2 Decomposition of the variance of y ........................................................................... 22
2.3.3 Goodness of fit: Coefficient of determination (R2) .................................................... 23
2.3.4 Regression through the origin .................................................................................... 25
2.4 Units of measurement and functional form .............................................. 26
2.4.1 Units of Measurement ................................................................................................ 26
2.4.2 Functional Form......................................................................................................... 27
2.5 Assumptions and statistical properties of OLS ......................................... 33
2.5.1 Statistical assumptions of the CLM in simple linear regression ................................ 33
2.5.2 Desirable properties of the estimators ........................................................................ 35
2.5.3 Statistical properties of OLS estimators ..................................................................... 37
Exercises ......................................................................................................... 41
Annex 2.1 Case study: Engel curve for demand of dairy products ................ 48
Appendixes ..................................................................................................... 54
Appendix 2.1: Two alternative forms to express β̂ 2 ......................................................... 54
Appendix 2.2. Proof: rxy2 = R 2 .......................................................................................... 55
Appendix 2.3. Proportional change versus change in logarithms ....................................... 55
Appendix 2.4. Proof: OLS estimators are linear and unbiased............................................ 56
Appendix 2.5. Calculation of variance of β̂ 2 : ................................................................... 57
Appendix 2.6. Proof of Gauss-Markov Theorem for the slope in simple regression .......... 58
)2
Appendix 2.7. Proof: σ is an unbiased estimator of the variance of the disturbance ....... 59
Appendix 2.8. Consistency of the OLS estimator ............................................................... 61
Appendix 2.9 Maximum likelihood estimator .................................................................... 62
3 Multiple linear regression: estimation and properties.................... 66
3.1 The multiple linear regression model ....................................................... 66
3.1.1 Population regression model and population regression function.............................. 67
3.1.2 Sample regression function ........................................................................................ 68
3.2 Obtaining the OLS estimates, interpretation of the coefficients, and other
characteristics ................................................................................................. 69
3.2.1 Obtaining the OLS estimates ...................................................................................... 69
3.2.2 Interpretation of the coefficients ................................................................................ 71
3.2.3 Algebraic implications of the estimation ................................................................... 75
3.3 Assumptions and statistical properties of the OLS estimators ................. 76
3.3.1 Statistical assumptions of the CLM in multiple linear regression) ............................. 76
3.3.2 Statistical properties of the OLS estimator ................................................................ 78
3.4 More on functional forms ......................................................................... 82
3.4.1 Use of logarithms in the econometric models ............................................................ 82
3.4.2 Polynomial functions ................................................................................................. 83
3.5 Goodness-of-fit and selection of regressors. ............................................ 85
3.5.1 Coefficient of determination ...................................................................................... 85
3.5.2 Adjusted R-Squared ................................................................................................... 86
3.5.3 Akaike information criterion (AIC) and Schwarz criterion (SC) ............................... 87
Exercises ......................................................................................................... 89
Appendixes ..................................................................................................... 97
Appendix 3.1 Proof of the theorem of Gauss-Markov ........................................................ 97
)2
Appendix 3.2 Proof: σ is an unbiased estimator of the variance of the disturbance ....... 98
Appendix 3.3 Consistency of the OLS estimator ................................................................ 99
Appendix 3.4 Maximum likelihood estimator .................................................................. 101
4 Hypothesis testing in the multiple regression model ................... 104
4.1 Hypothesis testing: an overview ............................................................. 104
4.1.1 Formulation of the null hypothesis and the alternative hypothesis .......................... 104
4.1.2 Test statistic ............................................................................................................. 105
4.1.3 Decision rule ............................................................................................................ 105
4.2 Testing hypotheses using the t test ......................................................... 108
4.2.1 Test of a single parameter ........................................................................................ 108
4.2.2 Confidence intervals ................................................................................................ 118
4.2.3 Testing hypotheses about a single linear combination of the parameters ................ 119
4.2.4 Economic importance versus statistical significance ............................................... 124
4.3 Testing multiple linear restrictions using the F test. .............................. 124
4.3.1 Exclusion restrictions ............................................................................................... 125
4.3.2 Model significance ................................................................................................... 129
4.3.3 Testing other linear restrictions................................................................................ 131
4.3.4 Relation between F and t statistics ........................................................................... 132
4.4 Testing without normality ...................................................................... 133
4.5 Prediction................................................................................................ 133
4.5.1 Point prediction ........................................................................................................ 133
4.5.2 Interval prediction .................................................................................................... 134
4.5.3 Predicting y in a ln(y) model .................................................................................... 137
4.5.4 Forecast evaluation and dynamic prediction ............................................................ 138
Exercises ....................................................................................................... 140
5 Multiple regression analysis with qualitative information .......... 156
5.1 Introducing qualitative information in econometric models. ................. 156
5.2 A single dummy independent variable ................................................... 156
5.3 Multiple categories for an attribute ........................................................ 160
5.4 Several attributes .................................................................................... 162
5.5 Interactions involving dummy variables. ............................................... 164
5.5.1 Interactions between two dummy variables ............................................................. 164
5.5.2 Interactions between a dummy variable and a quantitative variable ........................ 165
5.6 Testing structural changes ...................................................................... 166
5.6.1 Using dummy variables ........................................................................................... 166
5.6.2 Using separate regressions: The Chow test .............................................................. 169
Exercises ....................................................................................................... 172
6 Relaxing the assumptions in the linear classical model............... 186
6.1 Relaxing the assumptions in the linear classical model: an overview.... 186
6.2 Misspecification ..................................................................................... 188
6.2.1 Consequences of misspecification ........................................................................... 188
6.2.2 Specification tests: the RESET test .......................................................................... 190
6.3 Multicollinearity ..................................................................................... 192
6.3.1 Introduction.............................................................................................................. 192
6.3.2 Detection .................................................................................................................. 193
6.3.3 Solutions .................................................................................................................. 196
6.4 Normality test ......................................................................................... 197
6.5 Heteroskedasticity .................................................................................. 199
6.5.1 Causes of heteroskedasticity .................................................................................... 199
6.5.2 Consequences of heteroskedasticity......................................................................... 200
6.5.3 Heteroskedasticity tests ............................................................................................ 200
6.5.4 Estimation of heteroskedasticity-consistent covariance ........................................... 206
6.5.5 The treatment of the heteroskedasticity ................................................................... 207
6.6 Autocorrelation ....................................................................................... 209
6.6.1 Causes of autocorrelation ......................................................................................... 210
6.6.2 Consequences of autocorrelation ............................................................................. 212
6.6.3 Autocorrelation tests ................................................................................................ 212
6.6.4 HAC standard errors ................................................................................................ 218
6.6.5 Autocorrelation treatment ........................................................................................ 219
Exercises ....................................................................................................... 220
Appendix 6.1 ................................................................................................ 231
1 ECONOMETRICS AND ECONOMIC DATA

1.1 What is econometrics?


First, let us see something about the origin of econometrics as a discipline. The
term econometrics is believed to have been crafted by Ragnar Frisch, co-winner of the
first Nobel Prize in Economic Sciences in 1969, along with fellow econometrician Jan
Tinbergen. Both of them were founders of the Econometric Society in 1933. In section I
of the constitution of this society, it is stated that
“The Econometric Society is an international society for the advancement of economic
theory in its relation to statistics and mathematics. Its main object shall be to promote studies
that aim at a unification of the theoretical-quantitative and the empirical-quantitative approach
to economic problems and that are penetrated by constructive and rigorous thinking similar to
that which has come to dominate the natural sciences”
In the first issue of Econometrica (1933), the Econometric Society journal, Ragnar
Frisch gives us an explanation about the meaning of econometrics:
“But there are several aspects of the quantitative approach to economics, and no single
one of these aspects, taken by itself, should be confounded with econometrics. Thus,
econometrics is by no means the same as economic statistics. Nor is it identical with what we
call general economic theory, although a considerable portion of this theory has a definitely
quantitative character. Nor should econometrics be taken as synonymous with the application of
mathematics to economics. Experience has shown that each of these three viewpoints, that of
statistics, economic theory, and mathematics, is a necessary, but not by itself a sufficient
condition for a real understanding of the quantitative relations in modern economic life. It is the
unification of all three that is powerful. And it is this unification that constitutes econometrics.”
Today, we would also say that econometrics is the combined study of economic
models, mathematical statistics, and economic data. Within the field of econometrics,
econometric theory can be distinguished from applied econometrics.
Econometric theory concerns the development of tools and methods, and the study
of the properties of econometric methods. Econometric theory belongs to the field of
statistics.
Applied econometrics is a term describing the development of quantitative
economic models and the application of econometric methods to these models using
economic data. Applied econometrics is mainly used in the field of applied economics.
What are the goals of Econometrics? We are going to examine three:
1) Knowledge of the real economy. Econometric methods allow us to estimate
economic magnitudes such as the marginal propensity to consume or the elasticity of
labor with respect to output. These estimations are located in a determined time and space:

9
INTRODUCTION TO ECONOMETRICS

for example, in Spain in the last quarter of the 20th century. In addition to the estimation,
in which numerical values are obtained, econometric methods allow us to perform tests
of hypothesis; for example, in a production function, is the hypothesis of constant returns
to scale admissible?
2) Economic simulation policy. Econometrics methods can be used to simulate the
effects of alternative policies. For example, with an appropriate econometric
model we could see, in quantitative terms, how the different increases in
tobacco tax affect the consumption of tobacco.
3) Prediction or forecasting. Very often econometric methods are used to predict
values of economic variables in the future. By making predictions we try to
reduce our uncertainty in the future of the economy. This is not an easy task,
since in general the predictions are only satisfactory when there are no drastic
changes in the economy. Although it would be useful to be able to predict these
drastic changes accurately, both econometric and other alternative methods
tend to be imprecise.

1.2 Steps in developing an econometric model


There are three main steps in developing an econometric model: specification,
estimation and validation.
While in a first approximation these stages follow a sequential order, in
econometric analysis it is generally necessary to go back more than once within this
sequence. It is necessary to continuously confront the model with the data and any other
information source, in order to obtain an econometric model compatible with the data.
The model can be used to analyze reality, offer better predictions or constitute a good
basis for making decisions. Now we will describe the steps listed above.

(a) Specification
In this first step, the model or models used must be defined, as well as data to be
used in the estimation stage.
In the specification step, we will refer to four elements: the economic model, the
econometric model, the statistical assumptions of the model and the data. In this section
we will refer to the first three elements; in the following section we will examine different
types of data used in econometric analysis.
The first element we need is an economic model. In some cases, a formal
economic model is constructed entirely using economic theory. In other cases, economic
theory is used less formally in constructing an economic model.
After we have an economic model, we must convert it into an econometric model.
We are going to see that with two examples.
EXAMPLE 1.1 Keynesian consumption function
Keynes formulated his well-known consumption function in three propositions:
Proposition 1: Consumption is a function of income, and both variables are measured in real terms.
If the variables are measured in real terms, it means that when consumers decide the proportion of income
devoted to consumption, they are not affected by monetary illusion.
Analytically, proposition 1 can be expressed in the following way:

10
ECONOMETRICS AND ECONOMIC DATA

cons = f (inc) (1-1)


Proposition 2: Consumption is an increasing function of income, but an increase in income always
causes an increase, to a lesser degree, in consumption.
This proposition implies that marginal propensity to consumption is greater than 0 (it is an
increasing function), but it is smaller than 1 (an increase in income always causes an increase, to a lesser
degree, in consumption).
Analytically, proposition 2 can be expressed in the following way:
d cons
0< <1 (1-2)
d inc
Proposition 3: The proportion of income consumed is smaller when income increases. That is to
say, the proportion of the last euro earned devoted to consumption is smaller than the proportion of total
income earned devoted to consumption.
Analytically, proposition 3 can be expressed in the following way:
dcon cons
< (1-3)
d inc inc
In other words, the marginal propensity to consume is smaller than the average propensity to
consume.
These three propositions constitute an economic model: the Keynesian consumption function.
To estimate and test this model we must convert it into an econometric model. For this conversion,
two requirements must be accomplished.
According to the first requirement, it is necessary to specify the mathematical form of the function.
The linear function has been used in this case because, in addition to being simple, it is compatible with the
description made by Keynes.
In order to justify the second requirement, it must be taken into account that the model formulated
in proposition 1 is deterministic. That is to say, income is the only factor in the determination of
consumption. But in real life there are many other factors, other than income, which have an influence on
consumption. In an econometric model, all the factors different from the independent variables included
are gathered in a variable denominated random disturbance or error (u). The second requirement is the
introduction of the term of error in the equation.
In general, all the relevant factors must be introduced explicitly in the econometric model; all the
other factors are taken into account in a unique variable: the error or the random disturbance. In the
Keynesian consumption function the only relevant factor considered is income.
Taking into account these two requirements, Keynesian consumption function can be expressed in
the following way:
β1 + β 2 inc + u
cons = (1-4)
This is an econometric model that can be estimated if you have data on consumption and income.
Let us see now the other two propositions. In this linear model, the marginal propensity to consumption is
the following:
d cons
= β2 (1-5)
d inc
Consequently, proposition 2 in this model is the following:
0 < β2 < 1 (1-6)
Once the model has been estimated, it is possible to test whether the estimate of β 2 is between 0
and l.
The average propensity to consume in the linear model, considering that the error is equal to 0, is
the following:
cons β1 + β 2 inc β1
= = + β2 (1-7)
inc inc inc
Therefore, proposition 3 implies that

11
INTRODUCTION TO ECONOMETRICS

β1 β1
+ β 2 > β 2 or >0 (1-8)
inc inc
That is to say,
β1 > 0 (1-9)
Once the model has been estimated, testing proposition 3 is equivalent to testing whether the
intercept is significantly greater than 0.
EXAMPLE 1.2 Wage determination
Economic model:
Formal economic theory - human capital theory- says that education (educ), experience (exper)
and training are factors that affect productivity and hence the wage. Therefore, an economic model for
wage determination could be the following:
wage = f (educ, exper , training ) (1-10)
Incidentally, do you think there is any variable missing in this model?
Econometric model:
The corresponding econometric model, using a mathematical linear form, is the following:
β1 + β 2 educ + β3 exper + β 4 training + u
wage = (1-11)
To sum up, to convert an economic model into an econometric model:
a) The form of the function f(.) has been specified.
b) A disturbance variable has been included to reflect the effect of other variables
affecting wage, but not appearing in the model.
An important element in the specification of the model is the formulation of a set
of statistical assumptions, which are used in subsequent steps. These statistical
assumptions play a key role in hypothesis testing and, in general, throughout the inference
process carried out with the model.

(b) Estimation
In the estimation process we obtain numerical values of the coefficients of an
econometric model. To complete this stage, data are required on all observable variables
that appear in the specified econometric model, while it is also necessary to select the
appropriate estimation method, taking into account the implications of this choice on the
statistical properties of estimators of the coefficients. The distinction between estimator
and estimate should be made clear. An estimator is the result of applying an estimation
method to an econometric specification. On the other hand, an estimate consists of
obtaining a numerical value of an estimator for a given sample. For example, applying a
very simple estimation method, called ordinary least squares, to the specification of the
consumption function (1-4) provides expressions which determine the estimators β̂1 and
β̂ 2 . Substituting the sample data in these expressions, two numbers are obtained: one for
β̂1 and one for β̂ 2 which provide estimates of the parameters β 1 and β 2 .
In general, it is possible to obtain analytical expressions of the estimators,
particularly in the case of estimating linear relationships. But in non-linear procedures of
estimation it is often difficult to establish their analytical expression.

12
ECONOMETRICS AND ECONOMIC DATA

(c) Validation
The results are assessed in the validation stage, where we assess whether the
estimates obtained in the previous stage are acceptable, both theoretically and from the
statistical point of view. On the one hand, we analyze, whether estimates of model
parameters have the expected signs and magnitudes: that is to say, whether they satisfy
the constraints established by economic theory.
From the statistical point of view, on the other hand, statistical tests are performed
on the significance of the parameters of the model, using the statistical assumptions made
in the specification step. In turn, it is important to test whether the statistical assumptions
of the econometric model are fulfilled, although it should be noted that not all assumptions
are testable. The violation of any of these assumptions implies, in general, the application
of another estimation method that allows us to obtain estimators whose statistical
properties are as good as possible.
One way to establish the ability of a model to make predictions is to use the model
to forecast outside the sample period, and then to compare the predicted values of the
endogenous variable with the values actually observed.

1.3 Economic data


As we have seen, an empirical analysis uses data to test a theory or to estimate a
relationship. It is important to stress that in Econometrics we use non-experimental data.
Non experimental or observational data are collected by observing the real world in a
passive way. In this case, data are not the outcome of controlled experiments.
Experimental data are often collected in laboratory environments in the same way
as in natural sciences. Now, we are going to see three types of data which can be used in
the estimation of an econometric model: time series, cross sectional data, and panel data.
Time Series
In time series, data are observations on a variable over time. For example:
magnitudes from national accounts such as consumption, imports, income, etc. The
chronological ordering of observations provides potentially important information.
Consequently, ordering matters.
Time series data cannot be assumed to be independent across time. Most economic
series are related to their recent histories. Typical examples include macroeconomic
aggregates such as prices and interest rates. This type of data is characterized by serial
dependence.
Given that most aggregated economic data are only available at a low frequency
(annual, quarterly or perhaps monthly), the sample size can be much smaller than in
typical cross sectional studies. The exception is financial data where data are available at
a high frequency (weekly, daily, hourly, etc.) and so sample sizes can be quite large.
Cross Sectional Data
Cross sectional data sets have one observation per individual and data are referred
to a determined point in time. In most studies, the individuals surveyed are individuals
(for example, in the Labor Force Survey (EPA) more than 100000 individuals are
interviewed every quarter), households (for example, the Household Budget Survey),

13
INTRODUCTION TO ECONOMETRICS

firms (for example, industrial firm survey) or other economic agents. Surveys are a typical
source for cross-sectional data. In many contemporary econometric cross sectional
studies the sample size is quite large.
In cross sectional data, observations must be obtained by random sampling. Thus,
cross sectional observations are mutually independent. The ordering of observations in
cross sectional data does not matter for econometric analysis. If the data are not obtained
with a random sample, we have a sample selection problem.
So far we have referred to micro data type, but there may also be cross sectional
data relating to aggregate units such as countries, regions, etc. Of course, data of this type
are not obtained by random sampling.
Panel Data
Panel data (or longitudinal data) are time series for each cross sectional member
in a data set. The key feature is that the same cross sectional units are followed over a
given time period. Panel data combines elements of cross sectional and time series data.
These data sets consist of a set of individuals (typically people, households, or
corporations) surveyed repeatedly over time. The common modeling assumption is that
the individuals are mutually independent of one another, but for a given individual,
observations are mutually dependent. Thus, the ordering in the cross section of a panel
data set does not matter, but the ordering in the time dimension matters a great deal. If we
do not take into account the time in panel data, we say that we are using pooled cross
sectional data.

14
2 THE SIMPLE REGRESSION MODEL: ESTIMATION
AND PROPERTIES

2.1 Some definitions in the simple regression model


2.1.1 Population regression model and population regression function
In the simple regression model, the population regression model or, simply, the
population model is the following:
y =β1 + β 2 x + u (2-1)
We shall look at the different elements of the model (2-1) and the terminology
used to designate them. We are going to consider that there are three types of variables in
the model: y, x and u. In this model there is only one factor x to explain y. All the other
factors that affect y are jointly captured by u.
We typically refer to y as the endogenous (from the Greek: generated inside)
variable or dependent variable. Other denominations are also used to designate y: left-
hand side variable, explained variable, or regressand. In this model all these
denominations are equivalent, but in other models, as we will see later on, there can be
some differences.
In the simple linear regression of y on x, we typically refer to x as the exogenous
(from the Greek: generated outside) variable or independent variable. Other
denominations are also used to designate x: right-hand side variable, explanatory variable,
regressor, covariate, or control variable. All these denominations are equivalent, but in
other models, as we will see later, there can be some differences.
The variable u represents factors other than x that affect y. It is denominated error
or random disturbance. The disturbance term can also capture measurement error in the
dependent variable. The disturbance is an unobservable variable.
The parameters β1 and β 2 are fixed and unknown.
On the right hand of (2-1) we can distinguish two parts: the systematic component
β1 + β 2 x and the random disturbance u. Calling µ y to the systematic component, we can
write:
µ=
y β1 + β 2 x (2-2)

15
INTRODUCTION TO ECONOMETRICS

This equation is known as the population regression function (PRF) or population


line. Therefore, as can be seen in figure 2.1, µ y is a linear function of x with intercept β1
and slope β 2 .
The linearity means that a one-unit increase in x changes the expected value of y -
my = E ( y ) - by β1 units.
Now, let us suppose we have a random sample of size n {(y i ,x i ): i = 1,…,n} from
the studied population. In figure 2.2 the scatter diagram, corresponding to these data, have
been displayed.

y y



 



 
 

x x
FIGURE 2.1. The population regression function. FIGURE 2.2. The scatter diagram.
(PRF)

We can express the population model for each observation of the sample:
yi =β1 + β 2 xi + ui i =1, 2,  , n (2-3)
In figure 2.3 the population regression function and the scatter diagram are put
together, but it is important to keep in mind that although β1 and β 2 are fixed, they are
unknown. According to the model, it is possible to make the following decomposition
from a theoretical point of view:
yi =µ yi + ui i =1, 2,  , n (2-4)
which is represented in figure 2.3 for the ith observation. However, from an empirical
point of view, it is not possible because β1 and β 2 are unknown parameters and ui is not
observable.

2.1.2 Sample regression function


The basic idea of the regression model is to estimate the population parameters,
β 2 and β1 , from a given sample.
The sample regression function (SRF) is the sample counterpart of the population
regression function (PRF). Since the SRF is obtained for a given sample, a new sample
will generate different estimates.
The SRF, which is an estimation of the PRF, given by

16
THE SIMPLE REGRESSION MODEL

yˆ=i βˆ1 + βˆ2 xi (2-5)

allows us to calculate the fitted value ( yˆi ) for y when x = xi . In the SRF β̂1 and β̂ 2 are
estimators of the parameters β1 and β 2 . For each x i we have an observed value ( yi ) and
a fitted value ( yˆi ).

The difference between yi and yˆi is called the residual uˆi :

uˆi = yi − yˆi = yi − βˆ1 − βˆ2 xi (2-6)

In other words, the residual uˆi is the difference between the sample value yi and
the fitted value of yˆi , as can be seen in figure 2.4. In this case, it is possible to calculate
the decomposition:
y=
i yˆi + uˆi
for a given sample.

y y
 
μy yˆi
 
 
 

   
yi  yi 


 uˆi 

ui
  yˆi  
   
μyi 


xi x xi x

FIGURE 2.3. The population regression function FIGURE 2.4. The sample regression function
and the scatter diagram. and the scatter diagram.

To sum up, β̂1 , β̂ 2 , yˆi and uˆi are the sample counterpart of β1 , β 2 , µ yi and ui
respectively. It is possible to calculate β̂1 and β̂ 2 for a given sample, but the estimates
will change for each sample. On the contrary, β1 and β 2 are fixed, but unknown.

2.2 Obtaining the Ordinary Least Squares (OLS) Estimates


2.2.1 Different criteria of estimation
Before obtaining the least squares estimators, we are going to examine three
alternative methods to illustrate the problem in hand. What these three methods have in
common is that they try to minimize the residuals as a whole.
Criterion 1

17
INTRODUCTION TO ECONOMETRICS

The first criterion takes as estimators those values of β̂1 and β̂ 2 that make the
sum of all the residuals as near to zero as possible. According to this criterion, the
expression to minimize would be the following:
n
Min ∑ uˆ
i =1
i (2-7)

The main problem of this procedure is that the residuals of different signs can be
compensated. Such a situation can be observed graphically in figure 2.5, in which three
aligned observations are graphed, ( x1 , y1 ), ( x2 , y2 ) and ( x3 , y3 ). In this case the following
happens:
y2 − y1 y3 − y1
=
x2 − x1 x3 − x1

x1 x2 x3 x
FIGURE 2.5. The problems of criterion 1.

If a straight line is fitted so that it passes through the three points, each one of the
residuals will take value zero, and therefore
3

∑ uˆ
i =1
i =0


3
This fit could be considered optimal. But it is also possible to obtain uˆ = 0 ,
i =1 i

by rotating the straight line - from the point x2 , y2 - in any direction, as figure 2.5 shows,


3
because uˆ3 = −uˆ1 . In other words, by rotating this way the result uˆ = 0 is always
i =1 i

obtained. This simple example shows that this criterion is not appropriate for the
estimation of the parameters given that, for any set of observations, an infinite number of
straight lines exist, satisfying this criterion.
Criterion 2
In order to avoid the compensation of positive residuals with negative ones, the
absolute values from the residuals are taken. In this case, the following expression would
be minimized:

18
THE SIMPLE REGRESSION MODEL

n
Min ∑ uˆi (2-8)
i =1

Unfortunately, although the estimators thus obtained have some interesting


properties, their calculation is complicated and requires resolving the problem of linear
programming or applying a procedure of iterative calculation.
Criterion 3
A third procedure is to minimize the sum of the square residuals, that is to say,
n
Min S = Min ∑ uˆi2 (2-9)
i =1

The estimators obtained are denominated least square estimators (LS), and they
enjoy certain desirable statistical properties, which will be studied later on. On the other
hand, as opposed to the first of the examined criteria, when we square the residuals their
compensation is avoided, and the least square estimators are simple to obtain, contrary to
the second of the criteria. It is important to indicate that, from the moment we square the
residuals, we proportionally penalize the bigger residuals more than the smaller ones (if
a residual is double the size of another one, its square will be four times greater). This
characterizes the least square estimation with respect to other possible procedures.

2.2.2 Application of least square criterion


Now, we are going to look at the process of obtaining the LS estimators. The
objective is to minimize the residual sum of the squares (S). To do this, we are firstly
going to express S as a function of the estimators, using (2-6):
Therefore, we must
T n

ˆ ˆ ∑ t ˆ ˆ ∑
=
Min S Min= uˆ 2 Min ( yi − βˆ1 − βˆ2 xi ) 2 (2-10)
β1 , β 2
ˆ ˆ β1 , β 2 β1 , β 2
= t 1= i 1

To minimize S, we differentiate partially with respect to β̂1 and β̂ 2 :

∂S n
=−2∑ ( yi − βˆ1 − βˆ2 xi )
∂βˆ1 i =1

∂S n
=−2∑ ( yi − βˆ1 − βˆ2 xi )xi
∂βˆ2 i =1

The LS estimators are obtained by equaling the previous derivatives to 0:


n

∑(y
i =1
i − βˆ1 − βˆ2 xi ) =
0 (2-11)

∑ ( y − βˆ
i =1
i 1 − βˆ2 xi )xi =
0 (2-12)

The equations (2-11) are denominated normal equations or LS first order


conditions.

19
INTRODUCTION TO ECONOMETRICS

In operations with summations, the following rules must be taken into account:
n

∑ a = na
i =1
n n

i
=i 1 =i 1
∑ ax = a ∑ xi
n n n

=i 1
∑ ( xi + yi )= =i 1 =i 1
∑ xi + ∑ yi
Operating with the normal equations, we have
n n

i
=i 1 =i 1
∑=
y nβˆ1 + βˆ2 ∑ xi (2-13)

n n n

=i 1

= yx i i 1 i 2
=i 1 =i 1
βˆ ∑ x + βˆ ∑x 2
i
(2-14)

Dividing both sides of (2-13) by n, we have


y= βˆ1 + βˆ2 x (2-15)
Therefore
βˆ1= y − βˆ2 x (2-16)

Substituting this value of β̂1 in the second normal equation (2-14), we have
n n n

=i 1
∑ yi xi =
( y − βˆ2 x )∑ xi + βˆ2 ∑ xi2
=i 1 =i 1

n n n n

∑ yi xi =y ∑ xi − βˆ2 x ∑ xi + βˆ2 ∑ xi2


=i 1 =i 1 =i 1 =i 1

Solving for β̂ 2 we have:


n n

i i i ∑ y x − y∑ x
2 n
βˆ =
=i 1 =i 1
n
(2-17)
2
i
=i 1 =i 1
i ∑x − x∑ x

Or, as can be seen in appendix 2.1,


n

∑ ( y − y )( x − x )i i
βˆ2 = i =1
n
(2-18)
∑ (x − x )
i =1
i
2

Dividing the numerator and denominator of (2-18) by n, it can be seen that β̂ 2 is


equal to the ratio between the two variables covariance and variance of x. Therefore, the
sign of β̂ 2 is the same as the sign of the covariance.

20
THE SIMPLE REGRESSION MODEL

Once β̂ 2 is calculated, then we can obtain β̂1 by using (2-16).


These are the LS estimators. Since other more complicated methods exist, also
called least square methods, the method that we have applied is denominated ordinary
least square (OLS), due to its simplicity.

In the precedent epigraphs β̂1 and β̂ 2 have been used to designate generic
estimators. From now on, we will only designate OLS estimators with this notation.
EXAMPLE 2.1 Estimation of the consumption function
Given the Keynesian consumption function,
β1 + β 2 inc + ui
cons =
we will estimate it using data from six households that appear in table 2.1.
TABLE 2.1. Data and calculations to estimate the consumption function.
(consi − cons )
Observ. consi inci consi × inci inc2
i consi − cons inci − inc × (inci − inc) 2
(inci − inc)
1 5 6 30 36 -4 -5 20 25
2 7 9 63 81 -2 -2 4 4
3 8 10 80 100 -1 -1 1 1
4 10 12 120 144 1 1 1 1
5 11 13 143 169 2 2 4 4
6 13 16 208 256 4 5 20 25
Sums 54 66 644 786 0 0 50 60

Calculating cons and inc , and applying the formula (2-17), or alternatively (2-18), for the data
table 2.1, we obtain
54 66 644 − 9 × 66  50 
= = 9 ; inc
cons = = 11 ; = (2-17): βˆ2 = 0.83 ; (2-18): β= ˆ = 0.83
786 − 11× 66
2
6 6 60
 
Then by applying (2-16), we obtain β1 =−
ˆ 9 0.83 × 11 = −0.16

2.3 Some characteristics of OLS estimators


2.3.1 Algebraic implications of the estimation
The algebraic implications of the estimation are derived exclusively from the
application of the OLS procedure to the simple linear regression model:
1. The sum of the OLS residuals is equal to 0:
n

∑ uˆ
i =1
i =0 (2-19)

From the definition of residual


uˆi = yi − yˆi = yi − βˆ1 − βˆ2 xi i = 1, 2, , n (2-20)
If we sum up the n observations, we get

21
INTRODUCTION TO ECONOMETRICS

n n

i
=i 1 =i 1
∑ u=
ˆ ∑ ( y − βˆ i 1 − βˆ2 x=
i) 0 (2-21)

which is precisely the first equation (2-11) of the system of normal equations.
Note that, if (2-19) holds, it implies that
n n

∑ yi = ∑ yˆi
=i 1 =i 1
(2-22)

and, dividing (2-19) and (2-22) by n, we obtain


=uˆ 0=y yˆ (2-23)
2. The OLS line always goes through the mean of the sample ( x , y ).
Effectively, dividing the equation (2-13) by n, we have:
y= βˆ1 + βˆ2 x (2-24)
3. The sample cross product between each one of the regressors and the OLS
residuals is zero.
That is to say,
n

∑ x uˆ
i =1
i i =0 (2-25)

We can see that (2-25) is equal to the second normal equation,


n n

i i∑=
=i 1 =i 1
x uˆ ∑ x ( y i i − βˆ1 − βˆ=
2 xi ) 0

given in (2-12).
4. The sample cross product between the fitted values ( ŷ ) and the OLS residuals
is zero.
That is to say,
n

∑ yˆ uˆ
i =1
i ´i =0 (2-26)

Proof
Taking into account the algebraic implications 1 -(2-19)- and 3 -(2-25)-, we have
n n n n

=i 1 =i 1
∑ yˆiuˆ´i = ∑ (βˆ1 + βˆ2 xi )uˆ´i = βˆ1 ∑ uˆ´i + βˆ2 ∑ xiuˆ´i = βˆ1 × 0 + βˆ2 × 0 = 0
=i 1 =i 1

2.3.2 Decomposition of the variance of y


By definition
y=
i yˆi + uˆi (2-27)

22
THE SIMPLE REGRESSION MODEL

Subtracting y on both sides of the previous expression (remember that ŷ is


equal to y ), we have

yi − y = yˆi − yˆ + uˆi
Squaring both sides:

[ yi − y ]
2
= ( yˆi − yˆ ) + uˆi  = ( yˆi − yˆ ) 2 + uˆi2 + 2uˆi ( yˆi − yˆ )
2

Summing for all i:

∑[ y − y]= ∑ ( yˆ − yˆ ) 2 + ∑ uˆi2 + 2∑ uˆi ( yˆi − yˆ )


2
i i

Taking into account the algebraic properties 1 and 4, the third term of the right
hand side is equal to 0. Analytically,

∑ uˆ ( yˆ − =
i yˆ ) ∑ uˆ yˆ − yˆ ∑=
i uˆ i i i 0 (2-28)
Therefore, we have

∑[ y − y]= ∑ ( yˆ − yˆ ) 2 + ∑ uˆi2
2
i i
(2-29)
In words,
Total sum of squares (TSS) =
Explained sum of squares (ESS)+Residual sum of squares (RSS)
It must be stressed that it is necessary to use the relation (2-19) to assure that (2-28)
is equal to 0. We must remember that (2-19) is associated to the first normal equation:
that is to say, to the equation corresponding to the intercept. If there is no intercept in the
fitted model, then in general the decomposition obtained will not be fulfilled (2-29).
This decomposition can be made with variances, by dividing both sides of (2-29)
by n:

∑=
( y − y ) ∑ ( yˆ − yˆ ) ∑ uˆ
2 2 2
i
+ i i
(2-30)
n n n
In words,
Total variance=explained variance+ residual variance

2.3.3 Goodness of fit: Coefficient of determination (R2)


A priori we have obtained the estimators minimizing the sum of square residuals.
Once the estimation has been done, we can see how well our sample regression
line fits our data.
The measures that indicate how well the sample regression line fits the data are
denominated goodness of fit measures. We are going to look at the most well-known
measure, which is called coefficient of determination or the R-square ( R 2 ). This measure
is defined in the following way:

23
INTRODUCTION TO ECONOMETRICS

∑ ( yˆ − yˆ )
i
2

R2 = i =1
n
(2-31)
∑ ( y − y)
i =1
i
2

Therefore, R 2 is the proportion of the total sum of squares (TSS) which is


explained by the regression (ESS): that is to say, which is explained by the model. We
can also say that 100 R 2 is the percentage of the sample variation in y explained by x.
Alternatively, taking into account (2-29), we have:

∑ ( yˆ − yˆ ) = ∑ ( y − y ) − ∑ uˆ
i
2
i
2 2
i

Substituting in (2-31), we have


n

∑ ( yˆ − yˆ )
i
2

∑ ( y − y ) − ∑ uˆ
2 2
∑ uˆi2 = RSS
R =
2 i =1
n
= n
i i
=
1− n 1− (2-32)
∑ ( yi − y )2 ∑ ( y − y) ∑ ( yi − y )
2 2 TSS
i
=i 1 =i 1 =i 1

Therefore, R 2 is equal to 1 minus the proportion of the total sum of squares (TSS)
that is non-explained by the regression (RSS).
According to the definition of R 2 , the following must be accomplished
0 ≤ R2 ≤ 1
Extreme cases:
a) If we have a perfect fit, then u=
ˆí 0 ∀i . This implies that

yˆí = yí ∀i ⇒ ∑ ( yˆi − yˆ ) 2 = ∑ ( y − y)
i
2
⇒ R2 = 1

b) If =
yˆí c ∀i , it implies that

yˆ = c ⇒ yˆi − yˆ = c − c = 0 ∀i ⇒ ∑ ( yˆi − yˆ ) 2 = 0 ⇒ R2 = 0

If R 2 is close to 0, it implies that we have a poor fit. In other words, very little
variation in y is explained by x.
In many cases, a high R 2 is obtained when the model is fitted using time series
data, due to the effect of a common trend. On the contrary, when we use cross sectional
data a low value is obtained in many cases, but it does not mean that the fitted model is
bad.
What is the relationship between the coefficient of determination and the
coefficient of correlation studied in descriptive statistics? The coefficient of
determination is equal to the squared coefficient of correlation, as can be seen in appendix
2.2:
rxy2 = R 2 (2-33)

24
THE SIMPLE REGRESSION MODEL

(This equality is only valid in the simple regression model, but not in multiple
regression model).
EXAMPLE 2.2 Fulfilling algebraic implications and calculating R2 in the consumption function
·
In column 2 of table 2.2, consi is calculated; in columns 3, 4 and 5, you can see the fulfillment of
algebraic implications 1, 3 and 4 respectively. The remainder of the columns shows the calculations to
obtain
41.67
TSS = 42 ESS = 41.67 RSS = 42 − 41.67 = 0.33 R2 = = 0.992
42
0.33
or, alternatively, R 2 = 1− = 0.992
42
TABLE 2.2. Data and calculations to estimate the consumption function.

Observ. ·
consi
uˆi uˆi × inci · ´ uˆ
consi i
consi2 (consi − cons ) 2 · i2
cons · - cons
(cons · )2
i

1 4.83 0.17 1.00 0.81 25 16 23.36 17.36


2 7.33 -0.33 -3.00 -2.44 49 4 53.78 2.78
3 8.17 -0.17 -1.67 -1.36 64 1 66.69 0.69
4 9.83 0.17 2.00 1.64 100 1 96.69 0.69
5 10.67 0.33 4.33 3.56 121 4 113.78 2.78
6 13.17 -0.17 -2.67 -2.19 169 16 173.36 17.36
54.00 0.00 0.00 0.00 528 42 527.67 41.67

2.3.4 Regression through the origin


If we force the regression line to pass through the point (0,0), we are constraining
the intercept to be zero, as can be seen in figure 2.6. This is called a regression through
the origin.

y



 



 
 

x
FIGURE 2.6. A regression through the origin.
Now, we are going to estimate a regression line through the origin. The fitted
model is the following:
yi = β2 xi (2-34)
Therefore, we must minimize

25
INTRODUCTION TO ECONOMETRICS

n
=
Min

S Min

β2
∑ ( yi − β2 xi )2
β2
i =1
(2-35)

To minimize S, we differentiate with respect to β2 and equal to 0:


n
dS
−2∑ ( yi − β2 xi )xi =
= 0 (2-36)
d β2 i =1

Solving for β2


n

∑yx i i
β2 = i =1
n
(2-37)
∑x
i =1
2
i

Another problem with fitting a regression line through the origin is that the
following generally happens:

∑( y − y ) ≠ ∑ ( yˆi − yˆ ) 2 + ∑ uˆi2
2
i

If the decomposition of the variance of y in two components (explained and


residual) is not possible, then the R2 is meaningless. This coefficient can take values that
are negative or greater than 1 in the model without intercept.
To sum up, an intercept must always be included in the regressions, unless there
are strong reasons against it supported by the economic theory.

2.4 Units of measurement and functional form


2.4.1 Units of Measurement

Changing the units of measurement (change of scale) in x


If x is multiplied/divided by a constant, c≠0, then the OLS slope is
divided/multiplied by the same constant, c. Thus
 βˆ 
βˆ1 +  2  ( xi × c)
yˆi = (2-38)
c 
EXAMPLE 2.3
Let us suppose the following estimated consumption function, in which both variables are
measured in thousands of euros:
· i = 0.2 + 0.85´ inc
cons (2-39)
i

If we now express income in euros (multiplication by 1000) and call it ince, the fitted model with
the new units of measurement of income would be the following:
·
consi =
0.2 + 0.00085 × incei
As can be seen, changing the units of measurement of the explanatory variable does not affect the
intercept.

26
THE SIMPLE REGRESSION MODEL

Changing the units of measurement (change of scale) in y


If y is multiplied/divided by a constant, c≠0, then the OLS slope and intercept are
both multiplied/divided by the same constant, c. Thus,
( yˆi × c) = ( βˆ1 × c) + ( βˆ2 × c) xi (2-40)
EXAMPLE 2.4
If we express consumption in euros (multiplication by 1000) in model (2-39), and call it conse,
the fitted model with the new units of measurement of consumption would be the following:
·
consei = 200 + 850 × inci

Changing the origin


If one adds/subtracts a constant d to x and/or y, then the OLS slope is not affected.
However, changing the origin of either x and/or y affects the intercept of the regression.
If one subtracts a constant d to x, the intercept will change in the following way:
yˆi = ( βˆ1 + βˆ2 × d ) + βˆ2 ( xi − d ) (2-41)
If one subtracts a constant d to y, the intercept will change in the following way:
yˆ − d = ( βˆ − d ) + βˆ x
i 1 2 i (2-42)
EXAMPLE 2.5
Let us suppose that the average income is 20 thousand euros. If we define the variable
=
incd i inci − inc and both variables are measured in thousands of euros, the fitted model with this change
in the origin will be the following:
· i = (0.2 + 0.85 × 20) + 0.85 × (inc − 20) = 17.2 + 0.85 × incd
cons i i

EXAMPLE 2.6
Let us suppose that the average consumption is 15 thousands euros. If we define the variable
=
consd i consi − cons and both variables are measured in euros, the fitted model with this change in the
origin will be the following:
· i − 15 = 0.2 − 15 + 0.85 × inc
cons i

that is to say,
·
consd i = - 14.8 + 0.85´ inci

Note that R2 is invariant to changes in the units of x and/or y, and also is invariant to the origin of
the variables.

2.4.2 Functional Form


In many cases linear relationships are not adequate for economic applications.
However, in the simple regression model we can incorporate nonlinearities (in variables)
by appropriately redefining the dependent and independent variables.
Some definitions
Now we are going to look at some definitions of variation measures that will be
useful in the interpretation of the coefficients corresponding to different functional forms.
Specifically, we will look at the following: proportional change and change in logarithms.

27
INTRODUCTION TO ECONOMETRICS

The proportional change (or relative variation rate) between x1 and x0 is given
by:
∆x1 x1 − x0
= (2-43)
x0 x0
Multiplying a proportional change by 100, we obtain a proportional change in %.
That is to say:
∆x1
100 % (2-44)
x0

The change in logarithms and change in logarithms in % between x1 and x0 are


given by
∆ ln( x) = ln( x1 ) − ln( x0 )
(2-45)
100∆ ln( x)%
The change in logarithms is an approximation to the proportional change, as can
be seen in appendix 2.3. This approximation is good when the proportional change is
small, but the differences can be important when the proportional change is big, as can
seen in table 2.3.
TABLE 2.3. Examples of proportional change and change in logarithms.
x1 202 210 220 240 300
x0 200 200 200 200 200
Proportional change in % 1% 5.0% 10.0% 20.0% 50.0%
Change in logarithms in % 1% 4.9% 9.5% 18.2% 40.5%

Elasticity is the ratio of the relative changes of two variables. If we use


proportional changes, the elasticity of the variable y with respect to the variable x is given
by
∆y / y0
ε y/ x = (2-46)
∆x / x0
If we use changes in logarithms and consider infinitesimal changes, then the
elasticity of the variable y with respect to a variable x is given by
dy / y d ln( y )
ε y/ x
= = (2-47)
dx / x d ln( x)
In econometric models, elasticity is generally defined by using (2-47).
Alternative functional forms
The OLS method can also be applied to models in which the endogenous variable
and/or the exogenous variable have been transformed. In the presentation of the model
(2-1) we said that the exogenous variable and regressor were equivalent terms. But from
now on, a regressor is the specific form in which an exogenous variable appears in the
equation. For example, in the model
β1 + β 2 ln( x) + u
y=

28
THE SIMPLE REGRESSION MODEL

the exogenous variable is x, but the regressor is ln(x).


In the presentation of the model (2-1) we also said that the endogenous variable
and the regressand were equivalent. But from now on, the regressand is the specific form
in which an endogenous variable appears in the equation. For example, in the model
ln( y ) =β1 + β 2 x + u
the endogenous variable is y, but the regressand is ln(y).
Both models are linear in the parameters, although they are not linear in the
variable x (the first one) or in the variable y (the second one). In any case, if a model is
linear in the parameters, it can be estimated by applying the OLS method. On the contrary,
if a model is not linear in the parameters, iterative methods must be used in the estimation.
However, there are certain nonlinear models which, by means of suitable
transformations, can become linear. These models are denominated linearizables.
Thus, on some occasions potential models are postulated in economic theory, such
as the well-known Cobb-Douglas production function. A potential model with a unique
explanatory variable is given by
y = e β1 x β2
If we introduce the disturbance term in a multiplicative form, we obtain:
y = e β1 x β2 eu (2-48)
Taking natural logarithms on both sides of (2-48), we obtain a linear model in the
parameters:
β1 + β 2 ln( x) + u
ln( y ) = (2-49)
On the contrary, if we introduce the disturbance term in an additive form, we
obtain
=y e β1 x β2 + u
In this case, there is no transformation which allows this model to be turned into a linear
model. This is a non-linearizable model.
Now we are going to consider some models with alternative functional forms, all
of which are linear in the parameters. We will look at the interpretation of the coefficient
β̂ 2 in each case.

a) Linear model
The β̂ 2 coefficient measures the effect of the regressor x on y. Let us look at this
in detail. The observation i of the sample regression function is given according to (2-5)
by
yˆ=i βˆ1 + βˆ2 xi (2-50)
Let us consider the observation h of the fitted model whereupon the value of the
regressor and, consequently, of the regressand has changed with respect to (2-50):

29
INTRODUCTION TO ECONOMETRICS

yˆ=
h βˆ1 + βˆ2 xh (2-51)
Subtracting (2-51) from (2-50), we see that x has a linear effect on ŷ :

∆ŷ = βˆ1∆x (2-52)

where ∆yˆ = yˆi − yˆ h and ∆x = xi − xh

Therefore, β̂ 2 is the change in y (in the units in which y is measured) by a unit


change of x (in the units in which x is measured).
For example, if income increases by 1 unit, consumption will increase by 0.85
units in the fitted function (2-39).
The linearity of this model implies that a one-unit change in x always has the same
effect on y, regardless of the value of x considered.
EXAMPLE 2.7 Quantity sold of coffee as a function of its price. Linear model
In a marketing experiment 1 the following model has been formulated to explain the quantity sold
of coffee per week (coffqty) as a function of the price of coffee (coffpric).
β1 + β 2 coffpric + u
coffqty =
The variable coffpric takes the value 1 for the usual price, and also 0.95 and 0.85 in two price
actions whose effects are under investigation. This experiment lasted 12 weeks. coffqty is expressed in
thousands of units and coffpric in French francs. Data appear in table 2.4 and in work file coffee1.
The fitted model is the following:
·
coffqty = 774.9 - 693.33coffpric R 2 = 0.95 n = 12

TABLE 2.4. Data on quantities and prices of coffee.


week coffpric coffqty

1 1.00 89
2 1.00 86
3 1.00 74
4 1.00 79
5 1.00 68
6 1.00 84
7 0.95 139
8 0.95 122
9 0.95 102
10 0.85 186
11 0.85 179
12 0.85 187

Interpretation of the coefficient β̂ 2 : if the price of coffee increases by 1 French franc, the
quantity sold of coffee will decrease by 693.33 thousands of units. As the price of coffee is a small

1
The data of this exercise were obtained from a controlled marketing experiment in stores in Paris
on coffee expenditure, as reported in A. C. Bemmaor and D. Mouchoux, “Measuring the Short-Term Effect
of In-Store Promotion and Retail Advertising on Brand Sales: A Factorial Experiment”, Journal of
Marketing Research, 28 (1991), 202–14.

30
THE SIMPLE REGRESSION MODEL

magnitude, the following interpretation is preferable: if the price of coffee increases by 1 cent of a French
franc, the quantity sold will decrease by 6.93 thousands of units.

EXAMPLE 2.8 Explaining market capitalization of Spanish banks. Linear model


Using data from Bolsa de Madrid (Madrid Stock Exchange) on August 18, 1995 (file bolmad95,
the first 20 observations), the following model has been estimated to explain the market capitalization of
banks and financial institutions:
·
marktval = 29.42 + 1.219bookval
R2=0.836 n=20
where
- marktval is the capitalization the market value of a company. It is calculated by multiplying the
price of the stock by the number of stocks issued.
- bookval is the book value or the net worth of the company. The book value is calculated as the
difference between a company's assets and its liabilities.
- Data on marktval and bookval are expressed in millions of pesetas.
Interpretation of the coefficient β 2 : if the book value of a bank increases by 1 million pesetas, the
market capitalization of this bank will increase by 1.219 million of pesetas.

b) Linear-log model
A linear-log model is given by
β1 + β 2 ln( x) + u
y= (2-53)
The corresponding fitted function is the following:
yˆ βˆ1 + βˆ2 ln( x)
= (2-54)
Taking first order differences in (2-54), and then multiplying and dividing the
right hand side by 100, we have
βˆ2
=∆yˆ 100 × ∆ ln( x)%
100
Therefore, if x increases by 1%, then ŷ will increase by ( βˆ2 /100) units.

c) Log-linear model
A log-linear model is given by
ln( y ) =β1 + β 2 x + u (2-55)
The above model can be obtained by taking natural logs on both sides of the
following model:
y exp( β1 + β 2 x + u )
=
. For this reason, the model (2-55) is also called exponential.
The corresponding sample regression function to (2-55) is the following:
· y ) = bˆ + bˆ x
ln( (2-56)
1 2

31
INTRODUCTION TO ECONOMETRICS

Taking first order differences in (2-56), and then multiplying both sides by 100,
we have
· y )% = 100´ bˆ D x
100´ D ln( 2

Therefore, if x increases by 1 unit, then ŷ will increase by 100 β̂ 2 %.

d) Log-log model
The model given in (2-49) is a log-log model or, before the transformation, a
potential model (2-48). This model is also called a constant elasticity model.
The corresponding fitted model to (2-49) is the following:
· y ) = bˆ + bˆ ln( x)
ln( (2-57)
1 2

Taking first order differences in (2-57), we have


· y ) = bˆ D ln( x)
D ln( 2

Therefore, if x increases by 1%, then ŷ will increase by β̂ 2 %. It is important to


remark that, in this model, β̂ 2 is the estimated elasticity of y with respect to x, for any
value of x and y. Consequently, in this model the elasticity is constant.
In annex 1 in a study case on the Engel curve for demand of dairy, six alternative
functional forms are analyzed.
EXAMPLE 2.9 Quantity sold of coffee as a function of its price. Log- log model (Continuation example
2.7)
As an alternative to the linear model the following log-log model has been fitted:
·
ln(coffqty ) = 4.415 - 5.132 ln(coffpric) R 2 = 0.90 n = 12

Interpretation of the coefficient β̂ 2 : if the price of coffee increases by 1%, the quantity sold of
coffee will decrease by 5.13%. In this case β̂ 2 is the estimated demand/price elasticity.

EXAMPLE 2.10 Explaining market capitalization of Spanish banks. Log-log model (Continuation
example 2.8)
Using data from example 2.8, the following log-log model has been estimated:
·
ln( marktval ) = 0.6756 + 0.938ln(bookval )
R2=0.928 n=20
Interpretation of the coefficient β̂ 2 : if the book value of a bank increases by 1%, the market
capitalization of this bank will increase by 0.938%. In this case β̂ 2 is the estimated market value/book
value elasticity.
In table 2.5 and for the fitted model, the interpretation of β̂ 2 in these four models is shown. If we
are considering the population model, the interpretation of β 2 is the same but taking into account that ∆u
must be equal to 0.

32
THE SIMPLE REGRESSION MODEL

TABLE 2.5. Interpretation of β̂ 2 in different models.


Model If x increases by then y will increase by
linear 1 unit β̂ 2 units
linear-log 1% ( βˆ2 / 100) units
log-linear 1 unit (100 βˆ )%
2

log-log 1% βˆ2 %

2.5 Assumptions and statistical properties of OLS


We are now going to study the statistical properties of OLS estimators β̂1 and β̂ 2
. But first we need to formulate a set of statistical assumptions. Specifically, the set of
assumptions that we are going to formulate are called classical linear model assumptions
(CLM). It is important to note that CLM assumptions are simple and that the OLS
estimators have, under these assumptions, very good properties.

2.5.1 Statistical assumptions of the CLM in simple linear regression

a) Assumption on the functional form


1) The relationship between the regressand, the regressor and the random
disturbance is linear in the parameters:
y =β1 + β 2 x + u (2-58)
The regressand and the regressors can be any function of the endogenous variable and
the explanatory variables, respectively, provided that among regressors and regressand
there is a linear relationship, i.e. the model is linear in the parameters. The additivity of
the disturbance guarantees the linear relationship with the rest of the elements.

b) Assumptions on the regressor x


2) The values of x are fixed in repeated sampling:
According to this assumption, each observation of the regressor takes the same
value for different samples of the regressand. This is a strong assumption in the case of
the social sciences, where in general it is not possible to experiment. Data are obtained
by observation, not by experimentation. It is important to remark that the results obtained
using this assumption would remain virtually identical if we assume the repressors are
stochastic, provided the additional assumption of independence between the regressors
and the random disturbance is fulfilled. This alternative assumption can be formulated as:
2*) The regressor x is distributed independently of the random disturbance.
In any case, throughout this chapter and the following ones we will adopt
assumption 2.
3) The regressor x does not contain measurement errors
This is an assumption that is not often fulfilled in practice, since the measurement
instruments are unreliable in economy. Think, for example, of the multitude of errors that
can be made in the collection of information, through surveys on families.

33
INTRODUCTION TO ECONOMETRICS

4) The sample variance of x is different from 0 and has a finite limit as n tends to
infinity
Therefore, this assumption implies that
n

∑(x − x )
2
i

=S X2 i =1
≠0 (2-59)
n

c) Assumptions on the parameters


5) The parameters β 1 and β 2 are fixed
If this assumption is not adopted, the regression model would be very difficult to
handle. In any case, it may be acceptable to postulate that the model parameters are stable
over time (if it is not a very long period) or space (if it is relatively limited).

d) Assumptions on the random disturbances


6) The disturbances have zero mean,
E=
(ui ) 0,=  i 1, 2,3, …, n (2-60)

This is not a restrictive assumption, since we can always use β1 to normalize E(u)
to 0. Let us suppose, for example, that E (u ) = 4 . We could then redefine the model in the
following way:
y = ( β1 + 4) + β 2 x + v
where v= u − 4 . Therefore, the expectation of the new disturbance, v, is 0 and the
expectation of u has been absorbed by the intercept.
7) The disturbances have a constant variance
=
var (ui ) σ=
2
i 1, 2, n (2-61)
This assumption is called the homoskedasticity assumption. The word comes from
the Greek: homo (equal) and skedasticity (spread). This means that the variation of y
around the regression line is the same across the x values; that is to say, it neither increases
or decreases as x varies. This can be seen in figure 2.7, part a), where disturbances are
homoskedastic.
F(u) F(u)

µy y µy y

x1 x1

x2 x2

xi xi

x x

a) b)
FIGURE 2.7. Random disturbances: a) homoskedastic; b) heteroskedastic.

34
THE SIMPLE REGRESSION MODEL

If this assumption is not satisfied, as happens in part b) of figure 2.7, the OLS
regression coefficients are not efficient. Disturbances in this case are heteroskedastic
(hetero means different).
8) The disturbances with different subscripts are not correlated with each other
(no autocorrelation assumption):
=
E (ui u j ) 0 i≠ j (2-62)
That is, the disturbances corresponding to different individuals or different periods
of time are not correlated with each other. This assumption of no autocorrelation or no
serial correlation, like the previous one, is testable a posteriori. The transgression occurs
quite frequently in models using time series data.
9) The disturbance u is normally distributed
Taking into account assumptions 6, 7 y 8, we have
ui ~ NID(0, σ 2 ) i = 1, 2, , n (2-63)
where NID states for normally independently distributed.
The reason for this assumption is that if u is normally distributed, so will y and the
estimated regression coefficients, and this will be useful in performing tests of hypotheses
and constructing confidence intervals for β 1 and β 2 . The justification for the assumption
depends on the Central Limit Theorem. In essence, this theorem states that, if a random
variable is the composite result of the effects of an indefinite number of variables, it will
have an approximately normal distribution even if its components do not, provided that
none of them is dominant.

2.5.2 Desirable properties of the estimators


Before examining the properties of OLS estimators under the statistical
assumptions of the CLM, we pose the following question: what are the desirable
properties for an estimator?
Two desirable properties for an estimator are that it is unbiased and its variance is
as small as possible. If this occurs, the inference process will be carried out in optimal
conditions.
We will illustrate these properties graphically. Consider first the property of
unbiasedness. In Figures 2.8 and 2.9 the density functions of two hypothetical estimators
obtained by two different methods are shown.

35
INTRODUCTION TO ECONOMETRICS

( )
f bˆ2
f (b%2 )

b̂2(1) ( )
b2 = E bˆ2 b̂2(2) b̂2 b%2(1) b2 E (b%2 ) b%2(2) b%2
FIGURE 2.8. Unbiased estimator. FIGURE 2.9. Biased estimator.

The estimator b̂2 is unbiased, i.e., its expected value is equal to the parameter that
is estimated, β 2 . The estimator b̂2 is a random variable. In each sample of y’s – the x’s
are fixed in a repeated sample according to assumption 2- b̂2 taking a different value, but
on average is equal to the parameter β 2 , bearing in mind the infinite number of values b̂2
can take. In each sample of y’s a specific value of b̂2 , that is to say, an estimation of b̂2
is obtained. In figure 2.8 two estimations of β 2 ( b̂2(1) and b̂2(2) ) are obtained. The first
estimate is relatively close to β 2 , while the second one is much farther away. In any case,
unbiasedness is a desirable property because it ensures that, on average, the estimator is
centered on the parameter value.
The estimator b%2 is biased, since its expectation is not equal to β 2 . The bias is
precisely E (b%2 )- b2 . In this case two hypothetical estimates, b%2(1) and b%2(2) , are
represented in figure 2.9. As can be seen b%2(1) is closer to β 2 than the unbiased estimator
b̂2(1) , but this is a matter of chance. In any case, when it is biased, it is not centered on
the parameter value. An unbiased estimator will always be preferable, regardless of what
happens in a specific sample, because it has no systematic deviation from the parameter
value.
Another desirable property is efficiency. This property refers to the variance of
the estimators. In figures 2.10 and 2.11 two hypothetical unbiased estimators, which are
also called b̂2 and b%2 , are represented. The first one has a smaller variance than the
second one.

36
THE SIMPLE REGRESSION MODEL

( )
f bˆ2 f (b%2 )

b̂2(3) b2 b̂2(4) b̂2 b%2(4) b2 b%2(3) b%2

FIGURE 2.10. Estimator with small variance. FIGURE 2.11. Estimator with big variance.

In both figures we have represented two estimates: b̂2(3) and b̂2(4) for the
estimator with the smallest variance; and b%2(3) and b%2(4) for the estimator with the
greatest variance. To highlight the role played by chance, the estimate that is closer to β 2
is precisely b%2(3) . In any case, it is preferable that the variance of the estimator is as small
as possible. For example, when using the estimator b̂2 it is practically impossible that an
estimate is so far from β 2 as it is in the case of b̂2 , because the range of b̂2 is much
smaller than the range of b% 2

2.5.3 Statistical properties of OLS estimators


Under the above assumptions, the OLS estimators possess some ideal properties.
Thus, we can say that the OLS are the best linear unbiased estimators.

Linearity and unbiasedness of the OLS


The OLS estimator b̂2 is unbiased. In appendix 2.4 we prove that b̂2 is an
unbiased estimator using implicitly assumptions 3, 4 and 5, and explicitly assumptions 1,
2 and 6. In that appendix we can also see that b̂2 is a linear estimator using assumptions
1 and 2.

Similarly, one can show that the OLS estimator b̂1 is also unbiased. Remember that
unbiasedness is a general property of the estimator, but in a given sample we may be
“near” or “far” from the true parameter. In any case, its distribution will be centered at
the population parameter.

Variances of the OLS estimators


Now we know that the sampling distribution of our estimator is centered around
the true parameter. How spread out is this distribution? The variance (which is a measure
of dispersion) of an estimator is an indicator of the accuracy of the estimator.

In order to obtain the variances of β̂1 and β̂ 2 , assumptions 7 and 8 are needed,
in addition to the first six assumptions. These variances are the following:

37
INTRODUCTION TO ECONOMETRICS

n
σ 2 n −1 ∑ xi2
σ2
Var ( βˆ1 ) = n
i =1
Var ( βˆ2 ) = n
(2-64)
∑( x − x ) ∑(x − x )
2 2
i i
i =1 i =1

Appendix 2.5 shows how the variance for β̂ 2 is obtained.


OLS estimators are BLUE
The OLS estimators have the least variance in the class of all linear and unbiased
estimators. For this reason it is said that OLS estimators are the best linear unbiased
estimators (BLUE), as illustrated in figure 2.12. This property is known as the Gauss–
Markov theorem. For proof of this theorem assumptions 1-8 are used, as can be seen in
appendix 2.6. This set of assumptions is known as the Gauss–Markov assumptions.

Estimator
Linear
Unbiased
the Best
βˆ1 , βˆ2
BLUE

FIGURE 2.12. The OLS estimator is BLUE.

Estimating the disturbance variance and the variance of estimators


We do not know what the value of the disturbance variance, σ2, is and thus we
have to estimate it. But we cannot estimate it from the disturbances ui, because they are
not observable. Instead, we have to use the OLS residuals (ûi).

The relation between disturbances and residuals is given by


uˆi = yi − yˆi = β1 + β 2 xi + ui − βˆ1 − βˆ2 xi
( ) ( )
(2-65)
=ui − βˆ1 − β1 − βˆ2 − β 2 xi

Hence ûi is not the same as ui, although the difference between them-

( ) ( )
βˆ1 − β1 − βˆ2 − β 2 xi - does have an expected value of zero. Therefore, a first estimator
of σ2 could be the residual variance:
n

∑ uˆ 2
i
σ 2 = i =1
(2-66)
n
However, this estimator is biased, essentially because it does not account for the
two following restrictions that must be satisfied by the OLS residuals in the simple
regression model:

38
THE SIMPLE REGRESSION MODEL

 n
 ∑ uˆi = 0
 i =1
 n (2-67)
 x uˆ = 0
 ∑ ii
 i =1
One way to view these restrictions is the following: if we know n–2 of the
residuals, we can get the other two residuals by using the restrictions implied by the
normal equations.
Thus, there are only n–2 degrees of freedom in the OLS residuals, as opposed to
n degrees of freedom in the disturbances. In the unbiased estimator of σ2 shown below an
adjustment is made taking into account the degrees of freedom:
n

∑ uˆ 2
i

σˆ 2 = i =1
(2-68)
n−2
Under assumptions 1-8 (Gauss-Markov assumptions), and as can be seen in
appendix 7, we obtain
E (σˆ 2 ) = σ 2 (2-69)

If σˆ 2 is plugged into the variance formulas, we then have unbiased estimators of


var( β̂1 ) and var( β̂ 2 )

The natural estimator of σ is σˆ = σˆ 2 and is called the standard error of the


regression. The square root of the variance of β̂ 2 is called the standard deviation of β̂ 2 ,
that is to say,
σ
sd ( βˆ2 ) = (2-70)
n

∑(x − x )
2
i
i =1

Therefore, its natural estimator is called the standard error of β̂ 2 :


σˆ
se( βˆ2 ) = (2-71)
n

∑(x − x )
2
i
i =1

Note that se( βˆ2 ) , due to the presence of the estimator σˆ in (2-71), is a random
variable as is β̂ 2 . The standard error of any estimate gives us an idea of how precise the
estimator is.

Consistency of OLS and other asymptotic properties


Sometimes it is not possible to obtain an unbiased estimator. In any case
consistency is a minimum requirement for an estimator. According to an intuitive

39
INTRODUCTION TO ECONOMETRICS

approach, consistency means that as n → ∞ , the density function of the estimator


collapses to the parameter value. This property can be expressed for the estimator β̂ 2 as:

plim βˆ2 = β 2 (2-72)


n →∞

where plim means probability limit. In other words, β̂ 2 converges in probability to β 2 .


Note that the properties of unbiasedness and consistency are conceptually
different. The property of unbiasedness can hold for any sample size, whereas consistency
is strictly a large-sample property or an asymptotic property.

Under assumptions 1 through 6, the OLS estimators, β̂1 and β̂ 2 , are consistent.
The proof for β̂ 2 can be seen in appendix 2.8.

Other asymptotic properties of β̂1 and β̂ 2 : Under the Gauss-Markov assumptions


1 through 8, β̂1 and β̂ 2 are asymptotically normally distributed and also asymptotically
efficient within the class of consistent and asymptotically normal estimators.

OLS estimators are maximum likelihood estimators (ML) and minimum variance
unbiased estimators (MVUE)
Now we are going to introduce the assumption 9 on normality of the disturbance
u. The set of assumptions 1 through 9 is known as the classical linear model (CLM)
assumptions.
Under the CLM assumptions, the OLS estimators are also maximum likelihood
estimators (ML), as can be seen in appendix 2.8.
On the other hand, under CLM assumptions, OLS estimators are not only BLUE,
but are the minimum variance unbiased estimators (MVUE). This means that OLS
estimators have the smallest variance among all unbiased, linear o nonlinear, estimators,
as can be seen in figure 2.13. Therefore, we have no longer to restrict our comparison to
estimators that are linear in the y i ’s.

What also happens is that any linear combination of βˆ1 , βˆ2 , βˆ3 ,  , βˆk is also
normally distributed, and any subset of the βˆ ’s has a joint normal distribution.
j

Estimator
Unbiased

βˆ1 , βˆ2 Minimum Variance


MVUE

FIGURE 2.13. The OLS estimator is the MVUE.

In conclusion, we have seen that the OLS estimator has very desirable properties
when the statistical basic assumptions are met.

40
THE SIMPLE REGRESSION MODEL

Exercises
Exercise 2.1 The following model has been formulated to explain the annual sales (sales)
of manufacturers of household cleaning products based as a function of a relative price
index (rpi):
β1 β 2 rpi + u
sales =+
where the variable sales is expressed in a thousand million euros and rpi is an
index obtained as the ratio between the prices of each firm and prices of the firm 1 of the
sample). Thus, the value 110 in firm 2 indicates its price is 10% higher than in firm1.

Data on ten manufacturers of household cleaning products are the following:


firm sales rpi
1 10 100
2 8 110
3 7 130
4 6 100
5 13 80
6 6 80
7 12 90
8 7 120
9 9 120
10 15 90
a) Estimate β 1 and β 2 by OLS.
b) Calculate the RSS.
c) Calculate the coefficient of determination.
d) Check that the algebraic implications 1, 3 and 4 are fulfilled in the OLS
estimation.
Exercise 2.2 To study the relationship between fuel consumption (y) and flight time (x)
of an airline, the following model is formulated:
y =β1 + β 2 x + u
where y is expressed in thousands of pounds and x in hours, using fractions of an hour as
units of low-order decimal.
The statistics of "Flight times and fuel consumption" of an airline provides data
on flight times and fuel consumption of 24 different trips made by an aircraft of the
company. From these data the following statistics were drawn:
∑y i = 219.719; ∑xi = 31.470; ∑x 2
i = 51.075;

∑x yi i = 349.486; ∑y 2
i = 2396.504
a) Estimate β 1 and β 2 by OLS.
b) Decompose the variance of the variable y invariance explained by the
regression and residual variance.
c) Calculate the coefficient of determination.
d) Estimate total consumption, in thousands of pounds, for a flight program
consisting of 100 half-hour flights, 200 one hour flights and 100 two hours
flights.

41
INTRODUCTION TO ECONOMETRICS

Exercise 2.3 An analyst formulates the following model:


y =β1 + β 2 x + u
Using a given sample, the following results were obtained:
n n

∑ ( xi − x )( yi − y ) ∑ (x − x )
i
2
y =8
i =1
= 20 i =1
= 10 x =4
n n
βˆ = 3
2

Are these results consistent? Explain your answer.


Exercise 2.4 An econometrician has estimated the following model with a sample of five
observations:
yi = b1 + b2xi + ui
Once the estimation has been made, the econometrician loses all information
except what appears in the following table:
Obs. xi uˆ t
1 1 2
2 3 -3
3 4 0
4 5 ¿?
5 6 ¿?
With the above information the econometrician must calculate the residual
variance. Do it for them.
Exercise 2.5 Given the model
yi =β1 + β 2 xi + ui 1=1, 2,  , n
the following results with a sample size of 11 were obtained:
n n n n n

∑ xi = 0
i =1
∑ yi = 0
i =1
∑ xi2 = B
i =1
∑ yi2 = E
i =1
∑x y
i =1
i i =F

a) Estimate β 2 and β1
b) Calculate the sum of square residuals.
c) Calculate the coefficient of determination.
d) Calculate the coefficient of determination under the assumption that
2F 2 = BE
Exercise 2.6 Company A is dedicated to mounting prefabricated panels for industrial
buildings. So far, the company has completed eight orders, in which the number of square
meters of panels and working hours employed in the assembly are as follows:
Number of square meters
Number of hours
(thousands)
4 7400
6 9800
2 4600
8 12200
10 14000
5 8200
3 5800

42
THE SIMPLE REGRESSION MODEL

12 17000
Company A wishes to participate in a tender to mount 14000m2 of panels in a
warehouse, for which a budget is required.
In order to prepare the budget, we know the following:
a) The budget must relate exclusively to the assembly costs, since the
material is already provided.
b) The cost of the working hour for Company A is 30 euros.
c) To cover the remaining costs, Company A must charge 20% on the total
cost of labor employed in the assembly.
Company A is interested in participating in the tender with a budget that only
covers the costs. Under these conditions, and under the assumption that the number of
hours worked is a linear function of the number of square meters of panels mounted, what
would be the budget provided by company A?
Exercise 2.7 Consider the following equalities:
1. E[u] = 0.
2. E[ȗ] = 0.
3. u = 0.
4. û = 0.
In the context of the basic linear model, indicate whether each of the above
equalities are true or not. Justify your answer.
Exercise 2.8 The parameters β 1 and β 2 of the following model have been estimated by
OLS:
y =β1 + β 2 x + u
A sample of size 3 was used and the observations for x i were {1,2,3}. It is also
known that the residual for the first observation was 0.5.
From the above information, is it possible to calculate the sum of squared residuals
and obtain an estimate of σ2? If so, carry out the corresponding calculations.
Exercise 2.9 The following data are available to estimate a relationship between y and x:
y x
-2 -2
-1 0
0 1
1 0
2 1
a) Estimate the parameters α and β of the following model by OLS:
y =α + β x + ε
b) Estimate var(ε i ).
c) Estimate the parameters γ and δ of the following model by OLS:
x =γ + δ y + υ
d) Are the two fitted regression lines the same? Explain the result in terms
of the least-square method.

43
INTRODUCTION TO ECONOMETRICS

Exercise 2.10 Answer the following questions:


a) One researcher, after performing the estimation of a model by OLS,
calculates ∑ uˆi and verifies that it is not equal to 0. Is this possible? Are
there any conditions in which this may occur?
b) Obtain an unbiased estimator of σ2, indicating the assumption you have
to use. Explain your answer.
Exercise 2.11 In the context of a linear regression model
y =β1 + β 2 x + u
a) Indicate whether the following equalities are true. If so explain why
n n

∑ ui ∑ uˆi
= u = 0; =
=i 1 =i 1
uˆ =0; E [ xi ui ] =0; E [ui ] =0;
n n
b) Establish the relationship between the following expressions:

σˆ 2 = ∑ i
uˆ 2
E ui2  =σ 2 ;
n−k
Exercise 2.12 Answer the following questions:
a) Define the probabilistic properties of OLS estimator under the basic
assumptions of the linear regression model. Explain your answer.
b) What happens with the estimation of the linear regression model if the
sample variance of the explanatory variable is null? Explain your answer.
Exercise 2.13 A researcher believes that the relationship between consumption (cons)
and disposable income (inc) should be strictly proportional, and, therefore formulates the
following model:
cons=β 2 inc+u
a) Derive the formula for estimating β 2.
b) Derive the formula for estimating σ2 .
n
c) In this model, is å uˆ i equal to 0?
i=1

Exercise 2.14 In the context of the simple linear regression model


y =β1 + β 2 x + u
a) What assumptions must be met for the OLS estimators to be unbiased?
b) What assumptions are required for the estimators with variances which are
the lowest within the set of linear unbiased estimators?
Exercise 2.15 In statistical terms it is often usual to make statements like the following:
"Let x 2 ,… x n , be a random sample of size n drawn from a population N(α,σ)"
a) Express the previous statement with econometric language by introducing
a disturbance term.
b) Derive the formula for estimating α.
c) Derive the formula for estimating σ2.

44
THE SIMPLE REGRESSION MODEL

n
d) In this model, is å uˆ i equal to 0?
i=1

Exercise 2.16 The following model relates expenditure on education (exped) and
disposable income (inc):
exped=β 1 +β 2 inc+u
Using the information obtained from a sample of 10 families, the following results
have been obtained:
10 10 10
exped = 7 inc = 50 åi= 1
inci2 = 30.650 åi= 1
expedi2 = 622 å
i= 1
inci ´ expedi = 4.345

a) Estimate β 1 and β 2 by OLS.


b) Estimate the expenditure on education/ income elasticity for the sample
average family.
c) Decompose the variance of the endogenous variable invariance explained
by the regression and residual variance.
d) Calculate the coefficient of determination.
e) Estimate the variance of the disturbances.
Exercise 2.17 Given the population model
y i =3+2x i +u i i= 1, 2, 3
where x i ={1,2,3}:
a) Using N(0,1) random number, generate 15 samples of u 1 , u 2 and u 3 , and
obtain the corresponding values of y
b) Carry out the corresponding estimates of β 1 and β 2 in the model:
y =β1 + β 2 x + u
c) Compare the sample means and variances of β̂1 y β̂ 2 with their population
expectations and variances.
Exercise 2.18 Based on the information supplied in exercise 2.17, and the 15 pairs of
estimates of β 1 and β 2 obtained:
a) Calculate the residuals corresponding to each of the estimates.
b) Show why the residuals always take the form
uˆ1 = −uˆ2
uˆ3 = 0

Exercise 2.19 The following model was formulated to explain sleeping time (sleep) as a
function of time devoted to paid work (paidwork):
β1 + β 2 paidwork + u
sleep =
where sleep and paidwork are measured in minutes per day.
Using a random subsample extracted from the file timuse03, the following results
were obtained
·
sleep = 550.17 - 0.1783 paidwork
i

R2= 0.2539 n=62

45
INTRODUCTION TO ECONOMETRICS

a) Interpret the coefficient on paidwork.


b) What is, on average, the predicted increment in sleep if time devoted to
paid work decreases in an hour per day?
c) How much of the variation in sleep is explained by paidwork?
Exercise 2.20 Quantifying happiness is not an easy task. Researchers at the Gallup World
Poll went about it by surveying thousands of respondents in 155 countries, between 2006
and 2009, in order to measure two types of well-being. They asked respondents to report
on the overall satisfaction with their lives, and ranked their answers using a "life
evaluation" score from 1 to 10. To explain the overall satisfaction (stsfglo) the following
model has been formulated, where observations are averages of the variables in each
country:

β1 + β 2lifexpec + u
stsfglo =
where lifexpec is life expectancy at birth: that is to say, number of years a newborn infant
is expected to live.
Using the work file HDR2010, the fitted model obtained is the following:
·
stsfglo =−1.499 + 0.1062lifexpec
R2= 0.6135 n=144
a) Interpret the coefficient on lifexpec.
b) What would be the average overall satisfaction for a country with 80 years
of life expectancy at birth?
c) What should be the life expectancy at birth to obtain a global satisfaction
equal to six?
Exercise 2.21 In economics, Research and Development intensity (or simply R&D
intensity) is the ratio of a company's investment in Research and Development compared
to its sales.
For the estimation of a model which explains R&D intensity, it is necessary to
have an appropriate database. In Spain it is possible to use the Survey of Entrepreneurial
Strategies (Encuesta sobre Estrategias Empresariales) produced by the Ministry of
Industry. This survey, on an annual basis, provides in-depth knowledge of the industrial
sector's evolution over time by means of multiple data concerning business development
and company decisions. This survey is also designed to generate microeconomic
information that enables econometric models to be specified and tested. As far as its
coverage is concerned, the reference population of this survey is companies with 10 or
more workers from the manufacturing industry. The geographical area of reference is
Spain, and the variables have a timescale of one year. One of the most outstanding
characteristics of this survey is its high degree of representativeness.
Using the work file rdspain, which is a dataset consisting of 1,983 Spanish firms
for 2006, the following equation is estimated to explain expenditures on research and
development (rdintens):
·
rdintens = - 2.639 + 0.2123ln( sales )
R2= 0.0350 n=1983
where rdintens is expressed as a percentage of sales, and sales are measured in millions
of euros.

46
THE SIMPLE REGRESSION MODEL

a) Interpret the coefficient on ln(sales).


b) If sales increase by 50%, what is the estimated percentage point change in
rdintens?
c) What percentage of the variation of rdintens is explained by sales? Is it
large? Justify your answer.
Exercise 2.22 The following model has been formulated to explain MBA graduated
salary (salMBAgr) as a function of tuition fees (tuition)
β1 + β 2tuition + u
salMBAgr =
where salMBApr is the median annual salary in dollars for students enrolled in 2,010 of
the 50 best American business schools and tuition is tuition fees including all required
fees for the entire program (but excluding living expenses).
Using the data in MBAtui10, this model is estimated:
· = 54242 + 0.4313tuition
salMBAgr i i
2
R =0.4275 n=50
a) What is the interpretation of the intercept?
b) What is the interpretation of the slope coefficient?
c) What is the predicted value of salMBAgr for a graduate student who paid
110000$ tuition fees in a 2 years MBA?
Exercise 2.23 Using a subsample of the Structural Survey of Wages (Encuesta de
estructura salarial) for Spain in 2006 (wage06sp), the following model is estimated to
explain wages:
·wage
ln( = ) 1.918 + 0.0527educ
R2=0.2445 n=50
where educ (education) is measured in years and wage in euros per hour.
a) What is the interpretation of the coefficient on educ?
b) How many more years of education are required to have a 10% higher
wage?
c) Knowing that educ = 10.2 , calculate the wage/education elasticity. Do you
consider this elasticity to be high or low?
Exercise 2.24 Using data from the Spanish economy for the period 1954-2010 (work file
consump), the Keynesian consumption function is estimated:
·
conspc = −288 + 0.9416incpc
t t
2
R =0.994 n=57
where consumption (conspc) and disposable income (incpc) are expressed in
constant euros per capita, taking 2008 as reference year.
a) What is the interpretation of the intercept? Comment on the sign and
magnitude of the intercept.
b) Interpret the coefficient on incpc. What is the economic meaning of this
coefficient?

47
INTRODUCTION TO ECONOMETRICS

c) Compare the marginal propensity to consume with the average propensity


to consume at the sample mean (conspc = 8084, incpc = 8896) . Comment
on the result obtained.
d) Calculate the consumption/income elasticity for the sample mean.

Annex 2.1 Case study: Engel curve for demand of dairy products
The Engel curve shows the relationship between the various quantities of a good
that a consumer is willing to purchase at varying income levels.
In a survey with 40 households, data were obtained on expenditure on dairy
products and income. These data appear in table 2.6 and in work file demand. In order to
avoid distortions due to the different size of households, both consumption and income
have been expressed in terms of per capita. The data are expressed in thousands of euros
per month.
There are several demand models. We will consider the following models: linear,
inverse, semi-logarithmic, potential, exponential and inverse exponential. In the first three
models, the regressand of the equation is the endogenous variable, whereas in the last
three the regressand is the natural logarithm of the endogenous variable.
In all the models we will calculate the marginal propensity to expenditure, as well
as the expenditure/income elasticity.

TABLE 2.6. Expenditure on dairy products (dairy), disposable income (inc) in terms of per
capita. Unit: euros per month.
household dairy inc household dairy inc
1 8.87 1.250 21 16.20 2.100
2 6.59 985 22 10.39 1.470
3 11.46 2.175 23 13.50 1.225
4 15.07 1.025 24 8.50 1.380
5 15.60 1.690 25 19.77 2.450
6 6.71 670 26 9.69 910
7 10.02 1.600 27 7.90 690
8 7.41 940 28 10.15 1.450
9 11.52 1.730 29 13.82 2.275
10 7.47 640 30 13.74 1.620
11 6.73 860 31 4.91 740
12 8.05 960 32 20.99 1.125
13 11.03 1.575 33 20.06 1.335
14 10.11 1.230 34 18.93 2.875
15 18.65 2.190 35 13.19 1.680
16 10.30 1.580 36 5.86 870
17 15.30 2.300 37 7.43 1.620
18 13.75 1.720 38 7.15 960
19 11.49 850 39 9.10 1.125
20 6.69 780 40 15.31 1.875

Linear model
The linear model for demand of dairy products will be the following:
β1 β 2inc + u
dairy =+ (2-73)

48
THE SIMPLE REGRESSION MODEL

The marginal propensity indicates the change in expenditure as income varies and
it is obtained by differentiating the expenditure with respect to income in the demand
equation. In the linear model the marginal propensity of the expenditure on dairy is given
by
d dairy
= β2 (2-74)
d inc
In other words, in the linear model the marginal propensity is constant and,
therefore, it is independent of the value that takes the income. It has the disadvantage of
not being adapted to describe the behavior of the consumers, especially when there are
important differences in the household income. Thus, it is unreasonable that the marginal
propensity of expenditure on dairy products is the same in a low-income family and a
family with a high income. However, if the variation of the income is not very high in the
sample, a linear model can be used to describe the demand of certain goods.
In this model the expenditure/income elasticity is the following:
d dairy inc inc
ε dairy
= linear
/ inc = β2 (2-75)
d inc dairy dairy
Estimating the model (2-73) with the data from table 2.6, we obtain
· = 4.012 + 0.005288´ inc
dairy R 2 = 0.4584 (2-76)
Inverse model
In an inverse model there is a linear relationship between the expenditure and the
inverse of income. Therefore, this model is directly linear in the parameters and it is
expressed in the following way:
1
β1 + β 2
dairy = +u (2-77)
inc
The sign of the coefficient β 2 will be negative if the income is correlated
positively with the expenditure. It is easy to see that, when the income tends towards
infinite, the expenditure tends towards a limit which is equal to β 1 . In other words, β 1
represents the maximum consumption of this good.
In figure 2.14, we can see a double representation of the population function
corresponding to this model. In the first one, the relationship between the dependent
variable and explanatory variable has been represented. In the second one, the relationship
between the regressand and regressor has been represented. The second function is linear
as can be seen in the figure.

49
INTRODUCTION TO ECONOMETRICS

dairy
dairy

β1

E(dairy) = β1 + β2 1/inc

inc
1/inc

Figure 2.14. The inverse model.

In the inverse model, the marginal propensity to expenditure is given by


d dairy 1
= −β2 (2-78)
d inc (inc) 2
According to (2-78), the marginal propensity is inversely proportional to the
square of the income level.
On the other hand, the elasticity is inversely proportional to the product of
expenditure and income, as can be seen in the following expression:
d dairy inc 1
ε dairy
inv
/ inc = = −β2 (2-79)
d inc dairy inc × dairy
Estimating the model (2-77) with the data of table 2.6, we obtain
· 1
dairy = 18.652 - 8702 R 2 = 0.4281 (2-80)
inc
In this case the coefficient β̂ 2 does not have an economic meaning.

Linear-log model
This model is denominated linear-log model, because the expenditure is a linear
function of the logarithm of income, that is to say,
β1 + β 2 ln(inc) + u
dairy = (2-81)
In this model the marginal propensity to expenditure is given by
d dairy d dairy inc d dairy 1 1
= = = β2 (2-82)
d inc d inc inc d ln(inc) inc inc
and the elasticity expenditure/income is given by
d dairy inc d dairy 1 1
ε dairy
= lin- log
/ inc = = β2 (2-83)
d inc dairy d ln(inc) dairy dairy
The marginal propensity is inversely proportional to the level of income in the
linear-log model, while the elasticity is inversely proportional to the level of expenditure
on dairy products.

50
THE SIMPLE REGRESSION MODEL

In figure 2.15, we can see a double representation of the population function


corresponding to this model.

dairy dairy

E(dairy) = β1 + β2 ln(inc)

inc ln(inc)

Figure 2.15. The linear-log model.


Estimating the model (2-81) with the data from table 2.6, we obtain
· = - 41.623 + 7.399´ ln(inc) R 2 = 0.4567
dairy (2-84)

The interpretation of β̂ 2 is the following: if the income increases by 1%, the


demand of dairy products will increase by 0.07399 euros.
Log-log model or potential model
This exponential model is defined in the following way:
dairy = e β1 inc β2 eu (2-85)
This model is not linear in the parameters, but it is linearizable by taking natural
logarithms, and the following is obtained:
β1 + β 2 ln(inc) + u
ln(dairy ) = (2-86)
This model is also called log-log model, because this is the structure of the
corresponding linearized model.
In this model the marginal propensity to expenditure is given by
d dairy dairy
= β2 (2-87)
d inc inc
In the log-log model, the elasticity is constant. Therefore, if the income increases
by 1%, the expenditure will increase by β 2 %, since
d dairy inc d ln(dairy )
ε dairy
= log -log
/ inc = = β2 (2-88)
d inc dairy d ln(inc)
In figure 2.16, we can see a double representation of the population function
corresponding to this model.

51
INTRODUCTION TO ECONOMETRICS

dairy
E (dairy ) = β1inc β2
ln(dairy)

inc ln(inc)

Figure 2.16. The log-log model.


Estimating the model (2-86) with the data from table 2.6, we obtain
·dairy ) = - 2.556 + 0.6866´ ln(inc)
ln( R 2 = 0.5190 (2-89)

In this case β̂ 2 is the expenditure/income elasticity. Its interpretation is the


following: if the income increases by 1%, the demand of dairy products will increase by
0.68%.
Log-linear or exponential model
This exponential model is defined in the following way:
dairy = exp( β1 + β 2inc + u ) (2-90)
By taking natural logarithms on both sides of (2-90), we obtain the following
model that is linear in the parameters:
β1 β 2inc + u
ln(dairy ) =+ (2-91)
In this model the marginal propensity to expenditure is given by
d dairy
= β 2 dairy (2-92)
d inc
In the exponential model, unlike other models seen previously, the marginal
propensity increases when the level of expenditure does. For this reason, this model is
adequate to describe the demand of luxury products. On the other hand, the elasticity is
proportional to the level of income:
d dairy inc d ln(dairy )
ε dairy
= exp
/ inc = = inc β 2inc (2-93)
d inc dairy d inc
In figure 2.17, we can see a double representation of the population function
corresponding to this model.

52
THE SIMPLE REGRESSION MODEL

dairy ln(dairy)

E (dairy ) = e β1 + β 2inc

inc inc

Figure 2.17. The log-linear model.


Estimating the model (2-91) with the data from table 2.6, we obtain
·dairy ) = 1.694 + 0.00048´ inc
ln( R 2 = 0.4978 (2-94)

The interpretation of β̂ 2 is the following: if the income increases by a euro the


demand of dairy products will increase by 0.048%.
Inverse exponential model
The inverse exponential model, which is a mixture of the exponential model and
the inverse model, has properties that make it suitable for determining the demand for
products in which there is a saturation point. This model is given by
1
dairy = exp( β1 + β 2 + u) (2-95)
inc
By taking natural logarithms on both sides of (2-95), we obtain the following
model that is linear in the parameters:
1
β1 + β 2
ln(dairy ) = +u (2-96)
inc
In this model the marginal propensity to expenditure is given by
d dairy dairy
= −β2 (2-97)
d inc (inc) 2
and the elasticity by
d dairy inc d ln(dairy ) 1
ε dairy
invexp
/ inc = = inc = − β 2 (2-98)
d inc dairy d inc inc
Estimating the model (2-96) with the data from table 2.6, we obtain
·dairy ) = 3.049 - 822.02 1
ln( R 2 = 0.5040 (2-99)
inc
In this case, as in the inverse model, the coefficient β̂ 2 does not have an economic
meaning.
In table 2.7, the results of the marginal propensity, the expenditure/income
elasticity and R2 in the six fitted models are shown

53
INTRODUCTION TO ECONOMETRICS

Table 2.7. Marginal propensity, expenditure/income elasticity and R2 in the fitted models.
Model Marginal propensity Elasticity R2
inc
Linear β̂ 2 =0.0053 βˆ2 =0.6505 0.4440
dairy
1
− βˆ2 2
=0.0044 − βˆ2
1
=0.5361
Inverse inc  dairy × inc 0.4279
 
1 1
Linear-log βˆ2 =0.0052 βˆ2 =0.6441
0.4566
inc dairy

dairy
Log-log βˆ2 =0.0056 β̂ 2 =0.6864 0.5188
inc

Log-linear βˆ2 × dairy =0.0055 βˆ2 × inc =0.6783 0.4976

dairy
− βˆ2 2
=0.0047 − βˆ2
1
=0.5815
Inverse-log
inc  inc
0.5038
 

The R2 obtained in the first three models are not comparable with the R2 obtained
in the last three because the functional form of the regressand is different: y in the first
three models and ln(y) in the last three.
Comparing the first three models the best fit is obtained by the linear-log model,
if we use the R2 as goodness of fit measure. Comparing the last three models the best fit
is obtained by the log-log model. If we had used the Akaike Information Criterion (AIC),
which allows the comparison of models with different functional forms for the regressand,
then the-log-log model would have been the best among the six models fitted. The AIC
measured will be studied in chapter 3.

Appendixes

Appendix 2.1: Two alternative forms to express β̂ 2


It is easy to see that
n n n n n

=i 1
∑ ( yi − y )( xi − =
x)
=i 1
∑ ( yi xi − xyi − yxi + yx=) =i 1
∑ yi xi − x ∑ yi − y ∑ xi + nyx
=i 1 =i 1
n n n n

=i 1
= ∑ yi xi − nxy − y ∑ xi + nyx=
=i 1 =i 1 =i 1
∑ yi xi − y ∑ xi
On the other hand, we have
n n n n

=i 1
∑ (x − x ) = ∑ (x
i
=i 1
2 2
i − 2 xxi + xx ) 2 =
2
i
=i 1 =i 1
∑x − 2 x ∑ xi + nxx
n n n

=i 1
= ∑x 2
i − 2nx + nx =
2 22
i
=i 1 =i 1
∑x − x ∑ xi

54
THE SIMPLE REGRESSION MODEL

Therefore, (2-17) can be expressed in the following way:


n n n

∑ y x − y ∑ x ∑ ( y − y )( x − x )
i i i i i
=βˆ =i 1
2 n
=
=i 1 =i 1
n n
2
i∑ x − x∑ x
=i 1 =i 1
i
=i 1
∑ (x − x )
i
2

Appendix 2.2. Proof: rxy2 = R 2


First of all, we are going to see an equivalence that will be used in the proof. By
definition,
yˆ=i βˆ1 + βˆ2 xi
From the first normal equation, we have
y= βˆ1 + βˆ2 x
Subtracting the second equation from the first one:
y βˆ2 ( xi − x )
yˆi − =
Squaring both sides
( yˆi − y ) 2 = βˆ22 ( xi − x ) 2
and summing for all i, we have

∑ ( yˆ − y=
i) 2
βˆ22 ∑ ( xi − x ) 2
Taking into account the previous equivalence, we have
2
n
 n
n
 n
∑ ( ˆ
y i − ˆ
y ) 2 ∑
2
( xi βˆ 2
− x )  ∑ ( y i −2
y )( xi − x )  ∑ ( xi − x )
2

=
=
R 2 i=
n
1 =i 1
n
=  i =1 =
2
 i1
n
 2
∑ ∑ ∑
n
− − ( yi − y ) 2

2 2
( y i y ) ( yi y )
 ( xi − x ) 
=i 1 =i 1  i =1 =  i 1

2
 n 
 ∑ ( yi − y )( xi − x ) 
= = 
i =1 1
n n
rxy2
∑ ( xi − x )2 ∑ ( yi − y )2
=i 1 =i 1

Appendix 2.3. Proportional change versus change in logarithms


Change in logarithms is a variation rate, which is used in economics research. The
relationship between proportional change and change in logarithms can be seen if we
expand (2-45) by Taylor series:

55
INTRODUCTION TO ECONOMETRICS

x 
ln( x1 ) − ln( x0 ) =
ln  1 
 x0 
   
2  
x  1  1  x1   1 
= ln(1) +  1 − 1   +  − 1 −
 x0   x1  2  x0    x1  
 x0  x1   
x0
=1   x0   x1 =1
x0

 
3  
1  x1   2 
+  − 1  3
+  (2-100)
3 × 2  x0   
 1  x
  x0   x1
=1
x0
2 3
x  1x  1x 
=  1 − 1 −  1 − 1 +  1 − 1 + 
 x0  2  x0  3  x0 
2 3
∆x1 1  ∆x1  1  ∆x1 
=−   +   +
x0 2  x0  3  x0 
Therefore, if we take the linear approximation in this expansion, we have
 x  ∆x
∆ ln( x) = ln( x1 ) − ln( x0 ) = ln  1  ≈ 1 (2-101)
 x0  x0

Appendix 2.4. Proof: OLS estimators are linear and unbiased


We will only prove the unbiasedness of the estimator β̂ 2 , which is the most
important. In order to prove this, we need to rewrite our estimator in terms of the
population parameter. The formula (2-18) can be written as
n n
i i ∑ ( x − x )( y − y ) ∑ ( x − x ) y i i

=βˆ =i 1 =i 1
2 n n
= (2-102)
∑(x − x ) ∑(x − x )
2 2
i i
=i 1 =i 1

n n
because
=i 1 =i 1
∑ ( xi − x ) y = y ∑ ( xi − x ) = y × 0 = 0
Now (2-102) will be expressed in the following way:
n
βˆ2 = ∑ ci yi (2-103)
i =1

where

56
THE SIMPLE REGRESSION MODEL

xi − x
ci = n
(2-104)
∑ (x − x )
i =1
i
2

The c i ’s have the following properties:


n

∑c i =1
i =0 (2-105)

n ∑ (x − x ) i
2
1
=∑ ci2 = i =1
2 n
(2-106)
 n 2
i =1
 ∑ ( xi − x )  ∑ (x − x )
i
2

 i =1  i =1

n ∑ (x − x )x i i

∑ ci xi
= = 1
i =1
n
(2-107)
i =1
∑ ( xi − x )2
i =1

Now, if we substitute y =β1 + β 2 x + u (assumption 1) in (2-102), we have


n n
βˆ2=
=i 1 =i 1
∑ ci yi= ∑ c (β i 1 + β 2 xi + ui )
n n n n
(2-108)
1β ∑c + β
= i 2 i i ∑c x + ∑c u i i β 2 ∑ ci ui
=+
=i 1 =i 1 =i 1 =i 1

Since the regressors are assumed to be nonstochastic (assumption 2), the c i are
nonstochastic too. Therefore, β̂ 2 is an estimator that is a linear function of u’s.
Taking expectations in (2-108) and taking into account assumption 6, and
implicitly assumptions 3 through 5, we obtain
n
β 2 + ∑ ci E (ui ) =
E ( βˆ2 ) = β2 (2-109)
i =1

Therefore, b̂2 is an unbiased estimator of β 2

Appendix 2.5. Calculation of variance of β̂ 2 :


2
  ∑ ci u=
n n n

∑ ci2 E (ui2 ) + ∑∑ ci c j E (ui u j )


2
E  βˆ2 − β=
2 i
= i 1 =  i 1 = i≠ j i 1

n
ci2 σ2 σ2 (2-110)
= σ=
2

i =1 =
n
nS X2
∑ i
( x − x ) 2

i =1

In the above proof, to pass from the second to the third equality, we have taken
into account assumptions 6 and 7.

57
INTRODUCTION TO ECONOMETRICS

Appendix 2.6. Proof of Gauss-Markov Theorem for the slope in simple regression
The plan for the proof is the following. First, we are going to define an arbitrary
estimator β2 which is linear in y. Second, we will impose restrictions implied by
unbiasedness. Third, we will show that the variance of the arbitrary estimator must be
larger than, or at least equal to, the variance of β̂ 2 .

Let us define an arbitrary estimator β2 which is linear in y:


n
β2 = ∑ hi yi (2-111)
i =1

Now, we substitute y i by its value in the population model (assumption 1):


n n n n n

β=2 ∑ hi=
=i 1 =i 1
yi ∑ hi (β1 + β 2 xi + u=
i) β1 ∑ hi + β 2 ∑ hi xi + ∑ hi ui
=i 1 =i 1 =i 1
(2-112)

For the estimator β2 to be unbiased, the following restrictions must be


accomplished:
n n

∑h
i =1
i =0 ∑h x
i =1
i i =1 (2-113)

Therefore,
n
β=
2 β 2 + ∑ hi ui (2-114)
i =1

The variance of this estimator is the following:


2
 n  n
E  β2 − β 2 =  ∑ hi ui = σ 2 ∑ h=
2
2
i
=  i 1=  i 1
2 2
   
n   n 
x −x x −x x −x 
σ 2 ∑  hi − n i + n i  =σ 2 ∑  hi − n i  (2-115)
 2 i 1  2

=
i 1=
 ∑ ( xi − x ) ∑ ( xi − x )
i 1 =i 1
2
 =  ∑i 1
( xi − x )

2
   
n   n 
x −x x − x  xi − x
+σ 2 ∑  n i  + 2σ 2 ∑  hi − n i  n
i =1  2 i =1  2

=
 ∑ ( xi − x )
 i 1= 
 

∑i 1
( xi − x ) ∑ ( xi − x ) 2
 i =1
The third term of the last equality is 0, as shown below:

58
THE SIMPLE REGRESSION MODEL

 
n  x − x  xi − x
2σ 2 ∑  hi − n i  n
i =1  2

=


∑ ( xi − x ) ∑ ( xi − x ) 2
i 1=
 i 1
(2-116)
   
n   n  2 
x −x (x − x )
= 2σ 2 ∑  hi n i  − 2σ 2 ∑  n i = 2σ 2 × 1 − 2σ 2 ×=
1 0
 2 i 1  2
 ∑  ∑
=i 1 =
( xi − x ) ( xi − x )
= i 1=
 i 1


Therefore, taking into account (2-116) and operating, we have


n 2
1
 σ 2 ∑ [ hi − ci ] + σ 2
2
E  β2 − β 2= n
(2-117)
i =1
∑ (x − x )
i =1
i
2

xi − x
where ci = n

∑ (x − x )
i =1
i
2

The second term of the last equality is the variance of β̂ 2 , while the first term is
always positive because it is a sum of squares, except that h i =c i , for all i, in which case it
is equal to 0, and then β2 = βˆ2 . So,
2
E  β2 − β 2  ≥ E  βˆ2 − β 2 
2
(2-118)

)
Appendix 2.7. Proof: σ 2 is an unbiased estimator of the variance of the disturbance
The population model is by definition:
yi =β1 + β 2 xi + ui (2-119)
If we sum up both sides of (2-119) for all i and divide by n, we have
y =β1 + β 2 x + u (2-120)
Subtracting (2-120) from (2-119), we have
y β 2 ( xi − x ) + ( ui − u )
yi −= (2-121)

On the other hand, uˆi is by definition:

uˆi = yi − βˆ1 − βˆ2 xi (2-122)


If we sum up both sides of (2-122) for all i and divide by n, we have
û = y − βˆ1 − βˆ2 x (2-123)

Subtracting (2-123) from (2-122), and taking into account that û =0,

59
INTRODUCTION TO ECONOMETRICS

uˆi = ( yi − y ) − βˆ2 ( x1 − x ) (2-124)


Substituting (2-121) in (2-124), we have
uˆ=
i β 2 ( xi − x ) + ( ui − u ) − βˆ2 ( x1 − x )
( ) ( x − x ) + (u − u )
(2-125)
− βˆ2 − β 2
= 1 i

Squaring and summing up both sides of (2-125), we have


n n n

∑ uˆi2 =  β2 − β 2  ∑ ( xi − x )2 + ∑ (ui − u )2


2

=i 1 =i 1 =i 1
n
(2-126)
−2  β2 − β 2  ∑ ( xi − x )(ui − u )
i =1

Taking expectation in (2-126), we obtain


 n  n  n 
E  ∑ uˆi2  = ∑ ( xi − x ) 2 E  β2 − β 2  + E  ∑ (ui − u ) 2 
2

=  i 1=  i1 = i 1 
 
( )
n
−2 E  β2 − β 2 ∑ ( xi − x )(ui − u )  (2-127)
 i =1 
n
σ2
= ∑ ( xi − x ) 2 n
+ ( n − 1) σ 2 − 2σ 2 = ( n − 2 ) σ 2
i =1
∑ (x − x )
i =1
i
2

To obtain the first term of the last equality of (2-127), we have used (2-64). In
(2-128) and (2-129), you can find the developments used to obtain the second and the
third term of the last equality of (2-127) respectively. In both cases, assumptions 7 and 8
have been taken into account.
  n 
2

 n
  ∑ ui  
2  n 2 2
n
E  ∑ (ui − u ) =  
 E  ∑ ui − nu = E  ∑ ui − n  n 
2 i =1

=  i 1=   i 1=  i 1   
    (2-128)
   
 n 1 n  n
=E  ∑ ui2 −  ∑ ui2 + ∑ ui u j   = ( n − 1) σ 2
nσ 2 − σ 2 =
=  i 1 = n i 1 i≠ j   n

60
THE SIMPLE REGRESSION MODEL

 
   
( )
n n n
1
E  β2 − β 2 ∑ (=
xi − x )(ui − u )  E  n ∑ i ( x − x )ui∑ ( xi − x )ui
=i 1   = 
 ∑
( xi − x ) 2 i 1 =i 1

i =1

2
 n 1 
= n  ∑ ( xi − x ) E ( ui ) 
∑ ( xi − x )2  i =1 i =1

 n 
x ) E ( ui u j )  σ 2
1
=  ∑ ( xi − x ) E ( ui ) + ∑∑ ( xi − x )( xi −=
2 2
n

∑ ( x − x )2  i 1
=

i =1
i
i≠ j 

(2-129)
According to (2-127), we have
 n 
E  ∑ uˆi2 =
 ( n − 2)σ 2 (2-130)
 i =1 
Therefore, an unbiased estimator is given by
n

∑ uˆ 2
i
σˆ 2 = i =1
(2-131)
n−2
such as
 n 
E (σˆ 2 )
1
= = E  ∑ uˆi2  σ 2 (2-132)
n − 2  i =1 

Appendix 2.8. Consistency of the OLS estimator


The operator plim has the in variance property (Slutsky property). That is to say,
()
if θˆ is a consistent estimator of θ and if g θˆ is any continuous function of θˆ , then

plim g (θˆ) = g (θ ) (2-133)


n →∞

This means is that if θˆ is a consistent estimator of θ, then 1/ θˆ and ln( θˆ ) are also
consistent estimators of 1/θ and ln(θ) respectively. Note that these properties do not hold
true for the expectation operator E; for example, if θˆ is an unbiased estimator of θ [that is
to say, E( θˆ )=θ], it is not true that 1/ θˆ is an unbiased estimator of 1/θ; that is, E(1/ θˆ ) ≠
1/E( θˆ ) ≠ 1/θ. This is due to the fact that the expectation operator can be only applied to
linear functions of random variables. On the other hand, the plim operator is applicable
to any continuous functions.

Under assumptions 1 through 6, the OLS estimators, β̂1 and β̂ 2 , are consistent.

61
INTRODUCTION TO ECONOMETRICS

Now we are going to prove that β̂ 2 is a consistent estimator. First, β̂ 2 can be


expressed as:
n n n

∑ ( xi − x )( yi − y ) ∑ ( xi − x ) yi ∑ ( xi − x ) (β1 + β 2 xi + ui )
=
=
βˆ2 i 1 =n
=i 1 =i 1
=
n n

∑ i
( − ) ∑ ( xi − x ) ∑(x − x )
2 2 2
x x i
=i 1 =i 1 =i 1
n n n n
(2-134)
1 i β i∑( x − x )
i ∑( x − x ) x ∑( x − x )u i i ∑( x − x )u
i i

n
= +β
=i 1 =i 1 =i 1
2 n n
+ =i 1
2 n
β +
=
∑( x − x ) ∑(x − x ) ∑(x − x ) ∑ ( xi − x )
2 2 2 2
i i i
=i 1 =i 1 =i 1 =i 1

In order to prove consistency, we need to take plim´s in (2-134) and apply the Law
of Large Numbers. This law states that under general conditions, the sample moments
converge to their corresponding population moments. Thus, taking plim´s in (2-134):
 n
 1 n
 ∑ ( xi − x ) u i  plim ∑ ( xi − x ) ui
n →∞ n i 1
plim β 2 =
= ˆ plim  β 2 + n
i 1=
=β2 + (2-135)
n →∞  2  1 n
∑ ( xi − x ) plim ∑ ( xi − x )
n →∞ 2

=

 i 1=

 n →∞ n i 1

In the last equality we have divided the numerator and denominator by n, because
if we do not do so, both summations will go to infinity when n goes to infinity..
If we apply the law of large numbers to the numerator and denominator of (2-135),
they will converge in probability to the population moments cov(x,u) and var(x)
respectively. Provided var(x)≠0 (assumption 4), we can use the properties of the
probability limits to obtain
cov( x, u )
plimβˆ2 =
β2 + β2
= (2-136)
var ( x)
To reach the last equality, using assumptions 2 and 6, we obtain
cov( x, u ) = E [ ( x − x )u ] = ( x − x ) E [u ] = ( x − x ) × 0 = 0 (2-137)

Therefore, β̂ 2 is a consistent estimator.

Appendix 2.9 Maximum likelihood estimator


Taking into account assumptions 1 through 6 the expectation of y i is the following:
) β1 + β 2 xi
E ( yi= (2-138)
If we take into account assumptions 7, the variance of y i is equal to
) E [ yi − E ( yi ) ] = E [ yi − β1 + β 2 xi ] = E [ui ] = σ 2
var( yi = ∀i
2 2 2
(2-139)
According to assumption 1, y i is a linear function of u i , and if u i has a normal
distribution (assumption 9), then y i will be normally and independently (assumption 8)
distributed with mean β1 + β 2 xi and variance σ2.

62
THE SIMPLE REGRESSION MODEL

Then, the joint probability density function of y1 , y2 ,  , yn can be expressed as a


product of n individual density functions:
f ( y1 , y2 , . . . , yn | β1 + β 2 xi , σ 2 )
(2-140)
f ( y1 | β1 + β 2 xi , σ 2 ) f ( y2 | β1 + β 2 xi , σ 2 )  f ( yn | β1 + β 2 xi , σ 2 )
=

where

1  1 [ yi − β1 − β 2 xi ]2 
f ( yi )
= exp −  (2-141)
σ 2π  2 σ2 
which is the density function of a normally distributed variable with the given mean and
variance.
Substituting (2-141) into (2-140)for each y i , we obtain
f ( y1 , y2 ,  , yn ) = f ( y1 ) f ( y2 )  f ( yn )

1  1 n [ yi − β1 − β 2 xi ]2  (2-142)
= exp − ∑ 
( ) σ2
n
σn 2π  2 i =1 

If y1 , y2 ,  , yn are known or given, but β 2 , β 3 , and σ2 are not known, the function
in (2-142) is called a likelihood function, denoted by L(β 2 , β 3 , σ2) or simply L. If we take
natural logarithms in (2-142), we obtain
1 n ( yi − β1 − β 2 xi )
ln L =
n
−n ln σ − ln
2
( 2π − ) ∑
2 i =1 σ2
(2-143)
1 n ( yi − β1 − β 2 xi )
n n
− ln σ 2 − ln
=
2 2
( )
2π − ∑
2 i =1 σ2
The maximum likelihood (ML) method, as the name suggests, consists in
estimating the unknown parameters in such a manner that the probability of observing the
given y i ‘s is as high (or maximum) as possible. Therefore, we have to find the maximum
of the function (2-143). To maximize (2-143) we must differentiate with respect to β 2 , β 3 ,
and σ2 and equal to 0. If β1 , β2 and σ 2 denote the ML estimators, we obtain:
∂ ln L
∑ ( y − β − β x ) ( −1) =0
1
=−
∂β1 σ 2 1 2 i

∂ ln L
∑ ( y − β − β x ) ( − x ) =
1
=− 2 0 (2-144)

∂β 2 σ 1 2 i i

∂ ln L
∑ ( y − β − β x )
n 1 2
=− 2+ =
0
∂σ 2σ 2σ 4
2 1 2 i

If we take the first two equations of (2-144) and operate, we have

∑=
y i nβ1 + β2 ∑ xi (2-145)

63
INTRODUCTION TO ECONOMETRICS


= yx i i β1 ∑ xi + β2 ∑ xi2 (2-146)
As can be seen, (2-145) and (2-146) are equal to (2-13) and (2-14). That is to say,
the ML estimators, under the CLM assumptions, are equal to the OLS estimators.
Substituting β1 and β2 , obtained solving (2-145) and (2-146), in the third
equation of (2-144), we have
1
∑ ( ) 1
∑ ( 1
)∑
2 2
σ 2
= yi − β1 − β2 =
xi yi − βˆ1 − βˆ2 =
xi uˆi2 (2-147)
n n n
The ML estimator for σ 2 is biased, since, according to (2-127),
1  n 2 n−2 2
E (σ 2 )
= E ∑ uˆi
= σ
n  i =1 
(2-148)
n

In any case, σ 2 is a consistent estimator because


n−2
lim =1 (2-149)
n →0 n

64
65
3 MULTIPLE LINEAR REGRESSION: ESTIMATION AND
PROPERTIES

3.1 The multiple linear regression model


The simple linear regression model is not adequate for modeling many economic
phenomena, because in order to explain an economic variable it is necessary to take into
account more than one relevant factor. We will illustrate this with some examples.
In the Keynesian consumption function, disposable income is the only relevant
variable:
β1 β 2inc + u
cons =+ (3-1)
However, there are other factors that may be considered relevant in consumer
behavior. One of these factors could be wealth. By including this factor, we will have a
model with two explanatory variables:
β1 β 2inc + β3 wealth + u
cons =+ (3-2)
In the analysis of production, a potential function is often used, which can be
transformed into a linear model in the parameters with an adequate specification (taking
natural logs). Using a single input -labor- a model of this type would be specified as
follows:
β1 + β 2 ln(labor ) + u
ln(output ) = (3-3)
The previous model is clearly insufficient for economic analysis. It would be
better to use the well-known Cobb-Douglas model that considers two inputs (labor and
capital):
β1 + β 2 ln(labor ) + β3 ln(capital ) + u
ln(output ) = (3-4)
According to microeconomic theory, total costs (costot) are expressed as a
function of the quantity produced (quantprod). A first approximation to explain the total
costs could be a model with only one regresor:
β1 + β 2 quantprod + u
costot = (3-5)
However, it is very restrictive considering that, as would be the case with the
previous model, the marginal cost remains constant regardless of the quantity produced.
In economic theory, a cubic function is proposed, which leads to the following
econometric model:

66
MULTIPLE LINEAR REGRESSION

β1 + β 2 quantprod + β3quantprod 2 + β 4 quantprod 3 + u


costot =
(3-6)
In this case, unlike the previous ones, only one explanatory variable is considered,
but with three regressors.
Wages are determined by several factors. A relatively simple model could explain
wages using years of education and years of experience as explanatory variables:
β1 + β 2 educ + β3exper + u
wages = (3-7)
Other important factors to explain wages received can also be quantitative
variables such as training and age, or qualitative variables, such as sex, industry, and so
on.
Finally, in explaining the expenditure on fish relevant factors are the price of fish,
the price of a substitutive commodity such as meat, and disposable income:
β1 + β 2 fishprice + β3meatprice + β 4income + u (3-8)
fishexp =
Thus, the above examples highlight the need for using multiple regression models.
The econometric treatment of the simple regression model was made with ordinary
algebra. The treatment of an econometric model with two explanatory variables by using
ordinary algebra is tedious and cumbersome. Moreover, a model with three explanatory
variables is virtually intractable with this tool. For this reason, the regression model will
be presented using matrix algebra.

3.1.1 Population regression model and population regression function


In the model of multiple linear regression, the regressand (which can be either the
endogenous variable or a transformation of the endogenous variables) is a linear function
of k regressors corresponding to the explanatory variables -or their transformations - and
of a random disturbance or error. The model also has an intercept. Designating the
regressand by y, the regressors by x2, x3,..., xk and the disturbance –or the random
disturbance- by u, the population model of multiple linear regression is given by the
following expression:
y = β1 + β 2 x2 + β3 x3 + L + β k xk +u (3-9)

The parameters β1 , β 2 , β3 , L , β k are fixed and unknown.


On the right hand of (3-9) we can distinguish two parts: the systematic component
β1 + β 2 x2 + β3 x3 + L + β k xk and the random disturbance u. Calling µ y to the systematic
component, we can write:
µ y = β1 + β 2 x2 + β3 x3 +  + β k xk (3-10)
This equation is known as the population regression function (PRF) or population
hyperplane. When k=2 the PRF is specifically a straight line; when k=3 the PRF is
specifically a plane; finally, when k>3 the PRF is generically denominated hyperplane.
This cannot to be represented in a three dimension space.

67
INTRODUCTION TO ECONOMETRICS

According to (3-10), µ y is a linear function of the parameters β1 , β 2 , β3 , L , β k .


Now, let us suppose we have a random sample of size n
{( yi , x2i , x3i , L , xki ) : i = 1, 2, L , n} extracted from the population studied. If we write
the population model for all observations of the sample, the following system is obtained:
y1 = β1 + β 2 x21 + β3 x31 + L + β k xk1 + u1
y2 = β1 + β 2 x22 + β3 x32 + L + β k xk 2 + u2
(3-11)
L L L L
yn = β1 + β 2 x2 n + β3 x3n + L + β k xkn + un
The previous system of equations can be expressed in a compact form by using
matrix notation. Thus, we are going to denote
 β1 
 y1  1 x21 x31 ... xk1  β   u1 
y  1 x22 x32 ... xk 2   2 u 
y =  2 X= β =  β3  u =  2
 ...  M M M O M    ... 
     M  
 yn  1 x2 n x3n ... xkn 
 β k  un 

The matrix X is called the matrix of regressors. Also included among the
regressors is the regressor corresponding to the intercept. This one, which is often called
dummy regressor, takes the value 1 for all the observations.
The model of multiple linear regression (3-11) expressed in matrix notation is the
following:
β 
 y1  1 x21 ...
x31 xk1   1   u1 
y  1 β
 2  x22 x32 ... xk 2   2  u2 
= β  + (3-12)
 M M M M O M  3   M
    M  
 yn  1 x2 n x3n ... xkn    un 
 β k 

If we take into account the denominations given to vectors and matrices, the model
of multiple linear regression can be expressed in the following way:
y = Xβ + u (3-13)
where y is a vector n ×1 , X is a matrix n × k , β is a vector k ×1 and u is a vector n ×1 .

3.1.2 Sample regression function


The basic idea of regression is to estimate the population parameters,
β1 , β 2 , β3 ,L , β k from a given sample.
The sample regression function (SRF) is the sample counterpart of the population
regression function (PRF). Since the SRF is obtained for a given sample, a new sample
will generate different estimates.
The SRF, which is an estimation of the PRF, is given by

68
MULTIPLE LINEAR REGRESSION

yˆi = βˆ1 + βˆ2 x2i + βˆ3 x3i + L + βˆk xki i = 1, 2,L , n (3-14)
The above expression allows us to calculate the fitted value ( yˆi ) for each y i . In the
SRF βˆ1 , βˆ2 , βˆ3 , L , βˆk are the estimators of the parameters β1 , β 2 , β3 , L , β k .

We call residual to the difference between yi and yˆi . That is

uˆi = yi − yˆi = yi − βˆ1 − βˆ2 x2i − βˆ3 x3i − L − βˆk xki (3-15)

In other words, the residual uˆi is the difference between a sample value and its
corresponding fitted value.
The system of equations (2-5) can be expressed in a compact form by using matrix
notation. Thus, we are going to denote
 βˆ1 
 yˆ1     uˆ1 
 yˆ   βˆ2  uˆ 
yˆ =  2  βˆ =  βˆ3  uˆ =  2 
 ...   ... 
   M  
 yˆ n    uˆn 
 βˆk 

For all observations of the sample, the corresponding fitted model will be the
following:
ŷ = Xβˆ (3-16)
The residual vector is equal to the difference between the vector of observed
values and the vector of fitted values, that is to say,
uˆ = y - yˆ = y - Xβˆ (3-17)

3.2 Obtaining the OLS estimates, interpretation of the coefficients, and other
characteristics

3.2.1 Obtaining the OLS estimates


Denoting S to the sum of the squared residuals,
n
n

∑ uˆ= ∑  y − βˆ − βˆ x
2
ˆ 
= 2 i − β 3 x3i − L − β k xki  (3-18)
S 2 ˆ
i i 1 2
=i 1 =i 1

to apply the least squares criterion in the model of multiple linear regression, we calculate
the first derivative from S with respect to each βˆ j in the expression (3-18):

69
INTRODUCTION TO ECONOMETRICS

∂S n
= 2∑  yi − βˆ1 − βˆ2 x2i − βˆ3 x3i − L − βˆk xki  [ −1]
∂βˆ i =1
1

∂S n
= 2∑  yi − βˆ1 − βˆ2 x2i − βˆ3 x3i − L − βˆk xki  [ − x2i ]
∂βˆ2 i =1

∂S n
= 2∑  yi − βˆ1 − βˆ2 x2i − βˆ3 x3i − L − βˆk xki  [ − x3i ] (3-19)
∂βˆ3 i =1

L K L L
∂S n
= 2∑  yi − βˆ1 − βˆ2 x2i − βˆ3 x3i − L − βˆk xki  [ − xki ]
∂βˆk i =1

The least square estimators are obtained equaling to 0 the previous derivatives:
n

∑  y − βˆ − βˆ x
i =1
i 1 2 2i − βˆ3 x3i − L − βˆk xki  =
0
n

∑  y − βˆ − βˆ x
i =1
i 1 2 2i − βˆ3 x3i − L − βˆk xki  x2i =
0
n

∑  y − βˆ − βˆ x
i =1
i 1 2 2i − βˆ3 x3i − L − βˆk xki  x3i =
0 (3-20)

L K L L
n

∑  y − βˆ − βˆ x
i =1
i 1 2 2i − βˆ3 x3i − L − βˆk xki  xki =
0

or, in matrix notation,


X′Xβˆ = X′y (3-21)
The previous equations are denominated generically hyperplane normal equations.
In expanded matrix notation, the system of normal equations is the following:

 n n
  n 
 n ∑ x 2 i ... ∑ xki   ∑ yi 

=i 1 =i 1
  βˆ1   i 1
=

 n n n
 ˆ   n 
 ∑ x2i ∑ x2i  ∑ x2i xki   β 2   ∑ x2i yi 
2


=i 1 =i 1 =i 1
 =   =i 1  (3-22)
         
 n n n   βˆk   n 
 ∑ x ∑ x x  ∑ x2  ∑ x y 
= i 1 =ki
i 1
ki 2 i
=i 1
ki
 =  i 1 ki i 
Note that:
a) X′X / n is the matrix of second order sample moments with respect to the origin, of
the regressors, among which a dummy regressor (x 1i ) associated to the intercept is
included. This regressor takes the value x 1i =1 for all i.

70
MULTIPLE LINEAR REGRESSION

b) X′y / n is the vector of sample moments of second order, with respect to the origin,
between the regressand and the regressors.

In this system there are k equations and k unknown ( βˆ1 , βˆ2 , βˆ3 , L , βˆk ) . This
system can easily be solved using matrix algebra. In order to solve univocally the system
(3-21)with respect to β̂ , it must be held that the rank of the matrix X′X is equal to k. If
this is held, both members of (3-21) can be premultiplied by [ X′X ] :
−1

[ X′X] X′Xβˆ = [ X′X ] X′y


−1 −1

with which the expression of the vector of least square estimators, or more precisely, the
vector of ordinary least square estimators (OLS), is obtained because [ X′X ] X′X = I .
−1

Therefore, the solution is the following:


 βˆ1 
 
 βˆ2  ˆ
[ X′X]
−1
 M= β= X′y (3-23)
 
 βˆ 
 k
Since the matrix of second derivatives, 2 X′X , is a positive definite
matrix, the conclusion is that S presents a minimum in β̂ .

3.2.2 Interpretation of the coefficients


A βˆ j coefficient measures the partial effect of the regressor x j on y holding the
other regressors fixed. We will see next the meaning of this expression.
The fitted model for observation i is given by
yˆi = βˆ1 + βˆ2 x2i + βˆ3 x3i + L + βˆ j x ji + L + βˆk xki (3-24)
Now, let us consider the fitted model for observation h in which the values of the
regressors and, consequently, y will have changed with respect to (3-24):
yˆ h = βˆ1 + βˆ2 x2 h + βˆ3 x3h + L + βˆ j x jh + L + βˆk xkh (3-25)
Subtracting (3-25) from (3-24), we have
∆yˆ = βˆ2 ∆x2 + βˆ3 ∆x3 + L + βˆ j ∆x j + L + βˆk ∆xk (3-26)

where ∆yˆ = yˆi − yˆ h , ∆x2 = x2i − x2 h , ∆x3 = x3i − x3h , L ∆xk = xki − xkh .
The previous expression captures the variation of ŷ due to the changes in all
regressors. If only x j changes, we will have
∆yˆ = βˆ j ∆x j (3-27)
If x k increases in one unit, we will have

71
INTRODUCTION TO ECONOMETRICS

=∆yˆ βˆ j =
for ∆x j 1 (3-28)

Consequently, the coefficient βˆ j measures the change in y when x j increases in 1


unit, holding the regressors x2 , x3 , L , x j −1 , x j +1 , L , xk fixed. It is very important to take
into account this ceteris paribus clause when interpreting the coefficient.
This interpretation is not valid, of course, for the intercept.
EXAMPLE 3.1 Quantifying the influence of age and wage on absenteeism in the firm Buenosaires
Buenosaires is a firm devoted to manufacturing fans, having had relatively acceptable results in
recent years. The managers consider that these would have been better if the absenteeism in the company
were not so high. For this purpose, the following model is proposed:
β1 β 2 age + β3tenure + β 4 wage + u
absent =+
where absent is measured in days per year; wage in thousands of euros per year; tenure in years in the firm
and age is expressed in years.
Using a sample of size 48 (file absent), the following equation has been estimated:
·
absent = 14.413 - 0.096 age - 0.078 tenure - 0.036 wage
(1.603) (0.048) (0.067) (0.007)

2
R =0.694 n =48
The interpretation of β̂ 2 is the following: holding fixed tenure and wage, if age increases by one
year, worker absenteeism will be reduced by 0.096 days per year. The interpretation of β̂ 3 is as follows:
holding fixed the age and wage, if the tenure increases by one year, worker absenteeism will be reduced by
0.078 days per year. Finally, the interpretation of β̂ 4 is the following: holding fixed the age and tenure, if
the wage increases by 1000 euros per year, worker absenteeism will be reduced by 0.036 days per year.
EXAMPLE 3.2 Demand for hotel services
The following model is formulated to explain the demand for hotel services:
ln (hostel ) = b1 + b2 ln(inc) + b3 hhsize + u (3-29)
where hostel is spending on hotel services, inc is disposable income, both of which are expressed in euros
per month. The variable hhsize is the number of household members.
The estimated equation with a sample of 40 households, using file hostel, is the following:
·hostel ) = - 27.36 + 4.442 ln(inc ) - 0.523hhsize
ln( i i i

R2=0.738 n=40
As the results show, hotel services are a luxury good. Thus, the demand/income elasticity for this
good is very high (4.44), which is typical of luxury goods. This means that if income increases by 1%,
spending on hotel services increases by 4.44%, holding fixed the size of the household. On the other hand,
if the household size increases by one member, then spending on hotel services will decrease by 52%.
EXAMPLE 3.3 A hedonic regression for cars
The hedonic model of price measurement is based on the assumption that the value of a good is
derived from the value of its characteristics. Thus, the price of a car will therefore depend on the value the
buyer places on both qualitative (e.g. automatic gear, power, diesel, assisted steering, air conditioning), and
quantitative attributes (e.g. fuel consumption, weight, performance displacement, etc.). The data set for this
exercise is file hedcarsp (hedonic car price for Spain) and covers years 2004 and 2005. A first model based
only on quantitative attributes is the following:
β1 + β 2 volume + β3 fueleff + u
ln( price) =
where volume is length×width×height in m3 and fueleff is the liters per 100 km/horsepower ratio expressed
as a percentage.
The estimated equation with a sample of 214 observations is the following:

72
MULTIPLE LINEAR REGRESSION

·price ) = 14.97 + 0.0956volume - 0.1608 fueleff


ln( i i i

R2=0.765 n=214
The interpretation of β̂ 2 and β̂ 3 is the following. Holding fixed fueleff, if volume increases by 1
m3, the price of a car will rise by 9.56%. Holding fixed volume, if the ratio liters per 100 km/horsepower
increases by 1 percentage point, the price of a car price will fall by 16.08%.
EXAMPLE 3.4 Sales and advertising: the case of Lydia E. Pinkham
A model with time series data is estimated in order to measure the effect of advertising expenses,
realized over different time periods, on current sales. Denoting by Vt and Pt sales and advertising
expenditures, made at time t, the model proposed initially to explain sales, as a function of current and past
advertising expenses is as follows:
Vt = α + β1 Pt + β 2 Pt −1 + β 3 Pt − 2 +  + ut (3-30)

In the above expression the dots indicate that past expenditure on advertising continues to have an
indefinite influence, although it is assumed that with a decreasing impact on sales. The above model is not
operational given that it has an indefinite number of coefficients. Two approaches can be adopted in order
to solve the problem. The first approach is to fix a priori the maximum number of periods during which
advertising effects on sales are maintained. In the second approach, the coefficients behave according to
some law which determines their value based on a small number of parameters, also allowing further
simplification.
In the first approach the problem that arises is that, in general, there are no precise criteria or
sufficient information to fix a priori the maximum number of periods. For this reason, we shall look at a
special case of the second approach that is interesting due to the plausibility of the assumption and easy
application. Specifically, we will consider the case in which the coefficients βi decrease geometrically as
we move backward in time according to the following scheme:
β=
i β1λ i ∀i 0 < λ <1 (3-31)

The above transformation is called Koyck transformation, as it was this author who in 1954
introduced scheme (3-31) for the study of investment
Substituting (3-31) in (3-30), we obtain
Vt = α + β1 Pt + β1λ Pt −1 + β1λ 2 Pt − 2 +  + ut (3-32)

The above model still has infinite terms, but only three parameters and can also be simplified.
Indeed, if we express equation (3-32) for period t-1 and multiply both sides by λ we obtain
λVt −1 = αλ + β1λ Pt −1 + β1λ 2 Pt − 2 + β1λ 3 Pt −3 +  + λ ut −1 (3-33)

Subtracting (3-33) from (3-32), and taking into account factors λi tend to 0 as i tends to infinity,
the result is the following:
Vt = α (1 − λ ) + β1 Pt + λVt −1 + ut − λ ut −1 (3-34)

The model has been simplified so that it only has three regressors, although, in contrast, it has
moved to a compound disturbance term. Before seeing the application of this model, we will analyze the
significance of the coefficient λ and the duration of the effects of advertising expenditures on sales. The
parameter λ is the decay rate of the effects of advertising expenditures on current and future sales. The
cumulative effects that the advertising expenditure of one monetary unit have on sales after m periods are
given by
β1 (1 + λ + λ 2 + λ 3 +  + λ m ) (3-35)

73
INTRODUCTION TO ECONOMETRICS

To calculate the cumulative sum of effects, given in (3-35), we note that this expression is the sum
of the terms of a geometric progression 2, which can be expressed as follows:

β1 (1 − λ m )
(3-36)
1− λ
When m tends to infinity, then the sum of the cumulative effects is given by
β1
(3-37)
1− λ
An interesting point is to determine how many periods of time are required to obtain the p% (e.g.,
50%) of the total effect. Denoting by h the number of periods required to obtain this percentage, we have

β1 (1 − λ h )
p=
Effect in h periods
= 1− λ = 1− λh (3-38)
Total effect β1
1− λ
Setting p, h can be calculated according to (3-38). Solving for h in this expression, the following
is obtained
ln(1 − p )
h= (3-39)
ln λ
This model was used by Kristian S. Palda in his doctoral thesis published in 1964, entitled The
Measurement of Cumulative Advertising Effects, to analyze the cumulative effects of advertising
expenditures in the case of the company Lydia E. Pinkham. This case has been the basis for research on the
effects of advertising expenditures. We will see below some features of this case:
1) The Lydia E. Pinkham Medicine Company manufactured a herbal extract diluted in an alcohol
solution. This product was originally announced as an analgesic and also as a remedy for a wide variety of
diseases.
2) In general, in different types of products there is often competition among different brands, as
in the paradigmatic case of Coca-Cola and Pepsi-Cola. When this occurs, the behavior of the main
competitors is taken into account when analyzing the effects of advertising expenditure. Lydia E. Pinkham
had the advantage of having no competitors, acting as a monopolist in practice in its product line.
3) Another feature of the Lydia E. Pinkham case was that most of the distribution costs were
allocated to advertising because the company had no commercial agents, with the relationship between
advertising expenses and sales being very high.
4) The product was affected by different avatars. Thus, in 1914 the Food and Drug Administration
(United States agency established controls for food and medicines) accused the firm of misleading
advertising and so they had to change their advertising messages. Also, the Internal Revenue (IRS)
threatened to apply a tax on alcohol since the alcohol content of the product was 18%. For all these reasons
there were changes in the presentation and content during the period 1915-1925. In 1925 the Food and Drug
Administration banned the product from being announced as medicine, having to be distributed as a tonic
drink. In the period 1926-1940 spending on advertising was significantly increased and shortly after the
sales of the product declined.
The estimation of the model (3-34) with data from 1907 to 1960, using file pinkham, is the
following:

2
Denoting by ap, au and r the first term, the last term and the right respectively, the sum of the
terms of a convergent geometric progression is given by
a p − au
1− r

74
MULTIPLE LINEAR REGRESSION

·
salest = 138.7 + 0.3288advexp + 0.7593salest - 1

R2=0.877 n=53
The sum of the cumulative effects of advertising expenditures on sales is calculated by the formula
(3-37):

βˆ1 0.3288
= = 1.3660
1 − λˆ 1 − 0.7593
According to this result, every additional dollar spent on advertising produces an accumulated total
sale of 1,366 units. Since it is important not only to determine the overall effect, but also how long the
effect lasts, we will now answer the following question: how many periods of time are required to reach
half of the total effects? Applying the formula (3-39) for the case of p = 0.5, the following result is obtained:
ln(1 − 0.5)
=
hˆ(0.5) = 2.5172
ln(0.7593)

3.2.3 Algebraic implications of the estimation


The algebraic implications of the estimation are derived exclusively from the
application of the OLS method to the model of multiple linear regression:
1. The sum of the OLS residuals is equal to 0:
n

∑ uˆ
i =1
i =0 (3-40)

From the definition of residual


uˆi = yi − yˆi = yi − βˆ1 − βˆ2 x2i − L − βˆk xki i = 1, 2,L , n (3-41)
If we add for the n observations, then
n n n n

∑ uˆ=i
=i 1 =i 1
∑ yi − nβˆ1 − βˆ2 ∑ x2i − L − βˆk ∑ xki (3-42)
=i 1 =i 1

On the other hand, the first equation of the system of normal equations (3-20) is
n n n

=i 1 =i 1 =i 1
∑ yi − nβˆ1 − βˆ2 ∑ x2i − L − βˆk ∑ xki =
0 (3-43)

If we compare (2-21) and (3-43), we conclude that (2-19) holds.


Note that, if (2-19) holds, it implies that
n n

∑ y = ∑ yˆi
=i 1 =i 1
(3-44)

and, dividing (2-19) and (3-44) by n, we obtain


         uˆ = 0 y = yˆ (3-45)
2. The OLS hyperplane always goes through the point of the sample means
( y , x2 ,L , xk ) .
By dividing equation (3-43) by n we have:

75
INTRODUCTION TO ECONOMETRICS

y = βˆ1 + βˆ2 x2 + L + βˆk xk (3-46)


3. The sample cross product between each one of the regressors and the OLS
residuals is zero
n

∑ x uˆ
i =1
ji i =0 j = 2,3, L , k (3-47)

Using the last k normal equations (3-20) and taking into account that by definition
uˆí = yi − βˆ1 − βˆ2 x2i − βˆ3 x3i − L − βˆk xki , we can see that
n

∑ uˆ x
i =1
i 2i =0
n

∑ uˆ x
i =1
i 3i =0
(3-48)
L L
n

∑ uˆ x
i =1
i ki =0

4. The sample cross product between the fitted values ( ŷ ) and the OLS residuals
is zero.
n

∑ yˆ uˆ
i =1
i í =0 (3-49)

Taking into account (2-19) and (3-48), we obtain


n n n n n

∑ yˆi=
=i 1 =i 1
uˆí ∑ (βˆ1 + βˆ2 x2i + L + βˆk xki )=
uˆí βˆ1 ∑ uˆí + βˆ2 ∑ x2i uˆí + L βˆk ∑ xki uˆí
=i 1 =i 1 =i 1

= βˆ1 × 0 + βˆ2 × 0 + L βˆk × 0 = 0


(3-50)

3.3 Assumptions and statistical properties of the OLS estimators


Before studying the statistical properties of the OLS estimators in the multiple
linear regression model, we need to formulate a set of statistical assumptions. Specifically,
the set of assumptions that we will formulate are called classical linear model (CLM)
assumptions. It is important to note that CLM assumptions are simple, and that the OLS
estimators have, under these assumptions, very good properties.

3.3.1 Statistical assumptions of the CLM in multiple linear regression)

a) Assumption on the functional form


1) The relationship between the regressand, the regressors and the disturbance is linear
in the parameters:
y = β1 + β 2 x2 + L + β k xk +u (3-51)
or, alternatively, for all the observations,

76
MULTIPLE LINEAR REGRESSION

y = Xβ + u (3-52)

b) Assumptions on the regressors


2) The values of x2 , x3 L , xk are fixed in repeated sampling, or the matrix X is fixed
in repeated sampling:
This is a strong assumption in the case of the social sciences where, in general, it
is not possible to experiment. An alternative assumption can be formulated as follows:
2*) The regressors x2 , x3 , L , xk are distributed independently of the random
disturbance. Formulated in another way, X is distributed independently of the vector of
random disturbances, which implies that E ( X′u) = 0
As we said in chapter 2, we will adopt assumption 2).
3) The matrix of regressors, X, does not contain disturbances of measurement
4) The matrix of regressors, X, has rank k:
ρ ( X) = k (3-53)
Recall that the matrix of regressors contains k columns, corresponding to the k
regressors in the model, and n rows, corresponding to the number of observations. This
assumption has two implications:
1. The number of observations, n, must be equal to or greater than the number of
regressors, k. Intuitively, to estimate k parameters, we need at least k observations.
2. Each regressor must be linearly independent, which implies that an exact linear
relationship among any subgroup of regressors cannot exist. If an independent variable is
an exact linear combination of other independent variables, then there is perfect
multicollinearity, and the model cannot be estimated.
If an approximate linear relationship exists, then estimations of the parameters can
be obtained, although the reliability of such estimations would be affected. In this case,
there is non-perfect multicollinearity.

c) Assumption on the parameters


5) The parameters β1 , β 2 , β3 , L , β k are constant, or β is a constant vector.

d) Assumptions on the disturbances


6) The disturbances have zero mean,
    E=
(ui ) 0,=  i 1, 2,3, …, n or E (u) = 0 (3-54)
7) The disturbances have a constant variance (homoskedasticity assumption):
=
var (ui ) σ=
2
i 1, 2, n (3-55)
8) The disturbances with different subscripts are not correlated with each other
(no autocorrelation assumption):
=
E (ui u j ) 0 i≠ j (3-56)

77
INTRODUCTION TO ECONOMETRICS

The formulation of homoskedasticity and no autocorrelation assumptions allows


us to specify the covariance matrix of the disturbance vector:

E [u − E (u) ][u − E (u) ]′  = E [u − 0][u − 0]′  = E [u ][u ]′ 


     
  u1    u12 u1u2  u1un 
    2 
 u2  [u u  u ] E u2u1 u2  u2un 
= E=
   1 2 n
      
    
 un   unu1 unu2  un2 
 E (u12 ) E (u1u2 )  E (u1un )  σ 2 0  0
   
 E (u2u1 ) E (u22 )  E (u2un )   0 σ 2  0
= =
           
   
 E (unu1 ) E (unu2 )  E (un )   0
2
0  σ2
(3-57)
In order to get to the last equality, it has been taken into account that the variances
of each one of the elements of the vector is constant and equal to σ 2 in accordance with
(3-55) and the covariances between each pair of elements is 0 in accordance with (3-56).
The previous result can be expressed in synthetic form:
E (uu′) = σ 2 I (3-58)
The matrix given in (3-58) is denominated scalar matrix, since it is a
scalar ( σ 2 , in this case) multiplied by the identity matrix.
9) The disturbance u is normally distributed
Taking into account assumptions 6 to 9, we have
ui ~ NID(0, σ 2 ) i = 1, 2, , n or u ~ N (0, σ 2 I ) (3-59)
where NID stands for normally independently distributed.

3.3.2 Statistical properties of the OLS estimator


Under the above assumptions of the CLM, the OLS estimators possess good
properties. In the proofs of this section, assumptions 3, 4 and 5 will implicitly be used.

Linearity and unbiasedness of the OLS estimator


Now, we are going to prove that the OLS estimator is linearly unbiased. First, we
express β̂ as a function of the vector u, using assumption 1, according to (3-52):

βˆ = [ X′X ] X′y = [ X′X ] X′ [ Xβ + u ] = β + [ X′X ] X′u (3-60)


-1 -1 -1

The OLS estimator can be expressed in this way so that the property of linearity is
clearer:

βˆ = β + [ X′X ] X′u = β + Au
-1
(3-61)

78
MULTIPLE LINEAR REGRESSION

where A = [ X′X ] X′ is fixed under assumption 2. Thus β̂ is a linear


-1

function of u and, consequently, it is a linear estimator.


Taking expectations in (3-60) and using assumption 6, we obtain

E βˆ  = β + [ X′X ] X′E [u ] = β


-1
(3-62)

Therefore, β̂ is an unbiased estimator.

Variance of the OLS estimators


In order to calculate the covariance matrix of β̂ assumptions 7 and 8 are needed,
in addition to the first six assumptions:
′ ′
var(βˆ ) = E βˆ − E (βˆ )  βˆ − E (βˆ )  = E βˆ − βˆ  βˆ − βˆ 

= E [ X′X ] X′uu′X [ X′X ]  = [ X′X ] X′E (uu′) X [ X′X ] (3-63)


-1 -1 -1 -1
 
= [ X′X ] X′E (σ 2 I ) X [ X′X ] = σ 2 [ X′X ]
-1 -1 -1

In the third step of the above proof it is taken into account that, according to (3-60),
βˆ − β = [ X′X ] X′u . Assumption 2 is taken into account in the fourth step. Finally,
-1

assumptions 7 and 8 are used in the last step.

Therefore, var(βˆ ) = σ [ X′X ] is the covariance matrix of the vector β̂ . In this


2 -1

covariance matrix, the variance of each element βˆ appears on the main diagonal, while
j

the covariances between each pair of elements are outside of the main diagonal.
Specifically, the variance of βˆ j (for j=2,3,…,k) is equal to σ2 multiplied by the
corresponding element of the main diagonal of [ X′X ] . After operating, the variance of
-1

βˆ can be expressed as
j

σ2
var( βˆ j ) = (3-64)
nS 2j (1 − R 2j )

where R 2j is the R-squared from regressing x j on all other x’s, n is the sample
size and S 2j is the sample variance of the regressor X.
Formula (3-64) is valid for all slope coefficients, but not for the intercept
The square root of (3-64) is called the standard deviation of βˆ j :

σ
sd ( βˆ j ) = (3-65)
nS 2j (1 − R 2j )

79
INTRODUCTION TO ECONOMETRICS

OLS estimators are BLUE


Under assumptions 1 through 8 of the CLM, which are called Gauss-Markov
assumptions, the OLS estimators is the Best Linear Unbiased Estimators (BLUE).
The Gauss Markov theorem states that the OLS estimator is the best estimator
within the class of linear unbiased estimators. In this context, best means that it is an
estimator with the smallest variance for a given sample size. Let us now compare the
n
variance of an element of β̂ ( βˆ j ), with any other estimator that is linear (so β%j = ∑ wij yi
i =1

) and unbiased (so the weights, w j , must satisfy some restrictions). The property of βˆ j
being a BLUE estimator has the following implications when comparing its variance with
the variance of β%j :

1) The variance of the coefficient β%j is greater than, or equal to, the variance of
βˆ j obtained by OLS:

var( β%j ) ≥ var( βˆ j ) j=


1, 2,3, K , k (3-66)

2) The variance of any linear combination of β%j ´s is greater than, or equal to, the
variance of the corresponding linear combination of βˆ ’s. j

In appendix 3.1 the proof of the theorem of Gauss-Markov can be seen.

Estimator of the disturbance variance


Taking into account the system of normal equations (3-20), if we know n–k of the
residuals, we can get the other k residuals by using the restrictions imposed by that system
in the residuals.
For example, the first normal equation allows us to obtain the value of uˆn as a
function of the remaining residuals:
uˆn =−uˆ1 − uˆ2 − L − uˆn −1
Thus, there are only n–k degrees of freedom in the OLS residuals, as opposed to
n degrees of freedom in the disturbances. Remember that the degree of freedom is defined
as the difference between the number of observations and the number of parameters
estimated.
The unbiased estimator of σ 2 is adjusted taken into account the degree of freedom:
n

∑ uˆ 2
i
σˆ 2 = i =1
(3-67)
n−k
Under assumptions 1 to 8, we obtain
E (σˆ 2 ) = σ 2 (3-68)
See appendix 3.2 for the proof.

80
MULTIPLE LINEAR REGRESSION

The square root of (3-67), σˆ is called standard error of the regression and is an
estimator of σ .

Estimators of the variances of β̂ and the slope coefficient βˆ j

The estimator of the covariance matrix of β̂ is given by


 var(
· ˆ
β1 )
·
Cov( βˆ1 , βˆ2 ) L
·
Cov( βˆ1 , βˆ j ) L Cov( βˆ1 , βˆk ) 
·
 
· ˆ ˆ · · · ˆ ˆ 
Cov( β 2 , β1 ) var( βˆ2 ) L Cov( βˆ2 , βˆ j ) L Cov( β 2 , β k ) 
 L L O L L L 
·
[ X′X]  ·
−1
Var (βˆ ) σˆ=
= 2

·
Cov( βˆ j , βˆ1 ) Cov( βˆ j , βˆ2 ) L · ˆ · ˆ ˆ 
var( β j ) L Cov( β j , β k )
 
 L L L L O L 
· · · · ˆ 
Cov( βˆk , βˆ1 ) Cov( βˆk , βˆ2 ) L Cov( βˆk , βˆ j ) L var( β k ) 
(3-69)
The variance of the slope coefficient βˆ j , given in (3-64), is a function of the
unknown parameter σ 2 . When σ 2 is substituted by its estimator σˆ 2 , an estimator of the
variance of βˆ j is obtained:

· σˆ 2
var( βˆ j ) = (3-70)
nS 2j (1 − R 2j )

According to the previous expression, the estimator of the variance βˆ j is affected


by the following factors:
a) The greater σˆ 2 , the greater the variance of the estimator. This is not at all
surprising: more “noise” in the equation - a larger σˆ 2 - makes it more
difficult to estimate accurately the partial effect of any x’s on y. (See
figure 3.1).
b) As sample size increases, the variance of the estimator is reduced.
c) The smaller the sample variance of a regressor, the greater the variance
of the corresponding coefficient. Everything else being equal, for
estimating β j we prefer to have as much sample variation in xj as possible,
which is illustrated in figure 3.2. As you can see, there are many
hypothetical lines that could fit the data when the sample variance of xj
( S 2j ) is small, which can be seen in part a) of the figure. In any case,
assumption 4 does not allow S 2j being equal to 0.

d) The higher R 2j , (i.e., the higher is the correlation of regressor j with the
rest of the regressors), the greater the variance of βˆ . j

81
INTRODUCTION TO ECONOMETRICS

y y

xj xj
2 2
a) ŝ big b) ŝ small
2
FIGURE 3.1. Influence of ŝ on the estimator of the variance.

y y

xj xj
2 2
a) S small
j b) S big
j
2
FIGURE 3.2. Influence of S on the estimator of the variance.
j

The square root of (3-70) is called the standard error of βˆ j :

σˆ
se( βˆ j ) = (3-71)
nS 2j (1 − R 2j )

Other properties of the OLS estimators


Under 1 through 6 CLM assumptions, the OLS estimator β̂ is consistent, as can
be seen in appendix 3.3, asymptotically normally distributed and also asymptotically
efficient within the class of the consistent and asymptotically normal estimators.
Under 1 through 9 CLM assumptions, the OLS estimator is also the maximum
likelihood estimator (ML), as can be seen in appendix 3.4, and the minimum variance
unbiased estimator (MVUE). This means that the OLS estimator has the smallest variance
among all unbiased, linear o non linear, estimators.

3.4 More on functional forms


In this section we will examine two topics on functional forms: use of natural logs
in models and polynomial functions.

3.4.1 Use of logarithms in the econometric models


Some variables are often used in log form. This is the case of variables in monetary
terms which are generally positive or variables with high values such as population. Using
models with log transformations also has advantages, one of which is that coefficients
have appealing interpretations (elasticity or semi-elasticity). Another advantage is the
invariance of slopes to scale changes in the variables. Taking logs is also very useful

82
MULTIPLE LINEAR REGRESSION

because it narrows the range of variables, which makes estimates less sensitive to extreme
observations on the dependent or the independent variables. The CLM assumptions are
satisfied more often in models using ln(y) as a regressand than in models using y without
any transformation. Thus, the conditional distribution of y is frequently heteroskedastic,
while ln(y) can be homoskedastic.
One limitation of the log transformation is that it cannot be used when the original
variable takes zero or negative values. On the other hand, variables measured in years and
variables that are a proportion or a percentage, are often used in level (or original) form.

3.4.2 Polynomial functions


The polynomial functions have been extensively used in econometric research.
When there are only the regressors corresponding to a polynomial function we have a
polynomial model. The general kth degree polynomial model may be written as
y =β1 + β 2 x + β3 x 2 +  + β k x k +u (3-72)

Quadratic functions
An interesting case of polynomial functions is the quadratic function, which is a
second-degree polynomial function. When there are only regressors corresponding to the
quadratic function, we have a quadratic model:
y =β1 + β 2 x + β3 x 2 +u (3-73)
Quadratic functions are used quite often in applied economics to capture
decreasing or increasing marginal effects. It is important to remark that, in such a case,
β 2 does not measure the change in y with respect to x because it makes no sense to hold
x2 fixed while changing x. The marginal effect of x on y, which depends linearly on the
value of x, is the following:
dy
=
me = β 2 + 2β3 x (3-74)
dx
In a particular application this marginal effect would be evaluated at specific
values of x. If β 2 and β 3 have opposite signs the turning point will be at
β2
x* = − (3-75)
2β3
If β 2 >0 and β 3 <0, then the marginal effect of x on y is positive at first, but it will
be negative for values of x greater than x* . If β 2 <0 and β 3 >0, this marginal effect is
negative at first, but it will be positive for values of x greater than x* .
Example 3.5 Salary and tenure
Using the data in ceosal2 to study the type of relation between the salary of the Chief Executive
Officers (CEOSs) in USA corporations and the number of years in the company as CEO (ceoten), the
following model was estimated:
·salary ) =
ln( 6.246+ 0.0006 profits + 0.0440 ceoten − 0.0012 ceoten 2
(0.086) (0.0001) (0.0156) (0.00052)

R2=0.1976 n=177
where company profits are in millions of dollars and salary is annual compensation in thousands of dollars.

83
INTRODUCTION TO ECONOMETRICS

The marginal effect ceoten on salary expressed in percentage is the following:


·
me = 4.40 − 2 × 0.12ceoten
salary / ceoten %

Thus, if a CEO with 10 years in a company spends one more year in that company, their salary
will increase by 2%. Equating to zero the previous expression and solving for ceoten, we find that the
maximum effect of tenure as CEO on salary is reached by 18 years. That is, until 18 years the marginal
effect of CEO tenure on the salary is positive. On the contrary, from 18 years onwards this marginal effect
is negative.

Cubic functions
Another interesting case is the cubic function, or third-degree polynomial
function. If in the model there are only regressors corresponding to the cubic function, we
have a cubic model:
y=β1 + β 2 x + β3 x 2 + β 4 x3 + u (3-76)
Cubic models are used quite often in applied economics to capture decreasing or
increasing marginal effects, particularly in the cost functions. The marginal effect (me)
of x on y, which depends on x in a quadratic form, will be the following:
dy
me = = β 2 + 2β3 x + 3β 4 x 2 (3-77)
dx
The minimum of me will occur where
dme
=2 β3 + 6 β 4 x =0 (3-78)
dx
Therefore,
− β3
memin = (3-79)
3β 4
In a cubic model of a cost function, the restriction β32 < 3β 4 β 2 must be met to
guarantee that the minimum marginal cost is positive. Other restrictions that a cost
function must satisfy are as follows: β 1 , β 2 , and β 4 >0; and β 3 <0
Example 3.6 The marginal effect in a cost function
Using the data on 11 pulp mill firms (file costfunc) to study the cost function, the following model
was estimated:
· =
cost 29.16+ 2.316 output − 0.0914 output 2 + 0.0013 output 3
(1.602) (0.2167) (0.0081) (0.000086)

2
R =0.9984 n=11
where output is the production of pulp in thousands of tons and cost is the total cost in millions of euros
The marginal cost is the following:
· = 2.316 − 2 × 0.0914output + 3 × 0.0013output 2
marcost
Thus, if a firm with a production of 30 thousand tons of pulp increases the pulp production by one
thousand tons, the cost will increase by 0.754 million of euros. Calculating the minimum of the above
expression and solving for output, we find that the minimum marginal cost is equal to a production of
23.222 thousand tons of pulp.

84
MULTIPLE LINEAR REGRESSION

3.5 Goodness-of-fit and selection of regressors.


Once least squares have been applied, it is very useful to have some measure of
the goodness of fit between the model and the data. In the event that several alternative
models have been estimated, measures of the goodness of fit could be used to select the
most appropriate model.
In econometric literature there are numerous measures of goodness of fit. The
most popular is the coefficient of determination, which is designated by R2 or R-squared,
and the adjusted coefficient of determination, which is designated R 2 or adjusted R-
squared. Given that these measures have some limitations, the Akaike Information
Criterion (AIC) and Schwarz Criterion (SC) will also be referred to later on.

3.5.1 Coefficient of determination


As we saw in chapter 2, the coefficient of determination is based on the following
breakdown:
= ESS + RSS
TSS (3-80)
where TSS is the total sum of squares, ESS is the explained sum of squares and RSS is the
residual sum of squares.
Based on this breakdown, the coefficient of determination is defined as:
ESS
R2 = (3-81)
TSS
Alternatively, and in an equivalent manner, the coefficient of determination can
be defined as
RSS
R 2 = 1− (3-82)
TSS
The extreme values of the coefficient of determination are: 0, when the explained
variance is zero, and 1, when the residual variance is zero; that is, when the fit is perfect.
Therefore,
0 £ R2 £ 1 (3-83)
A small R2 implies that the disturbance variance (σ2) is large relative to the
variance of y, which means that β j is not estimated with precision. But remember that a
large disturbance variance can be offset by a large sample size. Thus, if n is large enough,
we may be able to estimate the coefficients with precision even though we have not
controlled for many unobserved factors.
To interpret the coefficient of determination properly, the following caveats
should be taken into account:
a) As new explanatory variables are added, the coefficient of determination
increases its value or, at least, keeps the same value. This happens even though the
variable (or variables) added have no relation to the endogenous variable. Thus, we can
always verify that
R 2j ³ R 2j- 1 (3-84)

85
INTRODUCTION TO ECONOMETRICS

where R 2j- 1 the R is squared in a model with j-1 regressors, and R 2j is the R squared in a
model with an additional regressor. That is to say, if we add variables to a given model,
R2 will never decrease, even if these variables do not have a significant influence.
b) If the model has no intercept, the coefficient of determination does not have a
clear interpretation because the decomposition given (3-80) is not fulfilled. In addition,
the two forms of calculation mentioned - (3-81) and (3-82) - generally lead to different
results, which in some cases may fall outside the interval [0, 1].
c) The coefficient of determination cannot be used to compare models in which
the functional form of the endogenous variable is different. For example, R2 cannot be
applied to compare two models in which the regressand is the original variable, y, and
ln(y) respectively.

3.5.2 Adjusted R-Squared


To overcome one of the limitations of the R2, we can “adjust” it in a way that takes
into account the number of variables included in a given model. To see how the usual R2
might be adjusted, it is useful to write it as
RSS / n
R 2 = 1- (3-85)
TSS / n
where, in the second term of the right-hand side, the residual variance is divided by the
variance of the regressand.
The R2, as it is defined in (3-85), is a sample measure. Now, if we want a
population measure, we can define the population R2 as
σ u2
2
RPOP = 1− (3-86)
σ y2

However, we have better estimates for these variances, σ u2 and σ y2 , than the ones
used in the (3-85). So, let us use unbiased estimates for these variances
SCR / (n - k ) n- 1
R 2 = 1- = 1- (1- R 2 ) (3-87)
SCT / (n - 1) n- k

This measure is called the adjusted R–squared, or R 2 .The primary attractiveness


of R 2 is that it imposes a penalty for adding additional regressors to a model. If a
regressor is added to the model then RSS decreases, or at least is equal. On the other hand,
the degrees of freedom of the regression n−k always decrease. R 2 can go up or down
when a new regressor is added to the model. That is to say:
R j2 ³ R j2- 1 or R j2 £ R j2- 1 (3-88)

An interesting algebraic fact is that if we add a new regressor to a model, R 2


increases if, and only if, the t statistic, which we will examine in chapter 4, on the new
regressor is greater than 1 in absolute value. Thus we see immediately that R 2 could be
used to decide whether a certain additional regressor must be included in the model. The
R 2 has an upper bound that is equal to 1, but it does not strictly have a lower bound since
it can take negative values.

86
MULTIPLE LINEAR REGRESSION

The observations b) and c) made to the R squared remain valid for the adjusted R
squared.

3.5.3 Akaike information criterion (AIC) and Schwarz criterion (SC)


These two criteria- Akaike information criterion (AIC) and Schwarz Criterion (SC)
- have a very similar structure. For this reason, they will be reviewed together.
The AIC statistic, proposed by Akaike (1974) and based on information theory,
has the following expression:
2l 2k
AIC = - + (3-89)
n n
where l is the log likelihood function (assuming normally distributed disturbances)
evaluated at the estimated values of the coefficients.
The SC statistic, proposed by Schwarz (1978), has the following expression:
2l k ln(n)
SC = - + (3-90)
n n
The AIC and SC statistics, unlike the coefficients of determination (R2 and R 2 ),
are better the lower their values are. It is important to remark that the AIC and SC statistics
are not bounded unlike R2.
a) The AIC and SC statistics penalize the introduction of new regressors. In the
case of the AIC, as can be seen in the second term of the right hand side of (3-89), the
number of regressors k appears in the numerator. Therefore, the growth of k will increase
the value of AIC and consequently worsen the goodness of fit, if that is not offset by a
sufficient growth of the log likelihood. In the case the SC, as can be seen in the second
term of the right hand side of (3-90), the numerator is kln(n). For n>7, the following
happens: kln(n)>2k. Therefore, SC imposes a larger penalty for additional regressors than
AIC when the sample size is greater than seven.
b) The AIC and SC statistics can be applied to statistical models without intercept.
c) The AIC and SC statistics are not relative measures as are the coefficients of
determination. Therefore, their magnitude, in itself, offers no information.
d) The AIC and SC statistics can be applied to compare models in which
endogenous variables have different functional forms. In particular, we will compare two
models in which the regressands are y and ln(y). When the regressand is y, the formula
(3-89) is applied in the AIC case, or (3-90) in the SC case. When the regressand is ln(y),
and also when we want to carry out a comparison with another model in which the
regressand is y, we must correct these statistics in the following way:
AICC = AIC + 2ln(Y ) (3-91)

SCC = SC + 2ln(Y ) (3-92)


where AIC C and SC C are the corrected statistics, and AIC and SC are the statistics supplied
by any econometric package such as the E-views.

87
INTRODUCTION TO ECONOMETRICS

Example 3.7 Selection of the best model


To analyze the determinants of expenditures on dairy the following alternative models have been
considered:
1) β1 + β 2 inc + u
dairy =
2) β1 + β 2 ln(inc) + u
dairy =
3) β1 + β 2 inc + β3 punder 5 + u
dairy =
4) β 2 inc + β3 punder 5 + u
dairy =
5) dairy = β1 + β 2 inc + β3 hhsize + u
6) ln(dairy ) =β1 + β 2 inc + u
7) β1 + β 2 inc + β3 punder 5 + u
ln(dairy ) =
8) β 2 inc + β3 punder 5 + u
ln(dairy ) =
where inc is disposable income of household, hhsize is the number of household members and punder5 is
the proportion of children under five in the household.
Using a sample of 40 households (file demand), and taking into account that ln(dairy ) =2.3719,
the goodness of fit statistics obtained for the eight models appear in table 1. In particular, the AIC corrected
for model 6) has been calculated as follows:
AICC = AIC + 2ln(Y ) = 0.2794 + 2´ 2.3719=5.0232
Conclusions
a) The R-squared can be only used to compare the following pairs of models: 1) with 2), and 3) with
5).
b) The adjusted R-squared can only be used to compare model 1) with 2), 3) and 5); and 6) with 7.
c) The best model out of the eight is model 7) according to AIC and SC.

TABLE 3.1. Measures of goodness of fit for eight models.


Model number 1 2 3 4 5 6 7 8
Regressand dairy dairy dairy dairy dairy ln(dairy) ln(dairy) ln(dairy)
intercept intercept intercept
intercept intercept inc intercept inc
Regressors inc Inc inc
inc ln(inc) punder5 inc punder5
punder5 househsize punder5

R-squared 0.4584 0.4567 0.5599 0.5531 0.4598 0.4978 0.5986 -0.6813


Adjusted R-squared 0.4441 0.4424 0.5361 0.5413 0.4306 0.4846 0.5769 -0.7255
Akaike information
5.2374 5.2404 5.0798 5.0452 5.2847 0.2794 0.1052 1.4877
criterion
Schwarz criterion 5.3219 5.3249 5.2065 5.1296 5.4113 0.3638 0.2319 1.5721
Corrected Akaike
5.0232 4.8490 6.2314
information criterion
Corrected Schwarz
5.1076 4.9756 6.3159
criterion

88
MULTIPLE LINEAR REGRESSION

Exercises
Exercise 3.1 Consider the linear regression model y = Xβ + u , where X is a matrix
50×5.
Answer the following questions, justifying your answers:
a) What are the dimensions of the vectors y , β, u ?
b) How many equations are there in the system of normal equations
X′Xβˆ = X′y ?
c) What conditions are needed in order to obtain β̂ ?

Exercise 3.2 Given the model


y i =β 1 +β 2 x 2i +β 3 x 3i +u i
and the following data:
y x2 x3
10 1 0
25 3 -1
32 4 0
43 5 1
58 7 -1
62 8 0
67 10 -1
71 10 2
a) Estimate β 1, β 2 and β 3 by OLS .
b) Calculate the residual sum of squares.
c) Obtain the residual variance.
d) Obtain the variance explained by the regression.
e) Obtain the variance of the endogenous variable
f) Calculate the coefficient of determination.
g) Obtain an unbiased estimation of σ2.
h) Estimate the variance of β̂ 2 .
To answer these questions you can use Excel. See exhibit 3.1 as an example.

89
INTRODUCTION TO ECONOMETRICS
Exhibit 3.1
1) Calculation of X’X and X’y

Explanation for X’X


a) Enter the matrices X’ and X into the Excel: B5:K6 and N2:O11
b) You can find the product X’X by highlighting the cells where you want to place the resulting matrix.
c) Once you have highlighted the resulting matrix, and while it is still highlighted, enter the following
formula:=MMULT(B5:K6; N2:O11)
d) When the formula is entered, press the Ctrl key and the Shift key simultaneously. Then, holding these two keys,
press the Enter key too.
2) Calculation of (X’X)-1

a) Enter the matrix X’X into the Excel: R5:S6


b) You can find the inverse of matrix X’X by highlighting the cells where you want to place the resulting
matrix (R5:S6)
c) Once you have highlighted the resulting matrix, and while it is still highlighted, enter the following
formula:=MINVERSE(R5:S6).
d) When the formula is entered, press the Ctrl key and the Shift key simultaneously. Then, holding these two keys,
press the Enter key too.
3) Calculation of vector β̂

4) Calculation of uˆ 'uˆ and σ2

uˆ 'uˆ = y ' y - yˆ ' yˆ = y ' y - βˆ ' X' Xβˆ = y ' y - βˆ ' X' y = R.5 - R.6 = 953 - 883=70
uˆ 'uˆ 70
sˆ 2 = = = 8.6993
n- 2 8
5) Calculation of covariance matrix of β̂
' ù- 1
æ3.8696 -0.0370ö÷= æ33.6624 -0.3215ö
÷
var(βˆ ) = sˆ 2 é
êX Xû
ç
ú = 8.6993ç
ç ÷
÷
ç
ç ÷
÷
ë è-0.0370 0.0004 ÷ ç
ø è-0.3215 0.0032 ø÷

90
Exercise 3.3 The following model was formulated to explain the annual sales (sales) of
the manufacturers of household cleaning products as a function of a relative price index
(rpi) and the advertising expenditures (adv):
β1 β 2 rpi + β3adv + u
sales =+
where the variable sales is expressed in a thousand million euros and rpi is a relative price
index obtained as a ratio between the prices of each firm and the prices of firm 1 of the
sample; adv is the annual expenditures on advertising and promotional campaigns and
media diffusion, expressed in millions of euros.
Data on ten manufacturers of household cleaning products appear in the attached
table.
firm sales rpi adv
1 10 100 300
2 8 110 400
3 7 130 600
4 6 100 100
5 13 80 300
6 6 80 100
7 12 90 600
8 7 120 200
9 9 120 400
10 15 90 700
Using an excel spreadsheet,
a) Estimate the parameters of the proposed model
b) Estimate the covariance matrix.
c) Calculate the coefficient of determination.
Note: In exhibit 3.1 the model sales =+ β1 β 2 rpi + u is estimated using excel.
Instructions are also included.
Exercise 3.4 A researcher, who is developing an econometric model to explain income,
formulates the following specification:
inc=α+βcons+γsave+u [1]
where inc is the household disposable income, cons is the total consumption and save is
the total savings of the household.
The researcher did not take into account that the above three magnitudes are
related by the identity
inc=cons+save [2]
The equivalence between the models [1] and [2] requires that, in addition to the
disappearance of the disturbance term, the model parameters [1] take the following values:
α =0, β =1, and γ =1
If you estimate equation [1] with the data for a given country, can you expect, in
αˆ 0,=
general, that the estimates will take the values = βˆ 1,=γˆ 0?
Please justify your answer using mathematical notation.
Exercise 3.5 A researcher proposes the following econometric model to explain tourism
revenue (turtot) in a given country:
β1 + β 2turmean + β3numtur + u
turtot =

91
INTRODUCTION TO ECONOMETRICS

where turmean is the average expenditure per tourist and numtur is the total number of
tourists.
a) It is obvious that turtot, numtur and turmean and are also linked by the
relationship turtot=turmean×numtur. Will this somehow affect the
estimation of the parameters of the proposed model?
b) Is there a model with another functional form involving tighter restrictions
on the parameters? If so, indicate it.
c) What is your opinion about using the proposed model to explain the
behavior of tourism revenue? Is it reasonable?
Exercise 3.6 Let us suppose you have to estimate the model
β1 + β 2 ln( x2 ) + β3 ln( x3 ) + β 4 ln( x4 ) + u
ln( y ) =
using the following observations:
x2 x3 x4
3 12 4
2 10 5
4 4 1
3 9 3
2 6 3
5 5 1
What problems can arise in the estimation of this model?
Exercise 3.7 Answer the following questions:
a) Explain the determination coefficient (R2) and the adjusted determination
2
coefficient ( R ). What can you use them for? Justify your answer.
b) Given the models
ln(y)=β 1 +β 2 ln(x)+u (1)
ln(y)=β 1 +β 2 ln(x)+β 3 ln(z)+u (2)
ln(y)=β 1 +β 2 ln(z)+u (3)
y=β 1 +β 2 z+u (4)
indicate what measure of goodness of fit is appropriate to compare the
following pairs of models: (1) - (2), (1) - (3), and (1) - (4). Explain your
answer.
Exercise 3.8 Let us suppose that the following model is estimated by OLS:
β1 + β 2 ln( x) + β3 ln( z ) + u
ln( y ) =
a) Can least square residuals all be positive? Explain your answer.
b) Under the assumption of no autocorrelation of disturbances, are the OLS
residuals independent? Explain your answer
c) Assuming that the disturbances are not normally distributed, will the OLS
estimators be unbiased? Explain your answer.
Exercise 3.9 Consider the linear regression model
y=Xβ+u
where y and u are vectors 8×1, X is a matrix 8×3 and β is a vector 3×1. Also the following
information is available:

92
MULTIPLE LINEAR REGRESSION

2 0 0
X′X =  0 3 0  uˆ ′uˆ = 22
 0 0 3
Answer the following questions, by justifying your answer:
a) Indicate the sample size, the number of regressors, the number of
parameters and the degrees of freedom of the residual sum of squares.
b) Derive the covariance matrix of the vector β̂ , making explicit the
assumptions used. Estimate the variances of the estimators.
c) Does the regression have an intercept? What implications does the answer
to this question have on the meaning of R2 in this model?
Exercise 3.10 Discuss whether the following statements are true or false:
a) In a linear regression model, the sum of the residuals is zero.
b) The coefficient of determination ( R 2 ) is always a good measure of the
model’s quality.
c) The least squares estimators are biased.
Exercise 3.11 The following model is formulated to explain time spent sleeping:
β1 + β 2 totalwrk + β3leisure + u
sleep =
where sleep, totalwrk (paid and unpaid work) and leisure (time not devoted to sleep or
work) are measured in minutes per day.
The estimated equation with a sample of 1000 observations, using file timuse03,
is the following:
· = 1440 - 1´ total _ work - 1´ leisure
sleep
R2=1.000 n=1000
a) What do you think about these results?
b) What is the meaning of the estimated intercept?
Exercise 3.12 Using a subsample of the Structural Survey of Wages (Encuesta de
estructura salarial) for Spain in 2006 (file wage06sp), the following model is estimated
to explain wage:
·wage) =+
ln( 1.565 0.0730educ + 0.0177tenure + 0.0065age
R2=0.337 n=800
where educ (education), tenure (experience in the firm) and age are measured in years
and wage in euros per hour.
a) What is the interpretation of coefficients on educ, tenure and age?
b) How many years does the age have to increase in order to have a similar
effect to an increase of one year in education, holding fixed in each case
the other two regressors?
c) Knowing that educ =10.2, tenure =7.2 and age =42.0, calculate the
elasticities of wage with respect to educ, tenure and age for these values,
holding fixed the others regressors. Do you consider these elasticities to be
high or low?

93
INTRODUCTION TO ECONOMETRICS

Exercise 3.13 The following equation describes the price of housing in terms of house
bedrooms (number of bedrooms), bathrms (number of full bathrooms) and lotsize (the lot
size of a property in square feet):
β1 + β 2bedrooms + β3bathrms + β 4lotsize + u
price =
where price is the price of a house measured in dollars.
Using the data for the city of Windsor contained in file housecan, the following
model is estimated:
·
price = −2418 + 5827bedrooms + 19750bathrms + 5.411lotsize
R2=0.486 n=546
a) What is the estimated increase in price for a house with one more bedroom
and one more bathroom, holding lotsize constant?
b) What percentage of the variation in price is explained jointly by the
number of bedrooms, the number of full bathrooms and the lot size?
c) Find the predicted selling price for a house of the sample with bedrooms=3,
bathrms=2 and lotsize=3880.
d) The actual selling price of the house in c) was $66,000. Find the residual
for this house. Does the result suggest that the buyer underpaid or overpaid
for the house?
Exercise 3.14 To examine the effects of a firm’s performance on a CEO salary, the
following model was formulated:
β1 β 2 roa + β3 ln( sales ) + β 4 profits + β5tenure + u
ln( salary ) =+
where roa is the ratio profits/assets expressed as a percentage and tenure is the number
of years as CEO (=0 if less than 6 months). Salaries are expressed in thousands of dollars,
and sales and profits in millions of dollars.
The file ceoforbes has been used for the estimation. This file contains data on 447
CEOs of America's 500 largest corporations. (52 of the 500 firms were excluded because
of missing data on one or more variables. Apple Computer was also excluded since Steve
Jobs, the acting CEO of Apple in 1999, received no compensation during this period.)
Company data come from Fortune magazine for 1999; CEO data come from Forbes
magazine for 1999 too. The results obtained were the following:
·salary ) =
ln( 4.641 + 0.0054roa + 0.2893ln( sales ) + 0.0000564 profits + 0.0122tenure
R2=0.232 n=447
a) Interpret the coefficient on the regressor roa
b) Interpret the coefficient on the regressor ln(sales). What is your opinion
about the magnitude of the elasticity salary/sales?
c) Interpret the coefficient on the regressor profits.
d) What is the salary/profits elasticity at the sample mean ( salary =2028 and
profits =700).

Exercise 3.15 (Continuation of exercise 2.21) Using a dataset consisting of 1,983 firms
surveyed in 2006 (file rdspain), the following equation was estimated:
·
rdintens = - 1.8168 + 0.1482 ln( sales ) + 0.0110exponsal

94
MULTIPLE LINEAR REGRESSION

R2= 0.048 n=1983


where rdintens is the expenditure on research and development (R&D) as a percentage of
sales, sales are measured in millions of euros, and exponsal is exports as a percentage of
sales.
a) Interpret the coefficient on ln(sales). In particular, if sales increase by
100%, what is the estimated percentage point change in rdintens? Is this
an economically large effect?
b) Interpret the coefficient on exponsal. Is it economically large?
c) What percentage of the variation in rdintens is explained by sales and
exponsal?
d) What is the rdintens/sales elasticity for the sample mean ( rdintens =0.732
and sales =63544960). Comment on the result.
e) What is the rdintens/exponsal elasticity for the sample mean ( rdintens
=0.732 and exponsal =17.657). Comment on the result.

Exercise 3.16 The following hedonic regression for cars (see example 3.3) is formulated:
β1 β 2 cid + β3hpweight + β 4 fueleff + u
ln( price) =+
where cid is the cubic inch displacement, hpweight is the ratio horsepower/weight in kg
expressed as percentage and fueleff is the ratio liters per 100 km/horsepower expressed as
a percentage.
a) What are the probable signs of β 2 , β 3 and β 4 ? Explain them.
b) Estimate the model using the file hedcarsp and write out the results in
equation form.
c) Interpret the coefficient on the regressor cid.
d) Interpret the coefficient on the regressor hpweight.
e) To expand the model, add a regressor relative to car size, such as volume
or weight. What happens if you add both of them? What is the relationship
between weight and volume?
Exercise 3.17 The concept of work covers a broad spectrum of possible activities in the
productive economy. An important part of work is unpaid; it does not pass through the
market and therefore has no price. The most important unpaid work is housework
(houswork) carried out mainly by women. In order to analyze the factors that influence
housework, the following model is formulated:
β1 β 2 educ + β3hhinc + β 4 age + β5 paidwork + u
houswork =+
where educ is the years of education attained, hhinc is the household income in euros per
month. The variables houswork and paidwork are measured in minutes per day.
Use the data in the file timuse03 to estimate the model. This file contains 1000
observations corresponding to a random subsample extracted from the time use survey
for Spain carried out in 2002-2003.
a) Which signs do you expect for β 2 , β 3, β 4 and β 5 ? Explain.
b) Write out the results in equation form?
c) Do you think there are relevant factors omitted in the above equation?
Explain.
d) Interpret the coefficient on the regressors educ, hhinc, age and paidwork.

95
INTRODUCTION TO ECONOMETRICS

Exercise 3.18 (Continuation of exercise 2.20) To explain the overall satisfaction of


people (stsfglo), the following model is formulated:
β1 + β 2 gnipc + β3lifexpec + u
stsfglo =
where gnipc is the gross national income per capita expressed in PPP 2008 US dollar
terms and lifexpec is the life expectancy at birth, i.e., the number of years a newborn infant
could expect to live. When a magnitude is expressed in PPP (purchasing power parity)
US dollar terms, a magnitude is converted to international dollars using PPP rates. (An
international dollar has the same purchasing power as a US dollar in the United States.)
Use the file HDR2010 for the estimation of the model.
a) What are the expected signs for β 2 and β 3 ? Explain.
b) What would be the average overall satisfaction for a country with 80 years
of life expectancy at birth and a gross national income per capita of 30000
$ expressed in PPP 2008 US dollars?
c) Interpret the coefficients on gnipc and lifexpe.
d) Given a country with a life expectancy at birth equal to 50 years, what
should be the gross national income per capita to obtain a global
satisfaction equal to five?
Exercise 3.19 (Continuation exercise 2.24) Due to the problems arisen in the Keynesian
consumption function, Brown introduced a new regressor in the function: consumption
lagged a period to reflect the persistence of consumer habits. The formulation of the
model is as follows
conspct = b1 + b2incpct + b3conspct- 1 + ut
As lagged consumption is included in this model, we have to distinguish between
marginal propensity to consume in the short term and long term. The short-run marginal
propensity is calculated in the same way as in the Keynesian consumption function. To
calculate the long-term marginal propensity it is necessary to consider equilibrium state
with no changes in variables. Denoting by conspce and incpce consumption and income
in equilibrium, and regardless of the random disturbance, the previous model in
equilibrium is given by
conspc e = b1 + b2incpc e + b3conspc e
The Brown consumption function was estimated with data of the Spanish
economy for the period 1954-2010 (file consumsp), obtaining the following results:
·
conspc =−7.156 + 0.3965incpc + 0.5771conspc
t t t −1
2
R =0.997 n=56
a) Interpret the coefficient on incpc. In the interpretation, do you have to
include the clause "holding fixed the other regressor”? Justify the answer.
b) Calculate the short-term elasticity for the sample means ( conspc =8084,
incpc =8896).
c) Calculate the long-term elasticity for the sample means.
d) Discuss the difference between the values obtained for the two types of
elasticity.

96
MULTIPLE LINEAR REGRESSION

Exercise 3.20 To explain the influence of incentives and expenditures in advertising on


sales, the following alternative models have been formulated:
β1 + β 2 advert + β3incent + u
sales = (1)
β1 + β 2 ln(advert ) + β3 ln(incent ) + u
ln( sales ) = (2)
β1 + β 2 advert + β3incent + u
ln( sales ) = (3)
sales = β 2 advert + β3incent + u (4)
β1 + β 2 ln(incent ) + u
ln( sales ) = (5)
β1 + β 2incent + u
sales = (6)
a) Using a sample of 18 sale areas (file advincen), estimate the above models:
b) In each of the following groups select the best model, indicating the criteria
you have used. Justify your answer.
b1) (1) and (6)
b2) (2) and (3)
b3) (1) and (4)
b4) (2), (3) and (5)
b5) (1), (4) and (6)
b6) (1), (2), (3), (4), (5) and (6)

Appendixes
Appendix 3.1 Proof of the theorem of Gauss-Markov
To prove this theorem, the MLC assumptions 1 through 9 are used.
Let us now consider another estimator β which is a function of y (remember that
βˆ is also a function of y), given by

β [ X′X ] X′ + A  y
−1
= (3-93)
 
where A is k × n arbitrary matrix, that is a function of X and/or other non-stochastic
variables, but it is not a function of y. For β to be unbiased, certain conditions must be
accomplished.
Taking (3-52) into account, we have

β =[ X′X ] X′ + A  [ Xβ + u ] =β + AXβ + [ X′X ] X′ + A  u


−1 −1
(3-94)
   
Taking expectations on both sides of (3-94), we have

E (β ) β + AXβ + [ X′X ] X=′ + A  E (u) β + AXβ


−1
= (3-95)
 
For β to be unbiased, that is to say, E (β ) = β , the following must be
accomplished:
AX = I (3-96)
Consequently,

97
INTRODUCTION TO ECONOMETRICS

β β + [ X′X ] X′ + A  u
−1
= (3-97)
 
Taking into account assumptions 7 and 8, and (3-96), the Var (β ) is equal to

Var (β ) = E ((β − β)(β − β)′) = E  [ X′X ] X′ + A  uu′  X [ X′X ] + A′ 


−1 −1

   
= E  [ X′X ] X′ uu′  X [ X′X ]  + AA ′ σ 2 [ X′X ] + AA′
−1 −1 −1
=
      
(3-98)
The difference between both variances is the following:

(βˆ ) σ 2 [ X′X ] + AA′ − [ X′=


X ]  σ 2 AA′
−1 −1
Var (β ) − Var
= (3-99)
 
The product of a matrix by its transpose is always a semi-positive definite matrix.
Therefore,
Var (β ) − Var (βˆ ) = σ 2 AA′ ≥ 0 (3-100)

The difference between the variance of an estimator β - arbitrary but linear and
unbiased – and the variance of the estimator β̂ is a semi positive definite matrix.
Consequently, β̂ is a Best Unbiased Linear Estimator; that is to say, it is a BLUE
estimator.
)
Appendix 3.2 Proof: σ 2 is an unbiased estimator of the variance of the disturbance
In order to see which is the most appropriate estimator of σ 2 , we shall first
analyze the properties of the sum of squared residuals. This one is precisely the numerator
of the residual variance.
Taking into account (3-17) and (3-23), we are going to express the vector of
residuals as a function of the regressand

- Xβˆ y - X [ X′X ] = X′y I - X [ X′X ] = X′ y My (3-101)


−1 −1
=uˆ y=
 
where M is an idempotent matrix.
Alternatively, the vector of residuals can be expressed as a function of the
disturbance vector:

uˆ =I - X [ X′X ] X′ y I - X [ X′X ] X′ [ Xβ + u ]


−1 −1
=
   
= Xβ - X [ X′X ] X′Xβ + u − X [ X′X ] X′β u
−1 −1

(3-102)
= Xβ - Xβ + I − X [ X′X ] X′ u = I − X [ X′X ] X′ u
−1 −1
   
= Mu
Taking into account (3-102), the sum of squared residuals (SSR) can be expressed
in the following form:
=uˆ ′uˆ u=
′M′Mu u′Mu (3-103)

98
MULTIPLE LINEAR REGRESSION

Now, keeping in mind that we are looking for an unbiased estimator of σ 2 , we


are going to calculate the expectation of the previous expression:
E [uˆ ′uˆ ] E=
= [u′Mu ] trE=
[u′Mu ] E [tru′Mu ]
[trMuu′] tr=
= E= ME [uu′] trMσ 2 I (3-104)
= σ 2tr
= M σ 2 (n − k )
In deriving (3-104), we have used the property of the trace that tr ( AB) = tr (BA) .
Taking into account that property of the trace, the value of trM is obtained:

tr I n×n X [ X′X ] X′ =trI n×n − trX [ X′X ] X′


−1 −1
trM =−
 
=
trI n×n − trI k ×k =
n−k

According to (3-104), it holds that


E [uˆ ′uˆ ]
σ2 = (3-105)
n−k
Keeping (3-105) in mind, an unbiased estimator of the variance will be:
uˆ ′uˆ
σˆ 2 = (3-106)
n−k
since, according to (3-104),
 uˆ ′uˆ  E (uˆ ′uˆ ) σ (n − k )
2
E (σˆ 2 ) E =
=  = = σ2 (3-107)
n − k  n−k n−k
The denominator of (3-106) is the degree of freedom corresponding to the RSS
that appear in the numerator. This result is justified by the fact that the normal equations
of the hyperplane impose k restrictions on the residuals. Therefore, the number of degrees
of freedom of the RSS is equal to the number of observations (n) minus the number of
restrictions k.

Appendix 3.3 Consistency of the OLS estimator


In appendix 2.8 we have proved the consistency of the OLS estimator b̂2 in the
simple regression model. Now we are going to prove the consistency of the OLS vector
β̂ .
First, the least squares estimator β̂ , given in (3-23). may be written as
- 1
ˆβ = β + æ
ç1 ö
X'X÷
÷ æ1
ç
ö
X'u÷
÷ (3-108)
ç
ç
èn ÷
ø èn ç
ç ÷
ø
Now, we take limits in the last factor of (3-108) and call Q to the result:
1
lim X'X = Q (3-109)
n® ¥ n

99
INTRODUCTION TO ECONOMETRICS

If X is taken to be fixed in repeated samples, according to assumption 2, then


(3-109) implies that Q=(l/n)X'X. According to assumption 3, and because the inverse is
a continuous function of the original matrix, Q-1 exists. Therefore, we can write
é1 ù
plim(βˆ ) = β + Q- 1plim ê X'uú
ê
ën ú
û
The last term of (3-108) can be written as
é1 1 L 1 L 1 ùéu1 ù
ê úê ú
êx21 x22 L x2i L x2n úêu2 ú
ê úê ú
1 1ê M M O M O Mú ê ú
úêMú
X'u = ê
n nê
êx j1 x j 2 L x ji L x jn úê ú
úêui ú
ê úê ú
êM M O M O MúêMú
ê úê ú
ê
ëxk 1 xk 2 L xki L xkn ú êun û
ûë ú
(3-110)
éu1 ù
ê ú
êu2 ú
ê ú
1 êMú 1 n
= [x1 x2 L xi L x n ]ê ú
êu ú= n å xi ui = xi ui
n êiú i= 1
êMú
ê ú
ê ú
êun û
ë ú
where x i is the column vector corresponding to the ith observation
Now, we are going to calculate the expectation and the variance (3-110),
n
ù= 1 1 n 1

x u ú n å [ i i ] n å xi E [ui ]= n X'E [u ]= 0 (3-111)
ê i iû E x u =
ë
i= 1 i= 1

é ù é ù 1 1 s 2 X'X s 2
var êxi ui ú= E êxi ui (xi ui ) 'ú= X'E [uu ']X = = 2Q
ë û ë û n n n n n
(3-112)
since E [uu ']= s 2I , according to assumptions 7 and 8.
Taking limits in (3-112), it then follows that

é ù s2
lim var êxi ui ú= lim Q = 0(Q) = 0 (3-113)
n® ¥ ë û n ® ¥ n2

Since the expectation of xi ui is identically zero and its variance converges to zero,
xi ui converges in mean square to zero. Convergence in mean square implies
convergence in probability, and so plim( xi ui )=0. Therefore,

100
MULTIPLE LINEAR REGRESSION

é1 ù
plim(βˆ ) = β + Q- 1plim(xi ui ) = β + Q- 1plim ê X'uú= β + Q- 1 ´ 0 = β
ê
ën ú
û
(3-114)
Consequently, β̂ is a consistent estimator.

Appendix 3.4 Maximum likelihood estimator


The method of maximum likelihood is widely used in econometrics. This method
proposes that the parameter estimators be those values for which the probability of
obtaining the observations given is maximum. In the least squares estimation no prior
assumption was adopted. On the contrary, the estimation by maximum likelihood requires
that statistical assumptions about the various elements of the model be established
beforehand. Thus, in the estimation by maximum likelihood we will adopt all the
assumptions of classic linear model (CLM).
Therefore, in the estimation by maximum likelihood of β and σ2 in the model
(3-52), we take as estimators those values that maximize the probability to obtain the
observations in a given sample.
Let us look at the procedure for obtaining the maximum likelihood estimators β
and σ. According to the CLM assumptions:
u : N (0, s 2 I ) (3-115)
The expectation and variance of the distribution of y are given by
E (y ) = E [ Xβ + u ] = Xβ + E (u) = Xβ (3-116)

var(y ) = E ( y − Xβ )( y − Xβ )′ =
 E uu′ =
 σ 2I (3-117)
   
Therefore,
y : N ( Xβ, s 2 I ) (3-118)
The probability density of y (or likelihood function), considering X and y fixed
and β and σ2 variable, will be in accordance with (3-118) equal to

L =f ( y | β, σ 2 ) =
1
(
exp − (1 2σ 2 ) ( y - Xβ ) ' ( y − Xβ ) )
( 2πσ )
2 n /2

(3-119)
The maximum for L is reached in the same point on the ln(L) given that the
logarithm function is monotonic, and thus, in order to maximize the function, we can
work with ln(L) instead of L. Therefore,
n ln(2π) n ln(σ 2 ) 1
ln( L) =
− − − 2 − (y - Xβ)'(y - Xβ) (3-120)
2 2 2σ
To maximize ln(L), we differentiate it with respect to β and σ2:
δ ln( L) 1
=
− 2 (−2 X'y + 2 X'Xβ) (3-121)
δβ 2σ

101
INTRODUCTION TO ECONOMETRICS

δ ln( L) n (y - Xβ)'(y - Xβ)


=
− 2+ (3-122)
δσ 2
2σ 2σ 4
Equating (3-121) to zero, we see that the maximum likelihood estimator of β,
denoted by β , satisfies that

X ' Xβ = X ' y (3-123)


Because we assume that X ' X is invertible,

[=
X ' X]
−1
=β X'y (3-124)
Consequently, the maximum likelihood estimator of β, under the assumptions of
the CLM, coincides with OLS estimator, that is to say,
β = βˆ (3-125)
Therefore,

(y - Xβ)'(y  = (y - Xβ)'(y
- Xβ) ˆ - Xβˆ ) = uˆ ' uˆ (3-126)
Equating (3-122) to zero and by substituting β by β , we obtain:
n uˆ ' uˆ
− + =
0 (3-127)
2σ 2
2σ 4
where we have designated by σ 2 the maximum likelihood estimator of the variance of
the random disturbances. From (3-127), it follows that
uˆ ' uˆ
σ 2 = (3-128)
n
As we can see, the maximum likelihood estimator is not equal to the unbiased
estimator that has been obtained in (3-106). In fact, if we take expectations to (3-128),
1 n−k 2
E σ 2  =
= E [uˆ ' uˆ ] σ (3-129)
n n
That is to say, the maximum likelihood estimator, σ 2 , is a biased estimator,
although its bias tends to zero as n infinity, since
n−k
lim =1 (3-130)
n →∞ n

102
MULTIPLE LINEAR REGRESSION

103
4 HYPOTHESIS TESTING IN THE MULTIPLE
REGRESSION MODEL

4.1 Hypothesis testing: an overview


Before testing hypotheses in the multiple regression model, we are going to offer
a general overview on hypothesis testing.
Hypothesis testing allows us to carry out inferences about population parameters
using data from a sample. In order to test a hypothesis in statistics, we must perform the
following steps:
1) Formulate a null hypothesis and an alternative hypothesis on population
parameters.
2) Build a statistic to test the hypothesis made.
3) Define a decision rule to reject or not to reject the null hypothesis.
Next, we will examine each one of these steps.

4.1.1 Formulation of the null hypothesis and the alternative hypothesis


Before establishing how to formulate the null and alternative hypothesis, let us
make the distinction between simple hypotheses and composite hypotheses. The
hypotheses that are made through one or more equalities are called simple hypotheses.
The hypotheses are called composite when they are formulated using the operators
"inequality", "greater than" and "smaller than".
It is very important to remark that hypothesis testing is always about population
parameters. Hypothesis testing implies making a decision, on the basis of sample data, on
whether to reject that certain restrictions are satisfied by the basic assumed model. The
restrictions we are going to test are known as the null hypothesis, denoted by H 0 . Thus,
null hypothesis is a statement on population parameters.
Although it is possible to make composite null hypotheses, in the context of the
regression model the null hypothesis is always a simple hypothesis. That is to say, in order
to formulate a null hypothesis, which shall be called H 0 , we will always use the operator
“equality”. Each equality implies a restriction on the parameters of the model. Let us look
at a few examples of null hypotheses concerning the regression model:
a) H0 : β1=0
b) H0 : β1+ β2 =0

104
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

c) H0 : β1=β2 =0
d) H0 : β2+β3 =1
We will also define an alternative hypothesis, denoted by H 1 , which will be our
conclusion if the experimental test indicates that H 0 is false.
Although the alternative hypotheses can be simple or composite, in the regression
model we will always take a composite hypothesis as an alternative hypothesis. This
hypothesis, which shall be called H1, is formulated using the operator “inequality” in most
cases. Thus, for example, given the H 0 :
H0 : β j = 1 (4-1)
we can formulate the following H1 :
H1 : β j ≠ 1 (4-2)
which is a “two side alternative” hypothesis.
The following hypotheses are called “one side alternative” hypotheses
H1 : β j < 1 (4-3)

H1 : β j > 1 (4-4)

4.1.2 Test statistic


A test statistic is a function of a random sample, and is therefore a random variable.
When we compute the statistic for a given sample, we obtain an outcome of the test
statistic. In order to perform a statistical test we should know the distribution of the test
statistic under the null hypothesis. This distribution depends largely on the assumptions
made in the model. If the specification of the model includes the assumption of normality,
then the appropriate statistical distribution is the normal distribution or any of the
distributions associated with it, such as the Chi-square, Student’s t, or Snedecor’s F.
Table 4.1 shows some distributions, which are appropriate in different situations,
under the assumption of normality of the disturbances.

TABLE 4.1. Some distributions used in hypothesis testing.


1 restriction 1 or more
restrictions

Known σ2 N Chi-square

Unknown σ2 Student’s t Snedecor’s F

The statistic used for the test is built taking into account the H0 and the sample
data. In practice, as σ 2 is always unknown, we will use the distributions t and F.

4.1.3 Decision rule


We are going to look at two approaches for hypothesis testing: the classical
approach and an alternative one based on p-values. But before seeing how to apply the

105
INTRODUCTION TO ECONOMETRICS

decision rule, we shall examine the types of mistakes that can be made in testing
hypothesis.

Types of errors in hypothesis testing


In hypothesis testing, we can make two kinds of errors: Type I error and Type II
error.

Type I error

We can reject H 0 when it is in fact true. This is called Type I error. Generally, we
define the significance level (α) of a test as the probability of making a Type I error.
Symbolically,
α = Pr( Reject H 0 | H 0 ) (4-5)
In other words, the significance level is the probability of rejecting H 0 given that
H 0 is true. Hypothesis testing rules are constructed making the probability of a Type I
error fairly small. Common values for α are 0.10, 0.05 and 0.01, although sometimes
0.001 is also used.
After we have made the decision of whether or not to reject H 0 , we have either
decided correctly or we have made an error. We shall never know with certainty whether
an error was made. However, we can compute the probability of making either a Type I
error or a Type II error.

Type II error

We can fail to reject H 0 when it is actually false. This is called Type II error.
β = Pr( No reject H 0 | H1 ) (4-6)
In words, β is the probability of not rejecting H 0 given that H 1 is true.
It is not possible to minimize both types of error simultaneously. In practice, what
we do is select a low significance level.

Classical approach: Implementation of the decision rule


The classical approach implies the following steps:
a) Choosing α. Classical hypothesis testing requires that we initially specify a
significance level for the test. When we specify a value for α, we are essentially
quantifying our tolerance for a Type I error. If α=0.05, then the researcher is willing to
falsely reject H 0 5% of the time.
b) Obtaining c, the critical value, using statistical tables. The value c is determined
by α.
The critical value (c) for a hypothesis test is a threshold to which the value of the
test statistic in a sample is compared to determine whether or not the null hypothesis is
rejected.

106
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

c) Comparing the outcome of the test statistic, s, with c, H 0 is either rejected or


not for a given α.
The rejection region (RR), delimited by the critical value(s), is a set of values of
the test statistic for which the null hypothesis is rejected. (See figure 4.1). That is, the
sample space for the test statistic is partitioned into two regions; one region (the rejection
region) will lead us to reject the null hypothesis H 0 , while the other will lead us not to
reject the null hypothesis. Therefore, if the observed value of the test statistic S is in the
critical region, we conclude by rejecting H 0 ; if it is not in the rejection region then we
conclude by not rejecting H 0 or failing to reject H 0 .
Symbolically,
If s≥c reject H0
(4-7)
If s<c not reject H 0
If the null hypothesis is rejected with the evidence of the sample, this is a strong
conclusion. However, the acceptance of the null hypothesis is a weak conclusion because
we do not know what the probability is of not rejecting the null hypothesis when it should
be rejected. That is to say, we do not know the probability of making a type II error.
Therefore, instead of using the expression of accepting the null hypothesis, it is more
correct to say fail to reject the null hypothesis, or not reject, since what really happens is
that we do not have enough empirical evidence to reject the null hypothesis.
In the process of hypothesis testing, the most subjective part is the a priori
determination of the significance level. What criteria can be used to determine it? In
general, this is an arbitrary decision, though, as we have said, the 1%, 5% and 10% levels
for α are the most used in practice. Sometimes the testing is made conditional on several
significance levels.

Non
Rejection
Rejection
Region RR
Region
NRR

c W
FIGURE 4.1. Hypothesis testing: classical approach.

An alternative approach: p-value


With the use of computers, hypothesis testing can be contemplated from a more
rational perspective. Computer programs typically offer, together with the test statistic, a
probability. This probability, which is called p-value (i.e., probability value), is also
known as the critical or exact level of significance or the exact probability of making a

107
INTRODUCTION TO ECONOMETRICS

Type I error. More technically, the p value is defined as the lowest significance level at
which a null hypothesis can be rejected.
Once the p-value has been determined, we know that the null hypothesis is
rejected for any α ≥ p-value, while the null hypothesis is not rejected when α<p-value.
Therefore, the p-value is an indicator of the level of admissibility of the null hypothesis:
the higher the p-value, the more confidence we can have in the null hypothesis. The use
of the p-value turns hypothesis testing around. Thus, instead of fixing a priori the
significance level, the p-value is calculated to allow us to determine the significance
levels of those in which the null hypothesis is rejected.
In the following sections, we will see the use of p value in hypothesis testing put
into practice.

4.2 Testing hypotheses using the t test

4.2.1 Test of a single parameter

The t test
Under the CLM assumptions 1 through 9,

βˆ j ~ N  β j , var( βˆ j )  j = 1, 2,3, , k (4-8)

If we typify
βˆ j − β j βˆ j − β j
= = ~ N [ 0,1] j 1, 2,3, , k (4-9)
var( βˆ j ) sd ( βˆ j )

The claim for normality is usually made on the basis of the Central Limit
Theorem (CLT), but this is restrictive in some cases. That is to say, normality cannot
always be assumed. In any application, whether normality of u can be assumed is really
an empirical matter. It is often the case that using a transformation, i.e. taking logs, yields
a distribution that is closer to normality, which is easy to handle from a mathematical
point of view. Large samples will allow us to drop normality without affecting the results
too much.
Under the CLM assumptions 1 through 9, we obtain a Student’s t distribution
bˆ j - b j
: t n- k (4-10)
se(bˆ )j

where k is the number of unknown parameters in the population model (k-1 slope
parameters and the intercept, β 1 ). The expression (4-10) is important because it allows
us to test a hypothesis on β j .
If we compare (4-10) with (4-9), we see that the Student’s t distribution derives
from the fact that the parameter σ in sd ( βˆ j ) has been replaced by its estimator σˆ ,
which is a random variable. Thus, the degrees of freedom of t are n-1-k corresponding
to the degrees of freedom used in the estimation of σˆ 2 .

108
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

When the degrees of freedom (df) in the t distribution are large, the t distribution
approaches the standard normal distribution. In figure 4.2, the density function for
normal and t distributions for different df are represented. As can be seen, the t density
functions are flatter (platycurtic) and the tails are wider than normal density function,
but as df increases, t density functions are closer to the normal density. In fact, what
happens is that the t distribution takes into account that σ 2 is estimated because it is
unknown. Given this uncertainty, the t distribution extends more than the normal one.
However, as the df grows the t-distribution is nearer to the normal distribution because
the uncertainty of not knowing σ 2 decreases.
Therefore, the following convergence in distribution should be kept in mind:
tn 
n →∞
→ N (0,1) (4-11)
Thus, when the number of degrees of freedom of a Student’s t tends to infinity,
the t distribution converges towards a distribution N(0.1). In the context of testing a
hypothesis, if the sample size grows, so will the degrees of freedom. This means that for
large sizes the normal distribution can be used to test hypothesis with one unique
restriction, even when you do not know the population variance. As a practical rule, when
the df are larger than 120, we can take the critical values from the normal distribution.

FIGURE 4.2. Density functions: normal and t for different degrees of freedom.
Consider the null hypothesis,
H0 : β j = 0

Since β j measures the partial effect of x j on y after controlling for all other
independent variables, H 0 : β j = 0 means that, once x 2 , x 3 , …,x j −1 , x j+1 ,…, x k have been
accounted for, x j has no effect on y. This is called a significance test. The statistic we use
to test H 0 : β j = 0 , against any alternative, is called the t statistic or the t ratio of βˆ j and
is expressed as
βˆ j
tβˆ =
j
se( βˆ j )

109
INTRODUCTION TO ECONOMETRICS

In order to test H 0 : β j = 0 , it is natural to look at our unbiased estimator of β j ,


βˆ . In a given sample βˆ will never be exactly zero, but a small value will indicate that
j j

the null hypothesis could be true, whereas a large value will indicate a false null
hypothesis. The question is: how far is βˆ j from zero?

We must recognize that there is a sampling error in our estimate βˆ j , and thus the
size of βˆ j must be weighted against its sampling error. This is precisely what we do when
we use t , since this statistic measures how many standard errors βˆ is away from zero.
βˆ j j

In order to determine a rule for rejecting H 0 , we need to decide on the relevant alternative
hypothesis. There are three possibilities: one-tail alternative hypotheses (right and left
tail), and two-tail alternative hypothesis.

One-tail alternative hypothesis: right


First, let us consider the null hypothesis
H0 : β j = 0
against the alternative hypothesis
H1 : β j > 0
This is a positive significance test. In this case, the decision rule is the following:
Decision rule

If tβˆ ≥ tnα− k reject H0


j
(4-12)
If tβˆ < tnα− k not reject H 0
j

Therefore, we reject H 0 : β j = 0 in favor of H1 : β j > 0 at α when tβˆ ≥ tnα− k as can


j

be seen in figure 4.3. It is very clear that to reject H 0 against H1 : β j > 0 , we must get a
positive tβˆ . A negative tβˆ , no matter how large, provides no evidence in favor of
j j

H1 : β j > 0 . On the other hand, in order to obtain tnα− k in the t statistical table, we only
need the significance level α and the degrees of freedom.
It is important to remark that as α decreases, tnα− k increases.
To a certain extent, the classical approach is somewhat arbitrary, since we need to
choose α in advance, and eventually H 0 is either rejected or not.

In figure 4.4, the alternative approach is represented. As can be seen by observing


the figure, the determination of the p-value is the inverse operation to find the value of
the statistical tables for a given significance level. Once the p-value has been determined,
we know that H 0 is rejected for any level of significance of α>p-value, while the null
hypothesis is not rejected when α<p-value.

110
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

Rejected Non rejected


Non for for
Rejection
tn−k Rejection
Region α>p-value α<p-value
Region tn−k
RR
NRR

p-value

tnα−k tβˆ
j

FIGURE 4.3. Rejection region using t: right-tail FIGURE 4.4. p-value using t: right-tail alternative
alternative hypothesis. hypothesis.
EXAMPLE 4.1 Is the marginal propensity to consume smaller than the average propensity to consume?
As seen in example 1.1, testing the 3rd proposition of the Keynesian consumption function in a
linear model, is equivalent to testing whether the intercept is significative1y greater than 0. That is to say,
in the model
β1 β 2inc + u
cons =+
we must test whether
β1 > 0
With a random sample of 42 observations, the following results have been obtained
·
consi = 0.41 + 0.843 inci
(0.350) (0.062)

The numbers in parentheses, below the estimates, are standard errors (se) of the estimators.
The question we pose is the following: is the third proposition of the Keynesian theory admissible?
Next, we answer this question.
1) In this case, the null and alternative hypotheses are the following:
H 0 : β1 = 0
H1 : β1 > 0
2) The test statistic is:
βˆ1 − β10 βˆ1 − 0 0.41
=t = = = 1.171
se( βˆ1 ) se( βˆ1 ) 0.35
3) Decision rule
It is useful to use several significance levels. Let us begin with a significance level of 0.10 because
the value of t is relatively small (smaller than 1.5). In this case, the degrees of freedom are 40 (42
observations minus 2 estimated parameters). If we look at the t statistical table (row 40 and column 0.10,
0.10
or 0.20, in statistical tables with one tail, or two tails, respectively), we find t40 = 1.303
As t<1.303, we do not reject H 0 for α=0.10, and therefore we cannot reject for α=0.05
0.05
( t40 = 1.684 ) or α=0.01 ( t40
0.01
= 2.423 ), as can been in figure 4.5. In this figure, the rejection region
corresponds to α=0.10. Therefore, we cannot reject H 0 in favor H 1 . In other words, the sample data are not
consistent with Keynes’s proposition 3.
In the alternative approach, as can be seen in figure 4.6, the p-value corresponding to a tβˆ =1.171
1

for a t with 40 df is equal to 0.124. For α<0.124 - for example, 0.10, 0.05 and 0.01-, H 0 is not rejected.

111
INTRODUCTION TO ECONOMETRICS

p-value
0,10 0,124

t40 0,05 t40


0,10
0,05
0,01 0,01

2,423
1,171

1,684
1,303

1,171
1,303
1,684

2,423
FIGURE 4.5. Example 4.1: Rejection region using FIGURE 4.6. Example 4.1: p-value using t with
t with a right-tail alternative hypothesis. right-tail alternative hypothesis.

One-tail alternative hypothesis: left


Consider now the null hypothesis
H0 : β j = 0
against the alternative hypothesis
H1 : β j < 0
This is a negative significance test.
In this case, the decision rule is the following:
Decision rule

If tβˆ ≤ −tnα− k reject H0


j
(4-13)
If tβˆ > −tnα− k not reject H 0
j

Therefore, we reject H 0 : β j = 0 in favor of H1 : β j < 0 at a given α when


tβˆ ≤ −tnα , as can be seen in figure 4.7. It is very clear that to reject H 0 against H1 : β j < 0 ,
j

we must get a negative tβˆ . A positive tβˆ , no matter how large it is, provides no evidence
j j

in favor of H1 : β j < 0 .
In figure 4.8 the alternative approach is represented. Once the p-value has been
determined, we know that H 0 is rejected for any level of significance of α>p-value, while
the null hypothesis is not rejected when α<p-value.

112
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

Rejection Non Non rejected Rejected


Region Rejection tn−k for for tn−k
RR Region α>p-value ɑ<p-value
NRR

α p-value

−tnα−k 0 tβˆ
j
0

FIGURE 4.7. Rejection region using t: left-tail FIGURE 4.8. p-value using t: left-tail alternative
alternative hypothesis. hypothesis.

EXAMPLE 4.2 Has income a negative influence on infant mortality?


The following model has been used to explain the deaths of children under 5 years per 1000 live
births (deathun5).
β1 + β 2 gnipc + β3ilitrate + u
deathun5 =
where gnipc is the gross national income per capita and ilitrate is the adult (% 15 and older) illiteracy rate
in percentage.
With a sample of 130 countries (workfile hdr2010), the following estimation has been obtained:
·
deathun 5 = 27.91 - 0.000826 gnipc + 2.043 ilitrate
i i i
(5.93) (0.00028) (0.183)

The numbers in parentheses, below the estimates, are standard errors (se) of the estimators.
One of the questions posed by researchers is whether income has a negative influence on infant
mortality. To answer this question, the following hypothesis testing is carried out:
The null and alternative hypotheses, and the test statistic, are the following:
H 0 : β2 = 0 βˆ2 −0.000826
t= = = −2.966
H1 : β 2 < 0 se( β 2 )
ˆ 0.00028
Since the t value is relatively high, let us start testing with a level of 1%. For α=0.01,
0.01
t130 −1− 2 ≈ t600.01 =
2.390 . Given that t<-2.390, as is shown in figure 4.9, we reject H 0 in favour of H 1 .
Therefore, the gross national income per capita has an influence that is significantly negative in mortality
of children under 5.That is to say, the higher the gross national income per capita the lower the percentage
of mortality of children under 5. As H 0 has been rejected for α=0.01, it will also be rejected for levels of
5% and 10%.
In the alternative approach, as can be seen in figure 4.10, the p-value corresponding to a tβˆ =-
1

2.966 for a t with 61 df is equal to 0.0000. For all α>0.0000, such as 0.01, 0.05 and 0.10, H 0 is rejected.

p-value
0.01
0.05
t60 0.0000 t61
0.10 0.01
0.05
0.10

I
-1.671
-2.390
-2.966

-1.296

-1.671
-2.966

-2.390

-1.296

FIGURE 4.9. Example 4.2: Rejection region using FIGURE 4.10. Example 4.2: p-value using t with a
t with a left-tail alternative hypothesis. left-tail alternative hypothesis.

Two-tail alternative hypothesis


Consider now the null hypothesis

113
INTRODUCTION TO ECONOMETRICS

H0 : β j = 0
against the alternative hypothesis
H1 : β j ≠ 0

This is the relevant alternative when the sign of β j is not well determined by theory
or common sense. When the alternative is two-sided, we are interested in the absolute
value of the t statistic. This is a significance test.
In this case, the decision rule is the following:
Decision rule

If tβˆ ≥ tnα−/2k reject H0


j
(4-14)
α /2
If tβˆ < t n−k not reject H 0
j

Therefore, we reject H 0 : β j = 0 in favor of H1 : β j < 0 at α when tβˆ ≥ tnα−/k2 , as


j

can be seen in figure 4.11. In this case, in order to reject H 0 against H1 : β j ≠ 0 , we must
obtain a large enough tβˆ which is either positive or negative.
j

It is important to remark that as α decreases, tnα−/2k increases in absolute value.


In the alternative approach, once the p-value has been determined, we know that while
H 0 is rejected for any level of significance of α>p-value, the null hypothesis is not
rejected when α<p-value. In this case, the p-value is distributed between both tails in a
symmetrical way, as is shown in figure 4.12.
Non rejection region NRR
Rejection Rejection
region region
RR RR Non Rejected for α>p-value Non
tn−k rejected rejected
for for
tn−k α<p-value
α<p-value

p-value/2 p-value/2

α α
−tn−2k tn−2k
−tβˆ tβˆ
j
j

FIGURE 4.11. Rejection region using t: two-tail FIGURE 4.12. p-value using t: two-tail alternative
alternative hypothesis. hypothesis.
When a specific alternative hypothesis is not stated, it is usually considered to be
two-sided hypothesis testing. If H 0 is rejected in favor of H 1 at a given α, we usually say
that “x j is statistically significant at the level α”.
EXAMPLE 4.3 Has the rate of crime play a role in the price of houses in an area?
To explain housing prices in an American town, the following model is estimated:
β1 + β 2 rooms + β3lowstat + β 4 crime + u
price =
where rooms is the number of rooms of the house, lowstat is the percentage of people of “lower status” in
the area and crime is crimes committed per capita in the area.

114
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

The output for the fitted model, using the file hprice2 (first 55 observations), appears in table 4.2
and has been taken from E-views. The meaning of the first three columns is clear: “t-Statistic” is the
outcome to perform a significance test, that is to say, it is the ratio between the “Coefficient” and the “Std
error”; and “Prob” is the p-value to perform a two-tailed test.
In relation to this model, the researcher questions whether the rate of crime in an area plays a role
in the price of houses in that area.
To answer this question, the following procedure has been carried out.
In this case, the null and alternative hypothesis and the test statistic are the following:
H 0 : β4 = 0 βˆ4 −3854
t= = = −4.016
H1 : β 4 ≠ 0 se( β 4 )
ˆ 960

TABLE 4.2. Standard output in the regression explaining house price. n=55.
Variable Coefficient Std. Error t-Statistic Prob.
C -15693.61 8021.989 -1.956324 0.0559
ROOMS 6788.401 1210.720 5.606910 0.0000
LOWSTAT -268.1636 80.70678 -3.322690 0.0017
CRIME -3853.564 959.5618 -4.015962 0.0002
Since the t value is relatively high, let us by start testing with a level of 1%. For α=0.01,
t0.01/ 2
51 ≈t50 =
0.01/ 2
2.69 . (In the usual statistical tables for t distribution, there is no information for each df
above 20). Given that t > 2.69, we reject H 0 in favour of H 1 . Therefore, crime has a significant influence
on housing prices for a significance level of 1% and, thus, of 5% and 10%.
In the alternative approach, we can perform the test with more precision. In table 4.2 we see that
the p-value for the coefficient of crime is 0.0002. That means that the probability of the t statistic being
greater than 4.016 is 0.0001 and the probability of t being smaller than -4.016 is 0.0001. That is to say, the
p-value, as shown in Figure 4.13, is distributed in the two tails. As can be seen in this figure, H 0 is rejected
for all significance levels greater than 0.0002, such as 0.01, 0.05 and 0.10.
p-value/2 p-value/2
0,0002/2 t61 0,0002/2
0,01/2 0,01/2
0,05/2 0,05/2
0,10/2
0,10/2
-4,016

4,016

FIGURE 4.13. Example 4.3: p-value using t with a two-tail alternative hypothesis.
So far we have seen significant tests of one-tail and two-tails, in which a parameter
takes the value 0 in H 0 . Now we are going to look at a more general case where the
parameter in H 0 takes any value:
H 0 : β j = β 0j
Thus, the appropriate t statistic is
βˆ j − β 0j
tβˆ =
j
se( βˆ j )

115
INTRODUCTION TO ECONOMETRICS

As before, tβˆ measures how many estimated standard deviations βˆ j is away


j

from the hypothesized value of β 0j .


EXAMPLE 4.4 Is the elasticity expenditure in fruit/income equal to 1? Is fruit a luxury good?
To answer these questions, we are going to use the following model for the expenditure in fruit:
β1 + β 2 ln(inc) + β3 househsize + β 4 punders + u
ln( fruit ) =
where inc is disposable income of household, househsize is the number of household members and punder5
is the proportion of children under five in the household.
As the variables fruit and inc appear expressed in natural logarithms, then β 2 is the expenditure in
fruit/income elasticity. Using a sample of 40 households (workfile demand), the results of table 4.3 have
been obtained.

TABLE 4.3. Standard output in a regression explaining expenditure in fruit. n=40.


Variable Coefficient Std. Error t-Statistic Prob.
C -9.767654 3.701469 -2.638859 0.0122
LN(INC) 2.004539 0.512370 3.912286 0.0004
HOUSEHSIZE -1.205348 0.178646 -6.747147 0.0000
PUNDER5 -0.017946 0.013022 -1.378128 0.1767

Is the expenditure in fruit/income elasticity equal to 1?


To answer this question, the following procedure has been carried out:
In this case, the null and alternative hypothesis and the test statistic are the following:
H 0 : β2 = 1 βˆ2 − β 20 βˆ2 − 1 2.005 − 1
=t = = = 1.961
H1 : β 2 ≠ 1 se( βˆ2 ) se( βˆ2 ) 0.512

For α=0.10, we find that t360.10/ 2 ≈ t350.10/ 2 =


1.69 . As | t | >1.69, we reject H 0 . For α=0.05,
t
36 ≈t
0.05/ 2
=
0.05/ 2
35 2.03 . As | t | <2.03, we do not reject H 0 for α=0.05, nor for α=0.01. Therefore, we reject
that the expenditure on fruit/income elasticity is equal to 1 for α=0.10, but we cannot reject it for α=0.05,
nor for α=0.01.
Is fruit a luxury good?
According to economic theory, a commodity is a luxury good when its expenditure elasticity with
respect to income is higher than 1. Therefore, to answer to the second question, and taking into account that
the t statistic is the same, the following procedure has been carried out:
H 0 : β2 = 1 H1 : β 2 > 1 .
For α=0.10, we find that t360.10 ≈ t350.10 =
1.31 . As t>1.31, we reject H 0 in favour of H 1 . For α=0.05,
1.69 . As t>1.69, we reject H 0 in favour of H 1 . For α=0.01, t36 ≈ t35 =
t360.05 ≈ t350.05 =
0.01 0.01
2.44 . As t<2.44, we
do not reject H 0 . Therefore, fruit is a luxury good for α=0.10 and α=0.05, but we cannot reject H 0 in favour
of H 1 for α=0.01.
EXAMPLE 4.5 Is the Madrid stock exchange market efficient?
Before answering this question, we will examine some previous concepts. The rate of return of an
asset over a period of time is defined as the percentage change in the value invested in the asset during that
period of time. Let us now consider a specific asset: a share of an industrial company acquired in a Spanish
stock market at the end of one year and remains until the end of next year. Those two moments of time will
be denoted by t-1 and t respectively. The rate of return of this action within that year can be expressed by
the following relationship:

Pt Dt At
RAt = (4-15)
Pt1

116
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

where Pt: is the share price at the end of period t, Dt: are the dividends received by the share during the
period t, and At: is the value of the rights that eventually corresponded to the share during the period t
Thus, the numerator of (4-15) summarizes the three types of capital gains that have been received
for the maintenance of a share in year t; that is to say, an increase or decrease in quotation, dividends and
rights on capital increase. Dividing by Pt-1, we obtain the rate of profit on share value at the end of the
previous period. Of these three components, the most important one is the increase in quotation.
Considering only that component, the yield rate of the action can be expressed by
 Pt
RA1t = (4-16)
Pt1
or, alternatively if we use a relative rate of variation, by
RA2t =  ln Pt (4-17)
In the same way as Rat represents the rate of return of a particular share in either of the two
expressions, we can also calculate the rate of return of all shares listed in the stock exchange. The latter rate
of return, which will be denoted by RMt, is called the market rate of return.
So far we have considered the rate of return in a year, but we can also apply expressions such as
(4-16), or (4-17), to obtain daily rates of return. It is interesting to know whether the rates of return in the
past are useful for predicting rates of return in the future. This question is related to the concept of market
efficiency. A market is efficient if prices incorporate all available information, so there is no possibility of
making abnormal profits by using this information.
In order to test the efficiency of a market, we define the following model, using daily rates of
return defined by (4-16):
β1 + β 2 rmad t −1 + ut
=
rmad 92t  92 (4-18)
If a market is efficient, then the parameter β 2 of the previous model must be 0. Let us now compare
whether the Madrid Stock Exchange is efficient as a whole.
The model (4-18) has been estimated with daily data from the Madrid Stock Exchange for 1992,
using file bolmadef. The results obtained are the following:
· 92 = − 0.0004+ 0.1267 rmad 92
rmad t t- 1
(0.0007) (0.0629)

R2=0.0163 n=247
The results are paradoxical. On the one hand, the coefficient of determination is very low (0.0163),
which means that only 1.63% of the total variance of the rate of return is explained by the previous day’s
rate of return. On the other hand, the coefficient corresponding to the rate of significance of the previous
day is statistically significant at a level of 5% but not at a level of 1% given that the t statistic is equal to
0.1267/0.0629=2.02, which is slightly larger in absolute value than t245  t60 =2.00. The reason for this
0.01 0.01

apparent paradox is that the sample size is very high. Thus, although the impact of the explanatory variable
on the endogenous variable is relatively small (as indicated by the coefficient of determination), this finding
is significant (as evidenced by the statistical t) because the sample is sufficiently large.
To answer the question as to whether the Madrid Stock Exchange is an efficient market, we can
say that it is not entirely efficient. However, this response should be qualified. In financial economics there
is a dependency relationship of the rate of return of one day with respect to the rate corresponding to the
previous day. This relationship is not very strong, although it is statistically significant in many world stock
markets due to market frictions. In any case, market players cannot exploit this phenomenon, and thus the
market is not inefficient, according to the above definition of the concept of efficiency.
EXAMPLE 4.6 Is the rate of return of the Madrid Stock Exchange affected by the rate of return of the
Tokyo Stock Exchange?
The study of the relationship between different stock markets (NYSE, Tokyo Stock Exchange
Madrid Stock Exchange, London Stock Exchange, etc.) has received much attention in recent years due to
the greater freedom in the movement of capital and the use of foreign markets to reduce the risk in portfolio
management. This is because the absence of perfect market integration allows diversification of risk. In any

117
INTRODUCTION TO ECONOMETRICS

case, there is a world trend toward a greater global integration of financial markets in general and stock
markets in particular.
If markets are efficient, and we have seen in example 4.5 that they are, the innovations (new
information) will be reflected in the different markets for a period of 24 hours.
It is important to distinguish between two types of innovations: a) global innovations, which is
news generated around the world and has an influence on stock prices in all markets, b) specific innovations,
which is the information generated during a 24 hour period and only affects the price of a particular market.
Thus, information on the evolution of oil prices can be considered as a global innovation, while a new
financial sector regulation in a country would be considered a specific innovation.
According to the above discussion, stock prices quoted at a session of a particular stock market
are affected by the global innovations of a different market which had closed earlier. Thus, global
innovations included in the Tokyo market will influence the market prices of Madrid on the same day. The
following model shows the transmission of effects between the Tokyo Stock Exchange and the Madrid
Stock Exchange in 1992:
rmad92 t =β 1 +β 2 rtok92 t +u t (4-19)
where rmad92t is the rate of return of the Madrid Stock Exchange in period t and rtok92 t is the rate of
return of the Tokyo Stock Exchange in period t. The rates of return have been calculated according to (4-16).
In the working file madtok you can find general indices of the Madrid Stock Exchange and the
Tokyo Stock Exchange during the days both exchanges were open simultaneously in 1992. That is, we
eliminated observations for those days when any one of the two stock exchanges was closed. In total, the
number of observations is 234, compared to the 247 and 246 days that the Madrid and Tokyo Stock
Exchanges were open.
The estimation of the model (4-19) is as follows:
· 92 = − 0.0005+ 0.1244 rtok 92
rmad t t
(0.0007) (0.0375)

R2=0.0452 n=235
Note that the coefficient of determination is relatively low. However, for testing H 0 : β 2 =0, the
statistic t = (0.1244/0.0375) = 3.32, which implies that we reject the hypothesis that the rate of return of the
Tokyo Stock Exchange has no effect on the rate of return of the Madrid Stock Exchange, for a significance
level of 0.01.
Once again we find the same apparent paradox which appeared when we analyzed the efficiency
of the Madrid Stock Exchange in example 4.5 except for one difference. In the latter case, the rate of return
from the previous day appeared as significant due to problems arising in the elaboration of the general index
of the Madrid Stock Exchange.
Consequently, the fact that the null hypothesis is rejected implies that there is empirical evidence
supporting the theory that global innovations from the Tokyo Stock Exchange are transmitted to the quotes
of the Madrid Stock Exchange that day.

4.2.2 Confidence intervals


Under the CLM, we can easily construct a confidence interval (CI) for the
population parameter, β j . CI are also called interval estimates because they provide a
range of likely values for β j , and not just a point estimate.
The CI is built in such a way that the unknown parameter is contained within the
range of the CI with a previously specified probability.
By using the fact that
bˆ j - b j
: tn- k
se(bˆ )
j

118
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

 βˆ j − β j α /2 
Pr  −tnα−/2k ≤ ≤ tn − k  =1−α
 se( βˆ j ) 
Operating to put the unknown β j alone in the middle of the interval, we have

Pr  βˆ j − se( βˆ j ) × tnα−/2k ≤ β j ≤ βˆ j + se( βˆ j ) × tnα−/2k  =−


1 α

Therefore, the lower and upper bounds of a (1-α) CI respectively are given by
βˆ j se( βˆ j ) × tnα−/2k
β j =−

βˆ j se( βˆ j ) × tnα−/2k
β j =+

If random samples were obtained over and over again with β j , and β j computed
each time, then the (unknown) population value would lie in the interval ( β j , β j ) for (1
− α)% of the samples. Unfortunately, for the single sample that we use to construct CI,
we do not know whether β j is actually contained in the interval.
Once a CI is constructed, it is easy to carry out two-tailed hypothesis tests. If the
null hypothesis is H 0 : β j = a j , then H 0 is rejected against H1 : β j ≠ a j at (say) the 5%
significance level if, and only if, a j is not in the 95% CI.
To illustrate this matter, in figure 4.14 we constructed confidence intervals of 90%,
95% and 99%, for the marginal propensity to consumption -β 2 - corresponding to example
4.1.
0,99
0,95
0,90
0.968

1.011
0.675

0.739

0.843

0.947
0.718

FIGURE 4.14. Confidence intervals for marginal propensity to consume in example 4.1.

4.2.3 Testing hypotheses about a single linear combination of the parameters


In many applications we are interested in testing a hypothesis involving more than
one of the population parameters. We can also use the t statistic to test a single linear
combination of the parameters, where two or more parameters are involved.
There are two different procedures to perform the test with a single linear
combination of parameters. In the first, the standard error of the linear combination of
parameters corresponding to the null hypothesis is calculated using information on the
covariance matrix of the estimators. In the second, the model is reparameterized by
introducing a new parameter derived from the null hypothesis and the reparameterized
model is then estimated; testing for the new parameter indicates whether the null
hypothesis is rejected or not. The following example illustrates both procedures.

119
INTRODUCTION TO ECONOMETRICS

EXAMPLE 4.7 Are there constant returns to scale in the chemical industry?
To examine whether there are constant returns to scale in the chemical sector, we are going to use
the Cobb-Douglas production function, given by
β1 + β 2 ln(labor ) + β3 ln(capital ) + u
ln(output ) = (4-20)
In the above model parameters β 2 and β 3 are elasticities (output/labor and output/capital).
Before making inferences, remember that returns to scale refers to a technical property of the
production function examining changes in output subsequent to a change of the same proportion in all
inputs, which are labor and capital in this case. If output increases by that same proportional change then
there are constant returns to scale. Constant returns to scale imply that if the factors labor and capital
increase at a certain rate (say 10%), output will increase at the same rate (e.g., 10%). If output increases by
more than that proportion, there are increasing returns to scale. If output increases by less than that
proportional change, there are decreasing returns to scale. In the above model, the following occurs
- if β 2 +β 3 =1, there are constant returns to scale.
- if β 2 +β 3 >1, there are increasing returns to scale.
- if β 2 +β 3 <1, there are decreasing returns to scale.
Data used for this example are a sample of 27 companies of the primary metal sector (workfile
prodmet), where output is gross value added, labor is a measure of labor input, and capital is the gross
value of plant and equipment. Further details on construction of the data are given in Aigner, et al. (1977)
and in Hildebrand and Liu (1957); these data were used by Greene in 1991. The results obtained in the
estimation of model (4-20), using any econometric software available, appear in table 4.4.

TABLE 4.4. Standard output of the estimation of the production function:


model (4-20).
Variable Coefficient Std. Error t-Statistic Prob.
constant 1.170644 0.326782 3.582339 0.0015
ln(labor) 0.602999 0.125954 4.787457 0.0001
ln(capital) 0.375710 0.085346 4.402204 0.0002

To answer the question posed in this example, we must test


H 0 : β 2 + β3 =
1
against the following alternative hypothesis
H1 : β 2 + β 3 ≠ 1
According to H 0 , it is stated that β 2 + β 3 − 1 =0 . Therefore, the t statistic must now be based on
whether the estimated sum βˆ + βˆ − 1 is sufficiently different from 0 to reject H 0 in favor of H 1 .
2 3

Two procedures will be used to test this hypothesis. In the first, the covariance matrix of the
estimators is used. In the second, the model is reparameterized by introducing a new parameter.
Procedure: using covariance matrix of estimators
According to H 0 , it is stated that β 2 + β 3 − 1 =0 . Therefore, the t statistic must now be based on
whether the estimated sum βˆ2 + βˆ3 − 1 is sufficiently different from 0 to reject H 0 in favor of H 1 . To
account for the sampling error in our estimators, we standardize this sum by dividing by its standard error:
βˆ + βˆ3 − 1
tβˆ + βˆ = 2
2 3
se( βˆ2 + βˆ3 )
Therefore, if tβˆ is large enough, we will conclude, in a two side alternative test, that there are
2 + β3
ˆ

not constant returns to scale. On the other hand, if tβˆ is positive and large enough, we will reject, in a
2 + β3
ˆ

one side alternative test (right), H 0 in favour of H1 : β 2 + β 3 > 1 . Therefore, there are increasing returns to
scale.

120
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

On the other hand , we have


·
se( βˆ2 + β=
ˆ)
3 var( βˆ2 + βˆ3 )
where
· · · ·
var( βˆ2 +=
βˆ3 ) var( βˆ2 ) + var( βˆ3 ) + 2 × covar( βˆ2 , βˆ3 )
Hence, to compute se( βˆ2 + βˆ3 ) you need information on the estimated covariance of estimators.
Many econometric software packages (such as e-views) have an option to display estimates of the
covariance matrix of the estimator vector ’. In this case, the covariance matrix obtained appears in table 4.5.
Using this information, we have
se( β
ˆ + βˆ )
= 2 3 0.015864 + 0.007284 − 2 × 0.009616
= 0.0626
βˆ2 + βˆ3 − 1 −0.02129
tβˆ + βˆ
= = = −0.3402
2 3
se( βˆ2 + βˆ3 ) 0.0626

TABLE 4.5. Covariance matrix in the production function.


constant ln(labor) ln(capital)
constant 0.106786 -0.019835 0.001189
ln(labor)) -0.019835 0.015864 -0.009616
ln(capital) 0.001189 -0.009616 0.007284
Given that t=0.3402, it is clear that we cannot reject the existence of constant returns to scale for
the usual significance levels. Given that the t statistic is negative, it makes no sense to test whether there
are increasing returns to scale
Procedure: reparameterizing the model by introducing a new parameter
It is easier to perform the test if we apply the second procedure. A different model is estimated in
this procedure, which directly provides the standard error of interest. Thus, let us define:
θ = β 2 + β3 − 1
thus, the null hypothesis that there are constant returns to scale is equivalent to saying that H 0 : θ = 0 .
From the definition of θ, we have β 2 =θ − β 3 + 1 . Substituting β 2 in the original equation:
ln(output ) = β1 + (θ − β 3 + 1) ln(labor ) + β 3 ln(capital ) + u
Hence,
β1 + θ ln(labor ) + β3 ln(capital / labor ) + u
ln(output / labor ) =
Therefore, to test whether there are constant returns to scale is equivalent to carrying out a
significance test on the coefficient of ln(labor) in the previous model. The strategy of rewriting the model
so that it contains the parameter of interest works in all cases and is usually easy to implement. If we apply
this transformation to this example, we obtain the results of Table 4.6.
As can be seen we obtain the same result:
θˆ
tθˆ = = −0.3402
se(θˆ)

TABLE 4.6. Estimation output for the production function: reparameterized model.
Variable Coefficient Std. Error t-Statistic Prob.
constant 1.170644 0.326782 3.582339 0.0015
ln(labor) -0.021290 0.062577 -0.340227 0.7366
ln(capital/labor) 0.375710 0.085346 4.402204 0.0002

121
INTRODUCTION TO ECONOMETRICS

EXAMPLE 4.8 Advertising or incentives?


The Bush Company is engaged in the sale and distribution of gifts imported from the Near East.
The most popular item in the catalog is the Guantanamo bracelet, which has some relaxing properties. The
sales agents receive a commission of 30% of total sales amount. In order to increase sales without expanding
the sales network, the company established special incentives for those agents who exceeded a sales target
during the last year.
Advertising spots were radio broadcasted in different areas to strengthen the promotion of sales.
In those spots special emphasis was placed on highlighting the well-being of wearing a Guantanamo
bracelet.
The manager of the Bush Company wonders whether a dollar spent on special incentives has a
higher incidence on sales than a dollar spent on advertising. To answer that question, the company's
econometrician suggests the following model to explain sales:
β1 + β 2 advert + β3incent + u
sales =
where incent are incentives to the salesmen and advert are expenditures in advertising. The variables sales,
incent and advert are expressed in thousands of dollars.
Using a sample of 18 sale areas (workfile advincen), we have obtained the output and the
covariance matrix of the coefficients that appear in table 4.7 and in table 4.8 respectively.

TABLE 4.7. Standard output of the regression for example 4.8.


Variable Coefficient Std. Error t-Statistic Prob.
constant 396.5945 3548.111 0.111776 0.9125
advert 18.63673 8.924339 2.088304 0.0542
incent 30.69686 3.604420 8.516448 0.0000

TABLE 4.8. Covariance matrix for example 4.8.


C ADVERT INCENT
constant 12589095 -26674 -7101
advert -26674 79.644 2.941
incent -7101 2.941 12.992
In this model, the coefficient β 2 indicates the increase in sales produced by a dollar increase in
spending on advertising, while β 3 indicates the increase caused by a dollar increase in the special incentives,
holding fixed in both cases the other regressor.
To answer the question posed in this example, the null and the alternative hypothesis are the
following:
H 0 : β3 − β 2 =
0
H1 : β 3 − β 2 > 0
The t statistic is built using information about the covariance matrix of the estimators:
βˆ3 − βˆ2
tβˆ =
3 − β2
se( βˆ3 − βˆ2 )
ˆ

se( βˆ3 =
− βˆ2 ) 79.644 + 12.992 − 2 × =
2.941 9.3142
βˆ3 − βˆ2 30.697 − 18.637
=
tβˆ − βˆ = = 1.295
3 2
se( βˆ3 − βˆ2 ) 9.3142

For α=0.10, we find that t150.10 = 1.341 . As t<1.341, we do not reject H 0 for α=0.10, nor for α=0.05
or α=0.01. Therefore, there is no empirical evidence that a dollar spent on special incentives has a higher
incidence on sales than a dollar spent on advertising.
EXAMPLE 4.9 Testing the hypothesis of homogeneity in the demand for fish
In the case study in chapter 2, models for demand for dairy products have been estimated from
cross-sectional data, using disposable income as an explanatory variable. However, the price of the product

122
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

itself and, to a greater or lesser extent, the prices of other goods are determinants of the demand. The
demand analysis based on cross sectional data has precisely the limitation that it is not possible to examine
the effect of prices on demand because prices remain constant, since the data refer to the same point in time.
To analyze the effect of prices it is necessary to use time series data or, alternatively, panel data. We will
briefly examine some aspects of the theory of demand for a good and then move to the estimation of a
demand function with time series data. As a postscript to this case, we will test one of the hypotheses which,
under certain circumstances, a theoretical model must satisfy.
The demand for a commodity - say good j - can be expressed, according to an optimization process
carried out by the consumer, in terms of disposable income, the price of the good and the prices of the other
goods. Analytically:
q j = f j ( p1 , p2 , L , p j , L , pm , di ) (4-21)
where

- di is the disposable income of the consumer.


- p1 , p2 , L , p j , L pm are the prices of the goods which are taken into account by
consumers when they acquire the good j.
Logarithmic models are attractive in studies on demand,, because the coefficients are directly
elasticities. The log model is given by

ln(q j ) = β1 + β 2 ln( p1 ) + β 3 ln( p2 ) +  + β j ln( p j ) +  + β m +1 ln( pm ) + β m + 2 ln( R ) + u (4-22)


It is clear to see that all β coefficients, excluding the constant term, are elasticities of different
types and therefore are independent of the units of measurement for the variables. When there is no money
illusion, if all prices and income grow at the same rate, the demand for a good is not affected by these
changes. Thus, assuming that prices and income are multiplied by λ, if the consumer has no money illusion,
the following should be satisfied
f j (l p1 , l p2 , L , l p j , L , l pm , l R) = f j ( p1 , p2 ,L , p j ,L pm , di ) (4-23)
From a mathematical point of view, the above condition implies that the demand function must be
homogeneous of degree 0. This condition is called the restriction of homogeneity. Applying Euler's theorem,
the restriction of homogeneity in turn implies that the sum of the demand/income elasticity and of all
demand/price elasticities is zero, i.e.:
m

∑ε
h =1
q j / ph + εqj /R =
0 (4-24)

This restriction applied to the logarithmic model (4-22) implies that


β 2 + β 3 +  + β j +  + β m +1 + β m + 2 =
0 (4-25)
In practice, when estimating a demand function, the prices of many goods are not included, but
only those that are closely related, either because they are complementary or substitute goods. It is also
well known that the budgetary allocation of spending is carried out in several stages.
Next, the demand for fish in Spain will be studied by using a model similar to (4-22). Let us
consider that in a first assignment, the consumer distributes its income between total consumption and
savings. In a second stage, the consumption expenditure by function is performed taking into account the
total consumption and the relevant prices in each function. Specifically, we assume that the only relevant
price in the demand for fish is the price of the good (fish) and the price of the most important substitute
(meat).
Given the above considerations, the following model is formulated:
ln( fish )= β1 + β 2 ln( fishpr ) + β 3 ln(meatpr ) + β 4 ln(cons ) + u (4-26)
where fish is fish expenditure at constant prices, fishpr is the price of fish, meatpr is the price of meat and
cons is total consumption at constant prices.
The workfile fishdem contains information about this series for the period 1964-1991. Prices are
index numbers with 1986 as a base, and fish and cons are magnitudes at constant prices with 1986 as a base
also. The results of estimating model (4-26) are as follows:

123
INTRODUCTION TO ECONOMETRICS

· fish ) =7.788- 0.460 ln( fishpr ) + 0.554 ln(meatpr ) + 0.322 ln(cons )


ln(
(2.30) (0.133) (0.112) (0.137)

As can be seen, the signs of the elasticities are correct: the elasticity of demand is negative with
respect to the price of the good, while the elasticities with respect to the price of the substitute good and
total consumption are positive
In model (4-26) the homogeneity restriction implies the following null hypothesis:
β 2 + β3 + β 4 =
0 (4-27)
To carry out this test, we will use a similar procedure to the one used in example 4.6. Now, the
parameter θ is defined as follows
θ = β 2 + β3 + β 4 (4-28)
Setting β 2 =θ − β 3 − β 4 , the following model has been estimated:
β1 + θ ln( fishpr ) + β3 ln(meatpr / fishpr ) + β 4 ln(cons / fishpr ) + u
ln( fish ) =
(4-29)
The results obtained were the following:
· fish ) =7.788- 0.4596 ln( fishpr ) + 0.554 ln(meatpr ) + 0.322 ln(cons )
ln( i i i i
(2.30) (0.1334) (0.112) (0.137)

Using (4-28), testing the null hypothesis (4-27) is equivalent to testing that the coefficient of
ln(fishpr) in (4-29) is equal to 0. Since the t statistic for this coefficient is equal to -3.44 and t240.01/ 2 =2.8,
we reject the hypothesis of homogeneity regarding the demand for fish.

4.2.4 Economic importance versus statistical significance


Up until now we have emphasized statistical significance. However, it is
important to remember that we should pay attention to the magnitude and the sign of the
estimated coefficient in addition to t statistics.
Statistical significance of a variable x j is determined entirely by the size of tβˆ ,
j

whereas the economic significance of a variable is related to the size (and sign) of βˆ j .
Too much focus on statistical significance can lead to the false conclusion that a variable
is “important” for explaining y, even though its estimated effect is modest.
Therefore, even if a variable is statistically significant, you need to discuss the
magnitude of the estimated coefficient to get an idea of its practical or economic
importance.

4.3 Testing multiple linear restrictions using the F test.


So far, we have only considered hypotheses involving a single restriction. But
frequently, we wish to test multiple hypotheses about the underlying parameters
β1 , β 2 , β3 ,, β k .
In multiple linear restrictions, we will distinguish three types: exclusion
restrictions, model significance and other linear restrictions.

124
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

4.3.1 Exclusion restrictions

Null and alternative hypotheses; unrestricted and restricted model


We begin with the leading case of testing whether a set of independent variables
has no partial effect on the dependent variable, y. These are called exclusion restrictions.
Thus, considering the model
y=β1 + β 2 x2 + β3 x3 + β 4 x4 + β5 x5 + u (4-30)
the null hypothesis in a typical example of exclusion restrictions could be the following:
H 0 : β=
4 β=
5 0
This is an example of a set of multiple restrictions, because we are putting more
than one restriction on the parameters in the above equation. A test of multiple restrictions
is called a joint hypothesis test.
The alternative hypothesis can be expressed in the following way
H 1 : H 0 is not true
It is important to remark that we test the above H 0 jointly, not individually. Now,
we are going to distinguish between unrestricted (UR) and restricted (R) models. The
unrestricted model is the reference model or initial model. In this example the unrestricted
model is the model given in (4-30). The restricted model is obtained by imposing H 0 on
the original model. In the above example, the restricted model is
y=β1 + β 2 x2 + β3 x3 + u
By definition, the restricted model always has fewer parameters than the
unrestricted one. Moreover, it is always true that
RSS R ≥RSSUR
where RSSR is the RSS of the restricted model, and RSSUR is the RSS of the unrestricted
model. Remember that, because OLS estimates are chosen to minimize the sum of squared
residuals, the RSS never decreases (and generally increases) when certain restrictions
(such as dropping variables) are introduced into the model.
The increase in the RSS when the restrictions are imposed can tell us something
about the likely truth of H0. If we obtain a large increase, this is evidence against H0, and
this hypothesis will be rejected. If the increase is small, this is not evidence against H0,
and this hypothesis will not be rejected. The question is therefore whether the observed
increase in the RSS when the restrictions are imposed is large enough, relative to the RSS
in the unrestricted model, to warrant rejecting H0.
The answer depends on α, but we cannot carry out the test at a chosen α until we
have a statistic whose distribution is known, and is tabulated, under H0. Thus, we need a
way to combine the information in RSSR and RSSUR to obtain a test statistic with a known
distribution under H0.
Now, let us look at the general case, where the unrestricted model is
y = β1 + β 2 x2 + β3 x3 +  + β k xk +u (4-31)

125
INTRODUCTION TO ECONOMETRICS

Let us suppose that there are q exclusion restrictions to test. H0 states that q of the
variables have zero coefficients. Assuming that they are the last q variables, H0 is stated
as
H 0 : β k − q +=
1 β k − q +=
2 = β=
 k 0 (4-32)
The restricted model is obtained by imposing the q restrictions of H0 on the
unrestricted model.
y = β1 + β 2 x2 + β3 x3 +  + β k − q xk − q +u (4-33)
H1 is stated as
H1: H0 is not true (4-34)

Test statistic: F ratio


The F statistic, or F ratio, is defined by
( RSS R − RSSUR ) / q
F= (4-35)
RSSUR / (n − k )
where RSSR is the RSS of the restricted model, and RSSUR is the RSS of the unrestricted
model and q is the number of restrictions; that is to say, the number of equalities in the
null hypothesis.
In order to use the F statistic for a hypothesis testing, we have to know its sampling
distribution under H0 in order to choose the value c for a given α, and determine the
rejection rule. It can be shown that, under H0, and assuming the CLM assumptions hold,
the F statistic is distributed as a Snedecor’s F random variable with (q,n-k) df. We write
this result as
F | H 0 : Fq ,n- k (4-36)
A Snedecor’s F with q degrees of freedom in the numerator and n-k de degrees of
freedom in the denominator is equal to
/q
2

Fq ,n − k =
x q
(4-37)
x n−k / n − k
2

where xq2 and xn2− k are Chi-square distributions that are independent of each other.
In (4-35) we see that the degrees of freedom corresponding to RSSUR (dfUR)are n-
k. Remember that
RSSUR
σˆUR
2
= (4-38)
n−k
On the other hand, the degrees of freedom corresponding to RSSR (dfR) are n-k+q,
because in the restricted model k-q parameters are estimated. The degrees of freedom
corresponding to RSSR-RSSUR are
(n-k+q)-(n-k)=q = numerator degrees of freedom=dfR-dfUR

126
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

Thus, in the numerator of F, the difference in RSS´s is divided by q, which is the


number of restrictions imposed when moving from the unrestricted to the restricted model.
In the denominator of F, RSSUR is divided by dfUR. In fact, the denominator of F is simply
the unbiased estimator of σ2 in the unrestricted model.
The F ratio must be greater than or equal to 0, since SSRR − SSRUR ≥ 0 .
It is often useful to have a form of the F statistic that can be computed from the
R2 of the restricted and unrestricted models.
=
Using the fact that RSS R =
TSS (1 − RR2 ) and RSSUR TSS (1 − RUR
2
) , we can write
(4-35) as the following
2
( RUR − RR2 ) / q
F= (4-39)
(1 − RUR
2
) / (n − k )
since the SST term is cancelled.
This is called the R-squared form of the F statistic.
Whereas the R-squared form of the F statistic is very useful for testing exclusion
restrictions, it cannot be applied for testing all kinds of linear restrictions. For example,
the F ratio (4-39) cannot be used when the model does not have intercept or when the
functional form of the endogenous variable in the unrestricted model is not the same as
in the restricted model.

Decision rule
The Fq,n-k distribution is tabulated and available in statistical tables, where we look
for the critical value ( Fqα,n − k ), which depends on α (the significance level), q (the df of the
numerator), and n-k, (the df of the denominator). Taking into account the above, the
decision rule is quite simple.
Decision rule

If F ≥ Fqα,n − k reject H0
(4-40)
If F < Fqα,n − k not reject H 0

Therefore, we reject H0 in favor of H1 at α when F ≥ Fqα,n − k , as can be seen in


figure 4.15. It is important to remark that as α decreases, Fqα,n − k increases. If H0 is rejected,
then we say that xk − q +1 , xk − q + 2 , , xk are jointly statistically significant, or just jointly
significant, at the selected significance level.
This test alone does not allow us to say which of the variables has a partial effect
on y; they may all affect y or only one may affect y. If H0 is not rejected, then we say that
xk − q +1 , xk − q + 2 , , xk are jointly not statistically significant, or simply jointly not significant,
which often justifies dropping them from the model. The F statistic is often useful for
testing the exclusion of a group of variables when the variables in the group are highly
correlated.

127
INTRODUCTION TO ECONOMETRICS

Non
Rejection
Rejection Rejected Non rejected
Region
Region
RR for for
NRR
α>p-value α<p-value

Fq ,n−k
Fq ,n−k
α
p-value

Fqα,n−k F
FIGURE 4.15. Rejection region and non rejection FIGURE 4.16. p-value using F distribution.
region using F distribution.

In the F testing context, the p-value is defined as


= Pr( F > F ' | H 0 )
p - value
where F is the actual value of the test statistic and F ' denotes a Snedecor’s F random
variable with (q,n-k) df.
The p-value still has the same interpretation as for t statistics. A small p-value is
evidence against H0, while a large p-value is not evidence against H0. Once the p-value
has been computed, the F test can be carried out at any significance level. In figure 4.16
this alternative approach is represented. As can be seen by observing the figure, the
determination of the p-value is the inverse operation to find the value in the statistical
tables for a given significance level. Once the p-value has been determined, we know that
H0 is rejected for any level of significance of α>p-value, whereas the null hypothesis is
not rejected when α<p-value.
EXAMPLE 4.10 Wage, experience, tenure and age
The following model has been built to analyze the determinant factors of wage:
β1 β 2 educ + β3 exper + β 4 tenure + β5 age + u
ln( wage) =+
where wage is monthly earnings, educ is years of education, exper is years of work experience, tenure is
years with current employer, and age is age in years.
The researcher is planning to exclude tenure from the model, since in many cases it is equal to
experience, and also age, because it is highly correlated with experience. Is the exclusion of both variables
acceptable?
The null and alternative hypotheses are the following:
H 0 : β=
4 β=
5 0
H1 : H 0 is not true
The restricted model corresponding to this H0 is
β1 + β 2 educ + β 3 exper + u
ln( wage) =
Using a sample consisting of 53 observations from workfile wage2, we have the following
estimations for the unrestricted and for the restricted models:
·wage ) = 6.476 + 0.0658educ + 0.0267exper - 0.0094tenure - 0.0209age RSS = 5.954
ln( i i i i i
·wage ) = 6.157 + 0.0457educ + 0.0121exper
ln( RSS = 6.250
i i i

128
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

The F ratio obtained is the following:


( RSS R − RSSUR ) / q (6.250 − 5.954) / 2
=F = = 1.193
RSSUR / (n − k ) 5.954 / 48
Given that the F statistic is low, let us see what happens with a significance level of 0.10. In this
case the degrees of freedom for the denominator are 48 (53 observations minus 5 estimated parameters). If
0.10 0.10
we look in the F statistical table for 2 df in the numerator and 45 df in the denominator, we find F2,48 ; F2,45
=2.42. As F<2.42, we do not reject H0. If we do not reject H0 for 0.10, we will not reject H0 for 0.05 or 0.01,
as can been in figure 4.17. Therefore, we cannot reject H0 in favor of H1. In other words tenure and age are
not jointly significant.
0,10
Non Rejection
Rejection
Region
Region RR
NRR
0,05
0.05 0,01
0.01

0
2.42

3.23

5.18
2,44

3,23

5,18
1,980
1.980

FIGURE 4.17. Example 4.10: Rejection region using F distribution (α values are from a F2.40).

4.3.2 Model significance


Testing model significance, or overall significance, is a particular case of testing
exclusion restrictions. Model significance means global significance of the model. One
could think that the H 0 in this test is the following:

H 0 : β=
1 β=
2 β=
3 = β=
 k 0 (4-41)

However, this is not the adequate H 0 to test for the global significance of the
model. If β=
2 β=
3 = β=
 k 0 , then the restricted model would be the following:
y = β1 +u (4-42)
If we take expectations in (4-42), then we have
E ( y ) = β1 (4-43)

Thus, H 0 in (4-41) states not only that the explanatory variables have no
influence on the endogenous variable, but also that the mean of the endogenous variable–
for example, the consumption mean- is equal to 0.
Therefore, if we want to know whether the model is globally significant, the H 0
must be the following:
H 0 : β=
2 β=
3 = β=
 k 0 (4-44)
The corresponding restricted model given in (4-42) does not explain anything and,
therefore, RR2 is equal to 0. Testing the H 0 given in (4-44) is very easy by using the R-
squared form of the F statistic:

129
INTRODUCTION TO ECONOMETRICS

R2 / k
F= (4-45)
(1 − R 2 ) / (n − k )

where R 2 is the RUR


2
, since only the unrestricted model needs to be estimated, because
the R 2 of the model (4-42) – restricted model- is 0.
EXAMPLE 4.11 Salaries of CEOs
Consider the following equation to explain salaries of Chief Executive Officers (CEOs) as a
function of annual firm sales, return on equity (roe, in percent form), and return on the firm's stock (ros, in
percent form):
ln(salary) = β1+β2ln(sales)+β3roe+β4ros+ u.
The question posed is whether the performance of the company (sales, roe and ros) is crucial to
set the salaries of CEOs. To answer this question, we will carry out an overall significance test. The null
and alternative hypotheses are the following:
H 0 : β=
2 β=
3 β=
4 0
H1: H0 is not true
Table 4.9 shows an E-views complete output for least square (ls) using the filework ceosal1. At
the bottom the “F-statistic” can be seen for overall test significance, as well as “Prob”, which is the p-value
corresponding to this statistic. In this case the p-value is equal to 0, that is to say, H0 is rejected for all
significance levels (See figure 4.18). Therefore, we can reject that the performance of a company has no
influence on the salary of a CEO.
p-value
0,0000

0,10 0,05 0,01

F3,205

0
2,12

2,67

3,92

26,93

FIGURE 4.18. Example 4.11: p-value using F distribution (α values are for a F3,140).

130
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

TABLE 4.9. Complete output from E-views in the example 4.11.


Dependent Variable:
LOG(SALARY)
Method: Least Squares
Date: 04/12/12 Time: 19:39
Sample: 1 209
Included observations: 209

Variable Coefficient Std. Error t-Statistic Prob.

C 4.311712 0.315433 13.66919 0.0000


LOG(SALES) 0.280315 0.03532 7.936426 0.0000
ROE 0.017417 0.004092 4.255977 0.0000
ROS 0.000242 0.000542 0.446022 0.6561

R-squared 0.282685 Mean dependent var 6.950386


Adjusted R-squared 0.272188 S.D. dependent var 0.566374
S.E. of regression 0.483185 Akaike info criterion 1.402118
Sum squared resid 47.86082 Schwarz criterion 1.466086
Log likelihood -142.5213 F-statistic 26.9293
Durbin-Watson stat 2.033496 Prob(F-statistic) 0.0000

4.3.3 Testing other linear restrictions


So far, we have tested hypotheses with exclusion restrictions using the F statistic.
But we can also test hypotheses with linear restrictions of any kind. Thus, in the same test
we can combine exclusion restrictions, restrictions that impose determined values to the
parameters and restrictions on linear combination of parameters.
Therefore, let us consider the following model
y=β1 + β 2 x2 + β3 x3 + β 4 x4 + β5 x5 + u
and the null hypothesis:
 β 2 + β3 =1

H0 :  β4 = 3
 β5 = 0

The restricted model corresponding to this null hypothesis is
( y − x2 − 3x4 ) = β1 + β3 ( x3 − x2 ) + u
In the example 4.12, the null hypothesis consists of two restrictions: a linear
combination of parameters and an exclusion restriction.
EXAMPLE 4.12 An additional restriction in the production function. (Continuation of example 4.7)
In the production function of Cobb-Douglas, we are going to test the following H0 which has two
restrictions:
 β + β3 = 1
H0 :  2
 β 1 = 0
H1 : H 0 is not true

131
INTRODUCTION TO ECONOMETRICS

In the first restriction we impose that there are constant returns to scale. In the second restriction
that β1, parameter linked to the total factor productivity is equal to 0.
Substituting the restriction of H0 in the original model (unrestricted model), we have
(1 − β 3 ) ln(labor ) + β 3 ln(capital ) + u
ln(output ) =
Operating, we obtain the restricted model:
ln(output / labor ) β 3 ln(capital / labor ) + u
=
In estimating the unrestricted and restricted models, we get RSSR=3.1101 and RSSUR=0.8516.
Therefore, the F ratio is
( RSS R − RSSUR ) / q (3.1101 − 0.8516) / 2
=F = = 13.551
RSSUR / (n − k ) 0.8516 / (27 − 3)
There are two reasons for not using R2 in this case. First, the restricted model has no intercept.
Second, the regressand of the restricted model is different from the regressand of the unrestricted model.
Since the F value is relatively high, let us start by testing with a level of 1%. For α=0.01,
0.01
F2,24 = 5.61 . Given that F>5.61, we reject H0 in favour of H1. Therefore, we reject the joint hypotheses that
there are constant returns to scale and that the parameter β1 is equal to 0. If H0 is rejected for α=0.01, it will
also be rejected for levels of 5% and 10%.

4.3.4 Relation between F and t statistics


So far, we have seen how to use the F statistic to test several restrictions in the
model, but it can be used to test a single restriction. In this case, we can choose between
using the F statistic or the t statistic to carry out a two-tail test. The conclusions would,
nevertheless, be exactly the same.
But, what is the relationship between an F with one degree of freedom in the
numerator (to test a single restriction) and a t? It can be shown that
tn2− k = F1,n − k (4-46)
This fact is illustrated in figure 4.19. We observe that the tail of the F splits into
the two tails of the t. Hence, the two approaches lead to exactly the same outcome,
provided that the alternative hypothesis is two-sided. However, the t statistic is more
flexible for testing a single hypothesis, because it can be used to test H 0 against one-tail
alternatives.

Rejection Non rejection region NRR Rejection


Non Rejection Rejection
Region NRR Region RR region region
RR tn−k RR

F1,n−k

α −tn−2k
α α
tn−2k
− F1,nα−k F1,nα−k

0 F1,nα−k
FIGURE 4.19. Relationship between a F1,n-k and a t n-k.

132
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

Moreover, since the t statistics are also easier to obtain than the F statistics, there
is no good reason for using an F statistic to test a hypothesis with a unique restriction.

4.4 Testing without normality


The normality of the OLS estimators depends crucially on the normality
assumption of the disturbances. What happens if the disturbances do not have a normal
distribution? We have seen that the disturbances under the Gauss-Markov assumptions,
and consequently the OLS estimators are asymptotically normally distributed, i.e.
approximately normally distributed.
If the disturbances are not normal, the t statistic will only have an approximate t
distribution rather than an exact one. As it can be seen in the t student table, for a sample
size of 60 observations the critical points are practically equal to the standard normal
distribution.
Similarly, if the disturbances are not normal, the F statistic will only have an
approximate F distribution rather than an exact one, when the sample size is large enough
and the Gauss-Markov assumptions are fulfilled. Therefore, we can use the F statistic to
test linear restrictions in linear models as an approximate test.
There are other asymptotic tests (the likelihood ratio, Lagrange multiplier and
Wald tests) based on the likelihood functions that can be used in testing linear restriction
if the disturbances are non-normally distributed. These three can also be applied when a)
the restrictions are nonlinear; and b) the model is nonlinear in the parameters. For non-
linear restrictions, in linear and non-linear models, the most widely used test is the Wald
test.
For testing the assumptions of the model (for example, homoskedasticity and no
autocorrelation) the Lagrange multiplier (LM) test is usually applied. In the application
of the LM test, an auxiliary regression is often run. The name of auxiliary regression
means that the coefficients are not of direct interest: only the R2 is retained. In an auxiliary
regression the regressand is usually the residuals (or functions of the residuals), obtained
in the OLS estimation of the original model, while the regressors are often the regressors
(and/or functions of them) of the original model.

4.5 Prediction
In this section two types of prediction will be examined: point and interval
prediction.

4.5.1 Point prediction


Obtaining a point prediction does not pose any special problems, since it is a
simple extrapolation operation in the context of descriptive methods.
Let x20 , x30 , , xk0 denote the particular values in each of the k regressors for
prediction; these may or may not correspond to an actual data point in our sample. If we
substitute these values in the multiple regression model, we have
y 0 = β1 + β 2 x20 + β3 x30 + ... + β k xk0 + u 0 = θ 0 + u 0 (4-47)
Therefore, the expected, or mean, value of y is given by

133
INTRODUCTION TO ECONOMETRICS

E ( y 0 ) = β1 + β 2 x20 + β3 x30 + ... + β k xk0 = θ 0 (4-48)


The point prediction is obtained straightaway by replacing the parameters of
(4-48) by the corresponding OLS estimators:
θˆ0 = βˆ1 + βˆ2 x20 + βˆ3 x30 +  + βˆk xk0 (4-49)
To obtain (4-49) we did not need any assumption. But, if we adopt the
assumptions 1 to 6, we will immediately find that that θˆ0 is an unbiased predictor of θ 0 :

E θˆ0  =E  βˆ1 + βˆ2 x20 + βˆ3 x30 +  + βˆk xk0  =β1 + β 2 x20 + β3 x30 + ... + β k xk0 =
θ0
(4-50)
On the other hand, adopting the Gauss Markov assumptions (1 to 8), it can be
proved that this point predictor is the best linear unbiased estimator (BLUE).
We have a point prediction for θ0, but, what is the point prediction for y0? To
answer this question, we have to predict u0. As the error is not observable, the best
predictor for u0 is its expected value, which is 0. Therefore,
ŷ 0 = θˆ0 (4-51)

4.5.2 Interval prediction


Point predictions made with an econometric model will in general not coincide
with the observed values due to the uncertainty surrounding economic phenomena.
The first source of uncertainty is that we cannot use the population regression
function because we do not know the parameters β’s. Instead we have to use the sample
regression function. The confidence interval for the expected value – i.e. for θ 0 - which
will examine next, includes only this type of uncertainty.
The second source of uncertainty is that in an econometric model, in addition to
the systematic part, there is a disturbance which is not observable. The prediction interval
for an individual value – i.e. for y0-, which will be discussed later on includes both the
uncertainty arising from the estimation as well as the disturbance term.
A third source of uncertainty may come from the fact of not knowing exactly what
values the explanatory variables will take for the prediction we want to make. This third
source of uncertainty, which is not addressed here, complicates calculations for the
construction of intervals.

Confidence interval for the expected value


If we are predicting the expected value of y, which is θ 0 , then the prediction error
ê10 will be ê=
0
1 θ 0 − θˆ0 . According to (4-50), the expected prediction error is zero. Under
the assumptions of the CLM,
eˆ10 θ 0 − θˆ0
= : tn − k
se(θˆ0 ) se(θˆ0 )
Therefore, we can write that

134
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

 θ 0 − θˆ0 α /2 
Pr  −tnα−/2k ≤ 1−α
≤ tn − k  =
 se (θˆ0 )

Operating, we can construct a (1-α)% confidence interval (CI) for θ 0 with the
following structure:

Pr θˆ0 − se(θˆ0 ) × tnα−/2k ≤ θ 0 ≤ θˆ0 + se(θˆ0 ) × tnα−/2k  =−


1 α (4-52)

To obtain a CI for θ 0 , we need to know the standard error ( se(θˆ0 ) ) for θˆ0 . In any
case, there is an easy way to calculate it. Thus, solving (4-48) for β1 we find that
β1 = θ 0 − β 2 x20 − β3 x30 − ... − β k xk0 . Plugging this into the equation (4-47), we obtain
y =θ 0 + β 2 ( x2 − x20 ) + β3 ( x3 − x30 ) +  + β k ( xk − xk0 ) + u (4-53)
Applying OLS to (4-53), in addition to the point prediction, we obtain se(θˆ0 )
which is the standard error corresponding to the intercept in this regression. The previous
method allows us to put a CI around the OLS estimate of E(y), for any values of the x´s.

Prediction interval for an individual value


We are now going to construct an interval for y0, usually called prediction interval
for an individual value, or for short, prediction interval. According to (4-47), y0 has two
components:
y=
0
θ 0 + u0 (4-54)
The interval for the expected value built before is a confidence interval around θ 0
wcich is a combination of the parameters. In contrast, the interval for y0 is random,
because one of its components, u0, is random. Therefore, the interval for y0 is a
probabilistic interval and not a confidence interval. The mechanics for obtaining it are the
0 0 0
same, but bear in mind that now we are going to consider that the set x2 , x3 , L , xk vis
outside from of the sample used to estimate the regression.
The prediction error ( ê20 ) in using ŷ 0 to predict y0 is

eˆ20 = y 0 − yˆ 0 = θ 0 + u 0 − yˆ 0 (4-55)
Taking into account (4-51) and (4-50), and that E(u0)=0, then the expected
prediction error is zero. In finding the variance of ê20 , it must be taken into account that
u0 is uncorrelated with ŷ 0 because x20 , x30 , L , xk0 is not in the sample.
Therefore, the variance of the prediction error (conditional on the x´s) is the sum
of the variances:
Var (eˆ20 ) = Var ( yˆ 0 ) + Var (u 0 ) = Var ( yˆ 0 ) + σ 2 (4-56)

There are two sources of variation in ê20 :

1. The sampling error in ŷ 0 , which arises because we have estimated the βj’s.

135
INTRODUCTION TO ECONOMETRICS

2. The ignorance of the unobserved factors that affect y, which is reflected in


σ2.
Under the CLM assumptions, ê20 is also normally distributed. Using the unbiased
estimator of σ2 and taking into account that var ( yˆ 0 ) = var (θˆ0 ) , we can define the standard
error (se) of ê20 as

{se(θˆ ) + σˆ }
1
2
2 2
=
se(eˆ ) 0
2
0
(4-57)

2
Usually σˆ 2 is larger than  se(θˆ0 )  . Under the assumptions of the CLM,

eˆ20
: tn − k (4-58)
se(eˆ20 )
Therefore, we can write that
 α /2 eˆ20 α /2 
Pr  −tn − k ≤ ≤ t n 1−α
−k  = (4-59)
 se(eˆ20 ) 
Plugging in eˆ=
0
2 y 0 − yˆ 0 into (4-59) and rearranging it gives a (1-α)% prediction
interval for y0:
1 α
Pr  yˆ 0 − se(eˆ20 ) × tnα−/k2 ≤ y 0 ≤ yˆ 0 + se(eˆ20 ) × tnα−/k2  =− (4-60)
EXAMPLE 4. 13 What is the expected score in the final exam with 7 marks in the first short exam?
The following model has been estimated to compare the marks in the final exam (finalmrk) and in
the first short exam (shortex1) of Econometrics:
·
finalmrk = 4.155 + 0.491 shortex1
i i
(0.715) (0.123)

σˆ =1.649 R2=0.533 n=16


To estimate the expected final mark for a student with shortex10=7 mark in the first short exam,
the following model, according to (4-53), was estimated:
·
finalmrk = 7.593 + 0.491(shortex1 - 7)
i i
(0.497) (0.123)

σˆ =1.649 R2=0.533 n=16


The point prediction for shortex1 =7 is θˆ0 =7.593 and the lower and upper bounds of a 95% CI
0

respectively are given by


θ0 = θˆ0 − se(θˆ0 ) × t 0.05/ 2 =7.593 − 0.497 × 2.14 =6.5
14

θ =
0
θˆ0 + se(θˆ0 ) × t140.05/ 2 =7.593 + 0.497 × 2.14 =8.7
Therefore, the student will have a 95% confidence of obtaining on average a final mark located
between 6.5 and 8.7.
The point prediction could be also obtained from the first estimated equation:
·
finalmrk = 4.155 + 0.491´ 7 = 7.593
Now, we are going to estimate a 95% probability interval for the individual value. The se of ê20 is
equal

136
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

{ }
1
2
se(eˆ20=  se( yˆ 0 )  + σˆ 2 = 0.497 2 + 1.649=
2 2
) 1.722

where 1.649 is the “S. E. of regression” obtained from the E-views output directly.
The lower and upper bounds of a 95% probability interval respectively are given by
y 0 =yˆ 0 − se(eˆ20 ) × t140.025 =7.593 − 1.722 × 2.14 =3.7

y 0 =yˆ 0 + se(eˆ20 ) × t140.025 =7.593 + 1.722 × 2.14 =11.3


You must take into account that this probability interval is quite large because the size of the
sample is very small.
EXAMPLE 4.14 Predicting the salary of CEOs
Using data on the most important US companies taken from Forbes (workfile ceoforbes), the
following equation has been estimated to explain salaries (including bonuses) earned yearly (thousands of
dollars) in 1999 by the CEOs of these companies:
·
salary = 1381 + 0.008377 assets + 32.508 tenure + 0.2352 profits
i i i i
(104) (0.0013) (8.671) (0.0538)

σˆ =1506 R2=0.2404 n=447


where assets are total assets of firm in millions of dollars, tenure is number of years as CEO in the company,
and profits are in millions of dollars.
In Table 4.10 descriptive measures of explanatory variables of the model on CEOs salaries appear.
TABLE 4.10. Descriptive measures of variables of the model on CEOs salary.
assets tenure profits
Mean 27054 7.8 700
Median 7811 5.0 333
Maximum 668641 60.0 22071
Minimum 718 0.0 -2669
Observations 447 447 447

The predicted salaries and the corresponding se( θˆ0 ) for selected values (maximum, mean, median
and minimum), using a model as (4-53), appear in table 4.11.
TABLE 4.11. Predictions for selected values.
Prediction θˆ0 Std. Error se( θˆ0 )
Mean values 2026 71
Median value 1688 78
Maximum values 14124 1110
Minimum values 760 195

4.5.3 Predicting y in a ln(y) model


Consider the model in logs:
ln( y ) = β1 + β 2 x2 + β3 x3 +  + β k xk +u (4-61)
Obtaining OLS estimates, we predict ln(y) as
· y ) = βˆ + βˆ x + L + βˆ x
ln( (4-62)
1 2 2 k k

Applying exponentiation to (4-62), we obtain the prediction value

=
% · y=
y exp(ln( )) exp( βˆ1 + βˆ2 x2 + L + βˆk xk ) (4-63)

137
INTRODUCTION TO ECONOMETRICS

However, this prediction is biased and inconsistent because it will systematically


underestimate the expected value of y. Let us see why. If we apply exponentiation in
(4-61), we have
y exp( β1 + β 2 x2 + β3 x3 +  + β k xk ) × exp(u )
= (4-64)
Before taking expectation in (4-64), we must take into account that if u~N(0,σ2),
σ 2 
then E (exp(u )) = exp   . Therefore, under the CLM assumptions 1 through 9, we
 2 
have
y ) exp( β1 + β 2 x2 + β3 x3 +  + β k xk ) × exp(σ 2 / 2)
E (= (4-65)
Taking as a reference (4-65), the adequate predictor of y is
yˆ = exp(bˆ1 + bˆ2 x2 + L + bˆk xk )´ exp(sˆ 2 / 2) = %́
y exp(sˆ 2 / 2)
(4-66)
where σˆ 2 is the unbiased estimator of σ2.
It is important to remark that although ŷ is a biased predictor, it is consistent,
while %
y is biased and inconsistent
EXAMPLE 4.15 Predicting the salary of CEOs with a log model (continuation 4.14)
Using the same data as in example 4.14, the following model was estimated:
·salary ) = 5.5168 + 0.1885ln(assets ) + 0.0125 tenure + 0.00007 profits
ln( i i i i
(0.210) (0.0232) (0.0032) (0.0000195)

σˆ =0.5499 R2=0.2608 n=447


salary and assets are taken in natural logs, while profits are in levels because some observations
are negative and thus not possible to take logs.
First, we are going to calculate the inconsistent prediction, according to (4-63) for a CEO working
in a corporation with assets=10000, tenure=10 years and profits=1000:
²
salary ·salary ))
= exp(ln(
i i

= exp(5.5168 + 0.1885ln(10000) + 0.0125´ 10 + 0.00007 ´ 1000) = 1716


Using (4-66), we obtain a consistent prediction:
·
salary = exp(0.54992 / 2) ´ 1716 = 1996

4.5.4 Forecast evaluation and dynamic prediction


In this section we will compare predictions made using an econometric model
with the actual values in order to evaluate the predictive ability of the model. We will also
examine the dynamic prediction in models in which lagged endogenous variables are
included as regressors.

Forecast evaluation statistics


Suppose that the sample forecast is i=n+1, n+2,…, n+h, and denote the actual and
forecasted value in period i as yi and yˆi , respectively. Now, we present some of the more
common statistics used for forecast evaluation.
Mean absolute error (MAE)

138
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

The MAE is defined as the average of the absolute values of the errors:
n+ h
å yˆi - yi
i = n+ 1
MAE = (4-67)
h
Absolute values are taken so that positive errors are compensated by the negative
ones.
Mean absolute percentage error (MAPE),
n+ h
yˆi - yi
å yi
MAPE = i =n+ 1 ´ 100 (4-68)
h
Root of the mean squared error (RMSE)
This statistic is defined as the square root of the mean of the squared error:
n+ h
2
å ( yˆi - yi )
i = n+ 1
RMSE = (4-69)
h
As the errors are squared, the compensation between positive and negative errors
are avoided. It is important to remark that the MSE places a greater penalty on large
forecast errors than the MAE.
Theil Inequality Coefficient (U)
This coefficient is defined as follows:
n+ h
2
å ( yˆi - yi )
i = n+ 1

U= h (4-70)
n+ h n+ h
å yˆi2 å yi2
=i n=
+1 i n+ 1
+
h h
The smaller U is, the more accurate are the predictions. The scaling of U is such
that it will always lie between 0 and 1. If U=0, then yi= yˆi , for all forecasts; if U=1 the
predictive performance is as bad as it can be. Theil’s U statistic can be rescaled and
decomposed into three proportions: bias, variance and covariance. Of course the sum of
these three proportions is 1. The interpretation of these three proportions is as follows:
1) The bias reflects systematic errors. Whatever the value of U, we would hope
that the bias is close to 0. A large bias suggests a systematic over or under
prediction.
2) The variance also reflects systematic errors. The size of this proportion is an
indication of the inability of the forecasts to replicate the variability of the
variable to be forecasted.

139
INTRODUCTION TO ECONOMETRICS

3) The covariance measures unsystematic errors. Ideally, this should have the
highest proportion of Theil inequality.
In addition of the coefficient defined in (4-70), Theil proposed other coefficients
for forecast evaluation.

Dynamic prediction
Let the following model be given:
yt =β1 + β 2 xt + β3 yt −1 + ut (4-71)
Suppose that the sample forecast is i=n+1,…,i=n+h, and denote the actual and
forecasted value in period i as yi and yˆi , respectively. The forecast for the period n+1 is

βˆ1 βˆ2 xn +1 + βˆ3 yn


yˆ n +1 =+ (4-72)
As we can see for the prediction, we use the observed value of y (yn) because it is
inside the sample used in the estimation. For the remainder of the forecast periods we use
the recursively computed forecast of the lagged value of the dependent variable (dynamic
prediction), that is to say,
βˆ1 βˆ2 xn +i + βˆ3 yˆ n −1+i
yˆ n +i =+ i=
2,3,, h (4-73)
Thus, from period n+2 to n+h the forecast carried out in a period is used to forecast
the endogenous variable in the following period.

Exercises
Exercise 4.1 To explain the housing price in an American town, the following model is
formulated:
β1 + β 2 rooms + β3lowstat + β 4 crime + u
price =
where rooms is the number of rooms in the house, lowstat is the percentage of people of
“lower status” in the area and crime is crimes committed per capita in the area. Prices of
houses are measured in dollars.
Using the data in hprice2, the following model has been estimated:
·
price = - 15694+ 6788 rooms - 268 lowstat - 3854 crime
(8022) (1211) (81) (960)

R2=0.771 n=55
(The numbers in parentheses are standard errors of the estimators.)
a) Interpret the meaning of the coefficients β̂ 2 , β̂3 and β̂ 4 .
b) Does the percentage of people of “lower status” have a negative influence
on the price of houses in that area?
c) Does the number of rooms have a positive influence on the price of
houses?
Exercise 4.2 Consider the following model:
β1 + β 2 ln(inc) + β3hhsize + β 4 punder 5 + u
ln( fruit ) =

140
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

where fruit is expenditure in fruit, inc is disposable income of a household, hhsize is the
number of household members and punder5 is the proportion of children under five in
the household.
Using the data in workfile demand, the following model has been estimated:
· fruit ) = - 9.768+ 2.005ln(inc) - 1.205 hhsize - 0.0179 punder 5
ln(
(3.701) (0.512) (0.179) (0.013)
2
R =0.728 n=40
(The numbers in parentheses are standard errors of the estimators.)
a) Interpret the meaning of the coefficients β̂ 2 , β̂3 and β̂ 4 .
b) Does the number of household members have a statistically significant
effect on the expenditure in fruit?
c) Is the proportion of children under five in the household a factor that has
a negative influence on the expenditure of fruit?
d) Is fruit a luxury good?
Exercise 4.3 (Continuation of exercise 2.5). Given the model
yi =β1 + β 2 xi + ui i=1, 2,, n
the following results have been obtained with a sample size of 11 observations:
n n n n n

∑ xi = 0
i =1
∑ yi = 0
i =1
∑ xi2 = B
i =1
∑ yi2 = E
i =1
∑x y
i =1
i i =F

n n

∑ yi xi − y ∑ xi
=
(Remember that βˆ1 = i 1n=ni 1 )
∑ xi2 − x ∑ xi
=i 1 =i 1

a) Build a statistic to test H 0 : β 2 = 0 against H1 : β 2 ≠ 0 .


b) Test the hypothesis of question a) when EB = 2 F 2 .
c) Test the hypothesis of question a) when EB = F 2 .
Exercise 4.4 The following model has been formulated to explain the spending on food
(food):
β1 β 2inc + β3rpfood + u
food =+
where inc is disposable income and rpfood is the relative price index of food compared
to other consumer products.
Taking a sample of observations for 20 successive years, the following results are
obtained:
·
food = 1.40+ 0.126 inc - 0.036 rpfood
i i i
(4.92) (0.01) (0.07)

R2=0.996; ∑ uˆ 2
t = 0.196
(The numbers in parentheses are standard errors of the estimators.)
a) Test the null hypothesis that the coefficient of rpfood is less than 0.
b) Obtain a confidence interval of 95% for the marginal propensity to
consume food in relation to income.

141
INTRODUCTION TO ECONOMETRICS

c) Test the joint significance of the model.


Exercise 4.5 The following demand function for rental housing is formulated:
ln(srenhousi)=β1+β2ln(prenhousi)+ β3ln(inci)+εi
where srenhous is spending on rental housing, prenhous is the rental price, and inc is
disposable income.
Using a sample of 403 observations, we obtain the following results:
ln( srenhousi ) 10 – 0.7ln ( prenhousi ) + 0.9ln ( inci )
=
1.0 0 0 
2
R =0.39 ˆ 
cov(β) =  0 0.09 0.085
 0 0.085 0.09 
a) Interpret the coefficients on ln(prenhous) and ln(inc).
b) Using a 0.01 significance level, test the null hypothesis that β2=β3=0.
c) Test the null hypothesis that β2=0, against the alternative that β2<0.
d) Test the null hypothesis that β3=1 against the alternative that β3 ≠ 1.
e) Test the null hypothesis that a simultaneous increase in housing prices and
income has no proportional effect on housing demand.
Exercise 4.6 The following estimated models corresponding to average cost (ac)
functions have been obtained, using a sample of 30 firms:
= ¶ 172.46+ 35.72 qty
ac i i
(11.97) (3.70)

= =
R 2 0.838 RSS 8090 (1)
¶ =
ac 310.07 − 85.39 qty + 26.73 qty 2 − 1.40 qty 3
i i i i
(29.44) (33.81) (11.61) (1.22)

= =
R 2 0.978 RSS 1097 (2)
where ac is the average cost and qty is the quantity produced.
(The numbers in parentheses are standard errors of estimators.)
a) Test whether the quadratic and cubic terms of the quantity produced are
significant in determining the average cost.
b) Test the overall significance in the model 2.
Exercise 4.7 Using a sample of 35 observations, the following models have been
estimated to explain expenditures on coffee:
·coffee) = 21.32 + 0.11 ln(inc) - 1.33 ln(cprice) + 1.35ln(tprice)
ln(
(0.01) (0.23)
(1)
R 2 = 0.905 RSS = 254
·coffee) = 19.9 + 0.14 ln(inc) - 1.42 ln(cprice)
ln(
(0.02) (0.21)
(2)
RSS = 529
where inc is disposable income, cprice is coffee price and tprice is tea price.
(The numbers in parentheses are standard errors of estimators.)

142
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

a) Test the overall significance of model (1)


b) The standard error of ln(tprice) is missing in model (1), can you calculate
it?
c) Test whether the price of tea is statistically significant.
d) How would you test the assumption that the price elasticity of coffee is
equal but opposite to the price elasticity of tea? Detail the procedure.
Exercise 4.8 The following model has been formulated to analyse the determinants of air
quality (airqual) in 30 Standard Metropolitan Statistical Areas (SMSA) of California:
β1 + β 2 popln + β3medincm + β 4 poverty + β5 fueoil + β 6 valadd + u
airqual =
where airqual is weight in μg/m3 of suspended particular matter, popln is population in
thousands, medincm is medium per capita income in dollars, poverty is the percentage of
families with income less than poverty levels, fueloil is thousands of barrels of fuel oil
consumed in industrial manufacturing, and valadd is value added by industrial
manufactures in 1972 in thousands of dollars.
Using the data in workfile airqualy, the above model has been estimated:
·
airqual = 97.35+ 0.0956 popln − 0.0170 medincm − 0.0254 poverty
i i i i
(10.19) (0.0311) (0.0055) (0.0089)

− 0.0031 fueoili − 0.0011 valaddi


(0.0017) (0.0025)
2
R =0.415 n=30
(The numbers in parentheses are standard errors of the estimators.)
a) Interpret the coefficients on medincm, poverty and valadd
b) Are the slope coefficients individually significant at 10%?
c) Test the joint significance of fueloil and valadd, knowing that
·
airqual i =97.67 + 0.0566 poplni − 0.0102 medincmi − 0.0174 povertyi
(10.41) (0.020) (0.0039) (0.0078)

= =
R 0.339 2
n 30    
d) If you omit the variable poverty in the first model, the following results are
obtained:
·
airqual = 82.98 + 0.0523 popln − 0.0097 medincm
i i i i
(10.02) (0.031) (0.0055)

− 0.00063 fueoili − 0.00037 valaddi


(0.0017) (0.0028)

= =
R 0.218 n 30 2

Are the slope coefficients individually significant at 10% in the new model?
Do you consider these results to be reasonable in comparison with those
obtained in part b).
Comparing the R2 of the two estimated models, what is the role played by
poverty in determining air quality?
e) If you regress airqual using as regressors only the intercept and poverty,
you will obtain that R2=0.037. Do you consider this value to be reasonable
taking into account the results obtained in part d)?

143
INTRODUCTION TO ECONOMETRICS

Exercise 4.9 With a sample of 39 observations, the following production functions by


OLS was estimated:
· 1.30 2
output t = a labort
ˆ capitalt0.32 exp(0.0055trendt ) R = 0.9945
· 2
ˆ 1.41
capitalt0.47
output t = b labort R = 0.9937
· 2
output i = g
ˆ exp(0.0055trendt ) R = 0.9549
a) Test the joint significance of labor and capital.
b) Test the significance of the coefficient of the variable trend.
c) Identify the statistical assumptions under which the test carried out in the
two previous sections are correct. A further question: Specify the
population model of the first of the three previous specifications.
Exercise 4.10 A researcher has developed the following model:
β1 β 2 x2 + β3 x3 + u
y =+
Using a sample of 43 observations, the following results were obtained:
=yˆi - 0.06 + 1.44 x1i − 0.48 x2i
0.1011 −0.0007 −0.0005
=( X′X)  0.0231 −0.0162 
−1

 0.0122 

∑ yi2 = 444 ∑ yˆi2 = 424.92


a) Test that the intercept is less than 0.
b) Test that β2=2.
c) Test the null hypothesis that β2+3β3=0.
Exercise 4.11 Given the function of production
q = ak α l β exp(u )
and using data from the Spanish economy over the past 20 years, the following results
were obtained:
· q ) = 0.15 + 0.73ln(k ) + 0.47 ln(l )
ln( i i i

 4129 −95 −266 


[ X′X] =  −95 3 5 
−1
RSS = 0.017
 −266 5 19 
a) Test the individual significance of the coefficients on k and l.
b) Test whether the parameter α is significantly different from 1.
c) Test whether there are increasing returns to scale.
Exercise 4.12 Let the following multiple regression model be:
y =α 0 + α1 x1 + α 2 x2 + u
With a sample of 33 observations, this model is estimated by OLS, obtaining the
following results:

144
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

yˆi =
12.7 + 14.2 x1i + 2.1x2i
 4.1 −0.95 −0.266 
2 
σˆ [ X′X ] =  −0.95
−1
3.8 0.5 
 −0.266 0.5 1.9 
a) Test the null hypothesis α0= α1.
b) Test whether α1 / α 2 =7.
c) Are the coefficients α0, α1, y α2 individually significant?
Exercise 4.13 Using a sample of 30 companies, the following cost functions have been
estimated:
· = 172.46+ 35.72 x
a ) cost R 2 = 0.838 R 2 = 0.829 RSS = 8090
i i
(11.97) (3.70)
· = 310.07- 85.39 x + 26.73 x 2 - 1.40 x3 R 2 = 0.978 R 2 = 0.974 RSS = 1097
b) costi i i i
(29.44) (33.81) (11.61) (1.22)
where cost is the average cost and x is the quantity produced.
(The numbers in parentheses are standard errors of estimators.)
a) Which of the two models would you choose? What would be the criteria?
b) Test whether the quadratic and cubic terms of the quantity produced are
significant in determining the average cost.
c) Test the overall significance of the model b).
Exercise 4.14 A researcher formulates the following model:
y=β1 + β 2 x2 + β3 x3+u
Using a sample of 13 observations the following results are obtained:
yˆi =
1.00 − 1.82 x2i + 0.36 x3i
(1)
= =
R 2 0.50 n 13
 0.25 −0.01 0.04 
 −0.01 0.16 −0.15
var(βˆ ) =
 
 0.04 −0.15 0.81 
a) Test the null hypothesis that β 2 = 0 against the alternative hypothesis that
β2 < 0 .
b) Test the null hypothesis that β 2 + β3 =
−1 against the alternative
hypothesis that β 2 + β3 ≠ −1 , with a significance level of 5%.
c) Is the whole model significant?
d) Assuming that the variables in the estimated model are measured in natural
logarithms, what is the interpretation of the coefficient for x3?
Exercise 4.15 With a sample of 50 automotive companies the following production
functions were estimated taking the gross value added of the automobile production (gva)
as the endogenous variable and labor input (labor) and capital input (capital) as
explanatory variables.

145
INTRODUCTION TO ECONOMETRICS

·gva ) = 3.87 + 0.80 ln(labor ) + 1.24 ln(capital )


ln( i i i
1) (0.11) (0.24) ,
2 2
RSS = 254 R = 0.75 R = 0.72
·gva ) = 19.9 + 1.04 ln(capital )
ln( i i
2)
RSS = 529 R 2 = 0.84 , R 2 = 0.81
·
3) ln( gva / labori ) = 15.2 + 0.87 ln(capitali / labori )
RSS = 380
(The numbers in parentheses are standard errors of estimators.)
a) Test the joint significance of both factors in the production function.
b) Test whether labor has a significant positive influence on the gross value
added of automobile production.
c) Test the hypothesis of constant returns to scale. Explain your answer.
Exercise 4.16 With a sample of 35 annual observations two demand functions of Rioja
wine have been estimated. The endogenous variable is spending on Rioja reserve wine
(wine) and the explanatory variables are disposable income (inc), the average price of a
bottle of Rioja reserve wine (pwinrioj) and the average price of a bottle of Ribera Duero
reserve wine (pwinduer). The results are as follows:
·vino ) = 21.32 + 0.11 ln(renta ) - 1.33 ln( pvinrioj ) + 1.35 ln( pvinduer )
ln( i i i i
(0.01) (0.23) (0.233)

R 2 = 0.905 RSS = 254


·vino ) = 19.9 + 0.14 ln(renta ) - 1.42 ln( pvinrioj )
ln( i i i
(0.02) (0.21)
RSS = 529
(The numbers in parentheses are standard errors of the estimators.)
a) Test the joint significance of the first model.
b) Test whether the price of wine from Ribera del Duero has a significant
influence, using two statistics that do not use the same information. Show
that both procedures are equivalent.
c) How would you test the hypothesis that the price elasticity of Rioja wine
is the same but with an opposite sign to the price elasticity of Ribera del
Duero wine? Detail the procedure to follow.
Exercise 4.17 To analyze the demand for Ceylon tea (teceil) the following econometric
model is formulated:
β1 + β 2 ln(inc) + β3 ln( pteceil ) + β 4 ln( pteind ) + β5 ln( pcobras ) + u
ln(teceil ) =
where inc is the disposable income, pteceil the price of tea in Ceylon, pteind is the price
of tea in India and pcobras is the price of Brazilian coffee.
With a sample of 22 observations the following estimates were made:
·teceil ) = 2.83 + 0.25 ln(inc ) - 1.48 ln( pteceil )
ln( i i i
(0.17) (0.98)
+ 1.18 ln( pteindi ) + 0.19 ln( pcofbrasi )
(0.69) (0.16)

146
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

RSS=0.4277
·
ln(teceili ´ pteceil ) = 0.74 + 0.26 ln(inci ) + 0.20 ln( pcofbrasi )
(0.16) (0.15)

RSS=0.6788
(The numbers in parentheses are standard errors of the estimators.)
a) Test the significance of disposable income.
b) Test the hypothesis that β3 = −1 y β 4 = 0 , and explain the procedure
applied.
c) If instead of having information on RSS, only R2 was known for each model,
how would you proceed to test the hypothesis of section b)?
Exercise 4.18 The following fitted models are obtained to explain the deaths of children
under 5 years per 1000 live births (deathu5) using a sample of 64 countries.
¼5 = 263.64 - 0.0056 inc + 2.23 fertrate ;
1) deathun R 2 = 0.7077
i i i
(0.0019) (0.21)

·
2) deathun 5i = 168.31 - 0.0055 inci + 1.76 femilrati + 12.87 fertratei , R 2 = 0.7474
(0.0018) (0.25)

where inc is income per capita, femiltrat is the female illiteracy rate, and fertrate is the
fertility rate
(The numbers in parentheses are standard errors of the estimators.)
a) Test the joint significance of income, illiteracy and fertility rates.
b) Test the significance of the fertility rate.
c) Which of the two models would you choose? Explain your answer.
Exercise 4.19 Using a sample of 32 annual observations, the following estimations were
obtained to explain the car sales (car) of a particular brand:
· = 104.8- 6.64 pcar + 2.98 adv
car i i i
( 6.48) (3.19) (0.16)

å uˆi2 = 1805.2; å (cari - car ) 2 = 13581.4


where pcar is the price of cars and adv are spending on advertising.
(The numbers in parentheses are standard errors of the estimators.)
a) Are price and advertising expenditures significant together? Explain your
answer.
b) Can you accept that prices have a negative influence on sales? Explain your
answer.
c) Describe in detail how you would test the hypothesis that the impact of
advertising expenditures on sales is greater than minus 0.4 times the
impact of the price.
Exercise 4.20 In a study of the production costs (cost) of 62 coal mines, the following
results are obtained:
· = 2.20- 0.104 dmec + 3.48 geodif + 0.104 absent
costi i i i
(3.4) (0.005) (2.2) (0.15)

∑ cp= ∑ uˆi2 18.48


2
− cp 
i 109.6 =

147
INTRODUCTION TO ECONOMETRICS

where dmec is the degree of mechanization, geodif is a measurement of geological


difficulties and absent is the percentage of absenteeism.
a) Test the significance of each of the model coefficients.
b) Test the overall significance of the model.
Exercise 4.21 With fifteen observations, the following estimation was obtained:
yˆi =
8.04 − 2.46 xi 2 + 0.23 xi 3
(1.00) (0.60)

R = 0.30 2

where the values between parentheses are standard deviations and the coefficient of
determination is the adjusted one.
a) Is the coefficient of the variable x2 significant?
b) Is the coefficient of the variable x3 significant?
c) Discuss the joint significance of the model.
Exercise 4.22 Consider the following econometric specification:
y=β1 + β 2 x2 + β3 x3 + β 4 x4 + u
With a sample of 26 observations, the following estimations were obtained:
2
1) yˆi =
2 + 3.5 x1i − 0.7 x2i − 2 x3i + ui R =0.982
(1.9) (2.2) (1.5)

2
2) yˆi =1.5 + 3 ( x1i + x2i ) − 0.6 x3i + ui R = 0.876
(2.7) (2.4)

(The t statistics are between brackets)


a) Show that the following expressions for the F-statistic are equivalent:

F=
( RSS R − RSSUR ) / r
F=
2
RUR 2
− RR / q ( )
RSSUR / (n − k ) (1 − RUR
2
) / (n − k )
b) Test the null hypothesis β2= β3.
Exercise 4.23 In the estimation of the Brown model in exercise 3.19, using the workfile
consumsp, we obtained the following results:
·
conspc = − 7.156+ 0.3965 incpc + 0.5771 conspc
t t t −1
(84.88) (0.0857) (0.0903)

R2=0.997 RSS=1891320 n=56


Two additional estimations are now obtained:
· − conspc =
conspc − 98.13+ 0.2757(incpct − conspct −1 )
t t −1
(84.43) (0.0803)
2
R =0.1792 RSS=2199474 n=56
· − incpc =
conspc − 7.156− 0.0264 incpc + 0.5771(conspct −1 − incpct )
t t −1
(84.88) (0.0090) (0.0903)
2
R =0.6570 RSS=1891320 n=56
(The numbers in parentheses are standard errors of the estimators.)
a) Test the significance of each of the coefficients for the first model.
b) Test that the coefficient on incpc in the first model is smaller than 0.5.
c) Test the overall significance of the first model.

148
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

d) Is it admissible that β 2 + β3 =1?


e) Show that by operating in the third model you can reach the same
coefficients as in the first model.
Exercise 4.24 The following model was formulated to analyze the determinants of the
median base salary in $ for graduating classes of 2010 from the best American business
schools (salMBAgr):
β1 + β 2tuition + β3 salMBApr + u
salMBAgr =
where tuition is tuition fees including all required fees for the entire program (but
excluding living expenses) and salMBApr is the median annual salary in $ for incoming
classes in 2010.
Using the data in MBAtui10, the previous model has been estimated:
·
salMBAgr =42489+ 0.1881tuition + 0.5992 salMBApr
i i i
(5415) (0.0628) (0.1015)

R2=0.703 n=39
(The numbers in parentheses are standard errors of the estimators.)
a) Which of the regressors included in the above model are individually
significant at 1% and at 5%?
b) Test the overall significance of the model.
c) What is the predicted value of salMBAgr for a graduate student who paid
100000$ tuition fees in a two-year MBA master and previously had a
salMBApr equal to 70000$? How many years of work does the student
require to offset tuition expenses? To answer this question, suppose that
the discount rate equals the expected rate of salary increase and that the
student received no wage income during the two master courses.
d) If we added the regressor rank2010 (the rank of each business school in
2010), the following results were obtained:
·
salMBAgr = 61320+ 0.1229 tuition + 0.4662 salMBApr
i i i
(8520) (0.0626) (0.1055)

−232.06 rank 2010i


(85.13)

R2=0.755 n=39
Which of the regressors included in this model are individually significant
at 5%?
What is the interpretation of the coefficient on rank2010?
e) The variable rank2010 is based on three components: gradpoll is a rank
based on surveys of MBA grads and contributes 45 percent to final
ranking; corppoll is a rank based on surveys of MBA recruiters and
contributes 45 percent to final ranking; and intellec is a rank based on a
review of faculty research published over a five-year period in 20 top
academic journals and faculty books reviewed in The New York Times, The
Wall Street Journal, and Bloomberg Businessweek over the same period;
this last rank contributes 10 percent to the final ranking. In the following
estimated model rank2010 has been substituted for its three components:

149
INTRODUCTION TO ECONOMETRICS

·
salMBAgri =
79904+ 0.0305 tuitioni + 0.3751 salMBApri
(10700) (0.0696) (0.107)

−303.82 gradpolli − 33.829 corppolli − 113.36 intelleci


(94.54) (61.26) (64.09)
2
R =0.797 n=39
What is the weight in percentage of each one of these three components in
determining the salMBAgr? Compare the results with the contribution of
each in defining rank2010.
f) Are gradpoll, corppoll and intellec jointly significant at 5%? Are they
individually significant at 5%?
Exercise 4.25 (Continuation of exercise 3.12). The population model corresponding to
this exercise is:
β1 β 2 educ + β3tenure + β 4 age + u
ln( wage) =+
Using workfile wage06sp, the previous model was estimated:
·wage) =
ln( 1.565+ 0.0448 educ + 0.0177 tenure + 0.0065 age
i i i i
(0.073) (0.0035) (0.0019) (0.0016)

R2=0.337 n=800
(The numbers in parentheses are standard errors of the estimators.)
a) Test the overall significance of the model.
b) Is tenure statistically significant at 10%? Is age positively significant at
10%?
c) Is it admissible that the coefficient of educ is equal to that of tenure? Is it
admissible that the coefficient of educ is triple to that of tenure? To answer
these questions you have the following additional information:
·wage) =
ln( 1.565+ 0.0271educ + 0.0177(educ + tenure) + 0.0065 age
i i i i
(0.073) (0.0042) (0.0019) (0.0016)

·wage) = 1.565− 0.0082 educ + 0.0177(3 × educ + tenure) + 0.0065 age


ln( i i i i
(0.073) (0.0071) (0.0019) (0.0016)
2
Can you calculate the R in the two equations in part c)? Please do it.
Exercise 4.26 (Continuation of exercise 3.13). Let us take the population model of this
exercise as the reference model. In the estimated model, using workfile housecan, the
standard errors of the coefficients appear between brackets:
·
price = −2418+ 5827 bedrooms + 19750 bathrms + 5.411lotsize
i i i i
(3379) (1207) (1785) (0.388)

R2=0.486 n=546
a) Test the overall significance of this model.
b) Test the null hypothesis that an additional bathroom has the same influence
on housing prices than four additional bedrooms. Alternatively, test that
an additional bathroom has more influence on housing prices than four
additional bedrooms. (Additional information: var( βˆ2 ) =1455813;
var( βˆ ) =3186523; and var( βˆ , βˆ ) =-764846).
3 2 3
c) If we add the regressor stories (number of stories excluding the basement)
to the model, the following results have been obtained:

150
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

·
pricei =
−4010+ 2825 bedroomsi + 17105 bathrmsi
(3603) (1215) (1734)

+ 5.429 lotsizei + 7635 storiesi


(0.369) (1008)
2
R =0.536 n=546
What do you think about the sign and magnitude of the coefficient on
stories? Do you find it surprising? What is the interpretation of this
coefficient? Test whether the number of stories has a significant influence
on housing prices.
d) Repeat the tests in part b) with the model estimated in part c). (Additional
information: var( βˆ2 ) =1475758; var( βˆ3 ) =3008262; and var( βˆ2 , βˆ3 ) =-
554381).
Exercise 4.27 (Continuation of exercise 3.14). Let us take the population model of this
exercise as the reference model. Using workfile ceoforbes, the estimated model was the
following:
·salary ) =+
ln( 4.641 0.0054 roai + 0.2893ln( salesi ) + 0.0000564 profitsi + 0.0122 tenurei
i
(0.377) (0.0033) (0.0425) (0.0000220) (0.0032)

R2=0.232 n=447
(The numbers in parentheses are standard errors of the estimators.)
a) Does roa have a significant effect on salary? Does roa have a significant
positive effect on salary? Carry out both tests at the 10% and 5%
significance level.
b) If roa increases by 20 points, by what percentage is salary predicted to
increase?
c) Test the null hypothesis that the elasticity salary/sales is equal to 0.4.
d) If we add the regressor age, the following results are obtained:
·salary ) =
ln( 4.159+ 0.0055 roai + 0.2903ln( salesi ) + 0.0000539 profits
i
(0.442) (0.0033) (0.0423) (0.0000220)

+ 0.00924 tenurei + 0.00880 agei


(0.0035) (0.0043)

R2=0.240 n=447
Are the estimated coefficients very different from the estimates in the
reference model? What about the coefficient on tenure? Explain it.
e) Does age have a significant effect on the salary of a CEO?
f) Is it admissible that the coefficient of age is equal to the coefficient of
tenure? (Additional information: var( βˆ5 ) =1.24E-05; var( βˆ6 ) =1.82E-05;
and var( βˆ , βˆ ) =-6.09E-06).
5 6

Exercise 4.28 (Continuation of exercise 3.15). Let us take the population model of this
exercise as the reference model. Using workfile rdspain, the estimated model was the
following:
·
rdintensi =
−1.8168+ 0.1482 ln( salesi ) + 0.0110 exponsali
(0.428) (0.0278) (0.0021)
2
R =0.048 n=1983
(The numbers in parentheses are standard errors of the estimators.)
a) Is the sales variable individually significant at 1%?

151
INTRODUCTION TO ECONOMETRICS

b) Test the null hypothesis that the coefficient on sales is equal to 0.2?
c) Test the overall significance of the reference model.
d) If we add the regressor ln(workers), the following results are obtained:
·
rdintens =
0.480− 0.08585ln( sales ) + 0.01049 exponsal + 0.3422 ln( workers )
(0.750) (0.0687) (0.0021) (0.09198)
2
R =0.055 n=1983
Is sales individually significant at 1% in the new estimated model?
e) Test the null hypothesis that the coefficient on ln(workers) is greater than
0.5?
Exercise 4.29 (Continuation of exercise 3.16). Let us take the population model of this
exercise as the reference model. Using workfile hedcarsp, the corresponding fitted model
is the following:
·price) =
ln( 14.42+ 0.000581 cid + 0.003823 hpweight − 0.07854 fueleff
i i i i
(0.154) (0.0000438) (0.0079) (0.0122)

R2=0.830 n=214
(The numbers in parentheses are standard errors of the estimators.)
a) Which of the regressors included in the reference model are individually
significant at 1%?
b) Add the variable volume to the reference model. Does volume have a
statistically significant effect on ln(price)? Does volume have a statistically
significant positive effect on ln(price)?
c) Is it admissible that the coefficient of volume estimated in part b) is equal
but is the opposite of the coefficient of fueloff?
d) Add the variables length, width and height to the model estimated in part
b). Taking into account that volume=length×width×height, is there perfect
multicollinearity in the new model? Why? Why not? Estimate the new
model if it is possible.
e) Add the variable ln(volume) to the reference model. Test the null
hypothesis that the price/volume elasticity is equal to 1?
f) What happens if you add the regressors ln(length), ln(width) and ln(height)
to the model estimated in part e)?
Exercise 4.30 (Continuation of exercise 3.17). Let us take the population model of this
exercise as the reference model. Using workfile timuse03, the corresponding fitted model
is the following:
·
houswork = 141.9+ 3.850 educ − 0.00917 hhinc + 1.767 age − 0.2289 paidwork
i i i i i
(23.27) (1.621) (0.00539) (0.311) (0.0229)

R2=0.1440 n=1000
(The numbers in parentheses are standard errors of the estimators.)
a) Which of the regressors included in the reference model are individually
significant at 5% and at 1%?
b) Estimate a model in which you could test directly whether one additional
year of education has the same effect on time devoted to house work as
two additional years of age. What is your conclusion?
c) Test the joint significance of educ and hhnc.

152
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

d) Run a regression in which you add the variable childup3 (number of


children up to three years) to the reference model. In the new model, which
of the regressors are individually significant at 5% and at 1%?
e) In the model formulated in d), what is the most influential variable? Why?
Exercise 4.31 (Continuation of exercise 3.18). Let us take the population model of this
exercise as the reference model. Using workfile hdr2010, the corresponding fitted model
is the following:
·
stsfglo =
− 0.375+ 0.0000207 gnipc + 0.0858 lifexpec
i i i
(0.584) (0.00000617) (0.009)

R2=0.642 n=144
(The numbers in parentheses are standard errors of the estimators.)
a) Which of the regressors included in the reference model are individually
significant at 1%?
b) Run a regression by adding the variables popnosan (population in
percentage without access to improved sanitation services) and gnirank
(rank in gni) to the reference model. Which of the regressors included in
the new model are individually significant at 1%? Interpret the coefficients
on popnosan and gnirank.
c) Are popnosan and gnirank jointly significant?
d) Test the overall significance of the model formulated in b).
Exercise 4.32 Using a sample of 42 observations, the following model has been estimated:
yˆt =
−670.591 + 1.008 xt
For observation 43, it is known that the value of x is 1571.9.
a) Calculate the point predictor for observation 43.
b) Knowing that the variance of the prediction error eˆ=
43
2 y 43 − yˆ 43 is equal to
(24.9048)2, calculate a 90% probability interval for the individual value.
Exercise 4.33 Besides the estimation presented in exercise 4.23, the following estimation
on the Brown consumption function is also available:
·
conspct =12729+ 0.3965(incpct − 13500) + 0.5771(conspct −1 − 12793.6)
(64.35) (0.0857) (0.0903)
2
R =0.997 RSS=1891320 n=56
(The numbers in parentheses are standard errors of the estimators.)
a) Obtain the point predictor for consumption per capita in 2011, knowing
that incpc2011=13500 and conspc2010=12793.6.
b) Obtain a 95%confidence interval for the expected value of consumption
per capita in 2011.
c) Obtain a 95% prediction interval for the individual value of consumption
per capita in 2011.
Exercise 4.34 (Continuation of exercise 4.30) Answer the following questions:
a) Using the first estimation in exercise 4.30, obtain a prediction for
houswork (minutes devoted to house-work per day), when you plug in the

153
INTRODUCTION TO ECONOMETRICS

equation educ=10 (years), hhinc=1200 (euros per month), age=50 (years)


and paidwork=400 (minutes per day).
b) Run a regression, using workfile timuse03, which allows you to calculate
a 95% CI with the characteristics used in part a).
c) Obtain a 95% prediction interval for the individual value of houswork with
the characteristics used in parts a).
Exercise 4.35 (Continuation of exercise 4.29) Answer the following questions:
a) Plug in the first equation of the exercise 4.29 of cid=2000 (cubic inch
displacement), hpweight=10 (ratio horsepower/weight in kg expressed as
percentage), and fueleff=6 (minutes per day) Obtain the point predictor of
consumption per capita in 2011, knowing that incpc2011=12793.6 and
conspc2010=13500.
b) Obtain a consistent estimate of price with the characteristics used in parts
a).
c) Run a regression that allows you to calculate a 95% CI with the
characteristics used in part a).
d) Obtain a 95% prediction interval for the individual value of the
consumption per capita 2011.

154
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL

155
5 MULTIPLE REGRESSION ANALYSIS WITH
QUALITATIVE INFORMATION

5.1 Introducing qualitative information in econometric models.


Up until now, the variables that we have used in explaining the endogenous
variable have a quantitative nature. However, there are other variables of a qualitative
nature that can be important when explaining the behavior of the endogenous variable,
such as sex, race, religion, nationality, geographical region etc. For example, holding all
other factors constant, female workers are found to earn less than their male counterparts.
This pattern may result from gender discrimination, but whatever the reason, qualitative
variables such as gender seem to influence the regressand and clearly should be included
in many cases among the explanatory variables, or the regressors. Qualitative factors
often (although not always) come in the form of binary information, i.e. a person is male
or female, is either married or not, etc. When qualitative factors come in the form of
dichotomous information, the relevant information can be captured by defining a binary
variable or a zero-one variable. In econometrics, binary variables used as regressors are
commonly called dummy variables. In defining a dummy variable, we must decide which
event is assigned the value one and which is assigned the value zero.
In the case of gender, we can define
1 if the person is a female
female = 
0 if the person is a male
But of course we can also define
1 if the person is a male
male = 
0 if the person is a female
Nevertheless, it is important to remark that both variables, male and female,
contain the same information. Using zero-one variables for capturing qualitative
information is an arbitrary decision, but with this election the parameters have a natural
interpretation.

5.2 A single dummy independent variable


Let us see how we incorporate dichotomous information into regression models.
Consider the simple model of hourly wage determination as a function of the years of
education (educ):

156
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

β1 + β 2 educ + u
wage = (5-1)
To measure gender wage discrimination, we introduce a dummy variable for
gender as an independent variable in the model defined above,
β1 + δ1 female + β 2 educ + u
wage = (5-2)
The attribute gender has two categories: male and female. The female category
has been included in the model, while the male category, which was omitted, is the
reference category. Model 1 is shown in Figure 5.1, taking δ1<0. The interpretation of δ1
is the following: δ1 is the difference in hourly wage between females and males, given
the same amount of education (and the same error disturbance u). Thus, the coefficient δ1
determines whether there is discrimination against women or not. If δ1<0 then, for the
same level of other factors (education, in this case), women earn less than men on average.
Assuming that the disturbance mean is zero, if we take expectation for both categories we
obtain:
µ wage| female = E ( wage | female = 1, educ) = β1 + δ1 + β 2 educ
(5-3)
µ wage|male= E ( wage | female= 0, educ=
) β1 + β 2 educ
As can be seen in (5-3), the intercept is β1 for males, and β1+δ1 for females.
Graphically, as can be seen in Figure 5.1, there is a shift of the intercept, but the lines for
men and women are parallel.
wage

β2

δ1 β2

β1

β1 + δ1

0 educ

FIGURE 5.1. Same slope, different intercept.


In (5-2) we have included a dummy variable for female but not for male, because
if we had included both dummies this would have been redundant. In fact, all we need is
two intercepts, one for females and another one for males. As we have seen, if we
introduce the female dummy variable, we have an intercept for each gender. Introducing
two dummy variables would cause perfect multicollinearity given that female+male=1,
which means that male is an exact linear function of female and of the intercept. Including
dummy variables for both genders plus the intercept is the simplest example of the so-
called dummy variable trap, as we shall show later on.
If we use male instead of female, the wage equation would be the following:
α1 + γ 1male + β 2 educ + u
wage = (5-4)

157
INTRODUCTION TO ECONOMETRICS

Nothing has changed with the new equation, except the interpretation of α1 and
γ1: α1 is the intercept for women, which is now the reference category, and α1+γ1 is the
intercept for men. This implies the following relationship between the coefficients:
α1=β1+δ1 and α1+γ1=β1⇒ γ1=−δ1
In any application, it does not matter how we choose the reference category, since
this only affects the interpretation of the coefficients associated to the dummy variables,
but it is important to keep track of which category is the reference category. Choosing a
reference category is usually a matter of convenience. It would also be possible to drop
the intercept and to include a dummy variable for each category. The equation would then
be
wage = µ1male +ν 1 female + β 2 educ + u (5-5)
where the intercept is µ1 for men and ν1 for women.
Hypothesis testing is performed as usual. In model (5-2), the null hypothesis of no
difference between men and women is H 0 : δ1 = 0 , while the alternative hypothesis that
there is discrimination against women is H1 : δ1 < 0 . Therefore, in this case, we must
apply a one sided (left) t test.
A common specification in applied work has the dependent variable as the
logarithm transformation ln(y) in models of this type. For example:
β1 + δ1 female + β 2 educ + u
ln( wage) = (5-6)
Let us see the interpretation of the coefficient of the dummy variable in a log
model. In model (5-6), taking u=0, the wage for a female and for a male is as follows:
ln( wageF ) = β1 + δ1 + β 2 educ (5-7)

) β1 + β 2 educ
ln( wageM= (5-8)
Given the same amount of education, if we subtract (5-7) from (5-8), we have
δ1
ln( wageF ) − ln( wageM ) = (5-9)
Taking antilogs in (5-9) and subtracting 1 from both sides of (5-9), we
get
wageF
− 1= eδ1 − 1 (5-10)
wageM
That is to say
wageF − wageM
= eδ1 − 1 (5-11)
wageM
According to (5-11), the proportional change between the female wage and the
male wage, for the same amount of education, is equal to eδ1 − 1 . Therefore, the exact
percentage change in hourly wage between men and women is 100 ×(eδ1 − 1) . As an
approximation to this change, 100×δ1 can be used. However, if the magnitude of the
percentage is high, then this approximation is not so accurate.

158
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

EXAMPLE 5.1 Is there wage discrimination against women in Spain?


Using data from the wage structure survey of Spain for 2002 (file wage02sp), model (5-6) has
been estimated and the following results were obtained:
·wage) = 1.731 - 0.307 female + 0.0548 educ
ln(
(0.026) (0.022) (0.0025)

RSS=393 R2=0.243 n=2000


where wage is hourly wage in euros, female is a dummy variable that takes the value 1 if it is a woman, and
educ are the years of education. (The numbers in parentheses are standard errors of the estimators.)
To answer the question posed above, we need to test H 0 : δ1 = 0 against H1 : δ1 < 0 . Given that
the t statistic is equal to -14.27, we reject the null hypothesis for α=0.01. That is to say, there is a negative
discrimination in Spain against women in the year 2002. In fact, the percentage difference in hourly wage
between men and women is 100 × (e0.307 − 1) = 35.9% , given the same years of education.
EXAMPLE 5.2 Analysis of the relation between market capitalization and book value: the role of ibex35
A researcher wants to study the relationship between market capitalization and book value in
shares quoted on the continuous market of the Madrid stock exchange. In this market some stocks quoted
are included in the ibex35, a selective index. The researcher also wants to know whether the stocks included
in the ibex35 have, on average, a higher capitalization.. With this purpose in mind, the researcher formulates
the following model:
β1 + δ1ibex35 + β 2 ln(bookvalue) + u
ln(marketcap ) = (5-12)
where
- marktval is the capitalization value of a company, which is calculated by multiplying the price
of the stock by the number of stocks issued.
- bookval is the book value of a company, also referred to as the net worth of the company. The
book value is calculated as the difference between a company's assets and its liabilities.
- ibex35 is a dummy variable that takes the value 1 if the corporation is included in the selective
Ibex 35.
Using the 92 stocks quoted on 15th November 2011 which supply information on book value (file
bolmad11), the following results were obtained:
·
ln(marketcap ) = 1.784 + 0.690 ibex35 + 0.675ln(bookvalue)
(0.243) (0.179) (0.037)

2
RSS=35.672 R =0.893 n=92
The marketcap/bookvalue elasticity is equal to 0.690; that is to say, if the book value increases by
1%, then the market capitalization of the quoted stocks will increase by 0.675%.
To test whether the stocks included in ibex35 have on average a higher capitalization implies
testing H 0 : δ1 = 0 against H1 : δ1 > 0 . Given that the t statistic is (0.690/0.179)=3.85, we reject the null
hypothesis for the usual levels of significance. On the other hand, we see that the stocks included in ibex35
are quoted 99.4% higher than the stocks not included. The percentage is obtained as follows:
100 × (e0.690 − 1) =99.4% .
In the case of β2, we can test H 0 : β 2 = 0 against H1 : β 2 ≠ 0 . Given that the t statistic is
(0.675/0.037)=18, we reject the null hypothesis for the usual levels of significance.
EXAMPLE 5.3 Do people living in urban areas spend more on fish than people living in rural areas?
To see whether people living in urban areas spend more on fish than people living in rural areas,
the following model is proposed:
β1 + δ1urban + β 2 ln(inc) + u
ln( fish) = (5-13)
where fish is expenditure on fish, urban is a dummy variable which takes the value 1 if the person lives in
an urban area and inc is disposable income.
Using a sample of size 40 (file demand), model (5-13) was estimated:
· fish) = - 6.375 + 0.140 urban + 1.313ln(inc)
ln(
(0.511) (0.055) (0.070)

159
INTRODUCTION TO ECONOMETRICS

RSS=1.131 R2=0.904 n=40


According to these results, people living in urban areas spend 14% more on fish than people living
in rural areas. If we test H 0 : δ1 = 0 against H1 : δ1 > 0 , we find that the t statistic is (0.140/0.055)=2.55.
Given that t370.01 ≈ t350.01 =2.44, we reject the null hypothesis in favor of the alternative for the usual levels of
significance. That is to say, there is empirical evidence that people living in urban areas spend more on fish
than people living in rural areas.

5.3 Multiple categories for an attribute


In the previous section we have seen an attribute (gender) that has two categories
(male and female). Now we are going to consider attributes with more than two categories.
In particular, we will examine an attribute with three categories
To measure the impact of firm size on wage, we can use a dummy variable. Let
us suppose that firms are classified in three groups according to their size: small (up to
49 workers), medium (from 50 to 199 workers) and large (more than 199 workers). With
this information, we can construct three dummy variables:
1 up to 49 workers
small = 
0 in other case
1 from 50 to 199 workers
medium = 
0 in other case
1 more than 199 workers
large = 
0 in other case
If we want to explain hourly wages by introducing the firm size in the model, we
must omit one of the categories. In the following model, the omitted category is small
firms:
β1 + θ1medium + θ 2large + β 2 educ + u
wage = (5-14)
The interpretation of the θj coefficients is the following: θ1 (θ2) is the difference
in hourly wage between medium (large) firms and small firms, given the same amount of
education (and the same error term u).
Let us see what happens if we also include the category small in (5-14). We would
have the model:
β1 + θ 0 small + θ1medium + θ 2large + β 2 educ + u (5-15)
wage =
Now, let us consider that we have a sample of six observations: the observations
1 and 2 correspond to small firms; 3 and 4 to medium ones; and 5 and 6 to large ones. In
this case the matrix of regressors X would have the following configuration:

160
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

1 1 0 0 educ1 
1 1 0 0 educ2 

1 0 1 0 educ3 
X= 
1 0 1 0 educ4 
1 0 0 1 educ5 
 
1 0 0 1 educ6 
As can be seen in matrix X, column 1 of this matrix is equal to the sum of columns
2, 3 and 4. Therefore, there is perfect multicollinearity due to the so-called dummy
variable trap. Generalizing, if an attribute has g categories, we need to include only g−1
dummy variables in the model along with the intercept. The intercept for the reference
category is the overall intercept in the model, and the dummy variable coefficient for a
particular group represents the estimated difference in intercepts between that category
and the reference category. If we include g dummy variables along with an intercept, we
will fall into the dummy variable trap. An alternative is to include g dummy variables and
to exclude an overall intercept. In the case we are examining, the model would be the
following:
wage = θ 0 small + θ1medium + θ 2large + β 2 educ + u (5-16)
This solution is not advisable for two reasons. With this configuration of the model
it is more difficult to test differences with respect to a reference category. Second, this
solution only works in the case of a model with only one unique attribute.
EXAMPLE 5.4 Does firm size influence wage determination?
Using the sample of example 5.1 (file wage02sp), model (5-14), taking log for wage, was
estimated:
·wage) = 1.566 + 0.281 medium + 0.162 large + 0.0480 educ
ln(
(0.027) (0.025) (0.024) (0.0025)

2
RSS=406 R =0.218 n=2000
To answer the question above, we will not perform an individual test on θ1 or θ2. Instead we must
jointly test whether the size of firms has a significant influence on wage. That is to say, we must test whether
medium and large firms together have a significant influence on the determination of wage. In this case,
the null and the alternative hypothesis, taking (5-14) as the unrestricted model, will be the following:
H 0 : θ=
1 θ=
2 0
H1 : H 0 is not true
The restricted model in this case is the following:
β1 + β 2 educ + u
ln( wage) = (5-17)
The estimation of this model is the following:
·wage) = 1.657 + 0.0525 educ
ln(
(0.026) (0.0026)

2
RSS=433 R =0.166 n=2000
Therefore, the F statistic is

=F
[ RSS=
R − RSSUR ] / q [ 433 − 406] / 2
= 66.4
RSSUR / (n − k ) 406 / (2000 − 4)
So, according to the value of the F statistic, we can conclude that the size of the firm has a
significant influence on wage determination for the usual levels of significance.

161
INTRODUCTION TO ECONOMETRICS

Example 5.5 In the case of Lydia E. Pinkham, are the time dummy variables introduced significant
individually or jointly?
In example 3.4, we considered the case of Lydia E. Pinkham in which sales of a herbal extract
from this company (expressed in thousands of dollars) were explained in terms of advertising expenditures
in thousands of dollars (advexp) and last year's sales (salest-1). However, in addition to these two variables,
the author included three time dummy variables: d1, d2 and d3. These dummy variables encompass the
various situations which took place in the company. Thus, d1 takes 1 in the period 1907-1914 and 0 in the
remaining periods, d2 takes 1 in the period 1915-1925 and 0 in other periods, and finally, d3 takes 1 in the
period 1926 - 1940 and 0 in the remaining periods. Thus, the reference category is the period 1941-1960.
The final formulation of the model was therefore the following:
salest=β1+β2advexpt+β3salest-1+β4d1t+β5d2t+β6d3t+ut (5-18)
The results obtained in the regression, using file pinkham, were the following:
·
sales = 254.6+ 0.5345 advexp + 0.6073 sales - 133.35 d1 + 216.84 d 2 - 202.50 d 3
t t t- 1 t t t
(96.3) (0.136) (0.0814) (89) (67) (67)

R2=0.929 n=53
To test whether the dummy variables individually have a significant effect on sales, the null and
alternative hypotheses are:
ìïï H 0 : qi = 0
í i = 1, 2,3
ïïî H1 : qi ¹ 0
The corresponding t statistics are the following:
- 133.35 216.84 - 202.50
tqˆ = = - 1.50 tqˆ = = 3.22 tqˆ = = −3.02
1
89 2
67 3
67
As can be seen, the regressor d1 is not significant for any of the usual levels of significance,
whereas on the contrary the regressors d2 and d3 are significant for any of the usual levels.
The interpretation of the coefficient of the regressor d2, for example, is as follows: holding fixed
the advertising spending and given the previous year's sales, sales for one year of the period 1915-1920 are
$ 2.684 higher than for a year of the period 1941-1960.
To test jointly the effect of the time dummy variables, the null and alternative hypotheses are

ïíìï H 0 : q1 =q2 = q3 = 0
ïïî H1 : H 0 is not true
and the corresponding test statistic is
2
( RUR − RR2 ) / q (0.9290 − 0.8770) / 3
=F = = 11.47
(1 − RUR ) / (n − k ) (1 − 0.9290) / (53 − 6)
2

For any of the usual significance levels the null hypothesis is rejected. Therefore, the time dummy
variables have a significant effect on sales

5.4 Several attributes


Now we will consider the possibility of taking into account two attributes to
explain the determination of wage: gender and length of workday (part-time and full-
time). Let partime be a dummy variable that takes value 1 when the type of contract is
part-time and 0 if it is full-time. In the following model, we introduce two dummy
variables: female and partime:
β1 + δ1 female + φ1 partime + β 2 educ + u
wage = (5-19)
In this model, φ1 is the difference in hourly wage between those who work part-
time, given gender and the same amount of education (and also the same disturbance term
u).

162
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

Each of these two attributes has a reference category, which is the omitted
category. In this case, male is the reference category for gender and full-time for type of
contract. If we take expectations for the four categories involved, we obtain:
µ wage| female, partime = E [ wage | female, partime, educ ] = β1 + δ1 + φ1 + β 2 educ
µ wage| female, fulltime = E [ wage | female, fulltime, educ ] = β1 + δ1 + β 2 educ
µ wage|male, partime = E [ wage | male, partime, educ ] = β1 + φ1 + β 2 educ
µ wage|male, fulltime = E [ wage | male, fulltime, educ=] β1 + β 2 educ
(5-20)
The overall intercept in the equation reflects the effect of both reference categories,
male and full-time, and so full-time male is the reference category. From (5-20), you can
see the intercept for each combination of categories.
EXAMPLE 5.6 The influence of gender and length of the workday on wage determination
Model (5-19), taking log for wage, was estimated by using data from the wage structure survey
of Spain for 2006 (file wage06sp):
·wage) = 2.006 - 0.233 female - 0.087 partime + 0.0531 educ
ln(
(0.026) (0.021) (0.027) (0.0023)

RSS=365 R2=0.235 n=2000


According to the values of the coefficients and corresponding standard errors, it is clear that each
one of the two dummy variables, female and partime, are statistically significant for the usual levels of
significance.
EXAMPLE 5.7 Trying to explain the absence from work in the company Buenosaires
Buenosaires is a firm devoted to the manufacturing of fans, having had relatively acceptable results
in recent years. The managers consider that these would have been better if absenteeism in the company
were not so high. In order to analyze the factors determining absenteeism, the following model is proposed:
β1 δ1bluecoll + φ1male + β 2 age + β3tenure + β 4 wage + u
absent =+ (5-21)
where bluecoll is a dummy indicating that the person is a manual worker (the reference category is white
collar) and tenure is a continuous variable reflecting the years worked in the company.
Using a sample of size 48 (file absent), the following equation has been estimated:
·
absent = 12.444 + 0.968 bluecoll + 2.049 male - 0.037 age - 0.151tenure - 0.044 wage
(1.640) (0.669) (0.712) (0.047) (0.065) (0.007)

2
RSS=161.95 R =0.760 n=48
Next, we will look at whether bluecoll is significant. Testing H 0 : δ1 = 0 against H1 : δ1 ≠ 0 , the
t statistic is (0.968/0.669)=1.45. As t400.10/ 2 =1.68, we fail to reject the null hypothesis for α=0.10. And so
there is no empirical evidence to state that absenteeism amongst blue collar workers is different from white
collar workers. But if we test H 0 : δ1 = 0 against H1 : δ1 > 0 , as t40 0.10
=1.30 for α=0.10, then we cannot
reject that absenteeism amongst blue collar workers is greater than amongst white collar workers.
On the contrary, in the case of the male dummy, testing H 0 : ϕ1 = 0 against H1 : ϕ1 ≠ 0 , given that
the t statistic is (2.049/0.712)=2.88 and t400.01/ 2 =2.70, we reject that absenteeism is equal in men and women
for the usual levels of significance.
EXAMPLE 5.8 Size of firm and gender in determining wage
In order to know whether the size of the firm and gender jointly are two relevant factors in
determining wage, the following model is formulated:
β1 + δ1 female + θ1medium + θ 2 large + β 2 educ + u
ln( wage) = (5-22)
In this case, we must perform a joint test where the null and the alternative hypotheses are

163
INTRODUCTION TO ECONOMETRICS

H 0 : δ=
1 θ=
1 θ=
2 0
H1 : H 0 is not true
In this case, the restricted model is model (5-17) which was estimated in example 5.4 (file
wage02sp). The estimation of the unrestricted model is the following:
·wage) = 1.639 - 0.327 female + 0.308 medium + 0.168 large + 0.0499 educ
ln(
(0.026) (0.021) (0.023) (0.023) (0.0024)

2
RSS=361 R =0.305 n=2000
The F statistic is

=F R − RSSUR ] / q
[ RSS= [ 433 − 361] / 3
= 133
RSSUR / (n − k ) 361 / (2000 − 5)
Therefore, according to the value of F, we can conclude that the size of the firm and gender jointly
have a significant influence in wage determination.

5.5 Interactions involving dummy variables.


5.5.1 Interactions between two dummy variables
To allow for the possibility of an interaction between gender and length of the
workday on wage determination, we can add an interaction term between female and
partime in model (5-19), with the model to estimate being the following:
β1 + δ1 female + φ1 partime + ϕ1 female × partime + β 2 educ + u
wage =
(5-23)
This allows working time to depend on gender and vice versa.
EXAMPLE 5.9 Is the interaction between females and part-time work significant?
Model (5-23), taking log for wage, was estimated by using data from the wage structure survey of
Spain for 2006 (file wage06sp):
·wage) = 2.007 - 0.259 female - 0.198 partime + 0.167 female ´ partime + 0.0538 educ
ln(
(0.026) (0.022) (0.047) (0.058) (0.0024)

2
RSS=363 R =0.238 n=2000
To answer the question posed, we have to test H 0 : ϕ1 = 0 against H 0 : ϕ1 ≠ 0 . Given that the t
statistic is (0.167/0.058)=2.89 and taking into account that t600.01/ 2 =2.66, we reject the null hypothesis in
favor of the alternative hypothesis. Therefore, there is empirical evidence that the interaction between
females and part-time work is statistically significant.
EXAMPLE 5.10 Do small firms discriminate against women more or less than larger firms?
To answer this question, we formulate the following model:
ln( wage) = β1 + δ1 female + θ1medium + θ 2 large
(5-24)
+ϕ1 female × medium + ϕ 2 female × large + β 2 educ + u
Using the sample of example 5.1 (file wage02sp), model (5-24) was estimated:
·wage) = 1.624 - 0.262 female + 0.361 medium + 0.179 large
ln(
(0.027) (0.034) (0.028) (0.027)

- 0.159 female ´ medium - 0.043 female ´ large + 0.0497 educ


(0.050) (0.051) (0.0024)

2
RSS=359 R =0.308 n=2000
If in (5-24) the parameters ϕ1 and ϕ2 are equal to 0, this will imply that in the equation for wage
determination, there will be non interaction between gender and firm size. Thus to answer the above
question, we take (5-24) as the unrestricted model. The null and the alternative hypothesis will be the
following:

164
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

H 0 : ϕ=
1 ϕ=
2 0
H1 : H 0 is not true
In this case, the restricted model is therefore model (5-22) estimated in example 5.7. The F statistic
takes the value

=F R − RSSUR ] / q
[ RSS= [361 − 359] / 2
= 5.55
RSSUR / (n − k ) 359 / (2000 − 7)
For α=0.01, we find that F2,1993
0.01 0.01
; F2,60 = 4.98 . As F>5.61, we reject H0 in favor of H1. As H0 has
been rejected for α=0.01, it will also be rejected for levels of 5% and 10%. Therefore, the usual levels of
significance, the interaction between gender and firm size is relevant for wage determination.

5.5.2 Interactions between a dummy variable and a quantitative variable


So far, in the examples for wage determination a dummy variable has been used
to shift the intercept or to study its interaction with another dummy variable, while
keeping the slope of educ constant. However, one can also use dummy variables to shift
the slopes by letting them interact with any continuous explanatory variables. For
example, in the following model the female dummy variable interacts with the continuous
variable educ:
wage =β1 + β 2 educ + δ1 female × educ + u (5-25)
As can be seen in figure 5.2, the intercept is the same for men and women in this
model, but the slope is greater in men than in women because δ1 is negative.
In model (5-25), the returns to an extra year of education depend upon the gender
of the individual. In fact,
∂wage  β 2 + δ1 for women
= (5-26)
∂educ  β 2 for men
wage

β2

β2+ δ1

β1

0 educ

FIGURE 5.2. Different slope, same intercept.


EXAMPLE 5.11 Is the return to education for males greater than for females?
Using the sample of example 5.1 (file wage02sp), model (5-25) was estimated by taking log for
wage:

165
INTRODUCTION TO ECONOMETRICS

·wage) = 1.640 + 0.0632 educ - 0.0274 educ ´ female


ln(
(0.025) (0.0026) (0.0021)

2
RSS=400 R =0.229 n=2000
In this case, we need to test H 0 : δ1 = 0 against H1 : δ1 < 0 . Given that the t statistic is (-
0.0274/0.0021) =-12.81, we reject the null hypothesis in favor of the alternative hypothesis for any level of
significance. That is to say, there is empirical evidence that the return for an additional year of education is
greater for men than for women.

5.6 Testing structural changes


So far we have tested hypotheses in which one parameter, or a subset of
parameters of the model, is different for two groups (women and men, for example). But
sometimes we wish to test the null hypothesis that two groups have the same population
regression function, against the alternative that it is not the same. In other words, we want
to test whether the same equation is valid for the two groups. There are two procedures
for this: using dummy variables and running separate regressions through the Chow test.

5.6.1 Using dummy variables


In this procedure, testing for differences across groups consists in performing a
joint significance test of the dummy variable, which distinguishes between the two groups
and its interactions with all other independent variables. We therefore estimate the model
with (unrestricted model) and without (restricted model) the dummy variable and all the
interactions.
From the estimation of both equations we form the F statistic, either through the
RSS or from the R2. In the following model for the determination of wages, the intercept
and the slope are different for males and females:
wage =β1 + δ1 female + β 2 educ + δ 2 female × educ + u (5-27)
The population regression function corresponding to this model is represented in
figure 5.3. As can be seen, if female=1, we obtain
wage = ( β1 + δ1 ) + ( β 2 + δ 2 )educ + u (5-28)

For women the intercept is β1 + δ1 , and the slope β 2 + δ 2 . For female=0, we obtain
equation (5-1). In this case, for men the intercept is β1 , and the slope β 2 . Therefore, δ1
measures the difference in intercepts between men and women and, δ2 measures the
difference in the return to education between males and females. Figure 5.3 shows a lower
intercept and a lower slope for women than for men. This means that women earn less
than men at all levels of education, and the gap increases as educ gets larger; that is to
say, an additional year of education shows a lower return for women than for men.
Estimating (5-27) is equivalent to estimating two wage equations separately, one
for men and another for women. The only difference is that (5-27) imposes the same
variance across the two groups, whereas separate regressions do not. This set-up is ideal,
as we will see later on, for testing the equality of slopes, equality of intercepts, and
equality of both intercepts and slopes across groups.

166
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

wage
β2

β2+ δ2

β1

β1 + δ1

0 educ
FIGURE 5.3. Different slope, different intercept.
EXAMPLE 5.12 Is the wage equation valid for both men and women?
If parameters δ1 and δ2 are equal to 0 in model (5-27), this will imply that the equation for wage
determination is the same for men and women. In order to answer the question posed, we take (5-27), as
the unrestricted model but express wage in logs. The null and the alternative hypothesis will be the
following:
H 0 : δ=
1 δ=
2 0
H1 : H 0 is not true
Therefore, the restricted model is model (5-17). Using the same sample as in example 5.1 (file
wage02sp), we have obtained the following estimation of models (5-27) and (5-17):
·wage) = 1.739 - 0.3319 female + 0.0539 educ - 0.0027 educ ´ female
ln(
(0.030) (0.0546) (0.0030) (0.0054)

2
RSS=393 R =0.243 n=2000
·
ln(wage) = 1.657 + 0.0525 educ
(0.026) (0.0026)

2
RSS=433 R =0.166 n=2000
The F statistic takes the value

=F
[ RSS=
R − RSSUR ] / q [ 433 − 393] / 2
= 102
RSSUR / (n − k ) 393 / (2000 − 4)
It is clear that for any level of significance, the equations for men and women are different.
When we tested in example 5.1 whether there was discrimination in Spain against women
( H 0 : δ1 = 0 against H1 : δ1 < 0 ), it was assumed that the slope of educ (model (5-6)) is the same for men
and women. Now it is also possible to use model (5-27) to test the same null hypothesis, but assuming a
different slope. Given that the t statistic is (-0.3319/0.0546)=-6.06, we reject the null hypothesis by using
this more general model than the one in example 5.1.
In example 5.11 it was tested whether the coefficient δ2 in model (5-25), taking log for wage, was
0, assuming that the intercept is the same for males and females. Now, if we take (5-27), taking log for
wage, as the unrestricted model, we can test the same null hypothesis, but assuming that the intercept is
different for males and females. Given that the t statistic is (0.0027/0.0054)=0.49, we cannot reject the null
hypothesis which states that there is no interaction between gender and education.
EXAMPLE 5.13 Would urban consumers have the same pattern of behavior as rural consumers regarding
expenditure on fish?
To answer this question, we formulate the following model which is taken as the unrestricted
model:

167
INTRODUCTION TO ECONOMETRICS

β1 δ1urban + β 2 ln(inc) + δ 2 ln(inc) × urban + u


ln( fish) =+ (5-29)
The null and the alternative hypothesis will be the following:
H 0 : δ=
1 δ=
2 0
H1 : H 0 is not true
The restricted model corresponding to this H0 is
β1 + β 2 ln(inc) + u
ln( fish) = (5-30)
Using the sample of example 5.3 (file demand), models (5-29) and (5-30) were estimated:
· fish) = - 6.551 + 0.678 urban + 1.337 ln(inc) - 0.075ln(inc) ´ urban
ln(
(0.627) (1.095) (0.087) (0.152)

2
RSS=1.123 R =0.904 n=40
· fish) = - 6.224 + 1.302 ln(inc)
ln(
(0.542) (0.075)

2
RSS=1.325 R =0.887 n=40
The F statistic takes the value

=F
[ RSS
= R − RSSUR ] / q [1.325
=
− 1.123] / 2
3.24
RSSUR / (n − k ) 1.123 / (40 − 4)
If we look up in the F table for 2 df in the numerator and 35 df in the denominator for α=0.10, we
0.10
find F2,36 0.10
; F2,35 = 2.46 . As F>2.46 we reject H 0 . However, as F2,36
0.05 0.05
; F2,35 = 3.27 , we fail to reject H 0
in favour of H1 for α=0.05 and, therefore, for α=0.01. Conclusion: there is no strong evidence that families
living in rural areas have a different pattern of fish consumption than families living in rural areas.
Example 5.14 Has the productive structure of Spanish regions changed?
The question to be answered is specifically the following: Did the productive structure of Spanish
regions change between 1995 and 2008? The problem posed is a problem of structural stability. To specify
the model to be taken as a reference in the test, let us define the dummy y2008, which takes the value 1 if
the year is 2008 and 0 if the year is 1995.
The reference model is a Cobb-Douglas model, which introduces additional parameters to collect
the structural changes that may have occurred. Its expression is:
ln(q ) =γ 1 + α1 ln(k ) + β1 ln(l ) + γ 2 y 2008 + α 2 y 2008 × ln(k ) + β 2 y 2008 × ln(l ) + u
(5-31)
It is easily seen, according to the definition of the dummy y2008, that the elasticities
production/capital are different in the periods 1995 and 2008. Specifically, they take the following values:
∂ ln(Q) ∂ ln(Q)
ε Q=
/ K (1995) = α1 ε Q=
/ K (2008) = α1 + α 2
∂ ln( K ) ∂ ln( K )
In the case that α2 is equal to 0, then the elasticity of production/capital is the same in both periods.
Similarly, the production/labor elasticities for the two periods are given by
∂ ln( L) ∂ ln( L)
ε Q=
/ K (1995) = β1 ε Q=
/ K (2008) = β1 + β 2
∂ ln( K ) ∂ ln( K )
The intercept in the Cobb-Douglas is a parameter that measures efficiency. In model (5-31), the
possibility that the efficiency parameter (PEF) is different in the two periods is considered. Thus
PEF (1995) γ=
= 1 PEF (2008) γ 1 + γ 2
If the parameters α1, β1 and γ1 are zero in model (5-31), the production function is the same in both
periods. Therefore, in testing structural stability of the production function, the null and alternative
hypotheses are:
H0 : γ 2 = α 2 = β2
(5-32)
H1 : H 0 is not true
Under the null hypothesis, the restrictions given in (5-32) lead to the following restricted model:

168
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

γ 1 + α1 ln(k ) + β1 ln(l ) + u
ln(q ) = (5-33)
The file prodsp contains information for each of the Spanish regions in 1995 and 2008 on gross
value added in millions of euros (gdp), occupation in thousands of jobs (labor), and productive capital in
millions of euros (captot). You can also find the dummy y2008 in that file.
The results of the unrestricted regression model (5-31) are shown below. It is evident that we
cannot reject the null hypothesis that each of the coefficients α1, β1 and γ1, taken individually, are 0, since
none of the t statistics reaches 0.1 in absolute value.
· gva ) = 0.0559+ 0.6743ln(captot ) + 0.3291ln(labor )
ln(
(0.916) (0.185) (0.185)

- 0.1088 y 20108 + 0.0154 y 2008´ ln(captot ) - 0.0094 y 2008´ ln(labor )


(2.32) (0.419) (0.418)

2
R =0.99394 n=34
The results of the restricted model (5-33) are the following:
· gva ) = − 0.0690+ 0.6959 ln(captot ) + 0.311ln(labor )
ln(
(0.200) (0.036) (0.042)

2
R =0.99392 n=34
As can be seen, the R2 of the two models are virtually identical because they differ only from the
fifth decimal. It is not surprising, therefore, that the F statistic for testing the null hypothesis (5-32) takes a
value close to 0:
2
( RUR − RR2 ) / q (0.99394 − 0.99392) / 3
=F = = 0.0308
(1 − RUR ) / (n − k ) (1 − 0.99394) / (34 − 6)
2

Thus, the alternative hypothesis that there is structural change in the productive economy of the
Spanish regions between 1995 and 2008 is rejected for any significance level.

5.6.2 Using separate regressions: The Chow test


This test was introduced by the econometrician Chow (1960). He considered the
problem of testing the equality of two sets of regression coefficients. In the Chow test,
the restricted model is the same as in the case of using dummy variables to distinguish
between groups. The unrestricted model, instead of distinguishing the behaviour of the
two groups by using dummy variables, consists simply of separate regressions. Thus, in
the wage determination example, the unrestricted model consists of two equations:
female : β11 + β 21educ + u
wage =
(5-34)
male : β12 + β 22 educ + u
wage =
If we estimate both equations by OLS, we can show that the RSS of the unrestricted
model, RSSUR, is equal to the sum of the RSS obtained from the estimates for women,
RSS1, and for men, RSS2. That is to say,
RSSUR=RSS1+RSS2
The null hypothesis states that the parameters of the two equations in (5-34) are
equal. Therefore
 β = β12
H 0 :  11
 β 21 = β 22
H1 : No H 0
By applying the null hypothesis to model (5-34), you get model (5-17), which is
the restricted model. The estimation of this model for the whole sample is usually called

169
INTRODUCTION TO ECONOMETRICS

the pooled (P) regression. Thus, we will consider that the RSSR and RSSP are equivalent
expressions.
Therefore, the F statistic will be the following:
 RSS P − ( RSS1 + RSS 2 )  / k
F= (5-35)
[ RSS1 + RSS2 ] / [ n − 2k ]
It is important to remark that, under the null hypothesis, the error variances for the
groups must be equal. Note that we have k restrictions: the slope coefficients (interactions)
plus the intercept. Note also that in the unrestricted model we estimate two different
intercepts and two different slope coefficients, and so the df of the model are n−2k.
One important limitation of the Chow test is that under the null hypothesis there
are no differences at all between the groups. In most cases, it is more interesting to allow
partial differences between both groups as we have done using dummy variables.
The Chow test can be generalized to more than two groups in a natural way. From
a practical point of view, to run separate regressions for each group to perform the test is
probably easier than using dummy variables.
In the case of three groups, the F statistic in the Chow test will be the following:

F=
[ RSS P − ( RSS1 + RSS2 + RSS3 )] / 2 × k (5-36)
( RSS1 + RSS 2 + RSS3 ) / (n − 3k )
Note that, as a general rule, the number of the df of the numerator is equal to the
(number of groups-1) × k, while the number of the df of the denominator is equal to n
minus (number of groups) × k.
EXAMPLE 5.15 Another way to approach the question of wage determination by gender
Using the same sample as in example 5.1 (file wage02sp), we have obtained the estimation of the
equations in (5-34), taking log for wage, for men and women, which taken together gives the estimation of
the unrestricted model:
Female equation ·wage) = 1.407 + 0.0566 educ
ln(
(0.042) (0.0041)

2
RSS=104 R =0.236 n=617
Male equation ·wage) = 1.739 + 0.0539 educ
ln(
(0.031) (0.0032)

2
RSS=289 R =0.175 n=1383
The restricted model, estimated in example 5.4, has the same configuration as the equations in
(5-34) but in this case refers to the whole sample. Therefore, it is the pooled regression corresponding to
the restricted model. The F statistic takes the value

=F
[ RSS
= P − ( RSS F + RSS M ) ] / k [ 433 − (104 + 289)] / 2
= 102
RSS F + RSS M ) / (n − 2k ) (104 + 289) / (2000 − 2 × 2)
The F statistic must be, and is, the same as in example 5.12. The conclusions are therefore the
same.
EXAMPLE 5.16 Is the model of wage determination the same for different firm sizes?
In other examples the intercept, or the slope on education, was different for three different firm
sizes (small, medium and large). Now we shall consider a completely different equation for each firm size.
Therefore, the unrestricted model will be composed by three equations:

170
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

samall : ln( wage) =β11 + δ11 female + β 21edu + u


medium : ln( wage) = β12 + δ12 female + β 22 edu + u (5-37)
large : ln( wage) =β13 + δ13 female + β 23 edu + u
The null and the alternative hypothesis will be the following:
 β=
11 β=
12 β13

H 0 :  δ=
11 δ=
12 δ13
 β= β= β
 21 22 23

H1 : No H 0
Given this null hypothesis, the restricted model is model (5-2).
The estimations of the three equations of (5-37), by using file wage02sp, are the following:
small ·wage) = 1.706 - 0.249 female + 0.0396 educ
ln(
(0.034) (0.031) (0.0038)

2
RSS=121 R =0.160 n=801
medium ·wage) = 1.934 - 0.422 female + 0.0548 educ
ln(
(0.051) (0.039) (0.0046)

RSS =123 R2=0.302 n=590


large ·wage) = 1.749 - 0.303 female + 0.0554 educ
ln(
(0.046) (0.039) (0.0044)

RSS =114 R2=0.273 n=609


The pooled regression has been estimated in example 5.1. The F statistic takes the value

F=
[ RSS P − ( RSSS + RSSM + RSS L )] / 2 × k
( RSS S + RSS M + RSS L ) / (n − 3k )

=
[393 − (121 + 123 + 114)] / 6
= 32.4
(121 + 123 + 114) / (2000 − 3 × 3)
For any level of significance, we reject that the equations for wage determination are the same for
different firm sizes.
EXAMPLE 5.17 Is the Pinkham model valid for the four periods?
In example 5.5, we introduced time dummy variables and we tested whether the intercept was
different for each period. Now, we are going to test whether the whole model is valid for the four periods
considered. Therefore, the unrestricted model will be composed by four equations:
1907-1914 β11 + β 21advexpt + β 31  salest −1 + ut
salest =
1915-1925 β12 + β 22 advexpt + β 32  salest −1 + ut
salest =
(5-38)
1926-1940 β13 + β 23 advexpt + β 33  salest −1 + ut
salest =
1941-1960 β14 + β 24 advexpt + β 34  salest −1 + ut
salest =
The null and the alternative hypothesis will be the following:
 β=
11 β=
12 β=
13 β14

H 0 :  β=
21 β=
22 β=
23 β 24
 β= β= β= β
 31 32 33 34

H1 : No H 0
Given this null hypothesis, the restricted model is the following model:
β1 + β 2 advexpt + β3  salest −1 + ut
salest = (5-39)
The estimations of the four equations of (5-38) are the following:
·
1907-1914 sales t = 64.84+ 0.9149 advexp + 0.4630 salest - 1 SSR = 36017 n = 7   
(603) (1.025) (0.425)

171
INTRODUCTION TO ECONOMETRICS

·
1915-1925 salest = 221.5+ 0.1279 advexp + 0.9319 salest - 1 SSR = 400605 n = 11
(190) (0.557) (0.300)

·
1926-1940 salest = 446.8+ 0.4638 advexp + 0.4445 salest - 1 SSR = 201614 n = 15
(112) (0.115) (0.0827)

·
1941-1960 salest = - 182.4+ 1.6753 advexp + 0.3042 salest - 1 SSR = 187332 n = 20
(134) (0.241) (0.111)

The pooled regression, estimated in example 3.4, is the following:


·
salest = 138.7+ 0.3288 advexp + 0.7593 salest - 1 SSR = 2527215 n = 53
(95.7) (0.156) (0.0915)

The F statistic takes the value

F=
[ SSRP − ( SSR1 + SSR2 + SSR3 + SSR4 )] / 3 × k
( SSR1 + SSR2 + SSR3 + SSR4 ) / (n − 4k )
[=
2527215 − (36017 + 400605 + 201614 + 187332) ] / 9
9.16
(36017 + 400605 + 201614 + 187332) / (53 − 4 × 3)
For any level of significance, we reject that the model (5-39) is the same for the four periods
considered.

Exercises
Exercise 5.1 Answer the following questions for a model with explanatory dummy
variables:
a) What is the interpretation of the dummy coefficients?
b) Why are not included in the model so many dummy variables as categories
there are?
Exercise 5.2 Using a sample of 560 families, the following estimations of demand for
rental are obtained:
qˆi =4.17 − 0.247 pi + 0.960 yi
(0.11) (0.017) (0.026)
2
R =0.371 n=560
qˆi =5.27 − 0.221 pi + 0.920 yi + 0.341 di yi
(0.13) (0.030) (0.031) (0.120)

R2=0.380
where qi is the log of expenditure on rental housing of the ith family, pi is the logarithm of
rent per m2 in the living area of the ith family, yi is the log of household disposable income
of the ith family and di is a dummy variable that takes value one if the family lives in an
urban area and zero in a rural area.
(The numbers in parentheses are standard errors of the estimators.)
a) Test the hypothesis that the elasticity of expenditure on rental housing with
respect to income is 1, in the first fitted model.
b) Test whether the interaction between the dummy variable and income is
significant. Is there a significant difference in the housing expenditure
elasticity between urban and rural areas? Justify your answer.
Exercise 5.3 In a linear regression model with dummy variables, answer the following
questions:
a) The meaning and interpretation of the coefficients of dummy variables in
models with endogenous variable in logs.

172
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

b) Express how a model is affected when a dummy variable is introduced in


a multiplicative way with respect to a quantitative variable.
Exercise 5.4 In the context of a multiple linear regression model,
a) What is a dummy variable? Give an example of an econometric model
with dummy variables. Interpret the coefficients.
b) When is there perfect multicollinearity in a model with dummy variables?
Exercise 5.5 The following estimation is obtained using data for workers of a company:
·
wage = 500 + 50tenure + 200college + 100male
i i i i
where wage is the wage in euros per month, tenure is the number of years in the company,
college is a dummy variable that takes value 1 if the worker is graduated from college
and 0 otherwise and male is a dummy variable which takes value 1 if the worker is male
and 0 otherwise.
a) What is the predicted wage for a male worker with six years of tenure and
college education?
b) Assuming that all working women have college education and none of the
male workers do, write a hypothetical matrix of regressors (X) for six
observations. In this case, would you have any problem in the estimation
of this model? Explain it.
c) Formulate a new model that allows to establish whether there are wage
differentials between workers with primary, secondary and college
education.
Exercise 5.6 Consider the following linear regression model:
yi =α + β xi + γ 1d1i + γ 2 d 2i + ui (1)
where y is the monthly salary of a teacher, x is the number of years of teaching experience
y d1 y d2 are two dummy variables taking the following values:
1 if the teacher is male 1 if the teacher is white
d1i =  d 2i = 
0 otherwise 0 otherwise
a) What is the reference category in the model?
b) Interpret γ1 and γ2. What is the expected salary for each of the possible
categories?
c) To improve the explanatory power of the model, the following alternative
specification was considered:
yi =α + β xi + γ 1d1i + γ 2 d 2i + γ 3 (d1i d 2i ) + ui (2)
d) What is the meaning of the term (d1i d 2i ) ? Interpret γ3.
e) What is the expected salary for each of the possible categories in model
(2)?
Exercise 5.7 Using a sample of 36 observations, the following results are obtained:
yˆt =1.10− 0.96 xt1 − 4.56 xt 2 + 0.34 xt 3
(0.12) (0.34) (3.35) (0.07)

n n

∑ (=
yˆt − y ) ∑ uˆt2 20.22
109.24=
2

=t 1 =t 1

173
INTRODUCTION TO ECONOMETRICS

(The numbers in parentheses are standard errors of the estimators.)


a) Test the individual significance of the coefficient associated with x2.
b) Calculate the coefficient of determination, R2, and explain its meaning.
c) Test the joint significance of the model.
d) Two additional regressions, with the same specification, were made for the
two categories A and B included in the sample (n1=21 y n2=15). In these
estimates the following RSS were obtained: 11.09 y 2.17, respectively.
Test if the behavior of the endogenous variable is the same in the two
categories.
Exercise 5.8 To explain the time devoted to sport (sport), the following model was
formulated
sport = b1 + d1 female + j 1smoker + b2 age + u (1)
where sport is the minutes spent on sports a day, on average; female and smoker are
dummy variables taking the value 1 if the person is a woman or smoker of at least five
cigarettes per day, respectively.
a) Interpret the meaning of δ1, j 1 and β2.
b) What is the expected time spent on sports activities for all possible
categories?
c) To improve the explanatory power of the model, the following alternative
specification was considered:
depor = b1 + d1mujer + j 1 fumador + g1mujer ´ fumador
(2)
+ d2 mujer ´ edad + j 2 fumador ´ edad + b2 edad + u
In model (2), what is the meaning of γ1? What is the meaning of δ2 and
j 2?
d) What are the possible marginal effects of sport with respect to age in the
model (2)? Describe them.
Exercise 5.9 Using information for Spanish regions in 1995 and 2000, several production
functions were estimated.
For the whole of the two periods, the following results were obtained:
· q) = 5.72 + 0.26 ln(k ) + 0.75ln(l ) - 1.14 f + 0.11 f ´ ln(k ) - 0.05 f ´ ln(l )
ln( (1)
= =
R 2 0.9594 =
R 2 0.9510 =
RSS 0.9380 n 34
· q) = 3.91 + 0.45ln(k ) + 0.60(l )
ln( (2)
= =
2
R 0.9567 2
R 0.9525= RSS 1.0007
Moreover, the following models were estimated separately for each of the years:
1995 · q) = 5.72 + 0.26 ln(k ) + 0.75l
ln( (3)
= =
R 2 0.9527 R 2 0.9459 RSS =0.6052
2000 · q ) = 4.58 + 0.37 ln(k ) + 0.70l
ln( (4)
= =
R 2 0.9629 R 2 0.9555 RSS =0.3331

174
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

where q is output, k is capital, l is labor and f is a dummy variable that takes the value 1
for 1995 data and 0 for 2000.
a) Test whether there is structural change between 1995 and 2000.
b) Compare the results of estimations (3) and (4) with estimation (1).
c) Test the overall significance of model (1).
Exercise 5.10 With a sample of 300 service sector firms, the following cost function was
estimated:
· i = 0.847 + 0.899 qty RSS = 901.074 n = 300
cost i
(0.025)

where qtyi is the quantity produced.


The 300 firms are distributed in three big areas (100 in each one). The following
results were obtained:
· i = 1.053 + 0.876 qty
Area 1: cost sˆ 2 = 0.457i
(0.038)

· = 3.279 + 0.835 qty


Area 2: cost sˆ 2 = 3.154
i i
(0.096)

· = 5.279 + 0.984 qty


Area 3: cost sˆ 2 = 4.255
i i
(0.10)

a) Calculate an unbiased estimation of σ 2 in the cost function for the


sample of 300 firms.
b) Is the same cost function valid for the three areas?
Exercise 5.11 To study spending on magazines (mag), the following models have been
formulated:
β1 β 2 ln(inc) + β3age + β 4 male + u
ln(mag ) =+ (1)
β1 β 2 ln(inc) + β3age + β 4 male + β5 prim + β 6 sec + u
ln(mag ) =+ (2)
where inc is disposable income, age is age in years, male is a dummy variable that takes
the value 1 if he is male, prim and sec are dummy variables that take the value 1 when
the individual has reached, at most, primary and secondary level respectively.
With a sample of 100 observations, the following results have been obtained
·mag ) = 1.27 + 0.756 ln(inc ) + 0.031 age - 0.017 male
ln( i i i i
(0.124) (0040) (0.001) (0.022)

RSS=1.1575 R2=0.9286
·mag ) = 1.26 + 0.811ln(inc ) + 0.030 age + 0.003 male - 0.250 prim + 0.108 sec
ln( i i i i i i
(0.020) (0.007) (0.0002) (0.003) (0.004) (0.005)
2
RSS=0.0306 R =0.9981
a) Is education a relevant factor to explain spending on magazines? What is
the reference category for education?
b) In the first model, is spending on magazines higher for men than for
women? Justify your answer.
c) Interpret the coefficient on the male variable in the second model. Is
spending on magazines higher for men than for women? Compare with the
result obtained in section a).

175
INTRODUCTION TO ECONOMETRICS

Exercise 5.12 Let fruit be the expenditure on fruit expressed in euros over a year carried
out by a household and let r1, r2, r3, and r4 be dichotomous variables which reflect the
four regions of a country.
a) If you regress fruit only on r1, r2, r3, and r4 without an intercept, what is
the interpretation of the coefficients?
b) If you regress fruit only on r1, r2, r3, and r4 with an intercept, what would
happen? Why?
c) If you regress fruit only on r2, r3, and r4 without an intercept, what is the
interpretation of the coefficients?
d) If you regress fruit only on r1- r2, r2, r4-r3, and r4 without an intercept, what
is the interpretation of the coefficients?
Exercise 5.13 Consider the following model
β1 + δ1 female + β 2 educ + u
wage =
Now, we are going to consider three possibilities of defining the female dummy
variable.
1 for female 2 for female 2 for female
1) female =  2) female =  3) female = 
0 for male 1 for male 0 for male
a) Interpret the dummy variable coefficient for each definition.
b) Is one dummy variable definition preferable to another? Justify the
answer.
Exercise 5.14 In the following regression model:
β1 + δ1 female + u
wage =
where female is a dummy variable, taking value 1 for female and value 0 for a male.
Prove that applying the OLS formulas for simple regression you obtain that
βˆ = wage
1 M

=δˆ1 wageF − wageM


where F indicates female and M male.
In order to obtain a solution, consider that in the sample there are n1 females and
n2 males: the total sample is n= n1+n2.
Exercise 5.15 The data of this exercise were obtained from a controlled marketing
experiment in stores in Paris on coffee expenditure, as reported in A. C. Bemmaor and D.
Mouchoux, “Measuring the Short-Term Effect of In-Store Promotion and Retail
Advertising on Brand Sales: A Factorial Experiment’, Journal of Marketing Research, 28
(1991), 202–14. In this experiment, the following model has been formulated to explain
the quantity sold of coffee per week:
β1 + δ1advert + β 2 ln(coffpric) + δ 2 advert × ln(coffpric) + u
ln(coffqty ) =
where coffpric takes three values: 1, for the usual price, 0.95 and 0.85; advert is a dummy
variable that takes value 1 if there is advertising in this week and 0 if there is not. The
experiment lasted for 18 weeks. The original model and three other models were
estimated, using file coffee2:

176
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

·coffqty ) = 5.85+ 0.2565 advert - 3.9760ln(coffpric ) - 1.069 advert ´ ln(coffpric )


ln( i i i i i
(0.04) (0.099) (0.450) (0.883)
1)
2
 R = 0.9468  n = 18
·coffqty ) = 5.83+ 0.3559 advert - 4.2539 ln(coffpric )
ln( i i i
2) (0.04) (0.057) (0.393)

R 2 = 0.9412 n = 18
3)

·coffqty ) = 5.88- 3.6939 ln(coffpric ) - 2.9575 advert ´ ln(coffpric )


ln( i i i i
(0.04) (0.513) (0.582)
2
  R = 0.9214 n = 18
·coffqty ) = 5.89- 5.1727 ln(coffpric)
ln( i
4) (0.07) (0.674)
2
R = 0.7863  n = 18
a) In model (2), what is the interpretation of the coefficient on advert?
b) In model (3), what is the interpretation of the coefficient on
advert×ln(coffpric?
c) In model (2), does the coefficient on advert have a significant positive
effect at 5% and at 1%?
d) Is model (4) valid for weeks with advertising and for weeks without
advertising?
e) In model (1), is the intercept the same for weeks with advertising and for
weeks without advertising?
f) In model (3), is the coffee demand/price elasticity different for weeks with
advertising and for weeks without advertising?
g) In model (4), is the coffee demand/price elasticity smaller than -4?
Exercise 5.16 (Continuation of exercise 4.39). Using file timuse03, the following models
have been estimated:
·
houswork i = 132+ 2.787 educi + 1.847 agei − 0.2337 paidworki
(23) (1.497) (0.308) (0.023) (1)
= 2
R 0.142 = n 1000
·
houswork i = − 3.02 + 3.641 educi + 1.775 agei − 0.1568 paidworki + 32.11 femalei
(22.29) (1.356) (0.279) (0.021) (2.16) (2)
= 2
R 0.298= n 1000
·
houswork = − 8.04 + 4.847 educ + 1.333 age − 0.0871 paidwork + 32.75 female
i i i i i
(35.18) (2.352) (0.502) (0.032) (8.15)

− 0.1650 educi × femalei + 0.1019 agei × femalei − 0.02625 paidworki × femalei (3)
(0.546) (0.112) (0.009)

= =
R 2 0.306 n 1000
a) In model (1), is there a statistically significant tradeoff between time
devoted to paid work and time devoted to housework?
b) All other factors being equal and taking as a reference model (2), is there
evidence that women devote more time to housework than men?
c) Compare the R2 of models (1) and (2). What is your conclusion?

177
INTRODUCTION TO ECONOMETRICS

d) In model (3), what is the marginal effect of time devoted to housework


with respect to time devoted to paid work?
e) Is interaction between paidwork and gender significant?
f) Are the interactions between gender and the quantitative variables of the
model jointly significant?
Exercise 5.17 Using data from Bolsa de Madrid (Madrid Stock Exchange) on
November 19, 2011 (file bolmad11), the following models have been estimated:
·
ln( marktvali ) = 1.784+ 0.6998 ibex35i + 0.6749 ln(bookvali ) (1)
(0.243) (0.179) (0.0369)
2
RSS=35.69 R =0.8931 n=92
·
ln( marktvali ) = 1.828+ 0.4236 ibex35i + 0.6678ln(bookvali )
(0.275) (0.778) (0.0423)
(2)
+ 0.0310 ibex35i ´ ln(bookvali )
(0.088)

RSS=35.622 R2=0.8933 n=92


·
ln( marktvali ) = 2.323+ 0.1987 ibex35i + 0.6688ln(bookvali )
(0.310) (0.785) (0.0405)

+ 0.0369 ibex35i ´ ln(bookvali ) - 0.6613 servicesi - 0.6698 consumpi (3)


(0.089) (0.236) (0.221)

- 0.1931 energyi - 0.3895 industryi - 0.7020 itti


(0.263) (0.207) (0.324)
2
RSS=30.781 R =0.9078 n=92
·
ln( marktvali ) = 1.366+ 0.7658ln(bookvali ) (4)
(0.234) (0.0305)

RSS=41.625 R2=0.8753 n=92


For finance=1 ·
ln( markvali ) = 0.558+ 0.9346 ln(bookvali ) (5)
(0.560) (0.0702)

RSS=2.7241 R2=0.9415 n=13


where
- marktval is the capitalization value of a company.
- bookval is the book value of a company.
- ibex35 is a dummy variable that takes the value 1 if the corporation is included
in the selective Ibex 35.
- services, consumption, energy, industry and itc (information technology and
communication) are dummy variables. Each of them takes the value 1 if the
corporation is classified in this sector in Bolsa de Madrid. The category of
reference is finance.
a) In model (1), what is interpretation of the coefficient on ibex35?
b) In model (1), is the marktval/bookval elasticity equal to 1?
c) In model (2), is the elasticity marktval/bookval the same for all
corporations included in the sample?
d) Is model (4) valid both for corporations included in ibex 35 and for
corporations excluded?
e) In model (3), what is interpretation of the coefficient on consump?
f) Is the coefficient on consump significatively negative?

178
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

g) Is the introduction of dummy variables for different sectors statistically


justifiable?
h) Is the marktval/bookval elasticity for the financial sector equal to 1?
Exercise 5.18 (Continuation óf exercise 4.37). Using file rdspain, the equations which
appear in the attached table have been estimated.
The following variables appear in the table:
- rdintens is expenditure on research and development (R&D) as a percentage of
sales,
- sales are measured in millions of euros,
- exponsal is exports as a percentage of sales;
- medtech and hightech are two dummy variables which reflects if the firm
belongs to a medium or a high technology sector. The reference category
corresponds to the firms with low technology,
- workers is the number of workers.
(4) (5) (6)
(1) (2) (3)
rdintens rdintens rdintens
rdintens rdintens rdintens
for hightech=1 for medtech=1 for lowtech=1
exponsal 0.0136 0.0101 0.00968 0.00584 0.0116 0.00977
(0.00195) (0.00193) (0.00189) (0.00792) (0.00300) (0.00169)

workers 0.000433 0.000392 0.000394 0.00196 0.0000563 0.000393


(0.0000740) (0.0000725) (0.000208) (0.000338) (0.0000815) (0.000121)

hightech 1.448 0.976


(0.141) (0.151)

medtech 0.361 0.472


(0.109) (0.112)

hightech× 0.00153
workers (0.000271)

medtech× -0.000326
workers (0.000222)

intercept 0.394 0.137 0.143 1.211 0.577 0.142


(0.0598) (0.0691) (0.0722) (0.313) (0.103) (0.0443)

n 1983 1983 1983 296 616 1071


R2 0.0507 0.0986 0.138 0.113 0.0278 0.0459
RSS 9282.7 8815.0 8425.3 4409.0 2483.6 1527.5
F 52.90 54.06 52.90 18.71 8.776 25.72
df_n 2 4 6 2 2 2
df_d 1980 1978 1976 293 613 1068
Standard errors in parentheses
a) In model (2), all other factors being equal, is there evidence that
expenditure on research and development (expressed as a percentage of
sales) in high technology firms is greater than in low technology firms?
How strong is the evidence?

179
INTRODUCTION TO ECONOMETRICS

b) In model (2), all other factors being equal, is there evidence that rdintens
in medium technology firms is equal to low technology firms? How strong
is the evidence?
c) Taking as reference model (2), if you had to test the hypothesis that
rdintens in high technology firms is equal to medium technology firms,
formulate a model that allows you to test this hypothesis without using
information on covariance matrix of the estimators
d) Is the influence of workers on rdintens associated with the level of
technology in the firms?
e) Is the model (1) valid for all firms regardless of their technological level?
Exercise 5.19 To explain the overall satisfaction of people (stsfglo), the following model
were estimated using data from the file hdr2010:
·
stsfglo i =− 0.375+ 0.0000207 gnipci + 0.0858 lifexpeci
(0.584) (0.00000617) (0.009) (1)
= 2
R 0.642= n 144
·
stsfglo = 2.911+ 0.0000381 gnipc + 1.215 lifexpec
i i i
(0.897) (0.00000572) (0.18)

+ 1.215 dlatami − 0.7901 dafricai (2)


(0.179) (0.259)

= =
R 2 0.748 n 144
·
stsfgloi =
1.701+ 0.0000327 gnipci + 0.0527 lifexpeci + 1.166 dlatami
(1.014) (0.000006) (0.0147) (0.177)

− 3.096 dafricai + 0.0000673 gnipci × dafricai − 0.0699 lifexpeci × dafricai (3)


(1.712) (0.0000456) (0.0295)

R 2 = 0.760 n = 144
where
- gnipc is gross national income per capita expressed in PPP 2008 US dollar
terms,
- lifexpec is life expectancy at birth, i.e., number of years a newborn infant
could be expected to live,
- dafrica is a dummy variable that takes value 1 if the country is in Africa,
- dlatam is a dummy variable that takes value 1 if the country is in Latin
America.
a) In model (2), what is the interpretation of the coefficients on dlatam and
dafrica?
b) In model (2), do dlatam and dafrica individually have a significant positive
influence on global satisfaction?
c) In model (2), do dlatam and dafrica have a joint influence on global
satisfaction?
d) Is the influence of life expectancy on global satisfaction smaller in Africa
than in other regions of the world?
e) Is the influence of the variable gnipc greater in Africa than in other regions
of the world at 10%?
f) Are the interactions of people living in Africa and the variables gnipc and
lifexpec jointly significant?

180
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

Exercise 5.20 The equations which appear in the attached table have been estimated using
data from the file timuse03. This file contains 1000 observations corresponding to a
random subsample extracted from the time use survey for Spain carried out in 2002-2003.
The following variables appear in the table:
- educ is years of education attained,
- sleep, paidwork and unpaidwrk are measured in minutes per day,
- female, workday (Monday to Friday), spaniard and houswife are dummy
variables.
a) In model (1), is there a statistically significant tradeoff between time
devoted to paid work and time devoted to sleep?
b) In model (1), is the coefficient on unpaidwk statistically significant?
c) In model (1), is there evidence that women sleep more than men?
d) In model (2), are workday and spaniard individually significant? Are they
jointly significant?
e) Is the coefficient on housewife statistically significant?
f) Are the interactions between female and educ, paidwork and unpaidwk
jointly significant?

181
INTRODUCTION TO ECONOMETRICS

(1) (2) (3) (4) (5) (6)


Sleep Sleep Sleep Sleep Sleep sleep

educ -4.669 -4.787 -4.805 -4.754 -4.782 -4.792


(0.916) (0.912) (0.912) (0.913) (0.917) (0.917)

persinc 0.0238 0.0207 0.0195 0.0210 0.0208 0.0208


(0.00587) (0.00600) (0.00607) (0.00601) (0.00601) (0.00601)

age 0.854 0.879 0.895 0.884 0.879 0.891


(0.174) (0.174) (0.174) (0.174) (0.174) (0.302)

paidwork -0.258 -0.247 -0.246 -0.248 -0.246 -0.247


(0.0150) (0.0159) (0.0159) (0.0160) (0.0210) (0.0159)

unpaidwk -0.205 -0.198 -0.188 -0.224 -0.198 -0.198


(0.0184) (0.0184) (0.0196) (0.0365) (0.0185) (0.0184)

female 4.161 3.588 3.981 2.485 3.638 3.727


(1.465) (1.467) (1.493) (1.975) (1.691) (3.287)

workday -19.31 -19.46 -19.47 -19.30 -19.30


(7.168) (7.165) (7.171) (7.173) (7.172)

spaniard -47.50 -46.88 -47.90 -47.63 -47.51


(19.99) (19.98) (20.00) (20.10) (20.00)

houswife -14.71
(10.42)

unpaidwk 0.00607
×female (0.00726)
paidwork -0.000324
×female (0.00540)

age× female -0.00308


(0.0652)

intercept 588.9 648.3 646.6 651.9 648.2 647.8


(13.62) (24.34) (24.36) (24.73) (24.39) (26.40)

N 1000 1000 1000 1000 1000 1000


R2 0.316 0.325 0.326 0.325 0.325 0.325
RSS 9913901.3 9789312.3 9769648.2 9782424.0 9789276.9 9789290.3
F 76.58 59.62 53.27 53.06 52.95 52.95
df_n 6 8 9 9 9 9
df_d 993 991 990 990 990 990
Standard errors in parentheses

Exercise 5.21 To study infant mortality in the world, the following models have been
estimated using data from the file hdr2010:
·
deathinf = 93.02- 0.00037 gnipc - 0.6046 physicn - 0.003 contrcep
i i i i
(4.58) (0.0002) (0.1866) (0.003)
(1)
2
RSS=40285 R =0.6598 n=108

182
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

·
deathinf i = 78.55- 0.00042 gnipc - 0.3809 physicni - 0.6989 contrcepi
(5.96) (0.0002) (0.1879) (0.1042)
(2)
+ 17.92 dafrica
(5.05)

RSS=35893 R2=0.6851 n=108


·
deathinf i = 72.58- 0.00044 gnipc - 0.3994 physicni - 0.5857 contrcepi
(6.76) (0.0002) (0.1879) (0.1234)

+ 17.92 dafrica - 0.0000914 gnipc ´ dafrica - 2.0013 physicn ´ dafrica (3)


(5.05) (0.000826) (2.2351)

- 0.2172 contrcepi ´ dafrica


(0.2716)

RSS=34309 R2=0.7109 n=108


where
- deatinf is number of infant deaths (one year or younger) per 1000 live births
in 2008,
- gnipc is gross national income per capita expressed in PPP 2008 US dollar
terms,
- physicn are physicians per 10,000 people in the period 2000-2009,
- contrcep is the contraceptive prevalence rate using any method, expressed as %
of married women aged 15–49 for the period 1990-2008,
- dafrica is a dummy variable that takes value 1 if the country is in Africa.
a) In model (1), what is interpretation of the coefficients on gnipc, physicn
and contrcep?
b) In model (2), what is the interpretation of the coefficient on dafrica?
c) In model (2), all other factors being equal, do the countries of Africa have
a greater infant mortality than the countries of other regions of the world?
d) What is the marginal effect of variable gnipc on infant mortality in model
(3)?
e) Is the slope corresponding to the regressor contrcep significantly greater
for the countries of Africa?
f) Are the slopes corresponding to the regressors gnipc, physicn and contrcep
jointly different for the countries of Africa?
g) Is the model (1) valid for all countries of the world?
Exercise 5.22 Using a random subsample of 2000 observations extracted from the time
use surveys for Spain carried out in the periods 2002-2003 and 2009-2010 (file timus309),
the following models have been estimated to explain time spent watching television:
watchtv =
114 − 3.523 educ + 1.330 age − 0.1111 paidwork
(9.46) (0.620) (0.130) (0.0102)
(1)
= =
R 0.169 n 20002

watchtv =−
127 3.653 educ + 1.291 age − 0.120 paidwork − 25.146 female
(9.915) (0.615) (0.129) (0.010) (4.903)
(2)
+ 17.137 y 2009 R =
0.184 n =
2
2000
(5.247)

183
INTRODUCTION TO ECONOMETRICS

watchtv =−
123 3.583 educ + 1.302 age − 0.105 paidwork − 24.869 female
(10.01) (0.615) (0.129) (0.012) (4.899)
(3)
+ 24.536 y 2009 − 0.050 y 2009 × paidwork R =
2
0.186 n=2000
(6.115) (0.021)

where
- educ is years of education attained,
- watchtv and paidwork are measured in minutes per day.
- female is a dummy variable that takes value 1 if the interviewee is a female
- y2009 is a dummy variable that takes value 1 if the survey was carried out in
2008-2009
a) In model (1), what is interpretation of the coefficient on educ?
b) In model (1), is there a statistically significant tradeoff between time
devoted to work and time devoted to watching television?
c) All other factors being equal and taking as reference model (2), is there
evidence that men watch television more than women? How strong is the
evidence?
d) In model (2), what is the estimated difference in watching television
between females surveyed in 2008-2009 and males surveyed in 2002-2003?
Is this difference statistically significant?
e) In model (3), what is the marginal effect of time devoted to paid work on
time devoted to watching television?
f) Is there a significant interaction between the year of the survey and time
devoted to paid work?
Exercise 5.23 Using the file consumsp, the following models were estimated to analyze
if the entry of Spain into the European community in 1986 had any impact on the behavior
of Spanish consumers:
·
conspc t = − 7.156+ 0.3965 incpct + 0.5771 conspct −1
(84.88) (0.0857) (0.0903)
(1)
2
R =0.9967 RSS=1891320 n=56
·
conspc t = −102.4+ 0.3573 incpct + 0.5992 conspct −1 + 148.60 y1986t
(108) (0.0879) (0.0901) (92.56)
(2)
2
R =0.9968 RSS=1802007 n=56
·
conspc =79.17 + 0.5181incpc + 0.4186 conspc + 819.82 y1986
t t t −1 t
(114) (0.1100) (0.1199) (456.3)

− 0.5403 incpct × y1986t + 0.5424 conspct −1 × y1986t (3)


(0.2338) (0.2182)

R2=0.9972 RSS=1600714 n=56


·
conspct =+
117.03 0.3697 incpct + 0.5823 conspct −1 + 41.62 y1986t
(118) (0.0968) (0.1051) (348)

+ 0.0104 incpct × y1986t


(0.0326)
(4)
R2=0.9968 RSS=1798423 n=56
·
conspct =
120.1+ 0.3750 incpct + 0.5758 conspct −1 + 0.0141incpct × y1986t
(114) (0.0854) (0.0890) (0.0087)
(5)
2
R =0.9968 RSS=1798927 n=56

184
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION

(The numbers in parentheses are standard errors of the estimators.)


a) Test in model (5) whether the marginal propensity to consume in the short
term was reduced in 1986 and beyond.
b) Are the interactions between y1986 and the quantitative variables of the
model jointly significant?
c) Test whether there was a structural change in the consumption function in
1986.
d) Test whether the coefficient on conspct-1 changed in 1986 and beyond.
e) Was there a gap between consumption before 1986, with respect to 1986
and beyond?

185
6 RELAXING THE ASSUMPTIONS IN THE LINEAR
CLASSICAL MODEL

6.1 Relaxing the assumptions in the linear classical model: an overview


In chapters 2 and 3, single and multiple linear regression models were formulated,
including the set of statistical assumptions called the classical linear model (CLM)
assumptions. Now, let us examine the problems posed by the failure of each one of the
CLM assumptions and alternative methods for estimating the linear model.

Assumption on the functional form


Assumption 1 postulates the following population model:
y = β1 + β 2 x1 + L + β k xk +u (6-1)
This assumption specifies what the endogenous variable is and its functional form,
as well as what the explanatory variables are and their functional forms. It also states that
the model is linear on the parameters
If we estimate a different population model, a misspecification error is made. The
consequences of such errors will be discussed in section 6.2.

Assumptions on the regressors


The assumptions 2, 3 and 4 were made on the regressors. In the multiple linear
regression, assumption 2 postulated that the values x2 , x3 , L , xk are fixed in repeated
samples, that is to say, the regressors are non-stochastic. This is a reasonable assumption
when the regressors are obtained from experiments. However, it is less admissible for
variables obtained by observation in a passive way, as in the case of income in the
consumption function.
When the regressors are stochastic, the statistical relationship between the
regressors and the random disturbance is crucial in building an econometric model. For
this reason, an alternative assumption was formulated as 2*: the regressors x2 , x3 , L , xk
are distributed independently of the random disturbance. When we assume this alternative
assumption, the inference, conditional on the matrix of regressors, leads to results that are
virtually coincident with the case where the matrix X is fixed. In other words, in the case
of independence between the regressors and the random disturbance, the ordinary least
squares method is still the optimal method for estimating the vector of coefficients.

186
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

In assumption 3 it was postulated that the matrix of regressors X contains no


measurement errors. If there are measurement errors, a very serious econometric problem
will arise with a complex solution.
Assumption 4 states that there is no exact linear relationship between the
regressors, or, in other words, it establishes that there is no perfect multicollinearity in the
model. This assumption is necessary to calculate the OLS estimators. Perfect
multicollinearity is not used in practice. Instead, there is often an approximately linear
relationship between the regressors. In this case the estimators obtained will not be
accurate, although they still retain the property of being BLUE estimators. In other words,
the relationship between the regressors makes it difficult to quantify the effect that each
one has on the regressand. This is due to the fact that the variances of the estimators are
high. When an approximately linear relationship between the regressors exists,
multicollinearity is not perfect. Section 6.3 will be devoted to examining the detection of
non-perfect multicollinearity, along with some possible solutions

Assumptions on the parameters


In assumption 5 it was assumed that the parameters are not random. The real world
suggests that this coefficient constancy is not reasonable. In models using time series data,
there are often changes in patterns of behavior over time, which would naturally involve
changes in the regression coefficients. In any case, section 5.6 examines the test of
structural change which determines whether there has been any change in the parameters
over time.

Assumptions on the random disturbance term


In assumption 6 it is assumed that E(u)=0. This assumption is not empirically
testable in the general case of models with intercept.
Before moving on to other assumptions on the random disturbance ui, it should be
noted that this is an unobservable variable. Information on ui is obtained indirectly
through the residuals, which will be used for testing the behavior of the disturbances.
However, the use of residuals to perform tests on disturbances poses some problems.
When the CLM assumptions are fulfilled, the random disturbances are neither
autocorrelated nor homoskedastic, whereas the residuals are heteroskedastic and
autocorrelated under these assumptions. These circumstances are important in the design
of statistical tests on heteroskedasticity and no autocorrelation.
If assumptions 7 of homoscedasticity and/or 8 of no autocorrelation are not
fulfilled, the least squares estimators are still linear and unbiased but they are not the best.
The assumptions of homoskedasticity and no autocorrelation formulated in
chapter 3, respectively, may be formulated together indicating that the covariance matrix
of random disturbances is a scalar matrix, i.e.:

E (uu′) = σ 2I (6-2)
When one or both assumptions indicated are not fulfilled, then the covariance
matrix will be less restrictive. Thus, we will consider the following covariance matrix of
the disturbances:

187
INTRODUCTION TO ECONOMETRICS

E (uu′) = σ 2Ω (6-3)
where the only restriction imposed on Ω is that it is a positive definite matrix
When the covariance matrix is a non-scalar matrix such as (6-3), then one can
obtain linear, unbiased and best estimators by applying the method of generalized least
squares (GLS). The expression of these estimators is as follows:
−1
βˆ =  X′Ω −1X  X′Ω −1y (6-4)

In practice, formula (6-4) is not directly applied. Instead a two-step process that
leads to exactly the same results is applied.
In section 6.5, we will examine the tests to determine whether there is
heteroskedasticity, as well as the particularization of the GLS method in this case. Section
6.6 will present testing methods and the appropriate treatment of autocorrelation.
Assumption 9 of normality postulated in the CLM allows us to make statistical
inferences with known distributions. If the normality assumption is not adequate, then the
tests will only be approximately valid. In section 6.4, a normality test of the disturbances
is used to determine whether this assumption is acceptable.

6.2 Misspecification
Misspecification occurs when we estimate a different model from the population
model. The problem in social sciences, and in particular in economics, is that we do not
usually know the population model.
Bearing in mind this observation, we shall consider three types of misspecification:
- Inclusion of irrelevant variables.
- Exclusion of relevant variables.
- Incorrect functional form.

6.2.1 Consequences of misspecification


We will examine the consequences of each type of misspecification on the OLS
estimators

Inclusion of an irrelevant variable


Let us consider, for example, that the population model is the following:
y=β1 + β 2 x2 + u (6-5)
Consequently, the population regression function (PRF) is given by
µ=
y β1 + β 2 x2 (6-6)
Now let us suppose that the sample regression function (SRF) estimated is the
following
β%1 + β%2 x2i + β%3 x3i (6-7)
yi =
%

188
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

This is the case of inclusion of an irrelevant variable: specifically, in (6-7) we


have introduced the irrelevant variable x3. What are the effects of including an irrelevant
variable in the OLS estimators?
It can be shown that the estimators corresponding to (6-7) are unbiased, that is to
say,
E ( β%1 ) = β1 E (β%2 ) = β 2 E (β%3 ) = 0
However, the variances of these estimators will be greater than those obtained by
estimating (6-5) in which x3 is (correctly) omitted.
This result can be extended to the case of including one or more irrelevant
variables. In this case OLS estimators are unbiased, but with variances greater than when
the irrelevant variables are not included in the estimated model.

Exclusion of a relevant variable


Let us consider, for example, that the population model is the following:
β1 + β 2 x2i + β3 x3i +ui
yi = (6-8)
The PRF is therefore given by:
µy =β1 + β 2 x2 + β3 x3 (6-9)
Now let us suppose that the SRF we estimate, due to ignorance or data
unavailability, is the following
y=i β%1 + β%2 x2i (6-10)
%
This is a case of exclusion of a relevant variable: in (6-10) we have omitted the
relevant variable x3. Is β%2 , obtained by applying OLS in (6-10), an unbiased estimator of
β2 ?

As appendix 6.1 shows, the estimator β%2 is biased. The bias is


n

∑ (x 2i − x2 ) x3i
Bias ( β%2 ) = β3 i =1
n
(6-11)
∑ (x
i =1
2i − x2 ) 2

The bias is null if, according to (6-11), the covariance between x2 and x3 is 0. It is
important to remark that the ratio
n

∑ (x
i =1
2i − x2 ) x3i
n

∑ (x
i =1
2i − x2 ) 2

is just the OLS slope ( δˆ2 ) coefficient from regression of x3 on x2. That is to say,

189
INTRODUCTION TO ECONOMETRICS

∑ (x 2i − x2 ) x3i
δˆ1 + δˆ2 xˆ2 =
xˆ2 = δˆ1 + i =1
n
xˆ2 (6-12)
∑ (x
i =1
2i − x2 ) 2

Thus, according to (6-72) - in appendix 6.1-, and (6-12), we can write that
E ( β%2=
) β 2 + β3δˆ2 (6-13)

Therefore, the bias is equal to β3δˆ2 . In table 6.1, there is a summary of the sign of
the bias in β%2 when x3 is omitted in estimating equation. It must be taken into account that the
sign of δˆ2 is the same as the sign of the sample correlation between x2 and x3.

TABLE.1. Summary of bias in β%2 when x3 is omitted in estimating equation.


Corr(x2,x3)>0 Corr(x2,x3)<0
β3>0 Positive bias Negative bias
β3<0 Negative bias Positive bias

Incorrect functional form


If we use a functional form different from the true population model, then the OLS
estimators will be biased.
In conclusion, if there is exclusion of relevant variables or/and an incorrect
functional form has been used, then the OLS estimators will be biased and also
inconsistent. Therefore, the conventional inference procedures will be invalidated in these
two cases.

6.2.2 Specification tests: the RESET test


To test whether irrelevant variables are included in the model we can apply the
exclusion restriction tests, which we have examined in chapter 4.
To test the exclusion of relevant variables or the use of an incorrect functional
form, we can apply the RESET (Regression Equation Specification Error Test) test. This
test is a general test for specification errors proposed by Ramsey (1969). In order to
explain it, consider that the initial model is the following:
y=β1 + β 2 x2 + β3 x3 +u (6-14)
Now, we introduce an augmented model in which two new variables (z1 and z2)
appear:
y =β1 + β 2 x2 + β 3 x3 + α1 z1 + α 2 z2 +u (6-15)
Taking into account the specification of the two models, the null and alternative
hypotheses will be the following:
H 0 : α=
1 α=
2 0
(6-16)
H1 : H 0 is not true

190
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

The crucial question in building the test is to determine the z variables or


regressors to be introduced. In the case of exclusion of relevant variables, the z variables
will be the omitted regressors which may be new variables or also squares and powers of
previous variables. The test to be applied would be similar to the exclusion tests, but with
the roles reversed: the restricted model is now the initial model, while the unrestricted
model corresponds to the augmented model.
In testing for incorrect functional form, consider, for example, that (6-14) is
specified instead of the true relationship:
β1 + β 2 ln( x2 ) + β3ln( x3 )+u
ln( y ) = (6-17)
In model (6-17), there is a multiplicative relationship between the regressors.
Ramsey took into account that a Taylor series approximation of the multiplicative
relationship would yield an expression involving powers and cross-products of the
explanatory variables. For this reason, he suggests including, in the augmented model,
powers of the predicted values of the dependent variable (which are, of course, linear
combinations of power and cross-product terms of the explanatory variables):
y=β1 + β 2 x2 + β3 x3 + α1 yˆ 2 + α 2 yˆ 3 +u (6-18)
where the ŷ ´s are the OLS fitted values corresponding to the model (6-14). The
superscripts indicate the powers to which these predictions are raised. The first power is
not included since it is perfectly collinear with the rest of the regressors of the initial
model.
The steps involved in the RESET test are as follows:

Step 1. The initial model is estimated and the fitted values, yˆ i , are calculated.

Step 2. The augmented model, which can include one or more powers of yˆ i , is
estimated.
2 2
Step 3. Taking the Rinit corresponding to the initial model and the Raugm
corresponding to the augmented model, the F statistic is calculated:
2
( Raugm − Rinit
2
)/r
F= (6-19)
(1 − Raugm
2
) / ( n − h)
where r is the number of new parameters added to the initial model, and h
is the number of parameters of the augmented model, including the
intercept.
Under the null hypothesis, this statistic is distributed as follows:
F | H 0 : Fr ,n- h (6-20)

Step 4. For a significance level α, and designating by Frα,n − h the corresponding value
in the F table, the decision to make is the following:

191
INTRODUCTION TO ECONOMETRICS

If F ≥ Frα,n − h reject H0
If F < Frα,n − h not reject H 0
Therefore, high values of the statistic lead to the rejection of the initial model.
In RESET test we test the null hypothesis against an alternative hypothesis that
does not indicate what the correct specification should be. This test is therefore a
misspecification test which may indicate that there is some form of misspecification but
does not give any indication of what the correct specification should be.
EXAMPLE 6.1 Misspecification in a model for determination of wages
Using a subsample of data from the wage structure survey of Spain for 2006 (file wage06sp), the
following model is estimated:
·
wage = 4.679 + 0.681 educ + 0.293 tenure
i i i
(1.55) (0.146) (0.071)

R2=0.249 n=150
where educ (education) and tenure (experience in the firm) are measured in years and wage in euros per
hour.
Considering that we may have a problem of incorrect functional form, an augmented model is
· 2 and wage
estimated. In this augmented model - besides educ, tenure, and the intercept - wage · 3 from the
i i
2 2
initial model are included as regressors. The F statistic calculated using the Rinit and Raugm , according
0.05
to (6-19), is equal to 4.18. Given that F2,145 0.05
; F2,60 = 3.15 , we reject that, for the levels α=0.05 and α=0.10,
0.01
the linear form is adequate to explain wage determination. On the contrary, given that F2,145 0.01
; F2,60 = 4.98
H0 is not rejected for α=0.01.

6.3 Multicollinearity

6.3.1 Introduction
Perfect multicollinearity is not usually seen in practice, unless the model is
wrongly designed as we saw in chapter 5. Instead, an approximately linear relationship
between the regressors often exists. In this case, the estimators obtained will generally
not be very accurate, despite still being BLUE. In other words, the relationship between
regressors makes it difficult to quantify accurately the effect each one has on the
regressand. This is due to the fact that the variances of the estimators are high. When there
is an approximately linear relationship between the regressors, then it is said that there is
not perfect multicollinearity. The multicollinearity problem arises because there is
insufficient information to get an accurate estimation of model parameters.
To analyze the problem of multicollinearity, we will examine the variance of an
estimator. In the multiple linear regression model, the estimator of the variance of any
slope coefficient - for example, βˆ j - is equal, as we saw in (3-68), to

· sˆ 2
var(bˆ j ) = (6-21)
nS 2j (1- R 2j )

where ŝ 2 is the unbiased estimator of σ2, n is the sample size, S 2j is the sample variance
of the regressor xj, and R 2j is the R-squared obtained from regressing xj on all other x’s.

192
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

The last of these four factors which determines the value of the variance of βˆ j ,
(1- R 2j ), is precisely an indicator of multicollinearity. Multicollinearity arises in
estimating βj when R 2j is “close” to one, but there is no absolute number that we can
quote to conclude that multicollinearity is really a problem for the precision of the
estimators. Although the problem of multicollinearity cannot be clearly defined, it is true
that, for estimating βj, the lower the correlation between xj and the other independent
variables the better. If R 2j is equal to 1, then we would have perfect multicollinearity and
it is not possible to obtain the estimators of the coefficients. In any case, when one or
more R 2j are close to 1, multicollinearity is a serious problem. In this case, when making
inferences with the model, the following problems arise:
a) The variances of the estimators are very large.
b) The estimated coefficients will be very sensitive to small changes in the
data.

6.3.2 Detection
Multicollinearity is a problem of the sample, because it is associated with the
specific configuration of the sample of the x’s. For this reason, there are no statistical
tests. (Remember that statistical tests only work with population parameters). Instead,
many practical rules were developed attempting to determine to what extent
multicollinearity seriously affects the inference made with a model. These rules are not
always reliable, and in some cases are questionable. In any case, we are going to look at
some measures that are very useful to detect the degree of multicollinearity: the variance
inflation factor (VIF) and the tolerance, and the condition number and the coefficient
variance decomposition.

Variance inflation factor (VIF) and tolerance


In order to explain the meaning of these measures, let us suppose there is no linear
relationship between xj and the other explanatory variables in the model, that is to say,
the regressor xj is orthogonal to the remaining regressors. In this case, R 2j will be zero
and the variance of βˆ j will be
2
· b * ) = sˆ
var( (6-22)
j
nS 2j
Dividing (6-21) by (6-22), we obtain the variance inflation factor (VIF) as
1
VIF ( βˆ j ) = (6-23)
1 − R 2j
The VIF statistic calculated according to (6-23) is sometimes called “centered
VIF” to be distinguished from the “uncentered VIF” which is interesting in models
without intercept. The E-views programme supplies both statistics.
Tolerance, which is the inverse of VIF, is defined as

193
INTRODUCTION TO ECONOMETRICS

1
Tolerance( βˆ j ) = = 1 − R 2j (6-24)
VIF
Thus, VIF ( βˆ j ) is the ratio between the estimated variance and the one that there
would have been if xj was uncorrelated with the other regressors in the model. In other
words, the VIF shows the extent to which the variance of the estimator is "inflated" as a
result of non-orthogonallity of the regressors. It is readily seen that the higher the VIF (or
the lower the tolerance index), the higher the variance of βˆ j .
The procedure is to choose each one of the regressors at a time as the dependent
variable and to regress them against a constant and the remaining explanatory variables.
We would then get k values for the VIF’s. If any of them is high, then multicollinearity is
detected. Unfortunately, however, there is no theoretical indicator to determine whether
the VIF is “high.” Also, there is no theory that tells us what to do if multicollinearity is
found.
The variance inflation factor (VIF) and the tolerance are both widely used
measures of the degree of multicollinearity. Unfortunately, several rules of thumb – most
commonly the rule of 10 – associated with the VIF– are regarded by many practitioners
as a sign of severe or serious multicollinearity (this rule appears in both scholarly articles
and advanced statistical textbooks), but this rule has no scientific justification
The problem with the VIF (or the tolerance) is that it does not provide any
information that could be used to treat the problem.
EXAMPLE 6.2 Analyzing multicollinearity in the case of labor absenteeism
In example 3.1 a model was formulated and estimated, using file absent, to explain absenteeism
from work as a function of the variables age, tenure and wage.
Table 6.2 provides information on the tolerance and the VIF of each regressor. According to these
statistics, multicollinearity does not appear to affect the wage but there is a certain degree of
multicollinearity in the variables age and tenure. In any case, the problem of multicollinearity in this model
does not appear to be serious because all VIF are below 5.
TABLE 6.2. Tolerance and VIF.
Collinearity statistics
Tolerance VIF
age 0.2346 4.2634
tenure 0.2104 4.7532
wage 0.7891 1.2673

Condition number and coefficient variance decomposition


This method, developed by Belsey et al. (1982), is based on the variance
decomposition of each regression coefficient as a function of the eigenvalues λh of the
matrix X’X and the corresponding elements of the associate eigenvectors. We will not
discuss eigenvalues and eigenvectors here, because they are beyond the scope of this book,
but in any case we will see their application.
The condition number is a standard measure of ill-conditioning in a matrix. It
indicates the potential sensitivity of the computed inverse matrix to small changes in the
original matrix (X’X in the case of the regression). Multicollinearity reveals its presence

194
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

by one or more eigenvalues of X’X being “small”. The closer a matrix is to singularity
the smaller the eigenvalues. The condition number (κ) is defined as the square root of the
largest eigenvalue (λmax) divided by the smallest eigenvalue (λmin):
λmax
κ=
λmin
(6-25)
When there is no multicollinearity at all, then all the eigenvalues and the condition
number will be equal to one. As multicollinearity increases, eigenvalues will be both
greater and smaller than 1 (eigenvalues close to zero indicate a multicollinearity problem),
and the condition number will increase. An informal rule of thumb is that if the condition
number is greater than 15, multicollinearity is a concern; if it is greater than 30
multicollinearity is a very serious concern.
The variance of βˆ j can be decomposed into the contributions from each one of
the eigenvalues and can be expressed in the following way:
2
u jh
var( βˆ j ) = σ 2 ∑ (6-26)
h λh

Thus, the proportion of the contribution of eigenvalue λh in the variance of βˆ j


is equal to
u 2jh
λh
φ jh = (6-27)
k u 2jh
∑λ
h =0 h

High values of φ jh indicate that, as a consequence of multicollinearity, there is an


inflation of the variance. Given that eigenvalues close to zero indicate a
multicollinearity problem, it is important to pay special attention to the contribution of
the smallest eigenvalues. The contributions corresponding to the smallest eigenvalue may
give a clue of the regressors which are involved in the multicollinearity problem.
EXAMPLE 6.3 Analyzing the multicollinearity of factors determining time devoted to housework
In order to analyze the factors that influence time devoted to housework, the following model was
formulated in exercise 3.17, using file timuse03:
β1 β 2 educ + β3 hhinc + β 4 age + β5 paidwork + u
houswork =+
where educ is the years of education attained, and hhinc is the household income in euros per month. The
variables houswork and paidwork are measured in minutes per day.
Table 6.3 provides information on eigenvalues, sorted from the smallest to the largest, and the
variance decomposition proportions for each eigenvalue are calculated according to (6-27). The condition
number is equal to
λ 542.14
=κ =
max
= 8782
λmin 7.06 E − 06
The condition number is very big, which would indicate a large amount of multicollinearity.

195
INTRODUCTION TO ECONOMETRICS

As can be seen in table 6.3 3, the greater proportions associated with the smallest eigenvalue, which
is the main cause of multicollinearity in this model, correspond to the regressors educ and age. These two
regressors are inversely correlated. The greatest proportions associated with the second smallest eigenvalue
correspond to the regressors educ and the household income, which are positively correlated.
TABLE 6.3. Eigenvalues and variance decomposition proportions.
Eigenvalues 7.03E-06 0.000498 0.025701 1.861396 542.1400
Variance decomposition proportions

Associated Eigenvalue
Variable 1 2 3 4 5

C 0.999995 4.72E-06 8.36E-09 1.23E-13 1.90E-15


EDUC 0.295742 0.704216 4.22E-05 2.32E-09 3.72E-11
HHINC 0.064857 0.385022 0.209016 0.100193 0.240913
AGE 0.651909 0.084285 0.263805 5.85E-07 1.86E-08
PAIDWORK 0.015405 0.031823 0.007178 0.945516 7.80E-05

6.3.3 Solutions
In principle, the problem of multicollinearity is related to deficiencies in the
sample. The non-experimental design of the sample is often responsible for these
deficiencies. Let us look at some of the solutions to solve the problem of multicollinearity.

Elimination of variables
Multicollinearity can be mitigated if the regressors most affected by
multicollinearity are removed. The problem with this solution is that the estimators of the
new model would be biased if the original model was correct. On this issue the following
reflection should be made. In any case, the researcher is interested in obtaining an
unbiased estimator (or at least with very small bias) with a reduced variance. The mean
square error (MSE) includes both factors. Thus, for the estimator βˆ j , the MSE is defined
as follows:
2
MSE ( βˆ j ) bias ( βˆ j )  + var ( βˆ j )
= (6-28)

If a regressor is eliminated from the model, the estimator of a regressor that is


maintained (for example, βˆ j ) will be biased. Nevertheless, its MSE can be lower than
that of the original model, because the omission of a variable can sufficiently reduce the
variance of the estimator. In sum, although the elimination of a variable is not a desirable
practice in principle, under certain circumstances it can be justified when it contributes to
decreasing the MSE.

3
In table 6.3, the eigenvalues are ordered from the lowest to the highest as the associated
eigenvalues in the variance decomposition proportions. It is important to remark that in E-views
eigenvalues are ordered from the highest to the lowest. However, in this package the condition number is
defined differently than usual in the econometrics manuals which we have followed.

196
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

Increasing the sample size


Given that some degree of multicollinearity is a problem particularly when the
variances of the estimators increase significantly, the solutions should aim to reduce these
variances. A solution for increasing the variability of the regressors across the sample
consists in introducing additional observations. However, this is not always feasible, since
the data used in empirical analysis generally come from different data sources given the
researcher only collects information on rare occasions.
Furthermore, when dealing with experimental designs, the variability of the
regressors can be directly increased without increasing the size of the sample.

Using outside sample information


Another possibility is the use of outside sample information, either by setting
constraints on the parameters of the model, or by using estimates from other studies.
Establishing restrictions on the parameters of the model reduces the number of
parameters to be estimated and therefore alleviates the possible shortcomings of the
sample information. In any case, these restrictions must be inspired by the theoretical
model itself, or at least have an economic meaning.
In general, a disadvantage of this approach is that the meaning attributed to the
estimator obtained in cross sectional data is very different from that obtained with time
series data, in the case when both types of data are jointly used. Sometimes these
estimators can be truly "foreign" or outside the object of study.

Using ratios
If instead of the regressand and the regressors of the original model, we use ratios
with respect to the most affected regressor by collinearity, the correlations among the
regressors of the model may decrease. One such solution is very attractive for the
simplicity of implementation. However, the transformations of the original variables of
the model using ratios can cause other problems. Assuming the original model fulfills the
CLM assumptions, this transformation implicitly modifies the properties of the model,
and therefore the disturbances of the transformed model will no longer be homoskedastic
but heteroskedastic.

6.4 Normality test


The F and t significance tests built in chapter 4 are based on the normality
assumption of the disturbances. But it is not usual to perform a normality test, given that
a sufficiently large sample -e.g. 50 or more observations - is not often available. However,
normality tests have recently been receiving a growing interest in both theoretical and
applied studies.
Let us examine one test for verifying the assumptions of normality of disturbances
in an econometric model. This test was proposed by Bera and Jarque, and is based on the
statistics of skewness and kurtosis of the residuals.
The skewness statistic is the standardized third-order moment, applied to the
residuals, and its expression is the following:

197
INTRODUCTION TO ECONOMETRICS

γ 1( uˆ ) = ∑ uˆ / n3
i
(6-29)
3/2
 ∑ uˆ / n 
2
i

In a symmetric distribution, as is the case of the normal distribution, the


coefficient of skewness is 0.
The kurtosis statistic is the standardized fourth-order moment, applied to residuals,
and its expression is the following:

γ 2( uˆ ) = ∑ uˆ / n
4
i
(6-30)
 ∑ uˆ / n 
2 2
i

In a standard normal distribution, i.e. in an N(0.1), the coefficient of kurtosis is


equal to 3.
The Bera and Jarque statistic (BJ) is given by:
n 2
BJ =  ( γ 1( uˆ ) ) + ( γ 2( uˆ ) − 3) 
2 n
(6-31)
6 24 
In a theoretical normal distribution, the above expression will be equal to 0, as the
coefficient of skewness and kurtosis respectively take the values 0 and 3. The statistic BJ
will take higher values as the coefficient of asymmetry is far from 0 and the coefficient
of kurtosis is far from 3. Under the null hypothesis of normality, the statistic BJ has the
following distribution
BJ 
n →∞
→ χ 22 (6-32)

The indication n → ∞ means that BJ is an asymptotic test, i.e. valid when the
sample is sufficiently large.
EXAMPLE 6.4 Is the hypothesis of normality acceptable in the model to analyze the efficiency of the
Madrid Stock Exchange?
In example 4.5, using file bolmadef, we analyzed the market efficiency of the Madrid Stock
Exchange in 1992, using a model that relates the daily rate of return on the rate of the previous day. Now
we will test the normality assumption on the disturbances of this model. Given the low proportion of the
variance explained with this model (see example 4.5), the test of normality of the disturbances is roughly
equivalent to test the normality of the endogenous variable.
Table 6.4 shows the coefficients of skewness, kurtosis and the Bera and Jarque statistic, applied to
the residuals. The asymmetry coefficient (-0.04) is not far from the value 0 corresponding to a distribution
N(0.1). On the other hand, the coefficient of kurtosis (4.43) is slightly different from 3, which is the value
in the normal distribution. In this case, we reject the assumption of normality for the usual levels of
significance, as the Bera and Jarque statistic takes the value of 21.02, which is larger than c 22(0.01) = 9.21.
TABLE 6.4. Normality test in the model on the Madrid Stock Exchange.
skewness coefficient kurtosis coefficient Bera and Jarque statistic
-0.0421 4.4268 21.0232

The fact that the normality assumption is rejected may seem paradoxical, since the values of
kurtosis and especially of skewness do not differ substantially from the values taken by these coefficients
in a normal distribution. However, the discrepancies are significant enough because they are supported by
a large sample size (247 observations). If n (the size of the sample) had been 60 rather than 247, the BJ
statistic, calculated according to (6-31) and using the same coefficient of skewness and kurtosis, takes the

198
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

value of 5.11, which is smaller than c 22(0.01) = 9.21. To put it another way, with the same coefficients, but
with a smaller sample, there is not enough empirical evidence to reject the null hypothesis of normality.
Note that this is due to the fact that the BJ statistic increases proportionally to the size of the sample, but
the degrees of freedom (2) remain unchanged.

6.5 Heteroskedasticity
The homoskedasticity assumption (assumption 7 of the CLM) states that the
disturbances have a constant variance, that is to say:
=
var (ui ) σ=
2
i 1, 2, n (6-33)
Assuming that there is only one independent variable, the homoskedasticity
assumption means that the variability around of the regression line is the same for any
value of x. In other words, variability does not increase or decrease when x varies, as
shown in figure 2.7, part a) of chapter 2. In figure 6.1, a scatter plot is shown
corresponding to a model in which disturbances are homoskedastic.
If the homoskedasticity assumption is not satisfied, then there is
heteroskedasticity, or disturbances are heteroskedastic. In figure 2.7, part b) a model with
heteroskedastic disturbances was represented: the dispersion increases with increasing
values of x. Figure 6.2 shows the scatter diagram corresponding to a model in which the
dispersion grows when x grows.


y y 

 

 
 

    
 
  
 

     
 
 
      

     
 
         
 
 

   
  
    
 
    

x x
FIGURE 6.1. Scatter diagram corresponding to a FIGURE 6.2. Scatter diagram corresponding to a
model with homoskedastic disturbances. model with heteroskedastic disturbances.

6.5.1 Causes of heteroskedasticity


In models estimated with cross sectional data (for example, demand studies based
on surveys of household budgets) there are often problems of heteroskedasticity.
However, heteroskedasticity can also occur in models estimated with time series.
Let us now consider some factors that can cause disturbances to be heteroskedastic:
a) Influence of the size of an explanatory variable in the size of the disturbance.
Let us examine this factor using an example. Consider a model in which spending on
hotels is a linear function of disposable income. If you have a representative sample of
the population of a country, the great variability of the income received by families can

199
INTRODUCTION TO ECONOMETRICS

be seen. Logically, low income families are unlikely to spend large amounts on hotels,
and in this case we can expect that the oscillations in the expenditure of one family to
another are not significant. In contrast, in high-income families a greater variability in
this type of expenditure can be expected. Indeed, high-income families may choose
between spending a substantial part of their income on hotels or spending virtually
nothing. The scatter diagram in figure 6.2 may be adequate to represent what happens in
a model to explain the demand for a luxury good such as spending on hotels.
b) The presence of outliers can cause heteroskedasticity. An outlier is an
observation generated apparently by a different population to that generating the
remaining sample observations. When the sample size is small, the inclusion or exclusion
of such an observation can substantially alter the results of regression analysis and cause
heteroskedasticity.
c) Data transformation. As we saw in a previous section, one of the solutions to
solve the problem of multicollinearity consisted in transforming the model taking ratios
with respect to a variable (say xji), i.e. dividing both sides of the model by xji. Therefore,
the disturbance will now be ui/xji, instead of ui. Assuming that ui fulfills the
homoskedasticity assumption, the disturbances of the transformed model (ui/xji) will no
longer be homoskedastic but heteroskedastic.

6.5.2 Consequences of heteroskedasticity


When there is heteroskedasticity, the OLS method is not the most appropriate
because the estimators obtained are not the best, i.e. the estimators are not BLUE.
Moreover, the OLS estimators obtained when there is heteroskedasticity, in
addition to not being BLUE, have the following problem. The covariance matrix of the
estimators obtained by applying the usual formula is not valid when there is
heteroskedasticity (and/or autocorrelation). Consequently, the t and F statistics based on
the estimated covariance matrix can lead to erroneous inferences.

6.5.3 Heteroskedasticity tests


We are going to examine two heteroskedasticity tests: Breusch-Pagan-Godfrey
and White. Both of them are asymptotic and have the form of a Lagrange multiplier (LM)
test.

Breusch-Pagan-Godfrey (BPG) test


Breusch and Pagan (1979) developed a test for heteroskedasticity and Godfrey
(1978) developed another one. Because they are similar, they are usually known as
Breusch–Pagan–Godfrey (BPG) heteroskedasticity tests.
The BPG test is an asymptotic test, that is to say, it is only valid for large samples.
The null and alternative hypotheses of this test can be formulated as follows:
H 0 : E ( ui2 ) =σ 2 ∀i
(6-34)
H1 : σ i2 = α1 + α 2 z2i + α 3 z3i +  + α m zmi
where the zi’s can be some or all of the xi’s of the model.
Taking into account the above H1, H0 can be expressed as

200
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

H 0 : α 2 = α 3 = α m = 0 (6-35)
The steps involved in this test are as follows:
Step 1. The original model is estimated and the OLS residuals are calculated.
Step 2. The following auxiliary regression is estimated, taking as the regressand the
2
square of the residuals ( uˆi ) obtained in estimating the original model,
since we know neither σ i nor ui :
2 2

uˆi2 = α1 + α 2 z2i + α 3 z3i +  + α m zmi + ε i (6-36)


The auxiliary regression should have an intercept, although the original
model is estimated without it. In accordance with expression (6-36), in the
auxiliary regression there are m regressors in addition to the intercept.
2
Step 3. Designating by Rar the coefficient of determination of the auxiliary
2
regression, the statistic nRar is calculated.
Under the null hypothesis, this statistic (BPG) is distributed as follows:
BPG= nRar2 
n →∞
→ χ m2 (6-37)
2(α )
Step 4 For a significance level α, and designating by χ m the corresponding value
in χ2 table, the decision to make is the following:
2(α )
If BPG > χ m H0 is rejected
2(α )
If BPG ≤ χ m H0 is not rejected
In this test, high values of the statistic correspond to a situation of
heteroskedasticity, that is to say, to the rejection of the null hypothesis.
EXAMPLE 6.5 Application of the Breusch-Pagan-Godfrey test
This test will be applied to a sub-sample of 10 observations, which have been used for estimating
hotel expenditures (hostel) as a function of disposable income (inc). The data appear in table 6.5.
TABLE 6.5. Hostel and inc data.
i hostel inc
1 17 500
2 24 700
3 7 250
4 17 430
5 31 810
6 3 200
7 8 300
8 42 760
9 30 650
10 9 320
Step 1. Applying OLS to the model,
hostel = b1 + b2 inc + u

201
INTRODUCTION TO ECONOMETRICS

using data from table 6.5, the following estimated model is obtained:
·
hosteli = - 7.427+ 0.0533 inci
(3.48) (0.0065)

The residuals corresponding to this fitted model appear in table 6.6.


TABLE 6.6. Residuals of the regression of hostel on inc.
i 1 2 3 4 5 6 7 8 9 10

uˆi -2.226 -5.888 1.100 1.505 -4.751 -0.234 -0.565 8.913 2.777 -0.631

Step 2. The auxiliary regression which must be estimated is the following:


α1 α 2inci + ηi
uˆi2 =+
Applying OLS, the following results are obtained:
uˆi2 =
−23.93 + 0.0799inc R2=0.5045
Step 3. Using the value of R2, the BPG statistics is:
BPG= nRar2 =10(0.56)=5.05.

Step 4. Given that χ12(0.01) =3.84, the null hypothesis of homoskedasticity is rejected for a
significance level of 5%, because BPG>3.84, but not for the significance level of 1%.
Note that the validity of this test is asymptotic. However, the sample used in this example is very
small.

White test
In the White test the hypothetical variables determining the heteroskedasticity are
not specified. This test is a non-constructive test because it gives no indication of the
heteroskedasticity scheme when the null hypothesis is rejected
The White test is based on the fact that the standard errors are asymptotically valid
if we substitute the homoskedasticity assumption for the weaker assumption that the
squared disturbance u2 is uncorrelated with all the regressors, their squares, and their cross
products. Taking this into account, White proposed to carry out the auxiliary regression
2 2
of uˆi , since ui is unknown, on the factors mentioned above. If the coefficients of the
auxiliary regression are jointly non-significant, then we can admit that the disturbances
are homoskedastic. According to the assumption adopted, the White test is an asymptotic
test.
The application of the White test can pose problems in models with many
regressors. For example, if the original model has five independent variables, the White
auxiliary regression would involve 16 regressors (unless some are redundant), which
implies that the estimation is done with a loss of 16 degrees of freedom. For this reason,
when the model has many regressors a simplified version of the White test is often
applied. In the simplified version, the cross products are omitted from the auxiliary
regression.
The steps involved in the complete version of the White test are as follows:
Step 1. The original model is estimated and the OLS residuals are calculated.
Step 2. The following auxiliary regression is estimated, taking as the regressand the
square of the residuals obtained in the previous step:

202
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

uˆi2 = α1 + α 2ψ 2i + α 3ψ 3i +  + α mψ mi + ε i (6-38)

In the above auxiliary regression, the regressors ψ ji are the regressors of


the original model, their squared values, and the crossproduct(s) of the
regressors.
In any case, it is necessary to eliminate any redundancies that occur (i.e.
regressors that appear repeatedly). For example, the intercept (which is 1
for all observations) and the square of the intercept cannot appear
simultaneously as regressors, since they are identical. The simultaneous
introduction of these two regressors will lead to perfect multicollinearity.
The auxiliary regression should have an intercept, even if the original
model is estimated without it. In accordance with expression (6-38), in the
auxiliary regression there are m regressors as well as the intercept.
2
Step 3. Designating by Rar the coefficient of determination of the auxiliary
2
regression, the statistic nRar is calculated.
Under the null hypothesis, this statistic (W) is distributed as follows:
W= nRar2 
n →∞
→ χ m2 (6-39)
This statistic is used to test the overall significance of model (6-38).
Step 4. It is similar to step 4 in Breusch-Pagan-Godfrey test.
EXAMPLE 6.6 Application of the White test
This test is going to be applied to data from table 6.5.
Step 1. This step is the same as in the Breusch-Pagan-Godfrey test.
Step 2. Since there are two regressors in the original model (the intercept and inc), the regressors
of the auxiliary regression will be
ψ=1i 1 ∀i
ψ 2i = 1× inci
ψ 3i = inci2
Consequently, the model to be estimated is
α1 α 2inci + α 3inci2 + ηi
uˆi2 =+
By applying OLS to the data from table 6.5, we obtain the following
uˆi2 =
14.29 − 0.10inci + 0.00018inci2 R2=0.56
Step 3. By using the R2, we obtain the W statistic:
W=nR2 =10(0.56)=5.60.
The number of degrees of freedom is two.
Step 4. Given that χ 22(0.10) =4.61, the null hypothesis of homoskedasticity is rejected for a 10%
significance level because W=nR2>4.61, but not for significance levels of 5% and 1%.
Note that the validity of this test is asymptotic too.
EXAMPLE 6.7 Heteroskedasticity tests in models explaining the market value of the Spanish banks
To explain the market value (marktval) of Spanish banks as a function of their book value (bookval)
two models were formulated: one linear (example 2.8) and another one doubly logarithmic (example 2.10).

203
INTRODUCTION TO ECONOMETRICS

Heteroskedasticity in the linear model


The linear model is given by
marktval=β1+ β2bookval+u
Using data from 20 banks and insurance companies (filework bolmad95), the following results
were obtained:
·
marktval = 29.42+ 1.219 bookval
(30.85) (0.127)

In graphic 6.1, the scatter plot between the residuals in absolute value (ordinate) and the variable
bookval (in abscissa) is represented. This graphic shows that the absolute values of the residuals, which are
indicative of the spread of this series, grow with increasing values of the variable bookval. In other words,
this graph provides an indication but not a formal proof of the existence of heteroskedasticity of the
disturbances associated with the variable bookval.
400

350
Residuals in absolute value

300

250

200

150

100

50

0
0 100 200 300 400 500 600 700
valcon
GRAPHIC 6.1. Scatter plot between the residuals in absolute value and the variable bookval in the
linear model.
The BPG statistic takes the following value:
2
BPG= nRra = 20×0.5220=10.44

As χ12(0.01) =6.64<10.44, the null hypothesis of homoskedasticity is rejected for a significance


level of 1%, and therefore for α=0.05 and for α=0.10.
Now we will apply the White test. In this case, the auxiliary regression includes as regressors the
intercept, the variable bookval, and the square of this variable. The White statistic takes the following value:
2
W= nRra = 20×0.6017=12.03

As χ 22(0.01) =9.21=<12.03, the null hypothesis of homoskedasticity is rejected for a significance


level of 1%.
Therefore, both tests are conclusive in rejecting the null hypothesis for the usual levels of
significance.
Heteroskedasticity in the log-log model
The estimated log-log model with the same sample was as follows:
·
ln( marktval ) = 0.676+ 0.9384 ln(bookval )
(0.265) (0.062)

In graphic 6.2 the scatter plot between the residuals in absolute value (ordinate), corresponding to
this estimated model, and the variable ln(bookval) (in abscissa) is represented. As shown, the two largest
residuals correspond to two banks with small market value. Even disregarding these two cases, apparently
there is no relationship between the residuals and the explanatory variable of the model.

204
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

1.0

0.9

0.8

Residuals in absolute value


0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
ln(valcon)
GRAPHIC 6.2. Scatter plot between the residuals in absolute value and the variable bookval in the
log-log model.
The results of the two tests of heteroskedasticity applied are shown in table 6.7.
TABLE 6.7. Tests of heteroskedasticity on the log-log model to explain the market
value of Spanish banks.
Test Statistic Table values

Breusch-Pagan BP= nRra =1.05


2
χ 22(0.10) =4.61
White W= nRra =2.64
2
χ 22(0.10) =4.61
Both tests carried out indicate that the null hypothesis of homoskedasticity cannot be rejected
against the alternative hypothesis that the variance of the disturbances is associated with the explanatory
variable of the model.
An important conclusion is that, if an econometric model is estimated with cross sectional data, it
is easy to find observations with very different size. These problems of scale can cause heteroskedasticity
in the disturbances but can often be solved by using log-log models.
EXAMPLE 6.8 Is there heteroskedasticity in demand of hostel services?
In general, heteroskedasticity in the disturbances does not usually appear in demand for food
commodities. By contrast, heteroskedasticity is usually much more frequent in demand for luxury goods,
because in the demand for these goods there is a large disparity in the behavior of high income households,
while in households with low incomes such disparity is very unlikely.
In view of these considerations, the specification for analyzing the demand for hostel services is
the following:
ln (hostel ) = b1 + b2 ln(inc) + b3 secstud + b4 terstud + b5 hhsize + u (6-40)

where inc is disposable income of a household, hhsize is the number of household members, and secstud
and terstud are two dummies that take the value one if individuals have completed secondary and tertiary
studies respectively.
The results obtained, using file hostel, are the following (file hostel):
·hostel ) = - 16.37+ 2.732 ln(inc) + 1.398 secstud + 2.972 terstud - 0.444 hhsize
ln( i i i i i
(2.26) (0.324) (0.258) (0.333) (0.088)

R2=0.921 n=40
Note that hostel services are a luxury good, as the elasticity of demand/income for this good is
very high (2.73). This means that if income increases by 1%, spending on hostel services will increase, on
average, by 2.73%. As can be seen, families where the main breadwinner has secondary studies (secstud)
or, especially, higher education (terstud), spend more on hostel services than if the main breadwinner only

205
INTRODUCTION TO ECONOMETRICS

has primary education. However, spending on hostel services will decrease as household size (hhsize)
increases.
Graphic 6.3 shows the scatter plot between the residuals in absolute value and the variable ln(inc).
Income (or a transformation of it) is the main candidate, if not the only one, to explain the hypothetical
heteroskedasticity in the disturbances. As shown in the graphic, the dispersion of residuals is smaller for
low incomes than for middle or upper incomes.
We will now apply the two tests of heteroskedasticity that have been discussed in this section.
1.6

1.4
Residuals in absolute value

1.2

0.8

0.6

0.4

0.2

0
6.4 6.6 6.8 7 7.2 7.4 7.6 7.8 8
ln(inc)
GRAPHIC 6.3. Scatter plot between the residuals in absolute value and the variable ln(inc) in the
hostel model.
The results of the two tests of heteroskedasticity applied are shown in table 6.8
TABLE 6.8. Tests of heteroskedasticity in the model of demand for hostel services.
Test Statistic Table values

Breusch-Pagan-Godfrey
2
BPG= nRra =7.83 χ 22(0.05) =5.99
White W= nRra2 =12.24 χ 22(0.01) =9.21

In the BPG test we reject the null hypothesis of homoskedasticity for a significance level of α=0.05,
but not for α=0.01.
Since there are many dummy variables in the model, including cross products in the auxiliary
regression, this can lead to serious problems of multicollinearity. For this reason, in the auxiliary regression
cross products are not included. Not surprisingly, among the regressors of the auxiliary regression squares
of secstud and terstud are not included because they are dummies. Given the value obtained in the White
statistic, we reject the null hypothesis of homoskedasticity for a significance level of α=0.01. Therefore,
the White test is more conclusive in rejecting the homoskedasticity assumption.

6.5.4 Estimation of heteroskedasticity-consistent covariance


When there is heteroskedasticity and we apply OLS, we cannot make correct
inferences by using the covariance matrix associated to the OLS estimates, because this
matrix is not a consistent estimator of the covariance matrix of the coefficients.
Consequently, the t and F statistics based on that estimated covariance matrix can lead to
erroneous inferences.
Therefore, in the case that there is heteroskedasticity and OLS have been applied,
a consistent estimate of the covariance matrix should be looked for to make inferences.
White derived a consistent estimator of the covariance matrix under heteroskedasticity.

206
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

However, it is important to note that this estimator does not work well if the sample is
small, given that it is an asymptotic approximation.
Most econometric packages allow standard errors to be calculated by the White
procedure. By using these consistent standard deviations, adequate tests can be made
under the heteroskedasticity assumption.
EXAMPLE 6.9 Heteroskedasticity consistent standard errors in the models explaining the market value of
Spanish banks (Continuation of example 6.7)
In the following estimated equation of the linear model, using file bolmad95, standard deviations
of the estimates are calculated by the White procedure and therefore they are consistent under
heteroskedasticity:
·
marktval = 29.42+ 1.219 bookval
(18.67) (0.249)

As can be seen, the standard error of the bookval coefficient goes from 0.127 in the usual procedure
to 0.249 in the White procedure. However, the p-value remains very low (0.0001). Accordingly, the
significance of the variable bookval for all usual levels is still maintained. By contrast, the intercept, which
has no special meaning in the model, now has a standard error (18.67), which is lower than that obtained
with the usual procedure (30.85).
If we apply the White procedure to the log-log model, the following results are obtained:
·
ln( marktval ) = 0.676+ 0.9384 ln(bookval )
(0.3218) (0.0698)

In this case, the standard error of ln(bookval) coefficient is practically the same in the two
procedures.
From the above results, the following conclusions can be obtained. In determining the market
value of Spanish banks, disturbances of the linear model are strongly heteroskedastic. Therefore, when
using a consistent estimate, the standard deviation is almost doubled compared to the standard one. By
contrast, in the log-log model, which is not affected by heteroskedasticity, there is little difference between
the standard errors obtained with both procedures.

6.5.5 The treatment of the heteroskedasticity


In order to estimate a model with heteroskedastic disturbances it is necessary to
know or, if it is unknown, to estimate the pattern of heteroskedasticity. Thus, suppose that
the standard deviation of the disturbances follows this scheme:
σ i = f ( x ji ) (6-41)

As indicated in epigraph 6.1, the method GLS allows BLUE estimators to be


obtained when disturbances are heteroskedastic. If we know scheme (6-41), the
application of GLS is performed in two stages. In the first stage, the original model is
transformed by dividing both sides by the standard deviation. Therefore, according to
(6-41), the transformed model is given by

yi 1 x1i x2i xki ui


= β1 + β2 + β3 + L + βk +
f ( x ji ) f ( x ji ) f ( x ji ) f ( x ji ) f ( xki ) f ( x ji )
(6-42)
It is easily seen that the disturbances of the previous model, (ui/f(xji)), are
homoskedastic. Therefore, in the second stage OLS is applied to the transformed model,
thus obtaining BLUE estimators. When we divide each observation by f(xji), we are
weighting by the inverse of the value taken by this function. For this reason the above

207
INTRODUCTION TO ECONOMETRICS

procedure is often called weighted least squares (WLS). In this case, the weighting factor
is 1/f(xji).
If the function f(xji) is not known, it is necessary to estimate it. In that case, the
estimation method will not be exactly the GLS method because the application of this
method involves the knowledge of the covariance matrix, or, at least, knowledge of a
matrix that is proportional to it. If we estimate the covariance matrix, in addition to the
parameters, it is said that feasible GLS is applied. In the case of heteroskedastic
disturbances, the particularization of the feasible GLS method is called WLS (weighted
least squares) in two stages. In the first the function f(xij) stage is estimated, whereas in
the second stage OLS is applied to the model transformed using the f(xji) estimates.
To see how to apply the WLS method in two stages, let us consider the following
relationship, which simply defines the variance of the disturbances, in the case of
heteroskedasticity,
E ( ui2 ) = σ i2 (6-43)

Therefore, the squared disturbance can be made equal, as in the regression model,
to its expectation plus a random variable. That is to say:
u=
2
i σ i2 + ε i (6-44)
As the disturbances are not observable, one can establish a relationship analogous
to the above using residuals instead of disturbances. Therefore,
uˆ=
2
i σ i2 + η2i (6-45)
It should be noted that the above relationship does not have exactly the same
properties as (6-44) because the residuals are correlated and heteroskedastic, even if the
disturbances fulfill the CLM assumptions. However, in large samples they will have the
same properties.
If we use the residuals as the regressand instead of the squared residuals, we must
take the absolute values, since the standard deviation takes only positive values. Taking
into account (6-45), the following relationship can be established:
uˆi = σ i2 + η 2i = f ( xij ) + η 2i (6-46)

Since the function f(xij) is generally unknown, different functions are often tried.
Here there are some of the most common:
α1 + α 2 x ji + η2i
uˆi =
α1 + α 2 x ji + η2i
uˆi =
1 (6-47)
α1 α 2
uˆi =+ + η2i
x ji
α1 + α 2 ln( x ji ) + η2i
uˆi =
The functional form with the best fit (a higher coefficient of determination or a
smaller AIC statistic) is selected. For the transformation two circumstances are
contemplated, depending on the significance of the intercept. If this coefficient is

208
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

statistically significant, the model is transformed by dividing by the fitted values of the
selected equation. If it is not statistically significant, the model is transformed by dividing
by the regressor corresponding to the selected equation. Thus, if the selected equation
were the second one of (6-47), with the intercept not being significant, the transformed
model would be as follows:
yi 1 x x x u
= β1 + β 2 2i + β3 3i + L + β k ki + i
x ji x ji x ji x ji x ji x ji
(6-48)
Note that if the intercept is not significant, the estimated parameters are not
involved in the transformation of the model, but they are if the intercept is significant. As
the estimators in models (6-47) are biased, although consistent, it is not convenient to
transform the models by applying the fitted values, uˆi -obtained by using α̂ 0 and α̂1 -
except when the significance of the intercept is very high (e.g., exceeding 1%).
EXAMPLE 6.10 Application of weighted least squares in the demand of hotel services (Continuation of
example 6.8)
Since the two tests applied to the model to explain the cost of hotel services indicate that the
disturbances are heteroskedastic, we apply the weighted least squares method to estimate the model (6-40).
First, we estimate the four models (6-47), using as the regressand the residuals uˆi -in absolute
value- obtained in the estimation of model (6-40) by OLS. The results are presented below:
=¶ =
uˆi 0.0239+ 0.0003 inc R 2 0.1638
(0.143) (2.73)


uˆi -=
0.4198+ 0.0235 inc R 2 0.1733
(- 1.34) (2.82)

¶ 1
= =
uˆi 0.8857 - 532.1 R 2 0.1780
(5.39) (- 2.87) inc


uˆi -=
2.7033+ 0.4389 ln(inc) R 2 0.1788
(- 2.46) (2.88)

In the above results, the t-statistic appears below each coefficient.


The functional form in which ln(inc) appears as a regressor is selected because it corresponds to
the highest R2 obtained. Since the coefficient of the independent term is not statistically significant at 1%,
following the recommendation, WLS are applied taking 1/ln(inc) as the weighting variable. In estimating
WLS, the following results were obtained:
·hostel ) = - 16.21+ 2.709 ln(inc) + 1.401 secstud + 2.982 terstud - 0.445 hhsize
ln( i i i i i
(2.15) (0.309) (0.247) (0.326) (0.085)

R2=0.914 n=40
Compared to the OLS estimates of example 6.5, it can be seen that the differences are very small,
which is indicative of the robustness of the model.

6.6 Autocorrelation
No autocorrelation, or no serial correlation assumption (assumption 8 of the CLM)
states that disturbances with different subscripts are not correlated with each other:
=
E (ui u j ) 0 i≠ j (6-49)
That is, the disturbances corresponding to different periods of time, or to different
individuals, are not correlated with each other. Figure 6.3 shows a plot corresponding to
disturbances which are not autocorrelated. The x axis is time. As can be seen, disturbances

209
INTRODUCTION TO ECONOMETRICS

are randomly distributed above and below the line 0 (theoretical mean of u). In the figure,
each disturbance is linked by a line to the disturbance of the following period: in total this
line crosses the line 0 on 13 occasions.
3
u

00 time
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

-1

-2

-3

FIGURE 6.3. Plot of non-autocorrelated disturbances.


The transgression of the no autocorrelation assumption occurs quite frequently in
models using time series data. It should be noted also that autocorrelation can be positive
as well as negative. Positive autocorrelation is characterized by leaving a trail over time,
because the value of each disturbance is near the value of the disturbance which precedes
it. Positive autocorrelation occurs, by far, much more frequently in practice than the
negative one. Figure 6.4 shows a plot corresponding to disturbances which are positively
autocorrelated. As can be seen, the line which links successive disturbances crosses the
line 0 only 4times.
By contrast, disturbances affected by negative autocorrelation present a saw tooth
configuration, since each disturbance often takes the opposite sign of the disturbance
which precedes it. In figure 6.5, the plot corresponds to disturbances which are negatively
autocorrelated. Now the line 0 is crossed 21 times by the line which links successive
disturbances.
4 5
u u
4
3

3
2
2

1
1

00 time 00 time
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
-1
-1

-2
-2
-3

-3
-4

-4 -5

FIGURE 6.4. Plot of positive autocorrelated FIGURE 6.5. Plot of negative autocorrelated
disturbances. disturbances.

6.6.1 Causes of autocorrelation


There are several reasons for the presence of autocorrelation in a model, some of
which are as follows:
a) Specification bias. That is, it can be caused by using an incorrect functional
form or the omission of a relevant variable.

210
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

Let us suppose the correct functional form for determining wage as a function of
years of experience (exp) is as follows:
β1 β 2exp + β3exp 2 + u
wage =+
Instead of this model, the following one is fitted:
β1 β 2exp + v
wage =+
In the second model, the disturbance has a systematic component
=( v β3exp + u ). In figure 6.5, a scatter diagram (generated for the first model) and the
2

fitted function of the second model are represented. As can be seen, for the low values of
exp the fitted model overestimates wages; for intermediate values of exp wages are
underestimated; finally, for high values the fitted model again overestimates wages. This
example illustrates a case in which the use of an uncorrected functional form provokes
positive autocorrelation.
On the other hand, the omission of a relevant variable in the model could induce
positive autocorrelation if that variable has, for example, a cyclical behavior.



y  
  
  

 
 





 

x
FIGURE 6.6. Autocorrelated disturbances due to a specification bias.
b) Inertia. The disturbance term in a regression equation reflects the influence of
those variables affecting the dependent variable that have not been included in the
regression equation. To be precise, inertia or the persisting effects of excluded variables
of the model –and included in u- is probably the most frequent cause of positive
autocorrelation. As is well known, macroeconomic time series -such as GDP, production,
employment and price indexes- tend to move together: during expansion periods these
series tend to increase in parallel, while in times of contraction they tend to decrease also
in a parallel form. For this reason, in regressions involving time series data, successive
observations of the disturbance are likely to be dependent on the previous ones. Thus, this
cyclical behavior can produce autocorrelation in the disturbances.
c) Data Transformation. As an example let us consider the following model to
explain consumption as a function of income:
const = b1 + b2inct + ut (6-50)
For the observation t-1, we can write

211
INTRODUCTION TO ECONOMETRICS

const- 1 = b1 + b2inct- 1 + ut- 1 (6-51)


If we subtract (6-51) from (6-50), we obtain
D const = b2D inct + D ut (6-52)

where D const = const - const- 1 , D inct = inct - inct- 1 and=


vt D=
ut ut - ut- 1 .
The equation (6-50) is known as a level form equation, while the equation (6-52)
is known as the first difference form equation. Both of them are used in empirical analysis.
If disturbance in (6-50) is not autocorrelated, the disturbance in (6-52), which is equal to
vt = ut - ut- 1 , will be autocorrelated, because vt and vt-1 have a common element (ut-1). In
any case it should be noted the model (6-52), as specified, poses other econometric
problems which will not be addressed here.

6.6.2 Consequences of autocorrelation


The consequences of autocorrelation for OLS are somewhat similar to those of
heteroskedasticity. Thus, if the disturbances are autocorrelated, then the OLS estimator is
not BLUE because one can find an alternative unbiased estimator with smaller variance.
In addition to not being BLUE, the estimator obtained by OLS under the assumption of
autocorrelation presents the problem that the estimation of the covariance matrix of the
estimators calculated by the OLS usual formulas is biased. Consequently, the t and F
statistics based on this covariance matrix can lead to erroneous inferences.

6.6.3 Autocorrelation tests


In order to test autocorrelation, a scheme of autocorrelation of disturbances in the
alternative hypothesis must be defined. We will examine three of the best known tests. In
two of them (the Durbin and Watson test and Durbin’s h test) the alternative hypothesis
is a first-order autoregressive scheme, while the third one, called the Breusch–Godfrey
test, is a general test of autocorrelation applicable to higher-order autoregressive schemes.

Durbin and Watson test


The econometricians Durbin and Watson proposed the d test in 1950. DW is also
used to refer to this statistic.
Durbin and Watson proposed the following scheme for the disturbances ui:
ρ ut −1 + ε t
ut = ρ <1 ε t → NID(0, σ 2 ) (6-53)
The proposed scheme for ut is a first-order autoregressive scheme, since the
disturbances appear as regressand and also as regressor lagged a period. In the
terminology of time series analysis, the scheme (6-53) is called AR(1), that is to say, an
autoregressive process of order 1. The coefficient of this scheme is ρ, which is required
to be less than 1 in absolute value so that the disturbances do not have an explosive
character, when n grows indefinitely. The variable εt is a random variable with a normal
and independent distribution (which means NID) with mean 0 and variance σ2.
Consequently, the variable εt fulfills the same assumptions as ut in the CLM assumptions.
The variables with these properties are often called white noise variables.

212
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

According to the sign of ρ being positive or negative, the autocorrelation will be


positive or negative. On the other hand, almost always one-tailed test is performed,
namely the alternative hypothesis is taken as either positive autocorrelation or negative
autocorrelation.
The problem of constructing an autocorrelation test is that the disturbances are not
observable. The test must therefore be based on the residuals obtained from the OLS
estimation. This raises problems, since, under the null hypothesis that disturbances are
not autocorrelated, residuals are autocorrelated. In the construction of their test, Durbin
and Watson took these factors into account.
Let us now apply this test. Taking as a reference the scheme defined in (6-53),
Durbin and Watson formulate the following null and alternative hypothesis of positive
autocorrelation
H0 : ρ =0
(6-54)
H1 : ρ > 0
Thus, ut=εt is verified under the null hypothesis, i.e. the model fulfills the CLM
assumptions.
The statistic used by Durbin and Watson for testing hypotheses (6-54) is the d or
DW statistic, defined as follows:
n

∑ ( uˆ t − uˆt −1 )
= =
d DW t =2
n
(6-55)
∑ uˆ
t =1
2
t

The statistical distribution of d, which is symmetrical with a mean equal to 2, is


very complicated, since it depends on the particular form of the matrix of regressor X, the
sample size (n) and the number of regressors (k) excluding the intercept.
However, for different levels of significance, Durbin and Watson obtained two
values (dL and dU) for each value of n and k. The rules to test positive autocorrelation are:
< dL
If   d          , there is positive autocorrelation.   
If  d L ≤ d ≤ dU , the test is not conclusive.           (6-56)
d > dU
 If            , there is not positive autocorrelation.
As can be seen, there are values where the test is not conclusive. This is due to the
effect that the particular configuration of the matrix X has on the distribution of d.
If you want to test negative autocorrelation, the alternative hypothesis is the
following:
H1 : ρ < 0 (6-57)
In order to apply the negative autocorrelation test, it is taken into account that the
statistic d has a symmetrical distribution ranging between 0 and 4. The rules, therefore,
are the following:

213
INTRODUCTION TO ECONOMETRICS

Si  d > 4 − d L                , there is negative autocorrelation.          


Si  4 − dU ≤ d ≤ 4 − d L  , the test is not conclusive.          
Si  d < 4 − dU                 , there is not positive autocorrelation.
(6-58)
The Durbin and Watson test is not applicable if there are lagged endogenous
variables as regressors.
To be applied to quarterly data, Wallis considered a fourth-order autoregressive
scheme:
ut =ρ 4ut − 4 + ε i ρ4 < 1 ε t → NID(0, σ 2 ) (6-59)
The above scheme is similar to (6-53), the difference being that the disturbance of
the right hand side is lagged four periods. The Wallis statistic is similar to (6-55), but
takes into account that the residuals are lagged four periods. This author designed ad hoc
tables for testing models in which disturbances follow scheme (6-59).
EXAMPLE 6.11 Autocorrelation in the model to determine the efficiency of the Madrid Stock Exchange
In example 4.5, a model was formulated to determine the efficiency of the Madrid stock exchange.
Graphic 6.4 shows the standardized residuals 4 corresponding to the estimation of this model, using file
bolmadef. The DW statistic is equal to 2.04. (The DW statistic appears in the output of any econometric
package). As the DW table does not have values for a sample size of 247, we use the corresponding values
to n=200 and k’= 1. (In the nomenclature of this test, k' is used for the total number of regressors excluding
the intercept). As the sample size is large we use a significance level of 1%. Upper and lower tabulated
values, which correspond to the above specification, are as follows:
dL=1.664; dU=1.684
Since DW=2.04>dU, we do not reject the null hypothesis that the disturbances are not
autocorrelated for a significance level of α=0.01, i.e. of 1%, versus the alternative hypothesis of positive
autocorrelation according to the scheme (6-53).
4

-1

-2

-3

-4
GRAPHIC 6.4. Standardized residuals in the estimation of the model to determine the efficiency of
the Madrid Stock Exchange.
EXAMPLE 6.12 Autocorrelation in the model for the demand for fish

4
Standardized residuals are equal to residuals divided by σˆ .

214
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

In example 4.9 we estimated model (4-44), using file fishdem, to explain the demand for fish in
Spain. The graphic 6.5 shows the standardized residuals obtained in the estimation of this model. This
graph does not show that there is a significant autocorrelation scheme. In this regard, it should be noted
that, over a total of 28 observations, the line joining the points of the residuals crosses the axis 0 11 times,
which indicates a degree of randomness of the distribution of the residuals.
The value of the DW statistic for testing the scheme (6-53) is 1.202. For n=28 and k'=3, and for a
significance level of 1%, we get the following tabulated values:
dL=0.969 dU=1.415
Since dL<1.202<dU, there is not enough evidence to accept the null hypothesis, or to reject it.
3

-1

-2
2 4 6 8 10 12 14 16 18 20 22 24 26 28
GRAPHIC 6.5. Standardized residuals in the model on the demand for fish.

Durbin’s h test
Durbin (1970) proposed a statistic, called h, to test the hypothesis (6-54) in the
case that one or more lagged endogenous variables appear as explanatory variables. The
expression of the h statistic is the following:
n
h = rˆ (6-60)
¶ ( )
1- n var bˆ j

where rˆ is the correlation coefficient between uˆi and uˆi −1 , n is the sample size, and
¶ bˆ ) is the variance corresponding to the coefficient of the lagged endogenous
var (j
variable.
The statistic rˆ can be estimated using the following approximation, d ; 2(1- rˆ )
. If the regressand appears with different time lags as regressors, the variance
corresponding to the regressor with the lowest lag is selected.
Under assumptions (6-54), the h statistic has the following distribution:
h ¾ n¾
®¥
¾
® N (0,1) (6-61)
The critical region is therefore in the tails of the standard normal distribution: the
tail on the right for positive autocorrelation and the tail on the left for negative
autocorrelation.
¶ (bˆ ) ³ 1 . In this case, Durbin
The statistic (6-60) cannot be calculated if n var j

proposed an alternative procedure to estimate an auxiliary regression: the residuals are

215
INTRODUCTION TO ECONOMETRICS

taken as the regressand, the regressors are the same as those of the original model and the
residuals also lagged a period. This procedure is a particular case of the Breusch–Godfrey
test, which we will see next.
EXAMPLE 6.13 Autocorrelation in the case of Lydia E. Pinkham
In example 5.5 with the case of Lydia E. Pinkham, a model to explain the sales of a herbal extract
was estimated using file pinkham. Graphic 6.6 shows the graph of standardized residuals corresponding to
this model. As can be seen, it appears that the residuals are not distributed in a random way. Note, for
example, that from 1936 the residuals take positive values for 8 consecutive years.
The adequate test for autocorrelation in this model is Durbin’s h statistic, as there is a lagged
endogenous variable salest-1 in this model. The h statistic is:
n é dù n é 1.2012 ù 53
h = rˆ = ê1- ú = ê1- ú = 3.61
¶ ˆ ( )
1- n var b j ê
ë 2 ú ¶ ˆ
û 1- n var b j( )ê
ë 2 ú
û 1 - 53´ 0.08142

Given this value of h, the null hypothesis of no autocorrelation is rejected for α=0.01 or, even, for
α=0.001, according to the table of the normal distribution.

5,0
4,0
3,0
2,0
1,0
0,0
-1,0
-2,0
-3,0
-4,0
-5,0
8 13 18 23 28 33 38 43 48 53 58

GRAPHIC 6.6. Standardized residuals in the estimation of the model of the Lydia E. Pinkham case.

Breusch–Godfrey (BG) test


The Breusch–Godfrey (1978) test is a general test of autocorrelation applicable to
higher-order autoregressive schemes, and it can be used when there are stochastic
regressors such as the lagged regressand. This is an asymptotic test which is also known
as the LM (Lagrange multipliers) general test for autocorrelation.
In the BG test, it is assumed that the disturbances ut follow a pth-order
autoregressive model AR(p):
u=
t ρ1ut −1 + ρ 2ut − 2 +  + ρ p ut − p + ε t ρ <1 ε t → NID(0, σ 2 )
(6-62)
This is simply the extension of the AR(1) scheme of the Durbin and Watson test.
The null hypothesis and the alternative hypotheses to be tested are:
H 0 : ρ1 = ρ 2 =  = ρ p = 0
H1 : H 0 is not true
The BG test involves the following steps:

216
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

Step 1. The original model is estimated and the OLS residuals ( uˆi ) are calculated.

Step 2. An auxiliary regression is estimated, in which the residuals ( uˆi ) are taken
as the regressand and the regressors of the original model and the residuals
lagged 1, 2, ... and p periods are taken as regressors:
uˆt = α1 + α 2 x2t +  + α k xkt + γ 1uˆt −1 +  + γ 1uˆt − p + ε i (6-63)
The auxiliary regression should have an intercept, even if the original
model is estimated without it. In accordance with expression (6-63), in the
auxiliary regression there are k+p regressors in addition to the intercept.
2
Step 3. Designating by Rar the coefficient of determination of the auxiliary
2
regression, the statistic nRar is calculated.
Under the null hypothesis, the BG statistic is distributed as follows:
BG= nRar2 
n →∞
→ χ k2+ p (6-64)
The BG statistic is used to test the overall significance of the model (6-63).
For this purpose, the F statistic can also be used. However, in this case it
has only asymptotic validity, in the same way as with the BG statistic.
Step 4 For a significance level α, and designating by χ k2(+αp) the corresponding value
in χ2 table, the decision to make is the following:
If BG > χ k2(+αp) H0 is rejected

If BG ≤ χ k2(+αp) H0 is not rejected


As a particular case the BG test can be applied to quarterly data using a AR(4)
scheme.
EXAMPLE 6.14 Autocorrelation in a model to explain the expenditures of residents abroad
To explain the expenditures of residents abroad (turimp), the following model was estimated by
using quarterly data for the Spanish economy (file qnatacsp):
·turimp ) = - 17.31+ 2.0155ln( gdp )
ln( t t
(3.43) (0.276)
2
R =0.531 DW=2.055 n=49
where gdp is gross domestic product.

217
INTRODUCTION TO ECONOMETRICS

2.5

2.0

1.5

1.0

0.5

0.0

-0.5

-1.0

-1.5

-2.0
5 10 15 20 25 30 35 40 45

GRAPHIC 6.7. Standardized residuals in the estimation of the model explaining the expenditures of
residents abroad.
Graphic 6.7 shows the standardized residuals corresponding to this model. As can be seen, it
appears that the residuals are not distributed in a random way because, for example, there are peaks every
4 quarters, indicating that the autocorrelation has a scheme AR(4).
The BG statistic, calculated for a AR(4) scheme, is equal to nRar2 =36.35. Given this value of BG,
the null hypothesis of no autocorrelation is rejected for α=0.01, since χ 52(α ) =15.09. In the auxiliary
regression, in which uˆt −1 , uˆt − 2 , uˆt −3 and uˆt − 4 have been used as regressors, uˆt − 4 is the only significant
regressor.

6.6.4 HAC standard errors


As an extension of White’s heteroskedasticity-consistent standard errors that we
have seen in section 6.5.2, Newey and West proposed a method known as HAC
(heteroskedasticity and autocorrelation consistent) standard errors that allows OLS
standard errors to be corrected not only in situations of autocorrelation, but also in the
case of heteroskedasticity. Remember that the White method was designed specifically
for heteroskedasticity. It is important to point out that the Newey and West procedure is,
strictly speaking, valid in large samples and may not be appropriate in small ones. Note
that a sample of 50 observations is a reasonably large sample.
EXAMPLE 6.15 HAC standard errors in the case of Lydia E. Pinkham (Continuation of example 6.13)
Given the existence of autocorrelation in the model for the case of Lydia E. Pinkham, we have
calculated the standard errors according to the Newey and West procedure. These standard errors allow us
to make hypothesis tests on parameters correctly. The available sample is 53 observations. In table 6.9 you
can find the statistics t obtained by the conventional procedure and the procedure HAC, and the ratio
between them. The t obtained by the procedure HAC are slightly lower than those obtained by the
conventional method, except the advexp coefficient whose t is surprisingly much higher when the procedure
HAC is applied. In any case, the same conclusions are obtained for the two methods for significance levels
of 0.1, 0.05 and 0.01 in the significance test of each parameter.

218
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

TABLE 6.9.The t statistics, conventional and HAC, in the case of Lydia E. Pinkham.
regressor t conventional t HAC ratio
intercept 2.644007 1.779151 1.49
advexp 3.928965 5.723763 0.69
sales(-1) 7.45915 6.9457 1.07
d1 -1.499025 -1.502571 1.00
d2 3.225871 2.274312 1.42
d3 -3.019932 -2.658912 1.14

6.6.5 Autocorrelation treatment


In order to estimate an econometric model where the disturbances follow the
AR(1) scheme, we first consider the case that the value of ρ is known. Although this is
more an academic assumption which would not happen in reality, it is convenient to adopt
this assumption initially for presentation purposes. Let us suppose the following linear
regression model:
yt = β1 + β 2 x2t + β 3 x3t + L + β k xkt +ut (6-65)
If we lag a period in (6-65) and multiply both sides by ρ both, we obtain
ρ yt −1 = ρβ1 + ρβ 2 x2,t −1 + ρβ3 x3,t −1 + L + ρβ k xk ,t −1 +ρ ut −1
(6-66)
Subtracting (6-66) from (6-65), we have:
yt − ρ yt −1 = β1 (1 − ρ ) + β 2 ( x2t − ρ x2,t −1 ) + L + β k ( xkt − ρ xk ,t −1 ) + ( ut − ρ ut −1 )
(6-67)
As can be seen, according to the scheme given in (6-53), the disturbance term of
(6-67) fulfills the CLM assumptions.
Model (6-67) can be estimated directly by least squares if you know the value of
ρ. The estimator obtained is close to the GLS method if the sample is large enough. The
GLS method needs to strictly transform the observations 2 through n according to (6-67)
scheme, but also to transform the first observation in the following way:

yt 1 − ρ 2= β1 1 − ρ 2 + β 2 1 − ρ 2 x2t + L + β k 1 − ρ 2 xkt +ε t
(6-68)
When we estimate ρ together with the other model parameters, then the method is
called feasible GLS.
In general, in the application of feasible GLS the transformation of the first
observation according to (6-68) is ignored. Feasible GLS methods for estimating a model
in which the disturbances follow a AR(1) scheme can be grouped into three blocks: a)
two-step methods, b) iterative methods, and c) scanning methods.
Here we present two methods for block a), called direct method and Durbin two
stages method.

219
INTRODUCTION TO ECONOMETRICS

In the first stage of these two methods, ρ is estimated. In the direct method, ρ is
easily estimated from the DW statistic, using this approximate ratio DW ; 2(1- rˆ ) . In
the method of Durbin in two stages, we estimate the following regression model in which
the explanatory variables are the regressors of the original model, the regressors lagged
one period and the endogenous variable lagged one period:
yt = α1 + α 2,0 x2t + α 2,1 x2,t −1 + L + α k 0 xkt + α k1 xk ,t −1 +ρ yt −1 +υt
(6-69)
The coefficient of the lagged endogenous variable is precisely the parameter ρ. In
the first stage, the model (6-69) is estimated by OLS, taking from it the estimate of ρ. In
the second stage, applicable to both methods, the model is transformed with the estimation
of ρ calculated in the first stage as follows:

yt − ρˆ yt −1 = β1 (1 − ρˆ ) + β 2 ( x2t − ρˆ x2,t −1 ) + L + β k ( xkt − ρˆ xk ,t −1 ) +ξt (6-70)

Applying OLS to the transformed model we obtain the parameter estimates. An


exposition of iterative and scanning methods can be seen in Uriel, E.; Contreras, D.;
Moltó, M. L. and Peiró, A. (1990): Econometría. El modelo lineal. Editorial AC. Madrid.

Exercises
Exercise 6.1 Let us consider that the population model is the following:
y=i β1 + β 2 xi +ui (1)
Instead, the following model is estimated:
yi = β%2 x2i
% (2)

Is β%2 , obtained by applying OLS in (2), an unbiased estimator of β3 ?

Exercise 6.2 Let us consider that the population model is the following:
yi = β 2 xi +ui (1)
Instead, the following model is estimated:
y=i β%1 + β%2 x2i (2)
%
Is β%2 , obtained by applying OLS in (2), an unbiased estimator of β 2 ?

Exercise 6.3 Let the following models be:


imp = b1 + b2 gdp + b3 rpimp + u (1)
ln(imp ) = b1 + b2 ln( gdp ) + b3 ln(rpimp ) + u (2)
where imp is the import of goods, gdp is gross domestic product at market prices, and
rpimp are the relative prices imports/gdp. The magnitudes imp and gdp are expressed in
millions of pesetas.
a) Using a sample of the period 1971-1977 for Spain (file importsp), estimate
models (1) and (2).
b) Interpret coefficients β2 and β3 in both models.

220
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

c) Apply the RESET procedure to model (1).


d) Apply the RESET procedure to model (2).
e) Choose the most adequate specification using the p-values obtained in
sections c) and d).
Exercise 6.4 Consider the following model of food demand
food =β1 + β 2 rp + β3inc + u
where food is spending on food, rp are the relative prices and inc is disposable income.
Researcher A omitted variable inc, obtaining the following estimation:
·
food = 89.97+ 0.107 rp
i i
(11.85) (0.118)

Researcher B, who is more careful, got the following estimation:


·
food = 92.05- 0.142 rp + 0.236 inc
i i i
(5.84) (0.067) (0.031)

(The numbers in parentheses are standard errors of estimators.)


Throughout the discussion between researcher A and researcher B about which of
the two estimated models is most appropriate, researcher A tries to justify his oversight
on account of the omission being due to a problem of multicollinearity.
a) In favor of which researcher would you be in view of the results obtained?
Explain your choice.
b) Obtain analytically the bias of the estimator of β2 in the estimation carried
out by researcher A.
Exercise 6.5 The following production function is formulated:
β1 + β 2 ln(labor ) + β3 ln(capital ) + u
ln(output ) =
where output is the amount of output produced, labor is the amount of labor, capital is
the amount of capital.
The following data correspond to 9 companies:
outputi 230 140 180 270 300 240 230 350 120
labori 30 10 20 40 50 20 30 60 40
capitali 160 50 100 200 240 190 160 300 150

A researcher estimates the model mistaking only 8 observations, and obtains the
following results:
·
output = 97.259+ 0.970 labor + 0.650 capital
i i i
(1.956) (0.124) (0.027)

R2 = 0.999 F=3422
The numbers in parentheses are the standard errors of the estimators and the F
statistic corresponds to the test of the whole model.
When he realizes his mistake, he estimates the model with all observations (n=9),
obtaining in this case the following results:
·
output = 75.479- 1.970 labor + 1.272 capital
i i i
(32.046) (1.742) (0.376)

R2 = 0.824 F= 14.056

221
INTRODUCTION TO ECONOMETRICS

His confusion is great when comparing the two estimates, and he cannot
understand why the results become very different when using one more observation. Can
we find any reason that could justify these differences?
Exercise 6.6 Suppose in the model
y =β 0 + β1 x1 + β 2 x2 +u
2
the R-squared obtained from regressing x1 on x2, which will be called R1/2 , is zero.
Run the following regressions:
y λ0 + λ1 x1 +u
=
y γ 0 + γ 1 x2 +u
=
a) Will lˆ1 be equal to b̂1 and ĝ1 be equal to b̂2 ?
b) Will b̂0 be equal to lˆ0 or b̂0 be equal to ĝ0 ?
c) Will var( lˆ1 ) be equal to var( b̂1 ) and var( ĝ1 ) be equal to var( b̂2 )?

Exercise 6.7 An analyst wants to estimate the following model using the observations of
the attached table:
yi = e β1 x2βi2 x3βi3 x4βi4 eui
x2 x3 x4
3 12 4
2 10 5
4 4 1
3 9 3
2 6 3
5 5 1
What problems can occur in the estimation of this model with these data?
Exercise 6.8 In exercise 4.8, using the file airqualy, the following model was estimated:
·
airqual = 97.35+ 0.0956 popln − 0.0170 medincm − 0.0254 poverty
i i i i
(10.19) (0.0311) (0.0055) (0.0089)

− 0.0031 fueoili − 0.0011 valaddi


(0.0017) (0.0025)

R2=0.415 n=30
a) Calculate the statistic VIF for each coefficient.
b) What is your conclusion?
Exercise 6.9 To examine the effects of firm performance on CEO salary, the following
model is formulated:
β1 β 2 roa + β3 ln( sales ) + β 4 profits + β5tenure + β 6 age + u
ln( salary ) =+
where roa is the ratio profits/assets expressed as a percentage, tenure is the number of
years as CEO (=0 if less than six months), and age is age in years. Salaries are expressed
in thousands of dollars, and sales and profits in millions of dollars.
a) Using the full sample (447 observations) of the file ceoforbes, estimate the
model by OLS.

222
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

b) Apply the normality test to the residuals.


c) Using the first 60 observations, estimate the model by OLS. Compare the
coefficients and the R2 of this estimation with that obtained in section a).
What is your conclusion?
d) Apply the normality test to the residuals obtained in section c). What is
your conclusion comparing this result with that obtained in section b)?
Exercise 6.10 Let the following model be
yi = β1 + β 2 xi + ui [1]
where
σ i2 σ 2 xi ,
= xi > 0, ∀i
Apply generalized least squares to estimate β2 in model [1].
Exercise 6.11 Let the following model be
yi β xi + ui
= [1]
where
σ i2 σ 2 xi ,
= xi > 0, ∀i
a) Estimate β in model [1] using generalized least squares.
b) Calculate the variance of the estimator of β.
Exercise 6.12 Let the model be
yi =β1 + β 2 xi + ui [1]
where the variance of the disturbances is equal to
σ i2 σ 2 xi ,
= xi > 0, ∀i
1) Applying OLS to the model [1] and taking into account the Gauss-Markov
assumptions, the variance of the estimator according to (2-16) is
σ2
∑ (x − x )
i
2
[2]
2) Applying OLS to the model [1] and considering that σ = σ xi and the
2 2
i
remaining Gauss-Markov assumptions, the variance of the estimator is
therefore equal to
σ 2 ∑ ( xi − x ) 2 xi
(∑ ( xi − x ) 2 ) 2
[3]
3) Applying GLS to model [1] and considering that σ = σ xi and the
2 2
i
remaining Gauss-Markov assumptions, the variance of the estimator is
therefore equal to
σ2 [4]
( xi − x ) 2
∑ x
i

a) Are the variances [2] and [3] correct?

223
INTRODUCTION TO ECONOMETRICS

b) Show that [4] is less than or equal to [3]. (Hint: Apply the Cauchy-Schwarz
2
inequality which says that éêëå wi zi ùúû £ éêëå wi2 ùé 2ù
ûëå zi ú
úê û is true)

Exercise 6.13 Let the following model be


α1 α 2inc + u
hostel =+
where hostel is the spending on hotels and inc the yearly disposable income
The following information on 9 families was obtained:
family hostel inc
1 13 300
2 3 200
3 38 700
4 47 900
5 14 400
6 18 500
7 25 800
8 1 100
9 21 600

Hostel and income variables are expressed in thousands of pesetas.


a) Estimate the model by OLS.
b) Apply the White heteroskedasticity test.
c) Apply the Breusch-Pagan-Godfrey heteroskedasticity test.
d) Do you think it is appropriate to use the above heteroskedasticity tests in
this case?
Exercise 6.14 With reference to the model seen in exercise 4.5, we assume now that
var(ε i ) = σ 2 ln( yi )
a) Are, in this case, the OLS estimators unbiased?
b) Are the OLS estimators efficient?
c) Could you suggest an estimator better than OLS?
Exercise 6.15 Indicate and explain which of the following statements are true when there
is heteroskedasticity:
a) The OLS estimators are no longer BLUE.
b) The OLS estimators βˆ1 , βˆ2 , βˆ3 , L , βˆk are inconsistent.
c) The conventional t and F tests are not valid.
Exercise 6.16 In exercise 3.19, using the file consumsp, the Brown model was estimated
for the Spanish economy in the period 1954-2010. The results obtained were the
following:
·
conspc = − 7.156+ 0.3965 incpc + 0.5771 conspc
t t t −1
(84.88) (0.0857) (0.0903)

R2=0.997 RSS=1891320 n=56


Using the residuals of the above fitted model, the following regression was
obtained:

224
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

(·uˆt2 ) = 141568 + 89.71incpct - 149.2conspct- 1


- 0.183incpct2 - 0.221conspct2- 1 + 0.406incpct ´ conspct- 1
R2=0.285
a) Is there heteroskedasticity in the consumption function?
b) The following estimation, with White heteroskedasticity-consistent
standard errors, is obtained:
·
conspc = ? + ? incpc + ? conspc
t t t −1
(66.92) (0.0669) (0.0741)

Can you fill the blanks above? Please do so.


Explain the difference between the White heteroskedasticity- consistent
standard errors and the usual standard errors of the initial equation.
c) Test whether the coefficient on incpc is equal to 0.5. What standard errors
are you going to use in the inference process? Why?
Exercise 6.17 Assume the following specification:
ci =γ 1 + γ 2 hi + γ 3mi + ui
σ i2 = σ 2 hi2
Would it be appropriate to eliminate the heteroskedasticity to perform the
following transformation?
ci
=γ 1 + γ 2 hi + γ 3 mi + ui ?
hi
Explain your answer.
Exercise 6.18 Let the following model be
y =β1 + β 2 x + u
and we have the following information:
yi xi uˆi
2 -3 1.37
3 -2 -0.42
7 -1 0.79
6 0 -3.00
15 1 3.21
8 2 -6.58
22 3 4.63

a) Apply the White heteroskedasticity test.


b) Apply the Breusch-Pagan-Godfrey heteroskedasticity test.
c) Why is the significance obtained with both tests so different?
Exercise 6.19 Answer the following questions
a) Explain in detail what is the problem of heteroskedasticity in the linear
regression model.
b) Illustrate briefly the problem of heteroskedasticity with an example.
c) Propose solutions to the heteroskedasticity problem.

225
INTRODUCTION TO ECONOMETRICS

Exercise 6.20 Using a sample corresponding to 17 regions, the following estimations


were obtained:
yˆi =
−309.8 + 0.76 zi + 3.05hi R2 =
0.989
uˆí2 =
−1737.2 − 17.8 zi + 0.09 zi2 + 0.65 zi hi + 10.6hi − 0.31hi2 R2 =
0.705
where y is the expenditure on education, z is GDP and h is the number of inhabitants.
a) Is there a problem of heteroskedasticity? Detail the procedure followed in
testing.
b) Assuming that the presence of heteroskedasticity is detected in the
regression model, what solution would you take to test the significance of
the explanatory variables of the model? Explain your answer.
Exercise 6.21 Using data from Spanish economy for the period 1971-1997 (file importsp),
the following model was estimated to explain the Spanish imports (imp):
·imp ) =
ln( −26.58+ 2.4336 ln( gdp ) − 0.4494 ln(rpimp )
t t t
(2.81) (0.162) (0.021)

R2=0.997 n=27
where gdp is the gross domestic product at market prices, and rpimp are the relative prices
imports/gdp. The variables imp and gdp are expressed in millions of pesetas.
a) Set up and estimate the auxiliary regression to perform the Breusch-Pagan-
Godfrey heteroskedasticity test.
b) Apply the Breusch-Pagan-Godfrey heteroskedasticity test using the
auxiliary regression run in section a).
c) Set up the auxiliary regression to perform the complete White
heteroskedasticity test.
d) Apply the complete White heteroskedasticity test using the auxiliary
regression run in section c).
e) Set up the auxiliary regression to perform the simplified White
heteroskedasticity test.
f) Apply the simplified White heteroskedasticity test using the auxiliary
regression run in section e).
g) Compare the results of the test carried out in sections b), d) and f).
Exercise 6.22 Using data from file tradocde, the following model has been estimated to
explain the imports (impor) in OECD countries:
·impor ) =
ln( 18.01+ 1.6425ln( gdp ) − 0.5151ln( popul )
i i i
(6.67) (0.658) (0.636)

R2=0.614 n=34
where gdp is gross domestic product at market prices, and popul is the population of each
country.
a) What is the interpretation of the coefficient on ln(gdp)?
b) Set up the auxiliary regression to perform the White heteroskedasticity test.
c) Apply the White heteroskedasticity test using the auxiliary regression run
in section b).
d) Test whether the import/gdp elasticity is greater than 1. To make this test,
do you need to use the White heteroskedasticity-robust standard errors?

226
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

Exercise 6.23 Explain in detail what the appropriate autocorrelation test would be in each
situation:
a) When the model has no lagged endogenous variables and the observations
are annual.
b) When the model has lagged endogenous variables and the observations are
annual.
c) When the model has no lagged endogenous variables and the observations
are quarterly.
Exercise 6.24 Two alternative models were used to estimate the average cost of annual
car production of a particular brand in the period 1980-1999:
c =α + β p + u R 2 =0.848; R 2 =0.812; d =DW =0.51
c =α + β p + γ p 2 + u R 2 =0.852; R 2 =0.811; d =DW =2.11
a) When comparing the two estimations, indicate if you detect any
econometric problem. Explain it.
b) Depending on your answer to the previous section, which of the two
models would you choose?
Exercise 6.25 In the period 1950-1980, the following production is estimated
ln(ot ) =
− 3.94 + 1.45 ln(lt ) + 0.38 ln(kt )
(0.24) (0.083) (0.048)

=R 2 0.994= =
DW 0.858 ρˆ 0.559
where o is output, l is labor, and k is capital.
(The numbers in parentheses are standard errors of the estimators.)
a) Test whether there is autocorrelation.
b) If the model had a lagged endogenous variable as an explanatory variable,
indicate how you would test whether there is autocorrelation.
Exercise 6.26 Using 38 annual observations, the following demand function for a product
was estimated:
di =
2.47 + 0.35 pi + 0.9 di −1 R2 =
0.98 DW =
1.82
(0.39) (0.06)

where d is the quantity demanded, and p is the price.


(The numbers in parentheses are standard errors of the estimators.
a) Is there a problem of autocorrelation? Explain your answer.
b) List the conditions under which it would be appropriate to use the Durbin
Watson statistic.
Exercise 6.27 The following model of housing demand with annual
observations for the period 1960-1994 is estimated:
·rent ) = - 0.39+ 0.31ln(inc ) - 0.67 ln( price ) + 0.70 ln(rent )
ln( t t t t- 1
(0.15) (0.05) (0.02) (0.04)
2
R = 0.999 DW = 0.52
where v is spending on rent, r is disposable income, p is the price of housing
(The numbers in parentheses are standard deviations of the estimators).

227
INTRODUCTION TO ECONOMETRICS

a) Test whether there is autocorrelation.


b) Taking into account the conclusions reached in section a), how would you
carry out the significance tests for each one of the coefficients? Explain
your answer.
Exercise 6.28 Answer the following questions:
a) In a model to explain the sales, the estimation is carried out using quarterly
data. Explain how you can reasonably test whether there is autocorrelation.
b) Describe in detail, introducing assumptions that you consider appropriate,
how you would estimate the model when the null hypothesis of no
autocorrelation is rejected.
Exercise 6.29 In the estimation of the Keynesian consumption function for the French
economy, the following results were obtained:
·
cons = 1.22+ 0.854 inc
t t
(0.73) (79.39)

R 2 = 0.983 DW=0.4205 n =30


(The numbers in parentheses are the t statistics of the estimators).
A researcher believes the focus should be placed on the saving function, rather
than on the consumption function, proposing the following model:
savingt = a1 + a 2inct + vt
where
savingt = inct - const
a) Obtain the estimates of α1 and α2.
b) Estimate the variances of α̂1 and α̂ 2 .
c) Calculate the DW statistic of the saving model.
d) Calculate the R2 of the saving model.
Exercise 6.30 Let the model be
yt β xt + ut
=
[1]
ρ ut −1 + ε t ;
ut = E ε t2  =∀
σ2 i
a) If model [1] is transformed by taking differences first, under what
circumstances is the transformed model preferable to model [1]?
b) Is it appropriate to use the R2 to compare model [1] and the transformed
model? Explain your answer.
Exercise 6.31 Let the model be:
yt =β1 + β 2 xt + ut [1]
The following sample of observations is disposable for the variables x and y:
yi 6 3 1 1 1 4 6 16 25 36 49 64
xi -4 -3 -2 -1 1 2 3 4 5 6 7 8

a) Estimate the model [1] by OLS and calculate the corresponding adjusted
determination coefficient.
b) Calculate the Durbin-Watson statistic for the estimations made in a).

228
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

c) In view of the Durbin and Watson test and the representation of the fitted
line and residuals, is it appropriate to reformulate model [1]? Justify your
answer and, if it is yes, estimate the alternative model that you consider
the most appropriate for the data.
Exercise 6.32 Let the model be:
yt =β1 + β 2 xt + ut
ut = r ut- 1 + et ; et : NI (0, s 2 )
The following additional information is also disposable:
ρ = 0.5
yi 22 26 32 31 40 46 46 50
xi 4 6 10 12 13 16 20 22

a) Estimate the model by OLS.


b) Estimate the model by GLS without transforming the first observation.
c) Which of the two estimators of β2 is more efficient?
Exercise 6.33 In a study on product demand, the following results were obtained:
=yˆt 2.30+ 0.86 xt
(7.17) (0.05)

= =
R 2 0.9687 DW=3.4 n 15
(The numbers in parentheses are standard errors of the estimators.)
Furthermore, the following additional information about the residual regressions is
disposable:
1. =
uˆt 0.167 + 0.127 xt
(0.210) (0.180)

2. =
uˆt 0.231+ 0.218 xt1/2
(0.098) (0.095)

a) Detect whether there is autocorrelation.


b) Detect whether there is heteroskedasticity.
c) What would be the most appropriate procedure to solve the potential
problem of heteroskedasticity?
Exercise 6.34 Using a sample of the period 1971-1997 (file importsp), the following
model was estimated, using HAC standard errors, to explain the imports of goods in Spain
(imp):
·imp ) = - 26.58+ 2.434 ln( gdp ) - 0.4494 ln(rpimp )
ln( t t t- 1
(3.65) (0.210) (0.023)

R 2 = 0.997 DW=0.73 n = 27
where gdp is gross domestic product at market prices, and rpimp are the relative prices
import/gdp. Both magnitudes are expressed in millions of pesetas.
(The numbers in parentheses are standard errors of the estimators.)
a) Interpret the coefficient on rpimp.
b) Is there autocorrelation in this model?

229
INTRODUCTION TO ECONOMETRICS

c) Test whether the imp/gdp elasticity plus four times the imp/rpimp elasticity
is equal to zero. (Additional information: var( βˆ2 ) =0.044247; var( βˆ3 )
=0.000540; and var( βˆ , βˆ ) =0.004464).
2 3
d) Test the overall significance of this model.
Exercise 6.35 Using a sample for the period 1954-2009 (file electsp), the following model
was estimated to explain the electricity consumption in Spain (conselec):
·
ln( conselec ) = - 9.98+ 1.469 ln( gdp )
t t
(0.46) (0.035)

R 2 = 0.9805 DW=0.18 n = 37 (1)


where gdp is gross domestic product at 1986 market prices. The variable conselec is
expressed in a thousand tonnes of oil equivalent (ktoe) and gdp is expressed in millions
of pesetas.
(The numbers in parentheses are standard errors of the estimators.)
a) Test whether there is autocorrelation applying the Durbin-Watson statistic.
b) Test whether there is autocorrelation applying the Breusch-Godfrey
statistic for a AR(2) scheme.
c) The following model is also estimated:
·conselec ) = - 0.917+ 0.164 log( gdp ) + 0.871log(conselec )
log( t t t- 1
(0.75) (0.107) (0.072)
2
R = 0.997 DW=0.93 n = 36 (2)
Test whether there is autocorrelation applying the procedure you consider
appropriate.
d) Test whether the conselec/gdp elasticity in an equilibrium situation
e e e
( ln(conselec ) = b1 + b2 ln( gdp ) + b3 ln(conselec ) ) is greater than 1,
using an adequate procedure.
Exercise 6.36 The Phillips curve represents the relationship between the rate of inflation
(inf) and the unemployment rate (unemp). While it has been observed that there is a stable
short run tradeoff between unemployment and inflation, this has not been observed in the
long run.
The following model reflects the Phillips curve:
inf = β1 + β 2unempl +u
Using a sample for the Spanish economy in the period 1970-2010 (file phillipsp),
the following results were obtained:
¶ = 12.59- 0.3712 unempl
inf t t
(1.79) (0.120)

R2=0.198; DW=0.219; n=41


(The numbers in parentheses are standard deviations of the estimators).
a) Interpret the coefficient on unempl.
b) Test whether there is first order autocorrelation using Durbin and Watson
test.
c) Using the disposable information so far, can you test the significance of the
coefficient on unempl adequately?

230
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL

d) Using the HAC standard errors, test the significance of the coefficient on
unempl.
Exercise 6.37 It is important to remark that the Phillips curve is a relative relationship.
Inflation is considered low or high relative to the expected rate of inflation and
unemployment is considered low or high relative to the so-called natural rate of
unemployment. In the augmented Phillips curve this is taken into account:
inft − infte⁄t −1 = β 2 (unemplt − λ0 ) +ut
where λ0 is the natural rate of unemployment and infte⁄t −1 is the expected rate of inflation
for t formed in t-1. If we consider that the expected inflation for t is equal to the inflation
in t-1 ( infte⁄t −1 = inft −1 ) and β1 = − β 2λ0 , the augmented Phillips curve can be written as:
inft − inft −1 = β1 + β 2unemplt +ut
a) Using file phillipsp, estimate the above model.
b) Interpret the coefficient on unempl.
c) Test whether there is second order autocorrelation.
d) Test whether the natural rate of unemployment is greater than 10.

Appendix 6.1
First we are going to express the β%2 taking into account that y is generated by the
model (6-8):
n n

∑ ( x1i − x2 )( yi − y ) ∑ ( x1i − x2 ) yi
=
=
β2 =
% i 1 =i 1
n n

∑ 1i 2
(
=i 1 =i 1
x − x ) 2
∑ ( x1i − x2 )2
n

∑ (x 1i − x2 )( β1 + β 2 x1i + β3 x2i +ui )


= i =1
n

∑ (x
i =1
1i − x2 ) 2
n n n

1i 2 1i 1i 2∑ (x
2i − x )x ∑ (x − x )x ∑ (x1i − x2 )ui
=i 1
2 n

=i 1 =i 1
3 n n
+β +
1i
=i 1 =i 1
2
2
1i ∑ (x
2
2

i =1
−x ) ∑ (x −x ) ∑ (x 1i − x2 ) 2
n n

∑ ( x1i − x2 ) x2i ∑ ( x1i − x2 )ui (6-71)


β 2 + β3 n
=
=i 1 =i 1
+ n
∑ ( x1i − x2 ) ∑ ( x1i − x2 )2
=i 1 =i 1
2

If we take expectations on both sides of (6-71), we have

231
INTRODUCTION TO ECONOMETRICS

n n

1i 2 ∑ (x
2i 1i − x )x ∑ (x − x2 ) E (ui | x2 , x3 )
2E ( β%) =
=i 1 =i 1
2 3 n
β +β n
+

=i 1 =i 1
1i ∑ (x − x )
2
2
∑ (x 1i − x2 ) 2
n
(6-72)
∑ (x 1i − x2 ) x2i
= β2 + β 3
i =1
n

∑ (x
i =1
1i − x2 ) 2

232

You might also like