Introduction To Econometrics 12-09-2019
Introduction To Econometrics 12-09-2019
INTRODUCTION TO ECONOMETRICS
Ezequiel Uriel
2019
University of Valencia
I would like to thank the professors Luisa Moltó, Amado Peiró, Paz Rico, Pilar
Beneito and Javier Ferri for their suggestions for the errata they have detected in previous
versions, and for having provided me with data to formulate exercises. Some students
have also collaborated in the detection of errata. In any case, I am solely responsible for
the errata that have not been detected.
Summary
1 Econometrics and economic data..................................................... 9
1.1 What is econometrics? ................................................................................ 9
1.2 Steps in developing an econometric model .............................................. 10
1.3 Economic data .......................................................................................... 13
2 The simple regression model: estimation and properties............... 15
2.1 Some definitions in the simple regression model ..................................... 15
2.1.1 Population regression model and population regression function.............................. 15
2.1.2 Sample regression function ........................................................................................ 16
2.2 Obtaining the Ordinary Least Squares (OLS) Estimates .......................... 17
2.2.1 Different criteria of estimation................................................................................... 17
2.2.2 Application of least square criterion .......................................................................... 19
2.3 Some characteristics of OLS estimators ................................................... 21
2.3.1 Algebraic implications of the estimation ................................................................... 21
2.3.2 Decomposition of the variance of y ........................................................................... 22
2.3.3 Goodness of fit: Coefficient of determination (R2) .................................................... 23
2.3.4 Regression through the origin .................................................................................... 25
2.4 Units of measurement and functional form .............................................. 26
2.4.1 Units of Measurement ................................................................................................ 26
2.4.2 Functional Form......................................................................................................... 27
2.5 Assumptions and statistical properties of OLS ......................................... 33
2.5.1 Statistical assumptions of the CLM in simple linear regression ................................ 33
2.5.2 Desirable properties of the estimators ........................................................................ 35
2.5.3 Statistical properties of OLS estimators ..................................................................... 37
Exercises ......................................................................................................... 41
Annex 2.1 Case study: Engel curve for demand of dairy products ................ 48
Appendixes ..................................................................................................... 54
Appendix 2.1: Two alternative forms to express β̂ 2 ......................................................... 54
Appendix 2.2. Proof: rxy2 = R 2 .......................................................................................... 55
Appendix 2.3. Proportional change versus change in logarithms ....................................... 55
Appendix 2.4. Proof: OLS estimators are linear and unbiased............................................ 56
Appendix 2.5. Calculation of variance of β̂ 2 : ................................................................... 57
Appendix 2.6. Proof of Gauss-Markov Theorem for the slope in simple regression .......... 58
)2
Appendix 2.7. Proof: σ is an unbiased estimator of the variance of the disturbance ....... 59
Appendix 2.8. Consistency of the OLS estimator ............................................................... 61
Appendix 2.9 Maximum likelihood estimator .................................................................... 62
3 Multiple linear regression: estimation and properties.................... 66
3.1 The multiple linear regression model ....................................................... 66
3.1.1 Population regression model and population regression function.............................. 67
3.1.2 Sample regression function ........................................................................................ 68
3.2 Obtaining the OLS estimates, interpretation of the coefficients, and other
characteristics ................................................................................................. 69
3.2.1 Obtaining the OLS estimates ...................................................................................... 69
3.2.2 Interpretation of the coefficients ................................................................................ 71
3.2.3 Algebraic implications of the estimation ................................................................... 75
3.3 Assumptions and statistical properties of the OLS estimators ................. 76
3.3.1 Statistical assumptions of the CLM in multiple linear regression) ............................. 76
3.3.2 Statistical properties of the OLS estimator ................................................................ 78
3.4 More on functional forms ......................................................................... 82
3.4.1 Use of logarithms in the econometric models ............................................................ 82
3.4.2 Polynomial functions ................................................................................................. 83
3.5 Goodness-of-fit and selection of regressors. ............................................ 85
3.5.1 Coefficient of determination ...................................................................................... 85
3.5.2 Adjusted R-Squared ................................................................................................... 86
3.5.3 Akaike information criterion (AIC) and Schwarz criterion (SC) ............................... 87
Exercises ......................................................................................................... 89
Appendixes ..................................................................................................... 97
Appendix 3.1 Proof of the theorem of Gauss-Markov ........................................................ 97
)2
Appendix 3.2 Proof: σ is an unbiased estimator of the variance of the disturbance ....... 98
Appendix 3.3 Consistency of the OLS estimator ................................................................ 99
Appendix 3.4 Maximum likelihood estimator .................................................................. 101
4 Hypothesis testing in the multiple regression model ................... 104
4.1 Hypothesis testing: an overview ............................................................. 104
4.1.1 Formulation of the null hypothesis and the alternative hypothesis .......................... 104
4.1.2 Test statistic ............................................................................................................. 105
4.1.3 Decision rule ............................................................................................................ 105
4.2 Testing hypotheses using the t test ......................................................... 108
4.2.1 Test of a single parameter ........................................................................................ 108
4.2.2 Confidence intervals ................................................................................................ 118
4.2.3 Testing hypotheses about a single linear combination of the parameters ................ 119
4.2.4 Economic importance versus statistical significance ............................................... 124
4.3 Testing multiple linear restrictions using the F test. .............................. 124
4.3.1 Exclusion restrictions ............................................................................................... 125
4.3.2 Model significance ................................................................................................... 129
4.3.3 Testing other linear restrictions................................................................................ 131
4.3.4 Relation between F and t statistics ........................................................................... 132
4.4 Testing without normality ...................................................................... 133
4.5 Prediction................................................................................................ 133
4.5.1 Point prediction ........................................................................................................ 133
4.5.2 Interval prediction .................................................................................................... 134
4.5.3 Predicting y in a ln(y) model .................................................................................... 137
4.5.4 Forecast evaluation and dynamic prediction ............................................................ 138
Exercises ....................................................................................................... 140
5 Multiple regression analysis with qualitative information .......... 156
5.1 Introducing qualitative information in econometric models. ................. 156
5.2 A single dummy independent variable ................................................... 156
5.3 Multiple categories for an attribute ........................................................ 160
5.4 Several attributes .................................................................................... 162
5.5 Interactions involving dummy variables. ............................................... 164
5.5.1 Interactions between two dummy variables ............................................................. 164
5.5.2 Interactions between a dummy variable and a quantitative variable ........................ 165
5.6 Testing structural changes ...................................................................... 166
5.6.1 Using dummy variables ........................................................................................... 166
5.6.2 Using separate regressions: The Chow test .............................................................. 169
Exercises ....................................................................................................... 172
6 Relaxing the assumptions in the linear classical model............... 186
6.1 Relaxing the assumptions in the linear classical model: an overview.... 186
6.2 Misspecification ..................................................................................... 188
6.2.1 Consequences of misspecification ........................................................................... 188
6.2.2 Specification tests: the RESET test .......................................................................... 190
6.3 Multicollinearity ..................................................................................... 192
6.3.1 Introduction.............................................................................................................. 192
6.3.2 Detection .................................................................................................................. 193
6.3.3 Solutions .................................................................................................................. 196
6.4 Normality test ......................................................................................... 197
6.5 Heteroskedasticity .................................................................................. 199
6.5.1 Causes of heteroskedasticity .................................................................................... 199
6.5.2 Consequences of heteroskedasticity......................................................................... 200
6.5.3 Heteroskedasticity tests ............................................................................................ 200
6.5.4 Estimation of heteroskedasticity-consistent covariance ........................................... 206
6.5.5 The treatment of the heteroskedasticity ................................................................... 207
6.6 Autocorrelation ....................................................................................... 209
6.6.1 Causes of autocorrelation ......................................................................................... 210
6.6.2 Consequences of autocorrelation ............................................................................. 212
6.6.3 Autocorrelation tests ................................................................................................ 212
6.6.4 HAC standard errors ................................................................................................ 218
6.6.5 Autocorrelation treatment ........................................................................................ 219
Exercises ....................................................................................................... 220
Appendix 6.1 ................................................................................................ 231
1 ECONOMETRICS AND ECONOMIC DATA
9
INTRODUCTION TO ECONOMETRICS
for example, in Spain in the last quarter of the 20th century. In addition to the estimation,
in which numerical values are obtained, econometric methods allow us to perform tests
of hypothesis; for example, in a production function, is the hypothesis of constant returns
to scale admissible?
2) Economic simulation policy. Econometrics methods can be used to simulate the
effects of alternative policies. For example, with an appropriate econometric
model we could see, in quantitative terms, how the different increases in
tobacco tax affect the consumption of tobacco.
3) Prediction or forecasting. Very often econometric methods are used to predict
values of economic variables in the future. By making predictions we try to
reduce our uncertainty in the future of the economy. This is not an easy task,
since in general the predictions are only satisfactory when there are no drastic
changes in the economy. Although it would be useful to be able to predict these
drastic changes accurately, both econometric and other alternative methods
tend to be imprecise.
(a) Specification
In this first step, the model or models used must be defined, as well as data to be
used in the estimation stage.
In the specification step, we will refer to four elements: the economic model, the
econometric model, the statistical assumptions of the model and the data. In this section
we will refer to the first three elements; in the following section we will examine different
types of data used in econometric analysis.
The first element we need is an economic model. In some cases, a formal
economic model is constructed entirely using economic theory. In other cases, economic
theory is used less formally in constructing an economic model.
After we have an economic model, we must convert it into an econometric model.
We are going to see that with two examples.
EXAMPLE 1.1 Keynesian consumption function
Keynes formulated his well-known consumption function in three propositions:
Proposition 1: Consumption is a function of income, and both variables are measured in real terms.
If the variables are measured in real terms, it means that when consumers decide the proportion of income
devoted to consumption, they are not affected by monetary illusion.
Analytically, proposition 1 can be expressed in the following way:
10
ECONOMETRICS AND ECONOMIC DATA
11
INTRODUCTION TO ECONOMETRICS
β1 β1
+ β 2 > β 2 or >0 (1-8)
inc inc
That is to say,
β1 > 0 (1-9)
Once the model has been estimated, testing proposition 3 is equivalent to testing whether the
intercept is significantly greater than 0.
EXAMPLE 1.2 Wage determination
Economic model:
Formal economic theory - human capital theory- says that education (educ), experience (exper)
and training are factors that affect productivity and hence the wage. Therefore, an economic model for
wage determination could be the following:
wage = f (educ, exper , training ) (1-10)
Incidentally, do you think there is any variable missing in this model?
Econometric model:
The corresponding econometric model, using a mathematical linear form, is the following:
β1 + β 2 educ + β3 exper + β 4 training + u
wage = (1-11)
To sum up, to convert an economic model into an econometric model:
a) The form of the function f(.) has been specified.
b) A disturbance variable has been included to reflect the effect of other variables
affecting wage, but not appearing in the model.
An important element in the specification of the model is the formulation of a set
of statistical assumptions, which are used in subsequent steps. These statistical
assumptions play a key role in hypothesis testing and, in general, throughout the inference
process carried out with the model.
(b) Estimation
In the estimation process we obtain numerical values of the coefficients of an
econometric model. To complete this stage, data are required on all observable variables
that appear in the specified econometric model, while it is also necessary to select the
appropriate estimation method, taking into account the implications of this choice on the
statistical properties of estimators of the coefficients. The distinction between estimator
and estimate should be made clear. An estimator is the result of applying an estimation
method to an econometric specification. On the other hand, an estimate consists of
obtaining a numerical value of an estimator for a given sample. For example, applying a
very simple estimation method, called ordinary least squares, to the specification of the
consumption function (1-4) provides expressions which determine the estimators β̂1 and
β̂ 2 . Substituting the sample data in these expressions, two numbers are obtained: one for
β̂1 and one for β̂ 2 which provide estimates of the parameters β 1 and β 2 .
In general, it is possible to obtain analytical expressions of the estimators,
particularly in the case of estimating linear relationships. But in non-linear procedures of
estimation it is often difficult to establish their analytical expression.
12
ECONOMETRICS AND ECONOMIC DATA
(c) Validation
The results are assessed in the validation stage, where we assess whether the
estimates obtained in the previous stage are acceptable, both theoretically and from the
statistical point of view. On the one hand, we analyze, whether estimates of model
parameters have the expected signs and magnitudes: that is to say, whether they satisfy
the constraints established by economic theory.
From the statistical point of view, on the other hand, statistical tests are performed
on the significance of the parameters of the model, using the statistical assumptions made
in the specification step. In turn, it is important to test whether the statistical assumptions
of the econometric model are fulfilled, although it should be noted that not all assumptions
are testable. The violation of any of these assumptions implies, in general, the application
of another estimation method that allows us to obtain estimators whose statistical
properties are as good as possible.
One way to establish the ability of a model to make predictions is to use the model
to forecast outside the sample period, and then to compare the predicted values of the
endogenous variable with the values actually observed.
13
INTRODUCTION TO ECONOMETRICS
firms (for example, industrial firm survey) or other economic agents. Surveys are a typical
source for cross-sectional data. In many contemporary econometric cross sectional
studies the sample size is quite large.
In cross sectional data, observations must be obtained by random sampling. Thus,
cross sectional observations are mutually independent. The ordering of observations in
cross sectional data does not matter for econometric analysis. If the data are not obtained
with a random sample, we have a sample selection problem.
So far we have referred to micro data type, but there may also be cross sectional
data relating to aggregate units such as countries, regions, etc. Of course, data of this type
are not obtained by random sampling.
Panel Data
Panel data (or longitudinal data) are time series for each cross sectional member
in a data set. The key feature is that the same cross sectional units are followed over a
given time period. Panel data combines elements of cross sectional and time series data.
These data sets consist of a set of individuals (typically people, households, or
corporations) surveyed repeatedly over time. The common modeling assumption is that
the individuals are mutually independent of one another, but for a given individual,
observations are mutually dependent. Thus, the ordering in the cross section of a panel
data set does not matter, but the ordering in the time dimension matters a great deal. If we
do not take into account the time in panel data, we say that we are using pooled cross
sectional data.
14
2 THE SIMPLE REGRESSION MODEL: ESTIMATION
AND PROPERTIES
15
INTRODUCTION TO ECONOMETRICS
y y
x x
FIGURE 2.1. The population regression function. FIGURE 2.2. The scatter diagram.
(PRF)
We can express the population model for each observation of the sample:
yi =β1 + β 2 xi + ui i =1, 2, , n (2-3)
In figure 2.3 the population regression function and the scatter diagram are put
together, but it is important to keep in mind that although β1 and β 2 are fixed, they are
unknown. According to the model, it is possible to make the following decomposition
from a theoretical point of view:
yi =µ yi + ui i =1, 2, , n (2-4)
which is represented in figure 2.3 for the ith observation. However, from an empirical
point of view, it is not possible because β1 and β 2 are unknown parameters and ui is not
observable.
16
THE SIMPLE REGRESSION MODEL
allows us to calculate the fitted value ( yˆi ) for y when x = xi . In the SRF β̂1 and β̂ 2 are
estimators of the parameters β1 and β 2 . For each x i we have an observed value ( yi ) and
a fitted value ( yˆi ).
In other words, the residual uˆi is the difference between the sample value yi and
the fitted value of yˆi , as can be seen in figure 2.4. In this case, it is possible to calculate
the decomposition:
y=
i yˆi + uˆi
for a given sample.
y y
μy yˆi
yi yi
uˆi
ui
yˆi
μyi
xi x xi x
FIGURE 2.3. The population regression function FIGURE 2.4. The sample regression function
and the scatter diagram. and the scatter diagram.
To sum up, β̂1 , β̂ 2 , yˆi and uˆi are the sample counterpart of β1 , β 2 , µ yi and ui
respectively. It is possible to calculate β̂1 and β̂ 2 for a given sample, but the estimates
will change for each sample. On the contrary, β1 and β 2 are fixed, but unknown.
17
INTRODUCTION TO ECONOMETRICS
The first criterion takes as estimators those values of β̂1 and β̂ 2 that make the
sum of all the residuals as near to zero as possible. According to this criterion, the
expression to minimize would be the following:
n
Min ∑ uˆ
i =1
i (2-7)
The main problem of this procedure is that the residuals of different signs can be
compensated. Such a situation can be observed graphically in figure 2.5, in which three
aligned observations are graphed, ( x1 , y1 ), ( x2 , y2 ) and ( x3 , y3 ). In this case the following
happens:
y2 − y1 y3 − y1
=
x2 − x1 x3 − x1
x1 x2 x3 x
FIGURE 2.5. The problems of criterion 1.
If a straight line is fitted so that it passes through the three points, each one of the
residuals will take value zero, and therefore
3
∑ uˆ
i =1
i =0
∑
3
This fit could be considered optimal. But it is also possible to obtain uˆ = 0 ,
i =1 i
by rotating the straight line - from the point x2 , y2 - in any direction, as figure 2.5 shows,
∑
3
because uˆ3 = −uˆ1 . In other words, by rotating this way the result uˆ = 0 is always
i =1 i
obtained. This simple example shows that this criterion is not appropriate for the
estimation of the parameters given that, for any set of observations, an infinite number of
straight lines exist, satisfying this criterion.
Criterion 2
In order to avoid the compensation of positive residuals with negative ones, the
absolute values from the residuals are taken. In this case, the following expression would
be minimized:
18
THE SIMPLE REGRESSION MODEL
n
Min ∑ uˆi (2-8)
i =1
The estimators obtained are denominated least square estimators (LS), and they
enjoy certain desirable statistical properties, which will be studied later on. On the other
hand, as opposed to the first of the examined criteria, when we square the residuals their
compensation is avoided, and the least square estimators are simple to obtain, contrary to
the second of the criteria. It is important to indicate that, from the moment we square the
residuals, we proportionally penalize the bigger residuals more than the smaller ones (if
a residual is double the size of another one, its square will be four times greater). This
characterizes the least square estimation with respect to other possible procedures.
ˆ ˆ ∑ t ˆ ˆ ∑
=
Min S Min= uˆ 2 Min ( yi − βˆ1 − βˆ2 xi ) 2 (2-10)
β1 , β 2
ˆ ˆ β1 , β 2 β1 , β 2
= t 1= i 1
∂S n
=−2∑ ( yi − βˆ1 − βˆ2 xi )
∂βˆ1 i =1
∂S n
=−2∑ ( yi − βˆ1 − βˆ2 xi )xi
∂βˆ2 i =1
∑(y
i =1
i − βˆ1 − βˆ2 xi ) =
0 (2-11)
∑ ( y − βˆ
i =1
i 1 − βˆ2 xi )xi =
0 (2-12)
19
INTRODUCTION TO ECONOMETRICS
In operations with summations, the following rules must be taken into account:
n
∑ a = na
i =1
n n
i
=i 1 =i 1
∑ ax = a ∑ xi
n n n
=i 1
∑ ( xi + yi )= =i 1 =i 1
∑ xi + ∑ yi
Operating with the normal equations, we have
n n
i
=i 1 =i 1
∑=
y nβˆ1 + βˆ2 ∑ xi (2-13)
n n n
=i 1
∑
= yx i i 1 i 2
=i 1 =i 1
βˆ ∑ x + βˆ ∑x 2
i
(2-14)
Substituting this value of β̂1 in the second normal equation (2-14), we have
n n n
=i 1
∑ yi xi =
( y − βˆ2 x )∑ xi + βˆ2 ∑ xi2
=i 1 =i 1
n n n n
i i i ∑ y x − y∑ x
2 n
βˆ =
=i 1 =i 1
n
(2-17)
2
i
=i 1 =i 1
i ∑x − x∑ x
∑ ( y − y )( x − x )i i
βˆ2 = i =1
n
(2-18)
∑ (x − x )
i =1
i
2
20
THE SIMPLE REGRESSION MODEL
In the precedent epigraphs β̂1 and β̂ 2 have been used to designate generic
estimators. From now on, we will only designate OLS estimators with this notation.
EXAMPLE 2.1 Estimation of the consumption function
Given the Keynesian consumption function,
β1 + β 2 inc + ui
cons =
we will estimate it using data from six households that appear in table 2.1.
TABLE 2.1. Data and calculations to estimate the consumption function.
(consi − cons )
Observ. consi inci consi × inci inc2
i consi − cons inci − inc × (inci − inc) 2
(inci − inc)
1 5 6 30 36 -4 -5 20 25
2 7 9 63 81 -2 -2 4 4
3 8 10 80 100 -1 -1 1 1
4 10 12 120 144 1 1 1 1
5 11 13 143 169 2 2 4 4
6 13 16 208 256 4 5 20 25
Sums 54 66 644 786 0 0 50 60
Calculating cons and inc , and applying the formula (2-17), or alternatively (2-18), for the data
table 2.1, we obtain
54 66 644 − 9 × 66 50
= = 9 ; inc
cons = = 11 ; = (2-17): βˆ2 = 0.83 ; (2-18): β= ˆ = 0.83
786 − 11× 66
2
6 6 60
Then by applying (2-16), we obtain β1 =−
ˆ 9 0.83 × 11 = −0.16
∑ uˆ
i =1
i =0 (2-19)
21
INTRODUCTION TO ECONOMETRICS
n n
i
=i 1 =i 1
∑ u=
ˆ ∑ ( y − βˆ i 1 − βˆ2 x=
i) 0 (2-21)
which is precisely the first equation (2-11) of the system of normal equations.
Note that, if (2-19) holds, it implies that
n n
∑ yi = ∑ yˆi
=i 1 =i 1
(2-22)
∑ x uˆ
i =1
i i =0 (2-25)
i i∑=
=i 1 =i 1
x uˆ ∑ x ( y i i − βˆ1 − βˆ=
2 xi ) 0
given in (2-12).
4. The sample cross product between the fitted values ( ŷ ) and the OLS residuals
is zero.
That is to say,
n
∑ yˆ uˆ
i =1
i ´i =0 (2-26)
Proof
Taking into account the algebraic implications 1 -(2-19)- and 3 -(2-25)-, we have
n n n n
=i 1 =i 1
∑ yˆiuˆ´i = ∑ (βˆ1 + βˆ2 xi )uˆ´i = βˆ1 ∑ uˆ´i + βˆ2 ∑ xiuˆ´i = βˆ1 × 0 + βˆ2 × 0 = 0
=i 1 =i 1
22
THE SIMPLE REGRESSION MODEL
yi − y = yˆi − yˆ + uˆi
Squaring both sides:
[ yi − y ]
2
= ( yˆi − yˆ ) + uˆi = ( yˆi − yˆ ) 2 + uˆi2 + 2uˆi ( yˆi − yˆ )
2
Taking into account the algebraic properties 1 and 4, the third term of the right
hand side is equal to 0. Analytically,
∑ uˆ ( yˆ − =
i yˆ ) ∑ uˆ yˆ − yˆ ∑=
i uˆ i i i 0 (2-28)
Therefore, we have
∑[ y − y]= ∑ ( yˆ − yˆ ) 2 + ∑ uˆi2
2
i i
(2-29)
In words,
Total sum of squares (TSS) =
Explained sum of squares (ESS)+Residual sum of squares (RSS)
It must be stressed that it is necessary to use the relation (2-19) to assure that (2-28)
is equal to 0. We must remember that (2-19) is associated to the first normal equation:
that is to say, to the equation corresponding to the intercept. If there is no intercept in the
fitted model, then in general the decomposition obtained will not be fulfilled (2-29).
This decomposition can be made with variances, by dividing both sides of (2-29)
by n:
∑=
( y − y ) ∑ ( yˆ − yˆ ) ∑ uˆ
2 2 2
i
+ i i
(2-30)
n n n
In words,
Total variance=explained variance+ residual variance
23
INTRODUCTION TO ECONOMETRICS
∑ ( yˆ − yˆ )
i
2
R2 = i =1
n
(2-31)
∑ ( y − y)
i =1
i
2
∑ ( yˆ − yˆ ) = ∑ ( y − y ) − ∑ uˆ
i
2
i
2 2
i
∑ ( yˆ − yˆ )
i
2
∑ ( y − y ) − ∑ uˆ
2 2
∑ uˆi2 = RSS
R =
2 i =1
n
= n
i i
=
1− n 1− (2-32)
∑ ( yi − y )2 ∑ ( y − y) ∑ ( yi − y )
2 2 TSS
i
=i 1 =i 1 =i 1
Therefore, R 2 is equal to 1 minus the proportion of the total sum of squares (TSS)
that is non-explained by the regression (RSS).
According to the definition of R 2 , the following must be accomplished
0 ≤ R2 ≤ 1
Extreme cases:
a) If we have a perfect fit, then u=
ˆí 0 ∀i . This implies that
yˆí = yí ∀i ⇒ ∑ ( yˆi − yˆ ) 2 = ∑ ( y − y)
i
2
⇒ R2 = 1
b) If =
yˆí c ∀i , it implies that
yˆ = c ⇒ yˆi − yˆ = c − c = 0 ∀i ⇒ ∑ ( yˆi − yˆ ) 2 = 0 ⇒ R2 = 0
If R 2 is close to 0, it implies that we have a poor fit. In other words, very little
variation in y is explained by x.
In many cases, a high R 2 is obtained when the model is fitted using time series
data, due to the effect of a common trend. On the contrary, when we use cross sectional
data a low value is obtained in many cases, but it does not mean that the fitted model is
bad.
What is the relationship between the coefficient of determination and the
coefficient of correlation studied in descriptive statistics? The coefficient of
determination is equal to the squared coefficient of correlation, as can be seen in appendix
2.2:
rxy2 = R 2 (2-33)
24
THE SIMPLE REGRESSION MODEL
(This equality is only valid in the simple regression model, but not in multiple
regression model).
EXAMPLE 2.2 Fulfilling algebraic implications and calculating R2 in the consumption function
·
In column 2 of table 2.2, consi is calculated; in columns 3, 4 and 5, you can see the fulfillment of
algebraic implications 1, 3 and 4 respectively. The remainder of the columns shows the calculations to
obtain
41.67
TSS = 42 ESS = 41.67 RSS = 42 − 41.67 = 0.33 R2 = = 0.992
42
0.33
or, alternatively, R 2 = 1− = 0.992
42
TABLE 2.2. Data and calculations to estimate the consumption function.
Observ. ·
consi
uˆi uˆi × inci · ´ uˆ
consi i
consi2 (consi − cons ) 2 · i2
cons · - cons
(cons · )2
i
y
x
FIGURE 2.6. A regression through the origin.
Now, we are going to estimate a regression line through the origin. The fitted
model is the following:
yi = β2 xi (2-34)
Therefore, we must minimize
25
INTRODUCTION TO ECONOMETRICS
n
=
Min
S Min
β2
∑ ( yi − β2 xi )2
β2
i =1
(2-35)
∑yx i i
β2 = i =1
n
(2-37)
∑x
i =1
2
i
Another problem with fitting a regression line through the origin is that the
following generally happens:
∑( y − y ) ≠ ∑ ( yˆi − yˆ ) 2 + ∑ uˆi2
2
i
If we now express income in euros (multiplication by 1000) and call it ince, the fitted model with
the new units of measurement of income would be the following:
·
consi =
0.2 + 0.00085 × incei
As can be seen, changing the units of measurement of the explanatory variable does not affect the
intercept.
26
THE SIMPLE REGRESSION MODEL
EXAMPLE 2.6
Let us suppose that the average consumption is 15 thousands euros. If we define the variable
=
consd i consi − cons and both variables are measured in euros, the fitted model with this change in the
origin will be the following:
· i − 15 = 0.2 − 15 + 0.85 × inc
cons i
that is to say,
·
consd i = - 14.8 + 0.85´ inci
Note that R2 is invariant to changes in the units of x and/or y, and also is invariant to the origin of
the variables.
27
INTRODUCTION TO ECONOMETRICS
The proportional change (or relative variation rate) between x1 and x0 is given
by:
∆x1 x1 − x0
= (2-43)
x0 x0
Multiplying a proportional change by 100, we obtain a proportional change in %.
That is to say:
∆x1
100 % (2-44)
x0
28
THE SIMPLE REGRESSION MODEL
a) Linear model
The β̂ 2 coefficient measures the effect of the regressor x on y. Let us look at this
in detail. The observation i of the sample regression function is given according to (2-5)
by
yˆ=i βˆ1 + βˆ2 xi (2-50)
Let us consider the observation h of the fitted model whereupon the value of the
regressor and, consequently, of the regressand has changed with respect to (2-50):
29
INTRODUCTION TO ECONOMETRICS
yˆ=
h βˆ1 + βˆ2 xh (2-51)
Subtracting (2-51) from (2-50), we see that x has a linear effect on ŷ :
1 1.00 89
2 1.00 86
3 1.00 74
4 1.00 79
5 1.00 68
6 1.00 84
7 0.95 139
8 0.95 122
9 0.95 102
10 0.85 186
11 0.85 179
12 0.85 187
Interpretation of the coefficient β̂ 2 : if the price of coffee increases by 1 French franc, the
quantity sold of coffee will decrease by 693.33 thousands of units. As the price of coffee is a small
1
The data of this exercise were obtained from a controlled marketing experiment in stores in Paris
on coffee expenditure, as reported in A. C. Bemmaor and D. Mouchoux, “Measuring the Short-Term Effect
of In-Store Promotion and Retail Advertising on Brand Sales: A Factorial Experiment”, Journal of
Marketing Research, 28 (1991), 202–14.
30
THE SIMPLE REGRESSION MODEL
magnitude, the following interpretation is preferable: if the price of coffee increases by 1 cent of a French
franc, the quantity sold will decrease by 6.93 thousands of units.
b) Linear-log model
A linear-log model is given by
β1 + β 2 ln( x) + u
y= (2-53)
The corresponding fitted function is the following:
yˆ βˆ1 + βˆ2 ln( x)
= (2-54)
Taking first order differences in (2-54), and then multiplying and dividing the
right hand side by 100, we have
βˆ2
=∆yˆ 100 × ∆ ln( x)%
100
Therefore, if x increases by 1%, then ŷ will increase by ( βˆ2 /100) units.
c) Log-linear model
A log-linear model is given by
ln( y ) =β1 + β 2 x + u (2-55)
The above model can be obtained by taking natural logs on both sides of the
following model:
y exp( β1 + β 2 x + u )
=
. For this reason, the model (2-55) is also called exponential.
The corresponding sample regression function to (2-55) is the following:
· y ) = bˆ + bˆ x
ln( (2-56)
1 2
31
INTRODUCTION TO ECONOMETRICS
Taking first order differences in (2-56), and then multiplying both sides by 100,
we have
· y )% = 100´ bˆ D x
100´ D ln( 2
d) Log-log model
The model given in (2-49) is a log-log model or, before the transformation, a
potential model (2-48). This model is also called a constant elasticity model.
The corresponding fitted model to (2-49) is the following:
· y ) = bˆ + bˆ ln( x)
ln( (2-57)
1 2
Interpretation of the coefficient β̂ 2 : if the price of coffee increases by 1%, the quantity sold of
coffee will decrease by 5.13%. In this case β̂ 2 is the estimated demand/price elasticity.
EXAMPLE 2.10 Explaining market capitalization of Spanish banks. Log-log model (Continuation
example 2.8)
Using data from example 2.8, the following log-log model has been estimated:
·
ln( marktval ) = 0.6756 + 0.938ln(bookval )
R2=0.928 n=20
Interpretation of the coefficient β̂ 2 : if the book value of a bank increases by 1%, the market
capitalization of this bank will increase by 0.938%. In this case β̂ 2 is the estimated market value/book
value elasticity.
In table 2.5 and for the fitted model, the interpretation of β̂ 2 in these four models is shown. If we
are considering the population model, the interpretation of β 2 is the same but taking into account that ∆u
must be equal to 0.
32
THE SIMPLE REGRESSION MODEL
log-log 1% βˆ2 %
33
INTRODUCTION TO ECONOMETRICS
4) The sample variance of x is different from 0 and has a finite limit as n tends to
infinity
Therefore, this assumption implies that
n
∑(x − x )
2
i
=S X2 i =1
≠0 (2-59)
n
This is not a restrictive assumption, since we can always use β1 to normalize E(u)
to 0. Let us suppose, for example, that E (u ) = 4 . We could then redefine the model in the
following way:
y = ( β1 + 4) + β 2 x + v
where v= u − 4 . Therefore, the expectation of the new disturbance, v, is 0 and the
expectation of u has been absorbed by the intercept.
7) The disturbances have a constant variance
=
var (ui ) σ=
2
i 1, 2, n (2-61)
This assumption is called the homoskedasticity assumption. The word comes from
the Greek: homo (equal) and skedasticity (spread). This means that the variation of y
around the regression line is the same across the x values; that is to say, it neither increases
or decreases as x varies. This can be seen in figure 2.7, part a), where disturbances are
homoskedastic.
F(u) F(u)
µy y µy y
x1 x1
x2 x2
xi xi
x x
a) b)
FIGURE 2.7. Random disturbances: a) homoskedastic; b) heteroskedastic.
34
THE SIMPLE REGRESSION MODEL
If this assumption is not satisfied, as happens in part b) of figure 2.7, the OLS
regression coefficients are not efficient. Disturbances in this case are heteroskedastic
(hetero means different).
8) The disturbances with different subscripts are not correlated with each other
(no autocorrelation assumption):
=
E (ui u j ) 0 i≠ j (2-62)
That is, the disturbances corresponding to different individuals or different periods
of time are not correlated with each other. This assumption of no autocorrelation or no
serial correlation, like the previous one, is testable a posteriori. The transgression occurs
quite frequently in models using time series data.
9) The disturbance u is normally distributed
Taking into account assumptions 6, 7 y 8, we have
ui ~ NID(0, σ 2 ) i = 1, 2, , n (2-63)
where NID states for normally independently distributed.
The reason for this assumption is that if u is normally distributed, so will y and the
estimated regression coefficients, and this will be useful in performing tests of hypotheses
and constructing confidence intervals for β 1 and β 2 . The justification for the assumption
depends on the Central Limit Theorem. In essence, this theorem states that, if a random
variable is the composite result of the effects of an indefinite number of variables, it will
have an approximately normal distribution even if its components do not, provided that
none of them is dominant.
35
INTRODUCTION TO ECONOMETRICS
( )
f bˆ2
f (b%2 )
b̂2(1) ( )
b2 = E bˆ2 b̂2(2) b̂2 b%2(1) b2 E (b%2 ) b%2(2) b%2
FIGURE 2.8. Unbiased estimator. FIGURE 2.9. Biased estimator.
The estimator b̂2 is unbiased, i.e., its expected value is equal to the parameter that
is estimated, β 2 . The estimator b̂2 is a random variable. In each sample of y’s – the x’s
are fixed in a repeated sample according to assumption 2- b̂2 taking a different value, but
on average is equal to the parameter β 2 , bearing in mind the infinite number of values b̂2
can take. In each sample of y’s a specific value of b̂2 , that is to say, an estimation of b̂2
is obtained. In figure 2.8 two estimations of β 2 ( b̂2(1) and b̂2(2) ) are obtained. The first
estimate is relatively close to β 2 , while the second one is much farther away. In any case,
unbiasedness is a desirable property because it ensures that, on average, the estimator is
centered on the parameter value.
The estimator b%2 is biased, since its expectation is not equal to β 2 . The bias is
precisely E (b%2 )- b2 . In this case two hypothetical estimates, b%2(1) and b%2(2) , are
represented in figure 2.9. As can be seen b%2(1) is closer to β 2 than the unbiased estimator
b̂2(1) , but this is a matter of chance. In any case, when it is biased, it is not centered on
the parameter value. An unbiased estimator will always be preferable, regardless of what
happens in a specific sample, because it has no systematic deviation from the parameter
value.
Another desirable property is efficiency. This property refers to the variance of
the estimators. In figures 2.10 and 2.11 two hypothetical unbiased estimators, which are
also called b̂2 and b%2 , are represented. The first one has a smaller variance than the
second one.
36
THE SIMPLE REGRESSION MODEL
( )
f bˆ2 f (b%2 )
FIGURE 2.10. Estimator with small variance. FIGURE 2.11. Estimator with big variance.
In both figures we have represented two estimates: b̂2(3) and b̂2(4) for the
estimator with the smallest variance; and b%2(3) and b%2(4) for the estimator with the
greatest variance. To highlight the role played by chance, the estimate that is closer to β 2
is precisely b%2(3) . In any case, it is preferable that the variance of the estimator is as small
as possible. For example, when using the estimator b̂2 it is practically impossible that an
estimate is so far from β 2 as it is in the case of b̂2 , because the range of b̂2 is much
smaller than the range of b% 2
Similarly, one can show that the OLS estimator b̂1 is also unbiased. Remember that
unbiasedness is a general property of the estimator, but in a given sample we may be
“near” or “far” from the true parameter. In any case, its distribution will be centered at
the population parameter.
In order to obtain the variances of β̂1 and β̂ 2 , assumptions 7 and 8 are needed,
in addition to the first six assumptions. These variances are the following:
37
INTRODUCTION TO ECONOMETRICS
n
σ 2 n −1 ∑ xi2
σ2
Var ( βˆ1 ) = n
i =1
Var ( βˆ2 ) = n
(2-64)
∑( x − x ) ∑(x − x )
2 2
i i
i =1 i =1
Estimator
Linear
Unbiased
the Best
βˆ1 , βˆ2
BLUE
Hence ûi is not the same as ui, although the difference between them-
( ) ( )
βˆ1 − β1 − βˆ2 − β 2 xi - does have an expected value of zero. Therefore, a first estimator
of σ2 could be the residual variance:
n
∑ uˆ 2
i
σ 2 = i =1
(2-66)
n
However, this estimator is biased, essentially because it does not account for the
two following restrictions that must be satisfied by the OLS residuals in the simple
regression model:
38
THE SIMPLE REGRESSION MODEL
n
∑ uˆi = 0
i =1
n (2-67)
x uˆ = 0
∑ ii
i =1
One way to view these restrictions is the following: if we know n–2 of the
residuals, we can get the other two residuals by using the restrictions implied by the
normal equations.
Thus, there are only n–2 degrees of freedom in the OLS residuals, as opposed to
n degrees of freedom in the disturbances. In the unbiased estimator of σ2 shown below an
adjustment is made taking into account the degrees of freedom:
n
∑ uˆ 2
i
σˆ 2 = i =1
(2-68)
n−2
Under assumptions 1-8 (Gauss-Markov assumptions), and as can be seen in
appendix 7, we obtain
E (σˆ 2 ) = σ 2 (2-69)
∑(x − x )
2
i
i =1
∑(x − x )
2
i
i =1
Note that se( βˆ2 ) , due to the presence of the estimator σˆ in (2-71), is a random
variable as is β̂ 2 . The standard error of any estimate gives us an idea of how precise the
estimator is.
39
INTRODUCTION TO ECONOMETRICS
Under assumptions 1 through 6, the OLS estimators, β̂1 and β̂ 2 , are consistent.
The proof for β̂ 2 can be seen in appendix 2.8.
OLS estimators are maximum likelihood estimators (ML) and minimum variance
unbiased estimators (MVUE)
Now we are going to introduce the assumption 9 on normality of the disturbance
u. The set of assumptions 1 through 9 is known as the classical linear model (CLM)
assumptions.
Under the CLM assumptions, the OLS estimators are also maximum likelihood
estimators (ML), as can be seen in appendix 2.8.
On the other hand, under CLM assumptions, OLS estimators are not only BLUE,
but are the minimum variance unbiased estimators (MVUE). This means that OLS
estimators have the smallest variance among all unbiased, linear o nonlinear, estimators,
as can be seen in figure 2.13. Therefore, we have no longer to restrict our comparison to
estimators that are linear in the y i ’s.
What also happens is that any linear combination of βˆ1 , βˆ2 , βˆ3 , , βˆk is also
normally distributed, and any subset of the βˆ ’s has a joint normal distribution.
j
Estimator
Unbiased
In conclusion, we have seen that the OLS estimator has very desirable properties
when the statistical basic assumptions are met.
40
THE SIMPLE REGRESSION MODEL
Exercises
Exercise 2.1 The following model has been formulated to explain the annual sales (sales)
of manufacturers of household cleaning products based as a function of a relative price
index (rpi):
β1 β 2 rpi + u
sales =+
where the variable sales is expressed in a thousand million euros and rpi is an
index obtained as the ratio between the prices of each firm and prices of the firm 1 of the
sample). Thus, the value 110 in firm 2 indicates its price is 10% higher than in firm1.
∑x yi i = 349.486; ∑y 2
i = 2396.504
a) Estimate β 1 and β 2 by OLS.
b) Decompose the variance of the variable y invariance explained by the
regression and residual variance.
c) Calculate the coefficient of determination.
d) Estimate total consumption, in thousands of pounds, for a flight program
consisting of 100 half-hour flights, 200 one hour flights and 100 two hours
flights.
41
INTRODUCTION TO ECONOMETRICS
∑ ( xi − x )( yi − y ) ∑ (x − x )
i
2
y =8
i =1
= 20 i =1
= 10 x =4
n n
βˆ = 3
2
∑ xi = 0
i =1
∑ yi = 0
i =1
∑ xi2 = B
i =1
∑ yi2 = E
i =1
∑x y
i =1
i i =F
a) Estimate β 2 and β1
b) Calculate the sum of square residuals.
c) Calculate the coefficient of determination.
d) Calculate the coefficient of determination under the assumption that
2F 2 = BE
Exercise 2.6 Company A is dedicated to mounting prefabricated panels for industrial
buildings. So far, the company has completed eight orders, in which the number of square
meters of panels and working hours employed in the assembly are as follows:
Number of square meters
Number of hours
(thousands)
4 7400
6 9800
2 4600
8 12200
10 14000
5 8200
3 5800
42
THE SIMPLE REGRESSION MODEL
12 17000
Company A wishes to participate in a tender to mount 14000m2 of panels in a
warehouse, for which a budget is required.
In order to prepare the budget, we know the following:
a) The budget must relate exclusively to the assembly costs, since the
material is already provided.
b) The cost of the working hour for Company A is 30 euros.
c) To cover the remaining costs, Company A must charge 20% on the total
cost of labor employed in the assembly.
Company A is interested in participating in the tender with a budget that only
covers the costs. Under these conditions, and under the assumption that the number of
hours worked is a linear function of the number of square meters of panels mounted, what
would be the budget provided by company A?
Exercise 2.7 Consider the following equalities:
1. E[u] = 0.
2. E[ȗ] = 0.
3. u = 0.
4. û = 0.
In the context of the basic linear model, indicate whether each of the above
equalities are true or not. Justify your answer.
Exercise 2.8 The parameters β 1 and β 2 of the following model have been estimated by
OLS:
y =β1 + β 2 x + u
A sample of size 3 was used and the observations for x i were {1,2,3}. It is also
known that the residual for the first observation was 0.5.
From the above information, is it possible to calculate the sum of squared residuals
and obtain an estimate of σ2? If so, carry out the corresponding calculations.
Exercise 2.9 The following data are available to estimate a relationship between y and x:
y x
-2 -2
-1 0
0 1
1 0
2 1
a) Estimate the parameters α and β of the following model by OLS:
y =α + β x + ε
b) Estimate var(ε i ).
c) Estimate the parameters γ and δ of the following model by OLS:
x =γ + δ y + υ
d) Are the two fitted regression lines the same? Explain the result in terms
of the least-square method.
43
INTRODUCTION TO ECONOMETRICS
∑ ui ∑ uˆi
= u = 0; =
=i 1 =i 1
uˆ =0; E [ xi ui ] =0; E [ui ] =0;
n n
b) Establish the relationship between the following expressions:
σˆ 2 = ∑ i
uˆ 2
E ui2 =σ 2 ;
n−k
Exercise 2.12 Answer the following questions:
a) Define the probabilistic properties of OLS estimator under the basic
assumptions of the linear regression model. Explain your answer.
b) What happens with the estimation of the linear regression model if the
sample variance of the explanatory variable is null? Explain your answer.
Exercise 2.13 A researcher believes that the relationship between consumption (cons)
and disposable income (inc) should be strictly proportional, and, therefore formulates the
following model:
cons=β 2 inc+u
a) Derive the formula for estimating β 2.
b) Derive the formula for estimating σ2 .
n
c) In this model, is å uˆ i equal to 0?
i=1
44
THE SIMPLE REGRESSION MODEL
n
d) In this model, is å uˆ i equal to 0?
i=1
Exercise 2.16 The following model relates expenditure on education (exped) and
disposable income (inc):
exped=β 1 +β 2 inc+u
Using the information obtained from a sample of 10 families, the following results
have been obtained:
10 10 10
exped = 7 inc = 50 åi= 1
inci2 = 30.650 åi= 1
expedi2 = 622 å
i= 1
inci ´ expedi = 4.345
Exercise 2.19 The following model was formulated to explain sleeping time (sleep) as a
function of time devoted to paid work (paidwork):
β1 + β 2 paidwork + u
sleep =
where sleep and paidwork are measured in minutes per day.
Using a random subsample extracted from the file timuse03, the following results
were obtained
·
sleep = 550.17 - 0.1783 paidwork
i
45
INTRODUCTION TO ECONOMETRICS
β1 + β 2lifexpec + u
stsfglo =
where lifexpec is life expectancy at birth: that is to say, number of years a newborn infant
is expected to live.
Using the work file HDR2010, the fitted model obtained is the following:
·
stsfglo =−1.499 + 0.1062lifexpec
R2= 0.6135 n=144
a) Interpret the coefficient on lifexpec.
b) What would be the average overall satisfaction for a country with 80 years
of life expectancy at birth?
c) What should be the life expectancy at birth to obtain a global satisfaction
equal to six?
Exercise 2.21 In economics, Research and Development intensity (or simply R&D
intensity) is the ratio of a company's investment in Research and Development compared
to its sales.
For the estimation of a model which explains R&D intensity, it is necessary to
have an appropriate database. In Spain it is possible to use the Survey of Entrepreneurial
Strategies (Encuesta sobre Estrategias Empresariales) produced by the Ministry of
Industry. This survey, on an annual basis, provides in-depth knowledge of the industrial
sector's evolution over time by means of multiple data concerning business development
and company decisions. This survey is also designed to generate microeconomic
information that enables econometric models to be specified and tested. As far as its
coverage is concerned, the reference population of this survey is companies with 10 or
more workers from the manufacturing industry. The geographical area of reference is
Spain, and the variables have a timescale of one year. One of the most outstanding
characteristics of this survey is its high degree of representativeness.
Using the work file rdspain, which is a dataset consisting of 1,983 Spanish firms
for 2006, the following equation is estimated to explain expenditures on research and
development (rdintens):
·
rdintens = - 2.639 + 0.2123ln( sales )
R2= 0.0350 n=1983
where rdintens is expressed as a percentage of sales, and sales are measured in millions
of euros.
46
THE SIMPLE REGRESSION MODEL
47
INTRODUCTION TO ECONOMETRICS
Annex 2.1 Case study: Engel curve for demand of dairy products
The Engel curve shows the relationship between the various quantities of a good
that a consumer is willing to purchase at varying income levels.
In a survey with 40 households, data were obtained on expenditure on dairy
products and income. These data appear in table 2.6 and in work file demand. In order to
avoid distortions due to the different size of households, both consumption and income
have been expressed in terms of per capita. The data are expressed in thousands of euros
per month.
There are several demand models. We will consider the following models: linear,
inverse, semi-logarithmic, potential, exponential and inverse exponential. In the first three
models, the regressand of the equation is the endogenous variable, whereas in the last
three the regressand is the natural logarithm of the endogenous variable.
In all the models we will calculate the marginal propensity to expenditure, as well
as the expenditure/income elasticity.
TABLE 2.6. Expenditure on dairy products (dairy), disposable income (inc) in terms of per
capita. Unit: euros per month.
household dairy inc household dairy inc
1 8.87 1.250 21 16.20 2.100
2 6.59 985 22 10.39 1.470
3 11.46 2.175 23 13.50 1.225
4 15.07 1.025 24 8.50 1.380
5 15.60 1.690 25 19.77 2.450
6 6.71 670 26 9.69 910
7 10.02 1.600 27 7.90 690
8 7.41 940 28 10.15 1.450
9 11.52 1.730 29 13.82 2.275
10 7.47 640 30 13.74 1.620
11 6.73 860 31 4.91 740
12 8.05 960 32 20.99 1.125
13 11.03 1.575 33 20.06 1.335
14 10.11 1.230 34 18.93 2.875
15 18.65 2.190 35 13.19 1.680
16 10.30 1.580 36 5.86 870
17 15.30 2.300 37 7.43 1.620
18 13.75 1.720 38 7.15 960
19 11.49 850 39 9.10 1.125
20 6.69 780 40 15.31 1.875
Linear model
The linear model for demand of dairy products will be the following:
β1 β 2inc + u
dairy =+ (2-73)
48
THE SIMPLE REGRESSION MODEL
The marginal propensity indicates the change in expenditure as income varies and
it is obtained by differentiating the expenditure with respect to income in the demand
equation. In the linear model the marginal propensity of the expenditure on dairy is given
by
d dairy
= β2 (2-74)
d inc
In other words, in the linear model the marginal propensity is constant and,
therefore, it is independent of the value that takes the income. It has the disadvantage of
not being adapted to describe the behavior of the consumers, especially when there are
important differences in the household income. Thus, it is unreasonable that the marginal
propensity of expenditure on dairy products is the same in a low-income family and a
family with a high income. However, if the variation of the income is not very high in the
sample, a linear model can be used to describe the demand of certain goods.
In this model the expenditure/income elasticity is the following:
d dairy inc inc
ε dairy
= linear
/ inc = β2 (2-75)
d inc dairy dairy
Estimating the model (2-73) with the data from table 2.6, we obtain
· = 4.012 + 0.005288´ inc
dairy R 2 = 0.4584 (2-76)
Inverse model
In an inverse model there is a linear relationship between the expenditure and the
inverse of income. Therefore, this model is directly linear in the parameters and it is
expressed in the following way:
1
β1 + β 2
dairy = +u (2-77)
inc
The sign of the coefficient β 2 will be negative if the income is correlated
positively with the expenditure. It is easy to see that, when the income tends towards
infinite, the expenditure tends towards a limit which is equal to β 1 . In other words, β 1
represents the maximum consumption of this good.
In figure 2.14, we can see a double representation of the population function
corresponding to this model. In the first one, the relationship between the dependent
variable and explanatory variable has been represented. In the second one, the relationship
between the regressand and regressor has been represented. The second function is linear
as can be seen in the figure.
49
INTRODUCTION TO ECONOMETRICS
dairy
dairy
β1
E(dairy) = β1 + β2 1/inc
inc
1/inc
Linear-log model
This model is denominated linear-log model, because the expenditure is a linear
function of the logarithm of income, that is to say,
β1 + β 2 ln(inc) + u
dairy = (2-81)
In this model the marginal propensity to expenditure is given by
d dairy d dairy inc d dairy 1 1
= = = β2 (2-82)
d inc d inc inc d ln(inc) inc inc
and the elasticity expenditure/income is given by
d dairy inc d dairy 1 1
ε dairy
= lin- log
/ inc = = β2 (2-83)
d inc dairy d ln(inc) dairy dairy
The marginal propensity is inversely proportional to the level of income in the
linear-log model, while the elasticity is inversely proportional to the level of expenditure
on dairy products.
50
THE SIMPLE REGRESSION MODEL
dairy dairy
E(dairy) = β1 + β2 ln(inc)
inc ln(inc)
51
INTRODUCTION TO ECONOMETRICS
dairy
E (dairy ) = β1inc β2
ln(dairy)
inc ln(inc)
52
THE SIMPLE REGRESSION MODEL
dairy ln(dairy)
E (dairy ) = e β1 + β 2inc
inc inc
53
INTRODUCTION TO ECONOMETRICS
Table 2.7. Marginal propensity, expenditure/income elasticity and R2 in the fitted models.
Model Marginal propensity Elasticity R2
inc
Linear β̂ 2 =0.0053 βˆ2 =0.6505 0.4440
dairy
1
− βˆ2 2
=0.0044 − βˆ2
1
=0.5361
Inverse inc dairy × inc 0.4279
1 1
Linear-log βˆ2 =0.0052 βˆ2 =0.6441
0.4566
inc dairy
dairy
Log-log βˆ2 =0.0056 β̂ 2 =0.6864 0.5188
inc
dairy
− βˆ2 2
=0.0047 − βˆ2
1
=0.5815
Inverse-log
inc inc
0.5038
The R2 obtained in the first three models are not comparable with the R2 obtained
in the last three because the functional form of the regressand is different: y in the first
three models and ln(y) in the last three.
Comparing the first three models the best fit is obtained by the linear-log model,
if we use the R2 as goodness of fit measure. Comparing the last three models the best fit
is obtained by the log-log model. If we had used the Akaike Information Criterion (AIC),
which allows the comparison of models with different functional forms for the regressand,
then the-log-log model would have been the best among the six models fitted. The AIC
measured will be studied in chapter 3.
Appendixes
=i 1
∑ ( yi − y )( xi − =
x)
=i 1
∑ ( yi xi − xyi − yxi + yx=) =i 1
∑ yi xi − x ∑ yi − y ∑ xi + nyx
=i 1 =i 1
n n n n
=i 1
= ∑ yi xi − nxy − y ∑ xi + nyx=
=i 1 =i 1 =i 1
∑ yi xi − y ∑ xi
On the other hand, we have
n n n n
=i 1
∑ (x − x ) = ∑ (x
i
=i 1
2 2
i − 2 xxi + xx ) 2 =
2
i
=i 1 =i 1
∑x − 2 x ∑ xi + nxx
n n n
=i 1
= ∑x 2
i − 2nx + nx =
2 22
i
=i 1 =i 1
∑x − x ∑ xi
54
THE SIMPLE REGRESSION MODEL
∑ y x − y ∑ x ∑ ( y − y )( x − x )
i i i i i
=βˆ =i 1
2 n
=
=i 1 =i 1
n n
2
i∑ x − x∑ x
=i 1 =i 1
i
=i 1
∑ (x − x )
i
2
∑ ( yˆ − y=
i) 2
βˆ22 ∑ ( xi − x ) 2
Taking into account the previous equivalence, we have
2
n
n
n
n
∑ ( ˆ
y i − ˆ
y ) 2 ∑
2
( xi βˆ 2
− x ) ∑ ( y i −2
y )( xi − x ) ∑ ( xi − x )
2
=
=
R 2 i=
n
1 =i 1
n
= i =1 =
2
i1
n
2
∑ ∑ ∑
n
− − ( yi − y ) 2
∑
2 2
( y i y ) ( yi y )
( xi − x )
=i 1 =i 1 i =1 = i 1
2
n
∑ ( yi − y )( xi − x )
= =
i =1 1
n n
rxy2
∑ ( xi − x )2 ∑ ( yi − y )2
=i 1 =i 1
55
INTRODUCTION TO ECONOMETRICS
x
ln( x1 ) − ln( x0 ) =
ln 1
x0
2
x 1 1 x1 1
= ln(1) + 1 − 1 + − 1 −
x0 x1 2 x0 x1
x0 x1
x0
=1 x0 x1 =1
x0
3
1 x1 2
+ − 1 3
+ (2-100)
3 × 2 x0
1 x
x0 x1
=1
x0
2 3
x 1x 1x
= 1 − 1 − 1 − 1 + 1 − 1 +
x0 2 x0 3 x0
2 3
∆x1 1 ∆x1 1 ∆x1
=− + +
x0 2 x0 3 x0
Therefore, if we take the linear approximation in this expansion, we have
x ∆x
∆ ln( x) = ln( x1 ) − ln( x0 ) = ln 1 ≈ 1 (2-101)
x0 x0
=βˆ =i 1 =i 1
2 n n
= (2-102)
∑(x − x ) ∑(x − x )
2 2
i i
=i 1 =i 1
n n
because
=i 1 =i 1
∑ ( xi − x ) y = y ∑ ( xi − x ) = y × 0 = 0
Now (2-102) will be expressed in the following way:
n
βˆ2 = ∑ ci yi (2-103)
i =1
where
56
THE SIMPLE REGRESSION MODEL
xi − x
ci = n
(2-104)
∑ (x − x )
i =1
i
2
∑c i =1
i =0 (2-105)
n ∑ (x − x ) i
2
1
=∑ ci2 = i =1
2 n
(2-106)
n 2
i =1
∑ ( xi − x ) ∑ (x − x )
i
2
i =1 i =1
n ∑ (x − x )x i i
∑ ci xi
= = 1
i =1
n
(2-107)
i =1
∑ ( xi − x )2
i =1
Since the regressors are assumed to be nonstochastic (assumption 2), the c i are
nonstochastic too. Therefore, β̂ 2 is an estimator that is a linear function of u’s.
Taking expectations in (2-108) and taking into account assumption 6, and
implicitly assumptions 3 through 5, we obtain
n
β 2 + ∑ ci E (ui ) =
E ( βˆ2 ) = β2 (2-109)
i =1
n
ci2 σ2 σ2 (2-110)
= σ=
2
∑
i =1 =
n
nS X2
∑ i
( x − x ) 2
i =1
In the above proof, to pass from the second to the third equality, we have taken
into account assumptions 6 and 7.
57
INTRODUCTION TO ECONOMETRICS
Appendix 2.6. Proof of Gauss-Markov Theorem for the slope in simple regression
The plan for the proof is the following. First, we are going to define an arbitrary
estimator β2 which is linear in y. Second, we will impose restrictions implied by
unbiasedness. Third, we will show that the variance of the arbitrary estimator must be
larger than, or at least equal to, the variance of β̂ 2 .
∑h
i =1
i =0 ∑h x
i =1
i i =1 (2-113)
Therefore,
n
β=
2 β 2 + ∑ hi ui (2-114)
i =1
=
i 1=
∑ ( xi − x ) ∑ ( xi − x )
i 1 =i 1
2
= ∑i 1
( xi − x )
2
n n
x −x x − x xi − x
+σ 2 ∑ n i + 2σ 2 ∑ hi − n i n
i =1 2 i =1 2
=
∑ ( xi − x )
i 1=
∑i 1
( xi − x ) ∑ ( xi − x ) 2
i =1
The third term of the last equality is 0, as shown below:
58
THE SIMPLE REGRESSION MODEL
n x − x xi − x
2σ 2 ∑ hi − n i n
i =1 2
=
∑ ( xi − x ) ∑ ( xi − x ) 2
i 1=
i 1
(2-116)
n n 2
x −x (x − x )
= 2σ 2 ∑ hi n i − 2σ 2 ∑ n i = 2σ 2 × 1 − 2σ 2 ×=
1 0
2 i 1 2
∑ ∑
=i 1 =
( xi − x ) ( xi − x )
= i 1=
i 1
xi − x
where ci = n
∑ (x − x )
i =1
i
2
The second term of the last equality is the variance of β̂ 2 , while the first term is
always positive because it is a sum of squares, except that h i =c i , for all i, in which case it
is equal to 0, and then β2 = βˆ2 . So,
2
E β2 − β 2 ≥ E βˆ2 − β 2
2
(2-118)
)
Appendix 2.7. Proof: σ 2 is an unbiased estimator of the variance of the disturbance
The population model is by definition:
yi =β1 + β 2 xi + ui (2-119)
If we sum up both sides of (2-119) for all i and divide by n, we have
y =β1 + β 2 x + u (2-120)
Subtracting (2-120) from (2-119), we have
y β 2 ( xi − x ) + ( ui − u )
yi −= (2-121)
Subtracting (2-123) from (2-122), and taking into account that û =0,
59
INTRODUCTION TO ECONOMETRICS
=i 1 =i 1 =i 1
n
(2-126)
−2 β2 − β 2 ∑ ( xi − x )(ui − u )
i =1
= i 1= i1 = i 1
( )
n
−2 E β2 − β 2 ∑ ( xi − x )(ui − u ) (2-127)
i =1
n
σ2
= ∑ ( xi − x ) 2 n
+ ( n − 1) σ 2 − 2σ 2 = ( n − 2 ) σ 2
i =1
∑ (x − x )
i =1
i
2
To obtain the first term of the last equality of (2-127), we have used (2-64). In
(2-128) and (2-129), you can find the developments used to obtain the second and the
third term of the last equality of (2-127) respectively. In both cases, assumptions 7 and 8
have been taken into account.
n
2
n
∑ ui
2 n 2 2
n
E ∑ (ui − u ) =
E ∑ ui − nu = E ∑ ui − n n
2 i =1
= i 1= i 1= i 1
(2-128)
n 1 n n
=E ∑ ui2 − ∑ ui2 + ∑ ui u j = ( n − 1) σ 2
nσ 2 − σ 2 =
= i 1 = n i 1 i≠ j n
60
THE SIMPLE REGRESSION MODEL
( )
n n n
1
E β2 − β 2 ∑ (=
xi − x )(ui − u ) E n ∑ i ( x − x )ui∑ ( xi − x )ui
=i 1 =
∑
( xi − x ) 2 i 1 =i 1
i =1
2
n 1
= n ∑ ( xi − x ) E ( ui )
∑ ( xi − x )2 i =1 i =1
n
x ) E ( ui u j ) σ 2
1
= ∑ ( xi − x ) E ( ui ) + ∑∑ ( xi − x )( xi −=
2 2
n
∑ ( x − x )2 i 1
=
i =1
i
i≠ j
(2-129)
According to (2-127), we have
n
E ∑ uˆi2 =
( n − 2)σ 2 (2-130)
i =1
Therefore, an unbiased estimator is given by
n
∑ uˆ 2
i
σˆ 2 = i =1
(2-131)
n−2
such as
n
E (σˆ 2 )
1
= = E ∑ uˆi2 σ 2 (2-132)
n − 2 i =1
This means is that if θˆ is a consistent estimator of θ, then 1/ θˆ and ln( θˆ ) are also
consistent estimators of 1/θ and ln(θ) respectively. Note that these properties do not hold
true for the expectation operator E; for example, if θˆ is an unbiased estimator of θ [that is
to say, E( θˆ )=θ], it is not true that 1/ θˆ is an unbiased estimator of 1/θ; that is, E(1/ θˆ ) ≠
1/E( θˆ ) ≠ 1/θ. This is due to the fact that the expectation operator can be only applied to
linear functions of random variables. On the other hand, the plim operator is applicable
to any continuous functions.
Under assumptions 1 through 6, the OLS estimators, β̂1 and β̂ 2 , are consistent.
61
INTRODUCTION TO ECONOMETRICS
∑ ( xi − x )( yi − y ) ∑ ( xi − x ) yi ∑ ( xi − x ) (β1 + β 2 xi + ui )
=
=
βˆ2 i 1 =n
=i 1 =i 1
=
n n
∑ i
( − ) ∑ ( xi − x ) ∑(x − x )
2 2 2
x x i
=i 1 =i 1 =i 1
n n n n
(2-134)
1 i β i∑( x − x )
i ∑( x − x ) x ∑( x − x )u i i ∑( x − x )u
i i
n
= +β
=i 1 =i 1 =i 1
2 n n
+ =i 1
2 n
β +
=
∑( x − x ) ∑(x − x ) ∑(x − x ) ∑ ( xi − x )
2 2 2 2
i i i
=i 1 =i 1 =i 1 =i 1
In order to prove consistency, we need to take plim´s in (2-134) and apply the Law
of Large Numbers. This law states that under general conditions, the sample moments
converge to their corresponding population moments. Thus, taking plim´s in (2-134):
n
1 n
∑ ( xi − x ) u i plim ∑ ( xi − x ) ui
n →∞ n i 1
plim β 2 =
= ˆ plim β 2 + n
i 1=
=β2 + (2-135)
n →∞ 2 1 n
∑ ( xi − x ) plim ∑ ( xi − x )
n →∞ 2
=
i 1=
n →∞ n i 1
In the last equality we have divided the numerator and denominator by n, because
if we do not do so, both summations will go to infinity when n goes to infinity..
If we apply the law of large numbers to the numerator and denominator of (2-135),
they will converge in probability to the population moments cov(x,u) and var(x)
respectively. Provided var(x)≠0 (assumption 4), we can use the properties of the
probability limits to obtain
cov( x, u )
plimβˆ2 =
β2 + β2
= (2-136)
var ( x)
To reach the last equality, using assumptions 2 and 6, we obtain
cov( x, u ) = E [ ( x − x )u ] = ( x − x ) E [u ] = ( x − x ) × 0 = 0 (2-137)
62
THE SIMPLE REGRESSION MODEL
where
1 1 [ yi − β1 − β 2 xi ]2
f ( yi )
= exp − (2-141)
σ 2π 2 σ2
which is the density function of a normally distributed variable with the given mean and
variance.
Substituting (2-141) into (2-140)for each y i , we obtain
f ( y1 , y2 , , yn ) = f ( y1 ) f ( y2 ) f ( yn )
1 1 n [ yi − β1 − β 2 xi ]2 (2-142)
= exp − ∑
( ) σ2
n
σn 2π 2 i =1
If y1 , y2 , , yn are known or given, but β 2 , β 3 , and σ2 are not known, the function
in (2-142) is called a likelihood function, denoted by L(β 2 , β 3 , σ2) or simply L. If we take
natural logarithms in (2-142), we obtain
1 n ( yi − β1 − β 2 xi )
ln L =
n
−n ln σ − ln
2
( 2π − ) ∑
2 i =1 σ2
(2-143)
1 n ( yi − β1 − β 2 xi )
n n
− ln σ 2 − ln
=
2 2
( )
2π − ∑
2 i =1 σ2
The maximum likelihood (ML) method, as the name suggests, consists in
estimating the unknown parameters in such a manner that the probability of observing the
given y i ‘s is as high (or maximum) as possible. Therefore, we have to find the maximum
of the function (2-143). To maximize (2-143) we must differentiate with respect to β 2 , β 3 ,
and σ2 and equal to 0. If β1 , β2 and σ 2 denote the ML estimators, we obtain:
∂ ln L
∑ ( y − β − β x ) ( −1) =0
1
=−
∂β1 σ 2 1 2 i
∂ ln L
∑ ( y − β − β x ) ( − x ) =
1
=− 2 0 (2-144)
∂β 2 σ 1 2 i i
∂ ln L
∑ ( y − β − β x )
n 1 2
=− 2+ =
0
∂σ 2σ 2σ 4
2 1 2 i
∑=
y i nβ1 + β2 ∑ xi (2-145)
63
INTRODUCTION TO ECONOMETRICS
∑
= yx i i β1 ∑ xi + β2 ∑ xi2 (2-146)
As can be seen, (2-145) and (2-146) are equal to (2-13) and (2-14). That is to say,
the ML estimators, under the CLM assumptions, are equal to the OLS estimators.
Substituting β1 and β2 , obtained solving (2-145) and (2-146), in the third
equation of (2-144), we have
1
∑ ( ) 1
∑ ( 1
)∑
2 2
σ 2
= yi − β1 − β2 =
xi yi − βˆ1 − βˆ2 =
xi uˆi2 (2-147)
n n n
The ML estimator for σ 2 is biased, since, according to (2-127),
1 n 2 n−2 2
E (σ 2 )
= E ∑ uˆi
= σ
n i =1
(2-148)
n
64
65
3 MULTIPLE LINEAR REGRESSION: ESTIMATION AND
PROPERTIES
66
MULTIPLE LINEAR REGRESSION
67
INTRODUCTION TO ECONOMETRICS
The matrix X is called the matrix of regressors. Also included among the
regressors is the regressor corresponding to the intercept. This one, which is often called
dummy regressor, takes the value 1 for all the observations.
The model of multiple linear regression (3-11) expressed in matrix notation is the
following:
β
y1 1 x21 ...
x31 xk1 1 u1
y 1 β
2 x22 x32 ... xk 2 2 u2
= β + (3-12)
M M M M O M 3 M
M
yn 1 x2 n x3n ... xkn un
β k
If we take into account the denominations given to vectors and matrices, the model
of multiple linear regression can be expressed in the following way:
y = Xβ + u (3-13)
where y is a vector n ×1 , X is a matrix n × k , β is a vector k ×1 and u is a vector n ×1 .
68
MULTIPLE LINEAR REGRESSION
yˆi = βˆ1 + βˆ2 x2i + βˆ3 x3i + L + βˆk xki i = 1, 2,L , n (3-14)
The above expression allows us to calculate the fitted value ( yˆi ) for each y i . In the
SRF βˆ1 , βˆ2 , βˆ3 , L , βˆk are the estimators of the parameters β1 , β 2 , β3 , L , β k .
uˆi = yi − yˆi = yi − βˆ1 − βˆ2 x2i − βˆ3 x3i − L − βˆk xki (3-15)
In other words, the residual uˆi is the difference between a sample value and its
corresponding fitted value.
The system of equations (2-5) can be expressed in a compact form by using matrix
notation. Thus, we are going to denote
βˆ1
yˆ1 uˆ1
yˆ βˆ2 uˆ
yˆ = 2 βˆ = βˆ3 uˆ = 2
... ...
M
yˆ n uˆn
βˆk
For all observations of the sample, the corresponding fitted model will be the
following:
ŷ = Xβˆ (3-16)
The residual vector is equal to the difference between the vector of observed
values and the vector of fitted values, that is to say,
uˆ = y - yˆ = y - Xβˆ (3-17)
3.2 Obtaining the OLS estimates, interpretation of the coefficients, and other
characteristics
∑ uˆ= ∑ y − βˆ − βˆ x
2
ˆ
= 2 i − β 3 x3i − L − β k xki (3-18)
S 2 ˆ
i i 1 2
=i 1 =i 1
to apply the least squares criterion in the model of multiple linear regression, we calculate
the first derivative from S with respect to each βˆ j in the expression (3-18):
69
INTRODUCTION TO ECONOMETRICS
∂S n
= 2∑ yi − βˆ1 − βˆ2 x2i − βˆ3 x3i − L − βˆk xki [ −1]
∂βˆ i =1
1
∂S n
= 2∑ yi − βˆ1 − βˆ2 x2i − βˆ3 x3i − L − βˆk xki [ − x2i ]
∂βˆ2 i =1
∂S n
= 2∑ yi − βˆ1 − βˆ2 x2i − βˆ3 x3i − L − βˆk xki [ − x3i ] (3-19)
∂βˆ3 i =1
L K L L
∂S n
= 2∑ yi − βˆ1 − βˆ2 x2i − βˆ3 x3i − L − βˆk xki [ − xki ]
∂βˆk i =1
The least square estimators are obtained equaling to 0 the previous derivatives:
n
∑ y − βˆ − βˆ x
i =1
i 1 2 2i − βˆ3 x3i − L − βˆk xki =
0
n
∑ y − βˆ − βˆ x
i =1
i 1 2 2i − βˆ3 x3i − L − βˆk xki x2i =
0
n
∑ y − βˆ − βˆ x
i =1
i 1 2 2i − βˆ3 x3i − L − βˆk xki x3i =
0 (3-20)
L K L L
n
∑ y − βˆ − βˆ x
i =1
i 1 2 2i − βˆ3 x3i − L − βˆk xki xki =
0
n n
n
n ∑ x 2 i ... ∑ xki ∑ yi
=i 1 =i 1
βˆ1 i 1
=
n n n
ˆ n
∑ x2i ∑ x2i ∑ x2i xki β 2 ∑ x2i yi
2
=i 1 =i 1 =i 1
= =i 1 (3-22)
n n n βˆk n
∑ x ∑ x x ∑ x2 ∑ x y
= i 1 =ki
i 1
ki 2 i
=i 1
ki
= i 1 ki i
Note that:
a) X′X / n is the matrix of second order sample moments with respect to the origin, of
the regressors, among which a dummy regressor (x 1i ) associated to the intercept is
included. This regressor takes the value x 1i =1 for all i.
70
MULTIPLE LINEAR REGRESSION
b) X′y / n is the vector of sample moments of second order, with respect to the origin,
between the regressand and the regressors.
In this system there are k equations and k unknown ( βˆ1 , βˆ2 , βˆ3 , L , βˆk ) . This
system can easily be solved using matrix algebra. In order to solve univocally the system
(3-21)with respect to β̂ , it must be held that the rank of the matrix X′X is equal to k. If
this is held, both members of (3-21) can be premultiplied by [ X′X ] :
−1
with which the expression of the vector of least square estimators, or more precisely, the
vector of ordinary least square estimators (OLS), is obtained because [ X′X ] X′X = I .
−1
where ∆yˆ = yˆi − yˆ h , ∆x2 = x2i − x2 h , ∆x3 = x3i − x3h , L ∆xk = xki − xkh .
The previous expression captures the variation of ŷ due to the changes in all
regressors. If only x j changes, we will have
∆yˆ = βˆ j ∆x j (3-27)
If x k increases in one unit, we will have
71
INTRODUCTION TO ECONOMETRICS
=∆yˆ βˆ j =
for ∆x j 1 (3-28)
2
R =0.694 n =48
The interpretation of β̂ 2 is the following: holding fixed tenure and wage, if age increases by one
year, worker absenteeism will be reduced by 0.096 days per year. The interpretation of β̂ 3 is as follows:
holding fixed the age and wage, if the tenure increases by one year, worker absenteeism will be reduced by
0.078 days per year. Finally, the interpretation of β̂ 4 is the following: holding fixed the age and tenure, if
the wage increases by 1000 euros per year, worker absenteeism will be reduced by 0.036 days per year.
EXAMPLE 3.2 Demand for hotel services
The following model is formulated to explain the demand for hotel services:
ln (hostel ) = b1 + b2 ln(inc) + b3 hhsize + u (3-29)
where hostel is spending on hotel services, inc is disposable income, both of which are expressed in euros
per month. The variable hhsize is the number of household members.
The estimated equation with a sample of 40 households, using file hostel, is the following:
·hostel ) = - 27.36 + 4.442 ln(inc ) - 0.523hhsize
ln( i i i
R2=0.738 n=40
As the results show, hotel services are a luxury good. Thus, the demand/income elasticity for this
good is very high (4.44), which is typical of luxury goods. This means that if income increases by 1%,
spending on hotel services increases by 4.44%, holding fixed the size of the household. On the other hand,
if the household size increases by one member, then spending on hotel services will decrease by 52%.
EXAMPLE 3.3 A hedonic regression for cars
The hedonic model of price measurement is based on the assumption that the value of a good is
derived from the value of its characteristics. Thus, the price of a car will therefore depend on the value the
buyer places on both qualitative (e.g. automatic gear, power, diesel, assisted steering, air conditioning), and
quantitative attributes (e.g. fuel consumption, weight, performance displacement, etc.). The data set for this
exercise is file hedcarsp (hedonic car price for Spain) and covers years 2004 and 2005. A first model based
only on quantitative attributes is the following:
β1 + β 2 volume + β3 fueleff + u
ln( price) =
where volume is length×width×height in m3 and fueleff is the liters per 100 km/horsepower ratio expressed
as a percentage.
The estimated equation with a sample of 214 observations is the following:
72
MULTIPLE LINEAR REGRESSION
R2=0.765 n=214
The interpretation of β̂ 2 and β̂ 3 is the following. Holding fixed fueleff, if volume increases by 1
m3, the price of a car will rise by 9.56%. Holding fixed volume, if the ratio liters per 100 km/horsepower
increases by 1 percentage point, the price of a car price will fall by 16.08%.
EXAMPLE 3.4 Sales and advertising: the case of Lydia E. Pinkham
A model with time series data is estimated in order to measure the effect of advertising expenses,
realized over different time periods, on current sales. Denoting by Vt and Pt sales and advertising
expenditures, made at time t, the model proposed initially to explain sales, as a function of current and past
advertising expenses is as follows:
Vt = α + β1 Pt + β 2 Pt −1 + β 3 Pt − 2 + + ut (3-30)
In the above expression the dots indicate that past expenditure on advertising continues to have an
indefinite influence, although it is assumed that with a decreasing impact on sales. The above model is not
operational given that it has an indefinite number of coefficients. Two approaches can be adopted in order
to solve the problem. The first approach is to fix a priori the maximum number of periods during which
advertising effects on sales are maintained. In the second approach, the coefficients behave according to
some law which determines their value based on a small number of parameters, also allowing further
simplification.
In the first approach the problem that arises is that, in general, there are no precise criteria or
sufficient information to fix a priori the maximum number of periods. For this reason, we shall look at a
special case of the second approach that is interesting due to the plausibility of the assumption and easy
application. Specifically, we will consider the case in which the coefficients βi decrease geometrically as
we move backward in time according to the following scheme:
β=
i β1λ i ∀i 0 < λ <1 (3-31)
The above transformation is called Koyck transformation, as it was this author who in 1954
introduced scheme (3-31) for the study of investment
Substituting (3-31) in (3-30), we obtain
Vt = α + β1 Pt + β1λ Pt −1 + β1λ 2 Pt − 2 + + ut (3-32)
The above model still has infinite terms, but only three parameters and can also be simplified.
Indeed, if we express equation (3-32) for period t-1 and multiply both sides by λ we obtain
λVt −1 = αλ + β1λ Pt −1 + β1λ 2 Pt − 2 + β1λ 3 Pt −3 + + λ ut −1 (3-33)
Subtracting (3-33) from (3-32), and taking into account factors λi tend to 0 as i tends to infinity,
the result is the following:
Vt = α (1 − λ ) + β1 Pt + λVt −1 + ut − λ ut −1 (3-34)
The model has been simplified so that it only has three regressors, although, in contrast, it has
moved to a compound disturbance term. Before seeing the application of this model, we will analyze the
significance of the coefficient λ and the duration of the effects of advertising expenditures on sales. The
parameter λ is the decay rate of the effects of advertising expenditures on current and future sales. The
cumulative effects that the advertising expenditure of one monetary unit have on sales after m periods are
given by
β1 (1 + λ + λ 2 + λ 3 + + λ m ) (3-35)
73
INTRODUCTION TO ECONOMETRICS
To calculate the cumulative sum of effects, given in (3-35), we note that this expression is the sum
of the terms of a geometric progression 2, which can be expressed as follows:
β1 (1 − λ m )
(3-36)
1− λ
When m tends to infinity, then the sum of the cumulative effects is given by
β1
(3-37)
1− λ
An interesting point is to determine how many periods of time are required to obtain the p% (e.g.,
50%) of the total effect. Denoting by h the number of periods required to obtain this percentage, we have
β1 (1 − λ h )
p=
Effect in h periods
= 1− λ = 1− λh (3-38)
Total effect β1
1− λ
Setting p, h can be calculated according to (3-38). Solving for h in this expression, the following
is obtained
ln(1 − p )
h= (3-39)
ln λ
This model was used by Kristian S. Palda in his doctoral thesis published in 1964, entitled The
Measurement of Cumulative Advertising Effects, to analyze the cumulative effects of advertising
expenditures in the case of the company Lydia E. Pinkham. This case has been the basis for research on the
effects of advertising expenditures. We will see below some features of this case:
1) The Lydia E. Pinkham Medicine Company manufactured a herbal extract diluted in an alcohol
solution. This product was originally announced as an analgesic and also as a remedy for a wide variety of
diseases.
2) In general, in different types of products there is often competition among different brands, as
in the paradigmatic case of Coca-Cola and Pepsi-Cola. When this occurs, the behavior of the main
competitors is taken into account when analyzing the effects of advertising expenditure. Lydia E. Pinkham
had the advantage of having no competitors, acting as a monopolist in practice in its product line.
3) Another feature of the Lydia E. Pinkham case was that most of the distribution costs were
allocated to advertising because the company had no commercial agents, with the relationship between
advertising expenses and sales being very high.
4) The product was affected by different avatars. Thus, in 1914 the Food and Drug Administration
(United States agency established controls for food and medicines) accused the firm of misleading
advertising and so they had to change their advertising messages. Also, the Internal Revenue (IRS)
threatened to apply a tax on alcohol since the alcohol content of the product was 18%. For all these reasons
there were changes in the presentation and content during the period 1915-1925. In 1925 the Food and Drug
Administration banned the product from being announced as medicine, having to be distributed as a tonic
drink. In the period 1926-1940 spending on advertising was significantly increased and shortly after the
sales of the product declined.
The estimation of the model (3-34) with data from 1907 to 1960, using file pinkham, is the
following:
2
Denoting by ap, au and r the first term, the last term and the right respectively, the sum of the
terms of a convergent geometric progression is given by
a p − au
1− r
74
MULTIPLE LINEAR REGRESSION
·
salest = 138.7 + 0.3288advexp + 0.7593salest - 1
R2=0.877 n=53
The sum of the cumulative effects of advertising expenditures on sales is calculated by the formula
(3-37):
βˆ1 0.3288
= = 1.3660
1 − λˆ 1 − 0.7593
According to this result, every additional dollar spent on advertising produces an accumulated total
sale of 1,366 units. Since it is important not only to determine the overall effect, but also how long the
effect lasts, we will now answer the following question: how many periods of time are required to reach
half of the total effects? Applying the formula (3-39) for the case of p = 0.5, the following result is obtained:
ln(1 − 0.5)
=
hˆ(0.5) = 2.5172
ln(0.7593)
∑ uˆ
i =1
i =0 (3-40)
∑ uˆ=i
=i 1 =i 1
∑ yi − nβˆ1 − βˆ2 ∑ x2i − L − βˆk ∑ xki (3-42)
=i 1 =i 1
On the other hand, the first equation of the system of normal equations (3-20) is
n n n
=i 1 =i 1 =i 1
∑ yi − nβˆ1 − βˆ2 ∑ x2i − L − βˆk ∑ xki =
0 (3-43)
∑ y = ∑ yˆi
=i 1 =i 1
(3-44)
75
INTRODUCTION TO ECONOMETRICS
∑ x uˆ
i =1
ji i =0 j = 2,3, L , k (3-47)
Using the last k normal equations (3-20) and taking into account that by definition
uˆí = yi − βˆ1 − βˆ2 x2i − βˆ3 x3i − L − βˆk xki , we can see that
n
∑ uˆ x
i =1
i 2i =0
n
∑ uˆ x
i =1
i 3i =0
(3-48)
L L
n
∑ uˆ x
i =1
i ki =0
4. The sample cross product between the fitted values ( ŷ ) and the OLS residuals
is zero.
n
∑ yˆ uˆ
i =1
i í =0 (3-49)
∑ yˆi=
=i 1 =i 1
uˆí ∑ (βˆ1 + βˆ2 x2i + L + βˆk xki )=
uˆí βˆ1 ∑ uˆí + βˆ2 ∑ x2i uˆí + L βˆk ∑ xki uˆí
=i 1 =i 1 =i 1
76
MULTIPLE LINEAR REGRESSION
y = Xβ + u (3-52)
77
INTRODUCTION TO ECONOMETRICS
The OLS estimator can be expressed in this way so that the property of linearity is
clearer:
βˆ = β + [ X′X ] X′u = β + Au
-1
(3-61)
78
MULTIPLE LINEAR REGRESSION
In the third step of the above proof it is taken into account that, according to (3-60),
βˆ − β = [ X′X ] X′u . Assumption 2 is taken into account in the fourth step. Finally,
-1
covariance matrix, the variance of each element βˆ appears on the main diagonal, while
j
the covariances between each pair of elements are outside of the main diagonal.
Specifically, the variance of βˆ j (for j=2,3,…,k) is equal to σ2 multiplied by the
corresponding element of the main diagonal of [ X′X ] . After operating, the variance of
-1
βˆ can be expressed as
j
σ2
var( βˆ j ) = (3-64)
nS 2j (1 − R 2j )
where R 2j is the R-squared from regressing x j on all other x’s, n is the sample
size and S 2j is the sample variance of the regressor X.
Formula (3-64) is valid for all slope coefficients, but not for the intercept
The square root of (3-64) is called the standard deviation of βˆ j :
σ
sd ( βˆ j ) = (3-65)
nS 2j (1 − R 2j )
79
INTRODUCTION TO ECONOMETRICS
) and unbiased (so the weights, w j , must satisfy some restrictions). The property of βˆ j
being a BLUE estimator has the following implications when comparing its variance with
the variance of β%j :
1) The variance of the coefficient β%j is greater than, or equal to, the variance of
βˆ j obtained by OLS:
2) The variance of any linear combination of β%j ´s is greater than, or equal to, the
variance of the corresponding linear combination of βˆ ’s. j
∑ uˆ 2
i
σˆ 2 = i =1
(3-67)
n−k
Under assumptions 1 to 8, we obtain
E (σˆ 2 ) = σ 2 (3-68)
See appendix 3.2 for the proof.
80
MULTIPLE LINEAR REGRESSION
The square root of (3-67), σˆ is called standard error of the regression and is an
estimator of σ .
· σˆ 2
var( βˆ j ) = (3-70)
nS 2j (1 − R 2j )
d) The higher R 2j , (i.e., the higher is the correlation of regressor j with the
rest of the regressors), the greater the variance of βˆ . j
81
INTRODUCTION TO ECONOMETRICS
y y
xj xj
2 2
a) ŝ big b) ŝ small
2
FIGURE 3.1. Influence of ŝ on the estimator of the variance.
y y
xj xj
2 2
a) S small
j b) S big
j
2
FIGURE 3.2. Influence of S on the estimator of the variance.
j
σˆ
se( βˆ j ) = (3-71)
nS 2j (1 − R 2j )
82
MULTIPLE LINEAR REGRESSION
because it narrows the range of variables, which makes estimates less sensitive to extreme
observations on the dependent or the independent variables. The CLM assumptions are
satisfied more often in models using ln(y) as a regressand than in models using y without
any transformation. Thus, the conditional distribution of y is frequently heteroskedastic,
while ln(y) can be homoskedastic.
One limitation of the log transformation is that it cannot be used when the original
variable takes zero or negative values. On the other hand, variables measured in years and
variables that are a proportion or a percentage, are often used in level (or original) form.
Quadratic functions
An interesting case of polynomial functions is the quadratic function, which is a
second-degree polynomial function. When there are only regressors corresponding to the
quadratic function, we have a quadratic model:
y =β1 + β 2 x + β3 x 2 +u (3-73)
Quadratic functions are used quite often in applied economics to capture
decreasing or increasing marginal effects. It is important to remark that, in such a case,
β 2 does not measure the change in y with respect to x because it makes no sense to hold
x2 fixed while changing x. The marginal effect of x on y, which depends linearly on the
value of x, is the following:
dy
=
me = β 2 + 2β3 x (3-74)
dx
In a particular application this marginal effect would be evaluated at specific
values of x. If β 2 and β 3 have opposite signs the turning point will be at
β2
x* = − (3-75)
2β3
If β 2 >0 and β 3 <0, then the marginal effect of x on y is positive at first, but it will
be negative for values of x greater than x* . If β 2 <0 and β 3 >0, this marginal effect is
negative at first, but it will be positive for values of x greater than x* .
Example 3.5 Salary and tenure
Using the data in ceosal2 to study the type of relation between the salary of the Chief Executive
Officers (CEOSs) in USA corporations and the number of years in the company as CEO (ceoten), the
following model was estimated:
·salary ) =
ln( 6.246+ 0.0006 profits + 0.0440 ceoten − 0.0012 ceoten 2
(0.086) (0.0001) (0.0156) (0.00052)
R2=0.1976 n=177
where company profits are in millions of dollars and salary is annual compensation in thousands of dollars.
83
INTRODUCTION TO ECONOMETRICS
Thus, if a CEO with 10 years in a company spends one more year in that company, their salary
will increase by 2%. Equating to zero the previous expression and solving for ceoten, we find that the
maximum effect of tenure as CEO on salary is reached by 18 years. That is, until 18 years the marginal
effect of CEO tenure on the salary is positive. On the contrary, from 18 years onwards this marginal effect
is negative.
Cubic functions
Another interesting case is the cubic function, or third-degree polynomial
function. If in the model there are only regressors corresponding to the cubic function, we
have a cubic model:
y=β1 + β 2 x + β3 x 2 + β 4 x3 + u (3-76)
Cubic models are used quite often in applied economics to capture decreasing or
increasing marginal effects, particularly in the cost functions. The marginal effect (me)
of x on y, which depends on x in a quadratic form, will be the following:
dy
me = = β 2 + 2β3 x + 3β 4 x 2 (3-77)
dx
The minimum of me will occur where
dme
=2 β3 + 6 β 4 x =0 (3-78)
dx
Therefore,
− β3
memin = (3-79)
3β 4
In a cubic model of a cost function, the restriction β32 < 3β 4 β 2 must be met to
guarantee that the minimum marginal cost is positive. Other restrictions that a cost
function must satisfy are as follows: β 1 , β 2 , and β 4 >0; and β 3 <0
Example 3.6 The marginal effect in a cost function
Using the data on 11 pulp mill firms (file costfunc) to study the cost function, the following model
was estimated:
· =
cost 29.16+ 2.316 output − 0.0914 output 2 + 0.0013 output 3
(1.602) (0.2167) (0.0081) (0.000086)
2
R =0.9984 n=11
where output is the production of pulp in thousands of tons and cost is the total cost in millions of euros
The marginal cost is the following:
· = 2.316 − 2 × 0.0914output + 3 × 0.0013output 2
marcost
Thus, if a firm with a production of 30 thousand tons of pulp increases the pulp production by one
thousand tons, the cost will increase by 0.754 million of euros. Calculating the minimum of the above
expression and solving for output, we find that the minimum marginal cost is equal to a production of
23.222 thousand tons of pulp.
84
MULTIPLE LINEAR REGRESSION
85
INTRODUCTION TO ECONOMETRICS
where R 2j- 1 the R is squared in a model with j-1 regressors, and R 2j is the R squared in a
model with an additional regressor. That is to say, if we add variables to a given model,
R2 will never decrease, even if these variables do not have a significant influence.
b) If the model has no intercept, the coefficient of determination does not have a
clear interpretation because the decomposition given (3-80) is not fulfilled. In addition,
the two forms of calculation mentioned - (3-81) and (3-82) - generally lead to different
results, which in some cases may fall outside the interval [0, 1].
c) The coefficient of determination cannot be used to compare models in which
the functional form of the endogenous variable is different. For example, R2 cannot be
applied to compare two models in which the regressand is the original variable, y, and
ln(y) respectively.
However, we have better estimates for these variances, σ u2 and σ y2 , than the ones
used in the (3-85). So, let us use unbiased estimates for these variances
SCR / (n - k ) n- 1
R 2 = 1- = 1- (1- R 2 ) (3-87)
SCT / (n - 1) n- k
86
MULTIPLE LINEAR REGRESSION
The observations b) and c) made to the R squared remain valid for the adjusted R
squared.
87
INTRODUCTION TO ECONOMETRICS
88
MULTIPLE LINEAR REGRESSION
Exercises
Exercise 3.1 Consider the linear regression model y = Xβ + u , where X is a matrix
50×5.
Answer the following questions, justifying your answers:
a) What are the dimensions of the vectors y , β, u ?
b) How many equations are there in the system of normal equations
X′Xβˆ = X′y ?
c) What conditions are needed in order to obtain β̂ ?
89
INTRODUCTION TO ECONOMETRICS
Exhibit 3.1
1) Calculation of X’X and X’y
uˆ 'uˆ = y ' y - yˆ ' yˆ = y ' y - βˆ ' X' Xβˆ = y ' y - βˆ ' X' y = R.5 - R.6 = 953 - 883=70
uˆ 'uˆ 70
sˆ 2 = = = 8.6993
n- 2 8
5) Calculation of covariance matrix of β̂
' ù- 1
æ3.8696 -0.0370ö÷= æ33.6624 -0.3215ö
÷
var(βˆ ) = sˆ 2 é
êX Xû
ç
ú = 8.6993ç
ç ÷
÷
ç
ç ÷
÷
ë è-0.0370 0.0004 ÷ ç
ø è-0.3215 0.0032 ø÷
90
Exercise 3.3 The following model was formulated to explain the annual sales (sales) of
the manufacturers of household cleaning products as a function of a relative price index
(rpi) and the advertising expenditures (adv):
β1 β 2 rpi + β3adv + u
sales =+
where the variable sales is expressed in a thousand million euros and rpi is a relative price
index obtained as a ratio between the prices of each firm and the prices of firm 1 of the
sample; adv is the annual expenditures on advertising and promotional campaigns and
media diffusion, expressed in millions of euros.
Data on ten manufacturers of household cleaning products appear in the attached
table.
firm sales rpi adv
1 10 100 300
2 8 110 400
3 7 130 600
4 6 100 100
5 13 80 300
6 6 80 100
7 12 90 600
8 7 120 200
9 9 120 400
10 15 90 700
Using an excel spreadsheet,
a) Estimate the parameters of the proposed model
b) Estimate the covariance matrix.
c) Calculate the coefficient of determination.
Note: In exhibit 3.1 the model sales =+ β1 β 2 rpi + u is estimated using excel.
Instructions are also included.
Exercise 3.4 A researcher, who is developing an econometric model to explain income,
formulates the following specification:
inc=α+βcons+γsave+u [1]
where inc is the household disposable income, cons is the total consumption and save is
the total savings of the household.
The researcher did not take into account that the above three magnitudes are
related by the identity
inc=cons+save [2]
The equivalence between the models [1] and [2] requires that, in addition to the
disappearance of the disturbance term, the model parameters [1] take the following values:
α =0, β =1, and γ =1
If you estimate equation [1] with the data for a given country, can you expect, in
αˆ 0,=
general, that the estimates will take the values = βˆ 1,=γˆ 0?
Please justify your answer using mathematical notation.
Exercise 3.5 A researcher proposes the following econometric model to explain tourism
revenue (turtot) in a given country:
β1 + β 2turmean + β3numtur + u
turtot =
91
INTRODUCTION TO ECONOMETRICS
where turmean is the average expenditure per tourist and numtur is the total number of
tourists.
a) It is obvious that turtot, numtur and turmean and are also linked by the
relationship turtot=turmean×numtur. Will this somehow affect the
estimation of the parameters of the proposed model?
b) Is there a model with another functional form involving tighter restrictions
on the parameters? If so, indicate it.
c) What is your opinion about using the proposed model to explain the
behavior of tourism revenue? Is it reasonable?
Exercise 3.6 Let us suppose you have to estimate the model
β1 + β 2 ln( x2 ) + β3 ln( x3 ) + β 4 ln( x4 ) + u
ln( y ) =
using the following observations:
x2 x3 x4
3 12 4
2 10 5
4 4 1
3 9 3
2 6 3
5 5 1
What problems can arise in the estimation of this model?
Exercise 3.7 Answer the following questions:
a) Explain the determination coefficient (R2) and the adjusted determination
2
coefficient ( R ). What can you use them for? Justify your answer.
b) Given the models
ln(y)=β 1 +β 2 ln(x)+u (1)
ln(y)=β 1 +β 2 ln(x)+β 3 ln(z)+u (2)
ln(y)=β 1 +β 2 ln(z)+u (3)
y=β 1 +β 2 z+u (4)
indicate what measure of goodness of fit is appropriate to compare the
following pairs of models: (1) - (2), (1) - (3), and (1) - (4). Explain your
answer.
Exercise 3.8 Let us suppose that the following model is estimated by OLS:
β1 + β 2 ln( x) + β3 ln( z ) + u
ln( y ) =
a) Can least square residuals all be positive? Explain your answer.
b) Under the assumption of no autocorrelation of disturbances, are the OLS
residuals independent? Explain your answer
c) Assuming that the disturbances are not normally distributed, will the OLS
estimators be unbiased? Explain your answer.
Exercise 3.9 Consider the linear regression model
y=Xβ+u
where y and u are vectors 8×1, X is a matrix 8×3 and β is a vector 3×1. Also the following
information is available:
92
MULTIPLE LINEAR REGRESSION
2 0 0
X′X = 0 3 0 uˆ ′uˆ = 22
0 0 3
Answer the following questions, by justifying your answer:
a) Indicate the sample size, the number of regressors, the number of
parameters and the degrees of freedom of the residual sum of squares.
b) Derive the covariance matrix of the vector β̂ , making explicit the
assumptions used. Estimate the variances of the estimators.
c) Does the regression have an intercept? What implications does the answer
to this question have on the meaning of R2 in this model?
Exercise 3.10 Discuss whether the following statements are true or false:
a) In a linear regression model, the sum of the residuals is zero.
b) The coefficient of determination ( R 2 ) is always a good measure of the
model’s quality.
c) The least squares estimators are biased.
Exercise 3.11 The following model is formulated to explain time spent sleeping:
β1 + β 2 totalwrk + β3leisure + u
sleep =
where sleep, totalwrk (paid and unpaid work) and leisure (time not devoted to sleep or
work) are measured in minutes per day.
The estimated equation with a sample of 1000 observations, using file timuse03,
is the following:
· = 1440 - 1´ total _ work - 1´ leisure
sleep
R2=1.000 n=1000
a) What do you think about these results?
b) What is the meaning of the estimated intercept?
Exercise 3.12 Using a subsample of the Structural Survey of Wages (Encuesta de
estructura salarial) for Spain in 2006 (file wage06sp), the following model is estimated
to explain wage:
·wage) =+
ln( 1.565 0.0730educ + 0.0177tenure + 0.0065age
R2=0.337 n=800
where educ (education), tenure (experience in the firm) and age are measured in years
and wage in euros per hour.
a) What is the interpretation of coefficients on educ, tenure and age?
b) How many years does the age have to increase in order to have a similar
effect to an increase of one year in education, holding fixed in each case
the other two regressors?
c) Knowing that educ =10.2, tenure =7.2 and age =42.0, calculate the
elasticities of wage with respect to educ, tenure and age for these values,
holding fixed the others regressors. Do you consider these elasticities to be
high or low?
93
INTRODUCTION TO ECONOMETRICS
Exercise 3.13 The following equation describes the price of housing in terms of house
bedrooms (number of bedrooms), bathrms (number of full bathrooms) and lotsize (the lot
size of a property in square feet):
β1 + β 2bedrooms + β3bathrms + β 4lotsize + u
price =
where price is the price of a house measured in dollars.
Using the data for the city of Windsor contained in file housecan, the following
model is estimated:
·
price = −2418 + 5827bedrooms + 19750bathrms + 5.411lotsize
R2=0.486 n=546
a) What is the estimated increase in price for a house with one more bedroom
and one more bathroom, holding lotsize constant?
b) What percentage of the variation in price is explained jointly by the
number of bedrooms, the number of full bathrooms and the lot size?
c) Find the predicted selling price for a house of the sample with bedrooms=3,
bathrms=2 and lotsize=3880.
d) The actual selling price of the house in c) was $66,000. Find the residual
for this house. Does the result suggest that the buyer underpaid or overpaid
for the house?
Exercise 3.14 To examine the effects of a firm’s performance on a CEO salary, the
following model was formulated:
β1 β 2 roa + β3 ln( sales ) + β 4 profits + β5tenure + u
ln( salary ) =+
where roa is the ratio profits/assets expressed as a percentage and tenure is the number
of years as CEO (=0 if less than 6 months). Salaries are expressed in thousands of dollars,
and sales and profits in millions of dollars.
The file ceoforbes has been used for the estimation. This file contains data on 447
CEOs of America's 500 largest corporations. (52 of the 500 firms were excluded because
of missing data on one or more variables. Apple Computer was also excluded since Steve
Jobs, the acting CEO of Apple in 1999, received no compensation during this period.)
Company data come from Fortune magazine for 1999; CEO data come from Forbes
magazine for 1999 too. The results obtained were the following:
·salary ) =
ln( 4.641 + 0.0054roa + 0.2893ln( sales ) + 0.0000564 profits + 0.0122tenure
R2=0.232 n=447
a) Interpret the coefficient on the regressor roa
b) Interpret the coefficient on the regressor ln(sales). What is your opinion
about the magnitude of the elasticity salary/sales?
c) Interpret the coefficient on the regressor profits.
d) What is the salary/profits elasticity at the sample mean ( salary =2028 and
profits =700).
Exercise 3.15 (Continuation of exercise 2.21) Using a dataset consisting of 1,983 firms
surveyed in 2006 (file rdspain), the following equation was estimated:
·
rdintens = - 1.8168 + 0.1482 ln( sales ) + 0.0110exponsal
94
MULTIPLE LINEAR REGRESSION
Exercise 3.16 The following hedonic regression for cars (see example 3.3) is formulated:
β1 β 2 cid + β3hpweight + β 4 fueleff + u
ln( price) =+
where cid is the cubic inch displacement, hpweight is the ratio horsepower/weight in kg
expressed as percentage and fueleff is the ratio liters per 100 km/horsepower expressed as
a percentage.
a) What are the probable signs of β 2 , β 3 and β 4 ? Explain them.
b) Estimate the model using the file hedcarsp and write out the results in
equation form.
c) Interpret the coefficient on the regressor cid.
d) Interpret the coefficient on the regressor hpweight.
e) To expand the model, add a regressor relative to car size, such as volume
or weight. What happens if you add both of them? What is the relationship
between weight and volume?
Exercise 3.17 The concept of work covers a broad spectrum of possible activities in the
productive economy. An important part of work is unpaid; it does not pass through the
market and therefore has no price. The most important unpaid work is housework
(houswork) carried out mainly by women. In order to analyze the factors that influence
housework, the following model is formulated:
β1 β 2 educ + β3hhinc + β 4 age + β5 paidwork + u
houswork =+
where educ is the years of education attained, hhinc is the household income in euros per
month. The variables houswork and paidwork are measured in minutes per day.
Use the data in the file timuse03 to estimate the model. This file contains 1000
observations corresponding to a random subsample extracted from the time use survey
for Spain carried out in 2002-2003.
a) Which signs do you expect for β 2 , β 3, β 4 and β 5 ? Explain.
b) Write out the results in equation form?
c) Do you think there are relevant factors omitted in the above equation?
Explain.
d) Interpret the coefficient on the regressors educ, hhinc, age and paidwork.
95
INTRODUCTION TO ECONOMETRICS
96
MULTIPLE LINEAR REGRESSION
Appendixes
Appendix 3.1 Proof of the theorem of Gauss-Markov
To prove this theorem, the MLC assumptions 1 through 9 are used.
Let us now consider another estimator β which is a function of y (remember that
βˆ is also a function of y), given by
β [ X′X ] X′ + A y
−1
= (3-93)
where A is k × n arbitrary matrix, that is a function of X and/or other non-stochastic
variables, but it is not a function of y. For β to be unbiased, certain conditions must be
accomplished.
Taking (3-52) into account, we have
97
INTRODUCTION TO ECONOMETRICS
β β + [ X′X ] X′ + A u
−1
= (3-97)
Taking into account assumptions 7 and 8, and (3-96), the Var (β ) is equal to
= E [ X′X ] X′ uu′ X [ X′X ] + AA ′ σ 2 [ X′X ] + AA′
−1 −1 −1
=
(3-98)
The difference between both variances is the following:
The difference between the variance of an estimator β - arbitrary but linear and
unbiased – and the variance of the estimator β̂ is a semi positive definite matrix.
Consequently, β̂ is a Best Unbiased Linear Estimator; that is to say, it is a BLUE
estimator.
)
Appendix 3.2 Proof: σ 2 is an unbiased estimator of the variance of the disturbance
In order to see which is the most appropriate estimator of σ 2 , we shall first
analyze the properties of the sum of squared residuals. This one is precisely the numerator
of the residual variance.
Taking into account (3-17) and (3-23), we are going to express the vector of
residuals as a function of the regressand
(3-102)
= Xβ - Xβ + I − X [ X′X ] X′ u = I − X [ X′X ] X′ u
−1 −1
= Mu
Taking into account (3-102), the sum of squared residuals (SSR) can be expressed
in the following form:
=uˆ ′uˆ u=
′M′Mu u′Mu (3-103)
98
MULTIPLE LINEAR REGRESSION
99
INTRODUCTION TO ECONOMETRICS
é ù é ù 1 1 s 2 X'X s 2
var êxi ui ú= E êxi ui (xi ui ) 'ú= X'E [uu ']X = = 2Q
ë û ë û n n n n n
(3-112)
since E [uu ']= s 2I , according to assumptions 7 and 8.
Taking limits in (3-112), it then follows that
é ù s2
lim var êxi ui ú= lim Q = 0(Q) = 0 (3-113)
n® ¥ ë û n ® ¥ n2
Since the expectation of xi ui is identically zero and its variance converges to zero,
xi ui converges in mean square to zero. Convergence in mean square implies
convergence in probability, and so plim( xi ui )=0. Therefore,
100
MULTIPLE LINEAR REGRESSION
é1 ù
plim(βˆ ) = β + Q- 1plim(xi ui ) = β + Q- 1plim ê X'uú= β + Q- 1 ´ 0 = β
ê
ën ú
û
(3-114)
Consequently, β̂ is a consistent estimator.
var(y ) = E ( y − Xβ )( y − Xβ )′ =
E uu′ =
σ 2I (3-117)
Therefore,
y : N ( Xβ, s 2 I ) (3-118)
The probability density of y (or likelihood function), considering X and y fixed
and β and σ2 variable, will be in accordance with (3-118) equal to
L =f ( y | β, σ 2 ) =
1
(
exp − (1 2σ 2 ) ( y - Xβ ) ' ( y − Xβ ) )
( 2πσ )
2 n /2
(3-119)
The maximum for L is reached in the same point on the ln(L) given that the
logarithm function is monotonic, and thus, in order to maximize the function, we can
work with ln(L) instead of L. Therefore,
n ln(2π) n ln(σ 2 ) 1
ln( L) =
− − − 2 − (y - Xβ)'(y - Xβ) (3-120)
2 2 2σ
To maximize ln(L), we differentiate it with respect to β and σ2:
δ ln( L) 1
=
− 2 (−2 X'y + 2 X'Xβ) (3-121)
δβ 2σ
101
INTRODUCTION TO ECONOMETRICS
[=
X ' X]
−1
=β X'y (3-124)
Consequently, the maximum likelihood estimator of β, under the assumptions of
the CLM, coincides with OLS estimator, that is to say,
β = βˆ (3-125)
Therefore,
(y - Xβ)'(y = (y - Xβ)'(y
- Xβ) ˆ - Xβˆ ) = uˆ ' uˆ (3-126)
Equating (3-122) to zero and by substituting β by β , we obtain:
n uˆ ' uˆ
− + =
0 (3-127)
2σ 2
2σ 4
where we have designated by σ 2 the maximum likelihood estimator of the variance of
the random disturbances. From (3-127), it follows that
uˆ ' uˆ
σ 2 = (3-128)
n
As we can see, the maximum likelihood estimator is not equal to the unbiased
estimator that has been obtained in (3-106). In fact, if we take expectations to (3-128),
1 n−k 2
E σ 2 =
= E [uˆ ' uˆ ] σ (3-129)
n n
That is to say, the maximum likelihood estimator, σ 2 , is a biased estimator,
although its bias tends to zero as n infinity, since
n−k
lim =1 (3-130)
n →∞ n
102
MULTIPLE LINEAR REGRESSION
103
4 HYPOTHESIS TESTING IN THE MULTIPLE
REGRESSION MODEL
104
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
c) H0 : β1=β2 =0
d) H0 : β2+β3 =1
We will also define an alternative hypothesis, denoted by H 1 , which will be our
conclusion if the experimental test indicates that H 0 is false.
Although the alternative hypotheses can be simple or composite, in the regression
model we will always take a composite hypothesis as an alternative hypothesis. This
hypothesis, which shall be called H1, is formulated using the operator “inequality” in most
cases. Thus, for example, given the H 0 :
H0 : β j = 1 (4-1)
we can formulate the following H1 :
H1 : β j ≠ 1 (4-2)
which is a “two side alternative” hypothesis.
The following hypotheses are called “one side alternative” hypotheses
H1 : β j < 1 (4-3)
H1 : β j > 1 (4-4)
Known σ2 N Chi-square
The statistic used for the test is built taking into account the H0 and the sample
data. In practice, as σ 2 is always unknown, we will use the distributions t and F.
105
INTRODUCTION TO ECONOMETRICS
decision rule, we shall examine the types of mistakes that can be made in testing
hypothesis.
Type I error
We can reject H 0 when it is in fact true. This is called Type I error. Generally, we
define the significance level (α) of a test as the probability of making a Type I error.
Symbolically,
α = Pr( Reject H 0 | H 0 ) (4-5)
In other words, the significance level is the probability of rejecting H 0 given that
H 0 is true. Hypothesis testing rules are constructed making the probability of a Type I
error fairly small. Common values for α are 0.10, 0.05 and 0.01, although sometimes
0.001 is also used.
After we have made the decision of whether or not to reject H 0 , we have either
decided correctly or we have made an error. We shall never know with certainty whether
an error was made. However, we can compute the probability of making either a Type I
error or a Type II error.
Type II error
We can fail to reject H 0 when it is actually false. This is called Type II error.
β = Pr( No reject H 0 | H1 ) (4-6)
In words, β is the probability of not rejecting H 0 given that H 1 is true.
It is not possible to minimize both types of error simultaneously. In practice, what
we do is select a low significance level.
106
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
Non
Rejection
Rejection
Region RR
Region
NRR
c W
FIGURE 4.1. Hypothesis testing: classical approach.
107
INTRODUCTION TO ECONOMETRICS
Type I error. More technically, the p value is defined as the lowest significance level at
which a null hypothesis can be rejected.
Once the p-value has been determined, we know that the null hypothesis is
rejected for any α ≥ p-value, while the null hypothesis is not rejected when α<p-value.
Therefore, the p-value is an indicator of the level of admissibility of the null hypothesis:
the higher the p-value, the more confidence we can have in the null hypothesis. The use
of the p-value turns hypothesis testing around. Thus, instead of fixing a priori the
significance level, the p-value is calculated to allow us to determine the significance
levels of those in which the null hypothesis is rejected.
In the following sections, we will see the use of p value in hypothesis testing put
into practice.
The t test
Under the CLM assumptions 1 through 9,
If we typify
βˆ j − β j βˆ j − β j
= = ~ N [ 0,1] j 1, 2,3, , k (4-9)
var( βˆ j ) sd ( βˆ j )
The claim for normality is usually made on the basis of the Central Limit
Theorem (CLT), but this is restrictive in some cases. That is to say, normality cannot
always be assumed. In any application, whether normality of u can be assumed is really
an empirical matter. It is often the case that using a transformation, i.e. taking logs, yields
a distribution that is closer to normality, which is easy to handle from a mathematical
point of view. Large samples will allow us to drop normality without affecting the results
too much.
Under the CLM assumptions 1 through 9, we obtain a Student’s t distribution
bˆ j - b j
: t n- k (4-10)
se(bˆ )j
where k is the number of unknown parameters in the population model (k-1 slope
parameters and the intercept, β 1 ). The expression (4-10) is important because it allows
us to test a hypothesis on β j .
If we compare (4-10) with (4-9), we see that the Student’s t distribution derives
from the fact that the parameter σ in sd ( βˆ j ) has been replaced by its estimator σˆ ,
which is a random variable. Thus, the degrees of freedom of t are n-1-k corresponding
to the degrees of freedom used in the estimation of σˆ 2 .
108
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
When the degrees of freedom (df) in the t distribution are large, the t distribution
approaches the standard normal distribution. In figure 4.2, the density function for
normal and t distributions for different df are represented. As can be seen, the t density
functions are flatter (platycurtic) and the tails are wider than normal density function,
but as df increases, t density functions are closer to the normal density. In fact, what
happens is that the t distribution takes into account that σ 2 is estimated because it is
unknown. Given this uncertainty, the t distribution extends more than the normal one.
However, as the df grows the t-distribution is nearer to the normal distribution because
the uncertainty of not knowing σ 2 decreases.
Therefore, the following convergence in distribution should be kept in mind:
tn
n →∞
→ N (0,1) (4-11)
Thus, when the number of degrees of freedom of a Student’s t tends to infinity,
the t distribution converges towards a distribution N(0.1). In the context of testing a
hypothesis, if the sample size grows, so will the degrees of freedom. This means that for
large sizes the normal distribution can be used to test hypothesis with one unique
restriction, even when you do not know the population variance. As a practical rule, when
the df are larger than 120, we can take the critical values from the normal distribution.
FIGURE 4.2. Density functions: normal and t for different degrees of freedom.
Consider the null hypothesis,
H0 : β j = 0
Since β j measures the partial effect of x j on y after controlling for all other
independent variables, H 0 : β j = 0 means that, once x 2 , x 3 , …,x j −1 , x j+1 ,…, x k have been
accounted for, x j has no effect on y. This is called a significance test. The statistic we use
to test H 0 : β j = 0 , against any alternative, is called the t statistic or the t ratio of βˆ j and
is expressed as
βˆ j
tβˆ =
j
se( βˆ j )
109
INTRODUCTION TO ECONOMETRICS
the null hypothesis could be true, whereas a large value will indicate a false null
hypothesis. The question is: how far is βˆ j from zero?
We must recognize that there is a sampling error in our estimate βˆ j , and thus the
size of βˆ j must be weighted against its sampling error. This is precisely what we do when
we use t , since this statistic measures how many standard errors βˆ is away from zero.
βˆ j j
In order to determine a rule for rejecting H 0 , we need to decide on the relevant alternative
hypothesis. There are three possibilities: one-tail alternative hypotheses (right and left
tail), and two-tail alternative hypothesis.
be seen in figure 4.3. It is very clear that to reject H 0 against H1 : β j > 0 , we must get a
positive tβˆ . A negative tβˆ , no matter how large, provides no evidence in favor of
j j
H1 : β j > 0 . On the other hand, in order to obtain tnα− k in the t statistical table, we only
need the significance level α and the degrees of freedom.
It is important to remark that as α decreases, tnα− k increases.
To a certain extent, the classical approach is somewhat arbitrary, since we need to
choose α in advance, and eventually H 0 is either rejected or not.
110
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
p-value
tnα−k tβˆ
j
FIGURE 4.3. Rejection region using t: right-tail FIGURE 4.4. p-value using t: right-tail alternative
alternative hypothesis. hypothesis.
EXAMPLE 4.1 Is the marginal propensity to consume smaller than the average propensity to consume?
As seen in example 1.1, testing the 3rd proposition of the Keynesian consumption function in a
linear model, is equivalent to testing whether the intercept is significative1y greater than 0. That is to say,
in the model
β1 β 2inc + u
cons =+
we must test whether
β1 > 0
With a random sample of 42 observations, the following results have been obtained
·
consi = 0.41 + 0.843 inci
(0.350) (0.062)
The numbers in parentheses, below the estimates, are standard errors (se) of the estimators.
The question we pose is the following: is the third proposition of the Keynesian theory admissible?
Next, we answer this question.
1) In this case, the null and alternative hypotheses are the following:
H 0 : β1 = 0
H1 : β1 > 0
2) The test statistic is:
βˆ1 − β10 βˆ1 − 0 0.41
=t = = = 1.171
se( βˆ1 ) se( βˆ1 ) 0.35
3) Decision rule
It is useful to use several significance levels. Let us begin with a significance level of 0.10 because
the value of t is relatively small (smaller than 1.5). In this case, the degrees of freedom are 40 (42
observations minus 2 estimated parameters). If we look at the t statistical table (row 40 and column 0.10,
0.10
or 0.20, in statistical tables with one tail, or two tails, respectively), we find t40 = 1.303
As t<1.303, we do not reject H 0 for α=0.10, and therefore we cannot reject for α=0.05
0.05
( t40 = 1.684 ) or α=0.01 ( t40
0.01
= 2.423 ), as can been in figure 4.5. In this figure, the rejection region
corresponds to α=0.10. Therefore, we cannot reject H 0 in favor H 1 . In other words, the sample data are not
consistent with Keynes’s proposition 3.
In the alternative approach, as can be seen in figure 4.6, the p-value corresponding to a tβˆ =1.171
1
for a t with 40 df is equal to 0.124. For α<0.124 - for example, 0.10, 0.05 and 0.01-, H 0 is not rejected.
111
INTRODUCTION TO ECONOMETRICS
p-value
0,10 0,124
2,423
1,171
1,684
1,303
1,171
1,303
1,684
2,423
FIGURE 4.5. Example 4.1: Rejection region using FIGURE 4.6. Example 4.1: p-value using t with
t with a right-tail alternative hypothesis. right-tail alternative hypothesis.
we must get a negative tβˆ . A positive tβˆ , no matter how large it is, provides no evidence
j j
in favor of H1 : β j < 0 .
In figure 4.8 the alternative approach is represented. Once the p-value has been
determined, we know that H 0 is rejected for any level of significance of α>p-value, while
the null hypothesis is not rejected when α<p-value.
112
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
α p-value
−tnα−k 0 tβˆ
j
0
FIGURE 4.7. Rejection region using t: left-tail FIGURE 4.8. p-value using t: left-tail alternative
alternative hypothesis. hypothesis.
The numbers in parentheses, below the estimates, are standard errors (se) of the estimators.
One of the questions posed by researchers is whether income has a negative influence on infant
mortality. To answer this question, the following hypothesis testing is carried out:
The null and alternative hypotheses, and the test statistic, are the following:
H 0 : β2 = 0 βˆ2 −0.000826
t= = = −2.966
H1 : β 2 < 0 se( β 2 )
ˆ 0.00028
Since the t value is relatively high, let us start testing with a level of 1%. For α=0.01,
0.01
t130 −1− 2 ≈ t600.01 =
2.390 . Given that t<-2.390, as is shown in figure 4.9, we reject H 0 in favour of H 1 .
Therefore, the gross national income per capita has an influence that is significantly negative in mortality
of children under 5.That is to say, the higher the gross national income per capita the lower the percentage
of mortality of children under 5. As H 0 has been rejected for α=0.01, it will also be rejected for levels of
5% and 10%.
In the alternative approach, as can be seen in figure 4.10, the p-value corresponding to a tβˆ =-
1
2.966 for a t with 61 df is equal to 0.0000. For all α>0.0000, such as 0.01, 0.05 and 0.10, H 0 is rejected.
p-value
0.01
0.05
t60 0.0000 t61
0.10 0.01
0.05
0.10
I
-1.671
-2.390
-2.966
-1.296
-1.671
-2.966
-2.390
-1.296
FIGURE 4.9. Example 4.2: Rejection region using FIGURE 4.10. Example 4.2: p-value using t with a
t with a left-tail alternative hypothesis. left-tail alternative hypothesis.
113
INTRODUCTION TO ECONOMETRICS
H0 : β j = 0
against the alternative hypothesis
H1 : β j ≠ 0
This is the relevant alternative when the sign of β j is not well determined by theory
or common sense. When the alternative is two-sided, we are interested in the absolute
value of the t statistic. This is a significance test.
In this case, the decision rule is the following:
Decision rule
can be seen in figure 4.11. In this case, in order to reject H 0 against H1 : β j ≠ 0 , we must
obtain a large enough tβˆ which is either positive or negative.
j
p-value/2 p-value/2
α α
−tn−2k tn−2k
−tβˆ tβˆ
j
j
FIGURE 4.11. Rejection region using t: two-tail FIGURE 4.12. p-value using t: two-tail alternative
alternative hypothesis. hypothesis.
When a specific alternative hypothesis is not stated, it is usually considered to be
two-sided hypothesis testing. If H 0 is rejected in favor of H 1 at a given α, we usually say
that “x j is statistically significant at the level α”.
EXAMPLE 4.3 Has the rate of crime play a role in the price of houses in an area?
To explain housing prices in an American town, the following model is estimated:
β1 + β 2 rooms + β3lowstat + β 4 crime + u
price =
where rooms is the number of rooms of the house, lowstat is the percentage of people of “lower status” in
the area and crime is crimes committed per capita in the area.
114
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
The output for the fitted model, using the file hprice2 (first 55 observations), appears in table 4.2
and has been taken from E-views. The meaning of the first three columns is clear: “t-Statistic” is the
outcome to perform a significance test, that is to say, it is the ratio between the “Coefficient” and the “Std
error”; and “Prob” is the p-value to perform a two-tailed test.
In relation to this model, the researcher questions whether the rate of crime in an area plays a role
in the price of houses in that area.
To answer this question, the following procedure has been carried out.
In this case, the null and alternative hypothesis and the test statistic are the following:
H 0 : β4 = 0 βˆ4 −3854
t= = = −4.016
H1 : β 4 ≠ 0 se( β 4 )
ˆ 960
TABLE 4.2. Standard output in the regression explaining house price. n=55.
Variable Coefficient Std. Error t-Statistic Prob.
C -15693.61 8021.989 -1.956324 0.0559
ROOMS 6788.401 1210.720 5.606910 0.0000
LOWSTAT -268.1636 80.70678 -3.322690 0.0017
CRIME -3853.564 959.5618 -4.015962 0.0002
Since the t value is relatively high, let us by start testing with a level of 1%. For α=0.01,
t0.01/ 2
51 ≈t50 =
0.01/ 2
2.69 . (In the usual statistical tables for t distribution, there is no information for each df
above 20). Given that t > 2.69, we reject H 0 in favour of H 1 . Therefore, crime has a significant influence
on housing prices for a significance level of 1% and, thus, of 5% and 10%.
In the alternative approach, we can perform the test with more precision. In table 4.2 we see that
the p-value for the coefficient of crime is 0.0002. That means that the probability of the t statistic being
greater than 4.016 is 0.0001 and the probability of t being smaller than -4.016 is 0.0001. That is to say, the
p-value, as shown in Figure 4.13, is distributed in the two tails. As can be seen in this figure, H 0 is rejected
for all significance levels greater than 0.0002, such as 0.01, 0.05 and 0.10.
p-value/2 p-value/2
0,0002/2 t61 0,0002/2
0,01/2 0,01/2
0,05/2 0,05/2
0,10/2
0,10/2
-4,016
4,016
FIGURE 4.13. Example 4.3: p-value using t with a two-tail alternative hypothesis.
So far we have seen significant tests of one-tail and two-tails, in which a parameter
takes the value 0 in H 0 . Now we are going to look at a more general case where the
parameter in H 0 takes any value:
H 0 : β j = β 0j
Thus, the appropriate t statistic is
βˆ j − β 0j
tβˆ =
j
se( βˆ j )
115
INTRODUCTION TO ECONOMETRICS
116
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
where Pt: is the share price at the end of period t, Dt: are the dividends received by the share during the
period t, and At: is the value of the rights that eventually corresponded to the share during the period t
Thus, the numerator of (4-15) summarizes the three types of capital gains that have been received
for the maintenance of a share in year t; that is to say, an increase or decrease in quotation, dividends and
rights on capital increase. Dividing by Pt-1, we obtain the rate of profit on share value at the end of the
previous period. Of these three components, the most important one is the increase in quotation.
Considering only that component, the yield rate of the action can be expressed by
Pt
RA1t = (4-16)
Pt1
or, alternatively if we use a relative rate of variation, by
RA2t = ln Pt (4-17)
In the same way as Rat represents the rate of return of a particular share in either of the two
expressions, we can also calculate the rate of return of all shares listed in the stock exchange. The latter rate
of return, which will be denoted by RMt, is called the market rate of return.
So far we have considered the rate of return in a year, but we can also apply expressions such as
(4-16), or (4-17), to obtain daily rates of return. It is interesting to know whether the rates of return in the
past are useful for predicting rates of return in the future. This question is related to the concept of market
efficiency. A market is efficient if prices incorporate all available information, so there is no possibility of
making abnormal profits by using this information.
In order to test the efficiency of a market, we define the following model, using daily rates of
return defined by (4-16):
β1 + β 2 rmad t −1 + ut
=
rmad 92t 92 (4-18)
If a market is efficient, then the parameter β 2 of the previous model must be 0. Let us now compare
whether the Madrid Stock Exchange is efficient as a whole.
The model (4-18) has been estimated with daily data from the Madrid Stock Exchange for 1992,
using file bolmadef. The results obtained are the following:
· 92 = − 0.0004+ 0.1267 rmad 92
rmad t t- 1
(0.0007) (0.0629)
R2=0.0163 n=247
The results are paradoxical. On the one hand, the coefficient of determination is very low (0.0163),
which means that only 1.63% of the total variance of the rate of return is explained by the previous day’s
rate of return. On the other hand, the coefficient corresponding to the rate of significance of the previous
day is statistically significant at a level of 5% but not at a level of 1% given that the t statistic is equal to
0.1267/0.0629=2.02, which is slightly larger in absolute value than t245 t60 =2.00. The reason for this
0.01 0.01
apparent paradox is that the sample size is very high. Thus, although the impact of the explanatory variable
on the endogenous variable is relatively small (as indicated by the coefficient of determination), this finding
is significant (as evidenced by the statistical t) because the sample is sufficiently large.
To answer the question as to whether the Madrid Stock Exchange is an efficient market, we can
say that it is not entirely efficient. However, this response should be qualified. In financial economics there
is a dependency relationship of the rate of return of one day with respect to the rate corresponding to the
previous day. This relationship is not very strong, although it is statistically significant in many world stock
markets due to market frictions. In any case, market players cannot exploit this phenomenon, and thus the
market is not inefficient, according to the above definition of the concept of efficiency.
EXAMPLE 4.6 Is the rate of return of the Madrid Stock Exchange affected by the rate of return of the
Tokyo Stock Exchange?
The study of the relationship between different stock markets (NYSE, Tokyo Stock Exchange
Madrid Stock Exchange, London Stock Exchange, etc.) has received much attention in recent years due to
the greater freedom in the movement of capital and the use of foreign markets to reduce the risk in portfolio
management. This is because the absence of perfect market integration allows diversification of risk. In any
117
INTRODUCTION TO ECONOMETRICS
case, there is a world trend toward a greater global integration of financial markets in general and stock
markets in particular.
If markets are efficient, and we have seen in example 4.5 that they are, the innovations (new
information) will be reflected in the different markets for a period of 24 hours.
It is important to distinguish between two types of innovations: a) global innovations, which is
news generated around the world and has an influence on stock prices in all markets, b) specific innovations,
which is the information generated during a 24 hour period and only affects the price of a particular market.
Thus, information on the evolution of oil prices can be considered as a global innovation, while a new
financial sector regulation in a country would be considered a specific innovation.
According to the above discussion, stock prices quoted at a session of a particular stock market
are affected by the global innovations of a different market which had closed earlier. Thus, global
innovations included in the Tokyo market will influence the market prices of Madrid on the same day. The
following model shows the transmission of effects between the Tokyo Stock Exchange and the Madrid
Stock Exchange in 1992:
rmad92 t =β 1 +β 2 rtok92 t +u t (4-19)
where rmad92t is the rate of return of the Madrid Stock Exchange in period t and rtok92 t is the rate of
return of the Tokyo Stock Exchange in period t. The rates of return have been calculated according to (4-16).
In the working file madtok you can find general indices of the Madrid Stock Exchange and the
Tokyo Stock Exchange during the days both exchanges were open simultaneously in 1992. That is, we
eliminated observations for those days when any one of the two stock exchanges was closed. In total, the
number of observations is 234, compared to the 247 and 246 days that the Madrid and Tokyo Stock
Exchanges were open.
The estimation of the model (4-19) is as follows:
· 92 = − 0.0005+ 0.1244 rtok 92
rmad t t
(0.0007) (0.0375)
R2=0.0452 n=235
Note that the coefficient of determination is relatively low. However, for testing H 0 : β 2 =0, the
statistic t = (0.1244/0.0375) = 3.32, which implies that we reject the hypothesis that the rate of return of the
Tokyo Stock Exchange has no effect on the rate of return of the Madrid Stock Exchange, for a significance
level of 0.01.
Once again we find the same apparent paradox which appeared when we analyzed the efficiency
of the Madrid Stock Exchange in example 4.5 except for one difference. In the latter case, the rate of return
from the previous day appeared as significant due to problems arising in the elaboration of the general index
of the Madrid Stock Exchange.
Consequently, the fact that the null hypothesis is rejected implies that there is empirical evidence
supporting the theory that global innovations from the Tokyo Stock Exchange are transmitted to the quotes
of the Madrid Stock Exchange that day.
118
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
βˆ j − β j α /2
Pr −tnα−/2k ≤ ≤ tn − k =1−α
se( βˆ j )
Operating to put the unknown β j alone in the middle of the interval, we have
Therefore, the lower and upper bounds of a (1-α) CI respectively are given by
βˆ j se( βˆ j ) × tnα−/2k
β j =−
βˆ j se( βˆ j ) × tnα−/2k
β j =+
If random samples were obtained over and over again with β j , and β j computed
each time, then the (unknown) population value would lie in the interval ( β j , β j ) for (1
− α)% of the samples. Unfortunately, for the single sample that we use to construct CI,
we do not know whether β j is actually contained in the interval.
Once a CI is constructed, it is easy to carry out two-tailed hypothesis tests. If the
null hypothesis is H 0 : β j = a j , then H 0 is rejected against H1 : β j ≠ a j at (say) the 5%
significance level if, and only if, a j is not in the 95% CI.
To illustrate this matter, in figure 4.14 we constructed confidence intervals of 90%,
95% and 99%, for the marginal propensity to consumption -β 2 - corresponding to example
4.1.
0,99
0,95
0,90
0.968
1.011
0.675
0.739
0.843
0.947
0.718
FIGURE 4.14. Confidence intervals for marginal propensity to consume in example 4.1.
119
INTRODUCTION TO ECONOMETRICS
EXAMPLE 4.7 Are there constant returns to scale in the chemical industry?
To examine whether there are constant returns to scale in the chemical sector, we are going to use
the Cobb-Douglas production function, given by
β1 + β 2 ln(labor ) + β3 ln(capital ) + u
ln(output ) = (4-20)
In the above model parameters β 2 and β 3 are elasticities (output/labor and output/capital).
Before making inferences, remember that returns to scale refers to a technical property of the
production function examining changes in output subsequent to a change of the same proportion in all
inputs, which are labor and capital in this case. If output increases by that same proportional change then
there are constant returns to scale. Constant returns to scale imply that if the factors labor and capital
increase at a certain rate (say 10%), output will increase at the same rate (e.g., 10%). If output increases by
more than that proportion, there are increasing returns to scale. If output increases by less than that
proportional change, there are decreasing returns to scale. In the above model, the following occurs
- if β 2 +β 3 =1, there are constant returns to scale.
- if β 2 +β 3 >1, there are increasing returns to scale.
- if β 2 +β 3 <1, there are decreasing returns to scale.
Data used for this example are a sample of 27 companies of the primary metal sector (workfile
prodmet), where output is gross value added, labor is a measure of labor input, and capital is the gross
value of plant and equipment. Further details on construction of the data are given in Aigner, et al. (1977)
and in Hildebrand and Liu (1957); these data were used by Greene in 1991. The results obtained in the
estimation of model (4-20), using any econometric software available, appear in table 4.4.
Two procedures will be used to test this hypothesis. In the first, the covariance matrix of the
estimators is used. In the second, the model is reparameterized by introducing a new parameter.
Procedure: using covariance matrix of estimators
According to H 0 , it is stated that β 2 + β 3 − 1 =0 . Therefore, the t statistic must now be based on
whether the estimated sum βˆ2 + βˆ3 − 1 is sufficiently different from 0 to reject H 0 in favor of H 1 . To
account for the sampling error in our estimators, we standardize this sum by dividing by its standard error:
βˆ + βˆ3 − 1
tβˆ + βˆ = 2
2 3
se( βˆ2 + βˆ3 )
Therefore, if tβˆ is large enough, we will conclude, in a two side alternative test, that there are
2 + β3
ˆ
not constant returns to scale. On the other hand, if tβˆ is positive and large enough, we will reject, in a
2 + β3
ˆ
one side alternative test (right), H 0 in favour of H1 : β 2 + β 3 > 1 . Therefore, there are increasing returns to
scale.
120
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
TABLE 4.6. Estimation output for the production function: reparameterized model.
Variable Coefficient Std. Error t-Statistic Prob.
constant 1.170644 0.326782 3.582339 0.0015
ln(labor) -0.021290 0.062577 -0.340227 0.7366
ln(capital/labor) 0.375710 0.085346 4.402204 0.0002
121
INTRODUCTION TO ECONOMETRICS
se( βˆ3 =
− βˆ2 ) 79.644 + 12.992 − 2 × =
2.941 9.3142
βˆ3 − βˆ2 30.697 − 18.637
=
tβˆ − βˆ = = 1.295
3 2
se( βˆ3 − βˆ2 ) 9.3142
For α=0.10, we find that t150.10 = 1.341 . As t<1.341, we do not reject H 0 for α=0.10, nor for α=0.05
or α=0.01. Therefore, there is no empirical evidence that a dollar spent on special incentives has a higher
incidence on sales than a dollar spent on advertising.
EXAMPLE 4.9 Testing the hypothesis of homogeneity in the demand for fish
In the case study in chapter 2, models for demand for dairy products have been estimated from
cross-sectional data, using disposable income as an explanatory variable. However, the price of the product
122
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
itself and, to a greater or lesser extent, the prices of other goods are determinants of the demand. The
demand analysis based on cross sectional data has precisely the limitation that it is not possible to examine
the effect of prices on demand because prices remain constant, since the data refer to the same point in time.
To analyze the effect of prices it is necessary to use time series data or, alternatively, panel data. We will
briefly examine some aspects of the theory of demand for a good and then move to the estimation of a
demand function with time series data. As a postscript to this case, we will test one of the hypotheses which,
under certain circumstances, a theoretical model must satisfy.
The demand for a commodity - say good j - can be expressed, according to an optimization process
carried out by the consumer, in terms of disposable income, the price of the good and the prices of the other
goods. Analytically:
q j = f j ( p1 , p2 , L , p j , L , pm , di ) (4-21)
where
∑ε
h =1
q j / ph + εqj /R =
0 (4-24)
123
INTRODUCTION TO ECONOMETRICS
As can be seen, the signs of the elasticities are correct: the elasticity of demand is negative with
respect to the price of the good, while the elasticities with respect to the price of the substitute good and
total consumption are positive
In model (4-26) the homogeneity restriction implies the following null hypothesis:
β 2 + β3 + β 4 =
0 (4-27)
To carry out this test, we will use a similar procedure to the one used in example 4.6. Now, the
parameter θ is defined as follows
θ = β 2 + β3 + β 4 (4-28)
Setting β 2 =θ − β 3 − β 4 , the following model has been estimated:
β1 + θ ln( fishpr ) + β3 ln(meatpr / fishpr ) + β 4 ln(cons / fishpr ) + u
ln( fish ) =
(4-29)
The results obtained were the following:
· fish ) =7.788- 0.4596 ln( fishpr ) + 0.554 ln(meatpr ) + 0.322 ln(cons )
ln( i i i i
(2.30) (0.1334) (0.112) (0.137)
Using (4-28), testing the null hypothesis (4-27) is equivalent to testing that the coefficient of
ln(fishpr) in (4-29) is equal to 0. Since the t statistic for this coefficient is equal to -3.44 and t240.01/ 2 =2.8,
we reject the hypothesis of homogeneity regarding the demand for fish.
whereas the economic significance of a variable is related to the size (and sign) of βˆ j .
Too much focus on statistical significance can lead to the false conclusion that a variable
is “important” for explaining y, even though its estimated effect is modest.
Therefore, even if a variable is statistically significant, you need to discuss the
magnitude of the estimated coefficient to get an idea of its practical or economic
importance.
124
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
125
INTRODUCTION TO ECONOMETRICS
Let us suppose that there are q exclusion restrictions to test. H0 states that q of the
variables have zero coefficients. Assuming that they are the last q variables, H0 is stated
as
H 0 : β k − q +=
1 β k − q +=
2 = β=
k 0 (4-32)
The restricted model is obtained by imposing the q restrictions of H0 on the
unrestricted model.
y = β1 + β 2 x2 + β3 x3 + + β k − q xk − q +u (4-33)
H1 is stated as
H1: H0 is not true (4-34)
Fq ,n − k =
x q
(4-37)
x n−k / n − k
2
where xq2 and xn2− k are Chi-square distributions that are independent of each other.
In (4-35) we see that the degrees of freedom corresponding to RSSUR (dfUR)are n-
k. Remember that
RSSUR
σˆUR
2
= (4-38)
n−k
On the other hand, the degrees of freedom corresponding to RSSR (dfR) are n-k+q,
because in the restricted model k-q parameters are estimated. The degrees of freedom
corresponding to RSSR-RSSUR are
(n-k+q)-(n-k)=q = numerator degrees of freedom=dfR-dfUR
126
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
Decision rule
The Fq,n-k distribution is tabulated and available in statistical tables, where we look
for the critical value ( Fqα,n − k ), which depends on α (the significance level), q (the df of the
numerator), and n-k, (the df of the denominator). Taking into account the above, the
decision rule is quite simple.
Decision rule
If F ≥ Fqα,n − k reject H0
(4-40)
If F < Fqα,n − k not reject H 0
127
INTRODUCTION TO ECONOMETRICS
Non
Rejection
Rejection Rejected Non rejected
Region
Region
RR for for
NRR
α>p-value α<p-value
Fq ,n−k
Fq ,n−k
α
p-value
Fqα,n−k F
FIGURE 4.15. Rejection region and non rejection FIGURE 4.16. p-value using F distribution.
region using F distribution.
128
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
0
2.42
3.23
5.18
2,44
3,23
5,18
1,980
1.980
FIGURE 4.17. Example 4.10: Rejection region using F distribution (α values are from a F2.40).
H 0 : β=
1 β=
2 β=
3 = β=
k 0 (4-41)
However, this is not the adequate H 0 to test for the global significance of the
model. If β=
2 β=
3 = β=
k 0 , then the restricted model would be the following:
y = β1 +u (4-42)
If we take expectations in (4-42), then we have
E ( y ) = β1 (4-43)
Thus, H 0 in (4-41) states not only that the explanatory variables have no
influence on the endogenous variable, but also that the mean of the endogenous variable–
for example, the consumption mean- is equal to 0.
Therefore, if we want to know whether the model is globally significant, the H 0
must be the following:
H 0 : β=
2 β=
3 = β=
k 0 (4-44)
The corresponding restricted model given in (4-42) does not explain anything and,
therefore, RR2 is equal to 0. Testing the H 0 given in (4-44) is very easy by using the R-
squared form of the F statistic:
129
INTRODUCTION TO ECONOMETRICS
R2 / k
F= (4-45)
(1 − R 2 ) / (n − k )
F3,205
0
2,12
2,67
3,92
26,93
FIGURE 4.18. Example 4.11: p-value using F distribution (α values are for a F3,140).
130
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
131
INTRODUCTION TO ECONOMETRICS
In the first restriction we impose that there are constant returns to scale. In the second restriction
that β1, parameter linked to the total factor productivity is equal to 0.
Substituting the restriction of H0 in the original model (unrestricted model), we have
(1 − β 3 ) ln(labor ) + β 3 ln(capital ) + u
ln(output ) =
Operating, we obtain the restricted model:
ln(output / labor ) β 3 ln(capital / labor ) + u
=
In estimating the unrestricted and restricted models, we get RSSR=3.1101 and RSSUR=0.8516.
Therefore, the F ratio is
( RSS R − RSSUR ) / q (3.1101 − 0.8516) / 2
=F = = 13.551
RSSUR / (n − k ) 0.8516 / (27 − 3)
There are two reasons for not using R2 in this case. First, the restricted model has no intercept.
Second, the regressand of the restricted model is different from the regressand of the unrestricted model.
Since the F value is relatively high, let us start by testing with a level of 1%. For α=0.01,
0.01
F2,24 = 5.61 . Given that F>5.61, we reject H0 in favour of H1. Therefore, we reject the joint hypotheses that
there are constant returns to scale and that the parameter β1 is equal to 0. If H0 is rejected for α=0.01, it will
also be rejected for levels of 5% and 10%.
F1,n−k
α −tn−2k
α α
tn−2k
− F1,nα−k F1,nα−k
0 F1,nα−k
FIGURE 4.19. Relationship between a F1,n-k and a t n-k.
132
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
Moreover, since the t statistics are also easier to obtain than the F statistics, there
is no good reason for using an F statistic to test a hypothesis with a unique restriction.
4.5 Prediction
In this section two types of prediction will be examined: point and interval
prediction.
133
INTRODUCTION TO ECONOMETRICS
E θˆ0 =E βˆ1 + βˆ2 x20 + βˆ3 x30 + + βˆk xk0 =β1 + β 2 x20 + β3 x30 + ... + β k xk0 =
θ0
(4-50)
On the other hand, adopting the Gauss Markov assumptions (1 to 8), it can be
proved that this point predictor is the best linear unbiased estimator (BLUE).
We have a point prediction for θ0, but, what is the point prediction for y0? To
answer this question, we have to predict u0. As the error is not observable, the best
predictor for u0 is its expected value, which is 0. Therefore,
ŷ 0 = θˆ0 (4-51)
134
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
θ 0 − θˆ0 α /2
Pr −tnα−/2k ≤ 1−α
≤ tn − k =
se (θˆ0 )
Operating, we can construct a (1-α)% confidence interval (CI) for θ 0 with the
following structure:
To obtain a CI for θ 0 , we need to know the standard error ( se(θˆ0 ) ) for θˆ0 . In any
case, there is an easy way to calculate it. Thus, solving (4-48) for β1 we find that
β1 = θ 0 − β 2 x20 − β3 x30 − ... − β k xk0 . Plugging this into the equation (4-47), we obtain
y =θ 0 + β 2 ( x2 − x20 ) + β3 ( x3 − x30 ) + + β k ( xk − xk0 ) + u (4-53)
Applying OLS to (4-53), in addition to the point prediction, we obtain se(θˆ0 )
which is the standard error corresponding to the intercept in this regression. The previous
method allows us to put a CI around the OLS estimate of E(y), for any values of the x´s.
eˆ20 = y 0 − yˆ 0 = θ 0 + u 0 − yˆ 0 (4-55)
Taking into account (4-51) and (4-50), and that E(u0)=0, then the expected
prediction error is zero. In finding the variance of ê20 , it must be taken into account that
u0 is uncorrelated with ŷ 0 because x20 , x30 , L , xk0 is not in the sample.
Therefore, the variance of the prediction error (conditional on the x´s) is the sum
of the variances:
Var (eˆ20 ) = Var ( yˆ 0 ) + Var (u 0 ) = Var ( yˆ 0 ) + σ 2 (4-56)
1. The sampling error in ŷ 0 , which arises because we have estimated the βj’s.
135
INTRODUCTION TO ECONOMETRICS
{se(θˆ ) + σˆ }
1
2
2 2
=
se(eˆ ) 0
2
0
(4-57)
2
Usually σˆ 2 is larger than se(θˆ0 ) . Under the assumptions of the CLM,
eˆ20
: tn − k (4-58)
se(eˆ20 )
Therefore, we can write that
α /2 eˆ20 α /2
Pr −tn − k ≤ ≤ t n 1−α
−k = (4-59)
se(eˆ20 )
Plugging in eˆ=
0
2 y 0 − yˆ 0 into (4-59) and rearranging it gives a (1-α)% prediction
interval for y0:
1 α
Pr yˆ 0 − se(eˆ20 ) × tnα−/k2 ≤ y 0 ≤ yˆ 0 + se(eˆ20 ) × tnα−/k2 =− (4-60)
EXAMPLE 4. 13 What is the expected score in the final exam with 7 marks in the first short exam?
The following model has been estimated to compare the marks in the final exam (finalmrk) and in
the first short exam (shortex1) of Econometrics:
·
finalmrk = 4.155 + 0.491 shortex1
i i
(0.715) (0.123)
θ =
0
θˆ0 + se(θˆ0 ) × t140.05/ 2 =7.593 + 0.497 × 2.14 =8.7
Therefore, the student will have a 95% confidence of obtaining on average a final mark located
between 6.5 and 8.7.
The point prediction could be also obtained from the first estimated equation:
·
finalmrk = 4.155 + 0.491´ 7 = 7.593
Now, we are going to estimate a 95% probability interval for the individual value. The se of ê20 is
equal
136
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
{ }
1
2
se(eˆ20= se( yˆ 0 ) + σˆ 2 = 0.497 2 + 1.649=
2 2
) 1.722
where 1.649 is the “S. E. of regression” obtained from the E-views output directly.
The lower and upper bounds of a 95% probability interval respectively are given by
y 0 =yˆ 0 − se(eˆ20 ) × t140.025 =7.593 − 1.722 × 2.14 =3.7
The predicted salaries and the corresponding se( θˆ0 ) for selected values (maximum, mean, median
and minimum), using a model as (4-53), appear in table 4.11.
TABLE 4.11. Predictions for selected values.
Prediction θˆ0 Std. Error se( θˆ0 )
Mean values 2026 71
Median value 1688 78
Maximum values 14124 1110
Minimum values 760 195
=
% · y=
y exp(ln( )) exp( βˆ1 + βˆ2 x2 + L + βˆk xk ) (4-63)
137
INTRODUCTION TO ECONOMETRICS
138
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
The MAE is defined as the average of the absolute values of the errors:
n+ h
å yˆi - yi
i = n+ 1
MAE = (4-67)
h
Absolute values are taken so that positive errors are compensated by the negative
ones.
Mean absolute percentage error (MAPE),
n+ h
yˆi - yi
å yi
MAPE = i =n+ 1 ´ 100 (4-68)
h
Root of the mean squared error (RMSE)
This statistic is defined as the square root of the mean of the squared error:
n+ h
2
å ( yˆi - yi )
i = n+ 1
RMSE = (4-69)
h
As the errors are squared, the compensation between positive and negative errors
are avoided. It is important to remark that the MSE places a greater penalty on large
forecast errors than the MAE.
Theil Inequality Coefficient (U)
This coefficient is defined as follows:
n+ h
2
å ( yˆi - yi )
i = n+ 1
U= h (4-70)
n+ h n+ h
å yˆi2 å yi2
=i n=
+1 i n+ 1
+
h h
The smaller U is, the more accurate are the predictions. The scaling of U is such
that it will always lie between 0 and 1. If U=0, then yi= yˆi , for all forecasts; if U=1 the
predictive performance is as bad as it can be. Theil’s U statistic can be rescaled and
decomposed into three proportions: bias, variance and covariance. Of course the sum of
these three proportions is 1. The interpretation of these three proportions is as follows:
1) The bias reflects systematic errors. Whatever the value of U, we would hope
that the bias is close to 0. A large bias suggests a systematic over or under
prediction.
2) The variance also reflects systematic errors. The size of this proportion is an
indication of the inability of the forecasts to replicate the variability of the
variable to be forecasted.
139
INTRODUCTION TO ECONOMETRICS
3) The covariance measures unsystematic errors. Ideally, this should have the
highest proportion of Theil inequality.
In addition of the coefficient defined in (4-70), Theil proposed other coefficients
for forecast evaluation.
Dynamic prediction
Let the following model be given:
yt =β1 + β 2 xt + β3 yt −1 + ut (4-71)
Suppose that the sample forecast is i=n+1,…,i=n+h, and denote the actual and
forecasted value in period i as yi and yˆi , respectively. The forecast for the period n+1 is
Exercises
Exercise 4.1 To explain the housing price in an American town, the following model is
formulated:
β1 + β 2 rooms + β3lowstat + β 4 crime + u
price =
where rooms is the number of rooms in the house, lowstat is the percentage of people of
“lower status” in the area and crime is crimes committed per capita in the area. Prices of
houses are measured in dollars.
Using the data in hprice2, the following model has been estimated:
·
price = - 15694+ 6788 rooms - 268 lowstat - 3854 crime
(8022) (1211) (81) (960)
R2=0.771 n=55
(The numbers in parentheses are standard errors of the estimators.)
a) Interpret the meaning of the coefficients β̂ 2 , β̂3 and β̂ 4 .
b) Does the percentage of people of “lower status” have a negative influence
on the price of houses in that area?
c) Does the number of rooms have a positive influence on the price of
houses?
Exercise 4.2 Consider the following model:
β1 + β 2 ln(inc) + β3hhsize + β 4 punder 5 + u
ln( fruit ) =
140
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
where fruit is expenditure in fruit, inc is disposable income of a household, hhsize is the
number of household members and punder5 is the proportion of children under five in
the household.
Using the data in workfile demand, the following model has been estimated:
· fruit ) = - 9.768+ 2.005ln(inc) - 1.205 hhsize - 0.0179 punder 5
ln(
(3.701) (0.512) (0.179) (0.013)
2
R =0.728 n=40
(The numbers in parentheses are standard errors of the estimators.)
a) Interpret the meaning of the coefficients β̂ 2 , β̂3 and β̂ 4 .
b) Does the number of household members have a statistically significant
effect on the expenditure in fruit?
c) Is the proportion of children under five in the household a factor that has
a negative influence on the expenditure of fruit?
d) Is fruit a luxury good?
Exercise 4.3 (Continuation of exercise 2.5). Given the model
yi =β1 + β 2 xi + ui i=1, 2,, n
the following results have been obtained with a sample size of 11 observations:
n n n n n
∑ xi = 0
i =1
∑ yi = 0
i =1
∑ xi2 = B
i =1
∑ yi2 = E
i =1
∑x y
i =1
i i =F
n n
∑ yi xi − y ∑ xi
=
(Remember that βˆ1 = i 1n=ni 1 )
∑ xi2 − x ∑ xi
=i 1 =i 1
R2=0.996; ∑ uˆ 2
t = 0.196
(The numbers in parentheses are standard errors of the estimators.)
a) Test the null hypothesis that the coefficient of rpfood is less than 0.
b) Obtain a confidence interval of 95% for the marginal propensity to
consume food in relation to income.
141
INTRODUCTION TO ECONOMETRICS
= =
R 2 0.838 RSS 8090 (1)
¶ =
ac 310.07 − 85.39 qty + 26.73 qty 2 − 1.40 qty 3
i i i i
(29.44) (33.81) (11.61) (1.22)
= =
R 2 0.978 RSS 1097 (2)
where ac is the average cost and qty is the quantity produced.
(The numbers in parentheses are standard errors of estimators.)
a) Test whether the quadratic and cubic terms of the quantity produced are
significant in determining the average cost.
b) Test the overall significance in the model 2.
Exercise 4.7 Using a sample of 35 observations, the following models have been
estimated to explain expenditures on coffee:
·coffee) = 21.32 + 0.11 ln(inc) - 1.33 ln(cprice) + 1.35ln(tprice)
ln(
(0.01) (0.23)
(1)
R 2 = 0.905 RSS = 254
·coffee) = 19.9 + 0.14 ln(inc) - 1.42 ln(cprice)
ln(
(0.02) (0.21)
(2)
RSS = 529
where inc is disposable income, cprice is coffee price and tprice is tea price.
(The numbers in parentheses are standard errors of estimators.)
142
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
= =
R 0.339 2
n 30
d) If you omit the variable poverty in the first model, the following results are
obtained:
·
airqual = 82.98 + 0.0523 popln − 0.0097 medincm
i i i i
(10.02) (0.031) (0.0055)
= =
R 0.218 n 30 2
Are the slope coefficients individually significant at 10% in the new model?
Do you consider these results to be reasonable in comparison with those
obtained in part b).
Comparing the R2 of the two estimated models, what is the role played by
poverty in determining air quality?
e) If you regress airqual using as regressors only the intercept and poverty,
you will obtain that R2=0.037. Do you consider this value to be reasonable
taking into account the results obtained in part d)?
143
INTRODUCTION TO ECONOMETRICS
144
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
yˆi =
12.7 + 14.2 x1i + 2.1x2i
4.1 −0.95 −0.266
2
σˆ [ X′X ] = −0.95
−1
3.8 0.5
−0.266 0.5 1.9
a) Test the null hypothesis α0= α1.
b) Test whether α1 / α 2 =7.
c) Are the coefficients α0, α1, y α2 individually significant?
Exercise 4.13 Using a sample of 30 companies, the following cost functions have been
estimated:
· = 172.46+ 35.72 x
a ) cost R 2 = 0.838 R 2 = 0.829 RSS = 8090
i i
(11.97) (3.70)
· = 310.07- 85.39 x + 26.73 x 2 - 1.40 x3 R 2 = 0.978 R 2 = 0.974 RSS = 1097
b) costi i i i
(29.44) (33.81) (11.61) (1.22)
where cost is the average cost and x is the quantity produced.
(The numbers in parentheses are standard errors of estimators.)
a) Which of the two models would you choose? What would be the criteria?
b) Test whether the quadratic and cubic terms of the quantity produced are
significant in determining the average cost.
c) Test the overall significance of the model b).
Exercise 4.14 A researcher formulates the following model:
y=β1 + β 2 x2 + β3 x3+u
Using a sample of 13 observations the following results are obtained:
yˆi =
1.00 − 1.82 x2i + 0.36 x3i
(1)
= =
R 2 0.50 n 13
0.25 −0.01 0.04
−0.01 0.16 −0.15
var(βˆ ) =
0.04 −0.15 0.81
a) Test the null hypothesis that β 2 = 0 against the alternative hypothesis that
β2 < 0 .
b) Test the null hypothesis that β 2 + β3 =
−1 against the alternative
hypothesis that β 2 + β3 ≠ −1 , with a significance level of 5%.
c) Is the whole model significant?
d) Assuming that the variables in the estimated model are measured in natural
logarithms, what is the interpretation of the coefficient for x3?
Exercise 4.15 With a sample of 50 automotive companies the following production
functions were estimated taking the gross value added of the automobile production (gva)
as the endogenous variable and labor input (labor) and capital input (capital) as
explanatory variables.
145
INTRODUCTION TO ECONOMETRICS
146
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
RSS=0.4277
·
ln(teceili ´ pteceil ) = 0.74 + 0.26 ln(inci ) + 0.20 ln( pcofbrasi )
(0.16) (0.15)
RSS=0.6788
(The numbers in parentheses are standard errors of the estimators.)
a) Test the significance of disposable income.
b) Test the hypothesis that β3 = −1 y β 4 = 0 , and explain the procedure
applied.
c) If instead of having information on RSS, only R2 was known for each model,
how would you proceed to test the hypothesis of section b)?
Exercise 4.18 The following fitted models are obtained to explain the deaths of children
under 5 years per 1000 live births (deathu5) using a sample of 64 countries.
¼5 = 263.64 - 0.0056 inc + 2.23 fertrate ;
1) deathun R 2 = 0.7077
i i i
(0.0019) (0.21)
·
2) deathun 5i = 168.31 - 0.0055 inci + 1.76 femilrati + 12.87 fertratei , R 2 = 0.7474
(0.0018) (0.25)
where inc is income per capita, femiltrat is the female illiteracy rate, and fertrate is the
fertility rate
(The numbers in parentheses are standard errors of the estimators.)
a) Test the joint significance of income, illiteracy and fertility rates.
b) Test the significance of the fertility rate.
c) Which of the two models would you choose? Explain your answer.
Exercise 4.19 Using a sample of 32 annual observations, the following estimations were
obtained to explain the car sales (car) of a particular brand:
· = 104.8- 6.64 pcar + 2.98 adv
car i i i
( 6.48) (3.19) (0.16)
147
INTRODUCTION TO ECONOMETRICS
R = 0.30 2
where the values between parentheses are standard deviations and the coefficient of
determination is the adjusted one.
a) Is the coefficient of the variable x2 significant?
b) Is the coefficient of the variable x3 significant?
c) Discuss the joint significance of the model.
Exercise 4.22 Consider the following econometric specification:
y=β1 + β 2 x2 + β3 x3 + β 4 x4 + u
With a sample of 26 observations, the following estimations were obtained:
2
1) yˆi =
2 + 3.5 x1i − 0.7 x2i − 2 x3i + ui R =0.982
(1.9) (2.2) (1.5)
2
2) yˆi =1.5 + 3 ( x1i + x2i ) − 0.6 x3i + ui R = 0.876
(2.7) (2.4)
F=
( RSS R − RSSUR ) / r
F=
2
RUR 2
− RR / q ( )
RSSUR / (n − k ) (1 − RUR
2
) / (n − k )
b) Test the null hypothesis β2= β3.
Exercise 4.23 In the estimation of the Brown model in exercise 3.19, using the workfile
consumsp, we obtained the following results:
·
conspc = − 7.156+ 0.3965 incpc + 0.5771 conspc
t t t −1
(84.88) (0.0857) (0.0903)
148
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
R2=0.703 n=39
(The numbers in parentheses are standard errors of the estimators.)
a) Which of the regressors included in the above model are individually
significant at 1% and at 5%?
b) Test the overall significance of the model.
c) What is the predicted value of salMBAgr for a graduate student who paid
100000$ tuition fees in a two-year MBA master and previously had a
salMBApr equal to 70000$? How many years of work does the student
require to offset tuition expenses? To answer this question, suppose that
the discount rate equals the expected rate of salary increase and that the
student received no wage income during the two master courses.
d) If we added the regressor rank2010 (the rank of each business school in
2010), the following results were obtained:
·
salMBAgr = 61320+ 0.1229 tuition + 0.4662 salMBApr
i i i
(8520) (0.0626) (0.1055)
R2=0.755 n=39
Which of the regressors included in this model are individually significant
at 5%?
What is the interpretation of the coefficient on rank2010?
e) The variable rank2010 is based on three components: gradpoll is a rank
based on surveys of MBA grads and contributes 45 percent to final
ranking; corppoll is a rank based on surveys of MBA recruiters and
contributes 45 percent to final ranking; and intellec is a rank based on a
review of faculty research published over a five-year period in 20 top
academic journals and faculty books reviewed in The New York Times, The
Wall Street Journal, and Bloomberg Businessweek over the same period;
this last rank contributes 10 percent to the final ranking. In the following
estimated model rank2010 has been substituted for its three components:
149
INTRODUCTION TO ECONOMETRICS
·
salMBAgri =
79904+ 0.0305 tuitioni + 0.3751 salMBApri
(10700) (0.0696) (0.107)
R2=0.337 n=800
(The numbers in parentheses are standard errors of the estimators.)
a) Test the overall significance of the model.
b) Is tenure statistically significant at 10%? Is age positively significant at
10%?
c) Is it admissible that the coefficient of educ is equal to that of tenure? Is it
admissible that the coefficient of educ is triple to that of tenure? To answer
these questions you have the following additional information:
·wage) =
ln( 1.565+ 0.0271educ + 0.0177(educ + tenure) + 0.0065 age
i i i i
(0.073) (0.0042) (0.0019) (0.0016)
R2=0.486 n=546
a) Test the overall significance of this model.
b) Test the null hypothesis that an additional bathroom has the same influence
on housing prices than four additional bedrooms. Alternatively, test that
an additional bathroom has more influence on housing prices than four
additional bedrooms. (Additional information: var( βˆ2 ) =1455813;
var( βˆ ) =3186523; and var( βˆ , βˆ ) =-764846).
3 2 3
c) If we add the regressor stories (number of stories excluding the basement)
to the model, the following results have been obtained:
150
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
·
pricei =
−4010+ 2825 bedroomsi + 17105 bathrmsi
(3603) (1215) (1734)
R2=0.232 n=447
(The numbers in parentheses are standard errors of the estimators.)
a) Does roa have a significant effect on salary? Does roa have a significant
positive effect on salary? Carry out both tests at the 10% and 5%
significance level.
b) If roa increases by 20 points, by what percentage is salary predicted to
increase?
c) Test the null hypothesis that the elasticity salary/sales is equal to 0.4.
d) If we add the regressor age, the following results are obtained:
·salary ) =
ln( 4.159+ 0.0055 roai + 0.2903ln( salesi ) + 0.0000539 profits
i
(0.442) (0.0033) (0.0423) (0.0000220)
R2=0.240 n=447
Are the estimated coefficients very different from the estimates in the
reference model? What about the coefficient on tenure? Explain it.
e) Does age have a significant effect on the salary of a CEO?
f) Is it admissible that the coefficient of age is equal to the coefficient of
tenure? (Additional information: var( βˆ5 ) =1.24E-05; var( βˆ6 ) =1.82E-05;
and var( βˆ , βˆ ) =-6.09E-06).
5 6
Exercise 4.28 (Continuation of exercise 3.15). Let us take the population model of this
exercise as the reference model. Using workfile rdspain, the estimated model was the
following:
·
rdintensi =
−1.8168+ 0.1482 ln( salesi ) + 0.0110 exponsali
(0.428) (0.0278) (0.0021)
2
R =0.048 n=1983
(The numbers in parentheses are standard errors of the estimators.)
a) Is the sales variable individually significant at 1%?
151
INTRODUCTION TO ECONOMETRICS
b) Test the null hypothesis that the coefficient on sales is equal to 0.2?
c) Test the overall significance of the reference model.
d) If we add the regressor ln(workers), the following results are obtained:
·
rdintens =
0.480− 0.08585ln( sales ) + 0.01049 exponsal + 0.3422 ln( workers )
(0.750) (0.0687) (0.0021) (0.09198)
2
R =0.055 n=1983
Is sales individually significant at 1% in the new estimated model?
e) Test the null hypothesis that the coefficient on ln(workers) is greater than
0.5?
Exercise 4.29 (Continuation of exercise 3.16). Let us take the population model of this
exercise as the reference model. Using workfile hedcarsp, the corresponding fitted model
is the following:
·price) =
ln( 14.42+ 0.000581 cid + 0.003823 hpweight − 0.07854 fueleff
i i i i
(0.154) (0.0000438) (0.0079) (0.0122)
R2=0.830 n=214
(The numbers in parentheses are standard errors of the estimators.)
a) Which of the regressors included in the reference model are individually
significant at 1%?
b) Add the variable volume to the reference model. Does volume have a
statistically significant effect on ln(price)? Does volume have a statistically
significant positive effect on ln(price)?
c) Is it admissible that the coefficient of volume estimated in part b) is equal
but is the opposite of the coefficient of fueloff?
d) Add the variables length, width and height to the model estimated in part
b). Taking into account that volume=length×width×height, is there perfect
multicollinearity in the new model? Why? Why not? Estimate the new
model if it is possible.
e) Add the variable ln(volume) to the reference model. Test the null
hypothesis that the price/volume elasticity is equal to 1?
f) What happens if you add the regressors ln(length), ln(width) and ln(height)
to the model estimated in part e)?
Exercise 4.30 (Continuation of exercise 3.17). Let us take the population model of this
exercise as the reference model. Using workfile timuse03, the corresponding fitted model
is the following:
·
houswork = 141.9+ 3.850 educ − 0.00917 hhinc + 1.767 age − 0.2289 paidwork
i i i i i
(23.27) (1.621) (0.00539) (0.311) (0.0229)
R2=0.1440 n=1000
(The numbers in parentheses are standard errors of the estimators.)
a) Which of the regressors included in the reference model are individually
significant at 5% and at 1%?
b) Estimate a model in which you could test directly whether one additional
year of education has the same effect on time devoted to house work as
two additional years of age. What is your conclusion?
c) Test the joint significance of educ and hhnc.
152
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
R2=0.642 n=144
(The numbers in parentheses are standard errors of the estimators.)
a) Which of the regressors included in the reference model are individually
significant at 1%?
b) Run a regression by adding the variables popnosan (population in
percentage without access to improved sanitation services) and gnirank
(rank in gni) to the reference model. Which of the regressors included in
the new model are individually significant at 1%? Interpret the coefficients
on popnosan and gnirank.
c) Are popnosan and gnirank jointly significant?
d) Test the overall significance of the model formulated in b).
Exercise 4.32 Using a sample of 42 observations, the following model has been estimated:
yˆt =
−670.591 + 1.008 xt
For observation 43, it is known that the value of x is 1571.9.
a) Calculate the point predictor for observation 43.
b) Knowing that the variance of the prediction error eˆ=
43
2 y 43 − yˆ 43 is equal to
(24.9048)2, calculate a 90% probability interval for the individual value.
Exercise 4.33 Besides the estimation presented in exercise 4.23, the following estimation
on the Brown consumption function is also available:
·
conspct =12729+ 0.3965(incpct − 13500) + 0.5771(conspct −1 − 12793.6)
(64.35) (0.0857) (0.0903)
2
R =0.997 RSS=1891320 n=56
(The numbers in parentheses are standard errors of the estimators.)
a) Obtain the point predictor for consumption per capita in 2011, knowing
that incpc2011=13500 and conspc2010=12793.6.
b) Obtain a 95%confidence interval for the expected value of consumption
per capita in 2011.
c) Obtain a 95% prediction interval for the individual value of consumption
per capita in 2011.
Exercise 4.34 (Continuation of exercise 4.30) Answer the following questions:
a) Using the first estimation in exercise 4.30, obtain a prediction for
houswork (minutes devoted to house-work per day), when you plug in the
153
INTRODUCTION TO ECONOMETRICS
154
HYPOTHESIS TESTING IN THE MULTIPLE REGRESSION MODEL
155
5 MULTIPLE REGRESSION ANALYSIS WITH
QUALITATIVE INFORMATION
156
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
β1 + β 2 educ + u
wage = (5-1)
To measure gender wage discrimination, we introduce a dummy variable for
gender as an independent variable in the model defined above,
β1 + δ1 female + β 2 educ + u
wage = (5-2)
The attribute gender has two categories: male and female. The female category
has been included in the model, while the male category, which was omitted, is the
reference category. Model 1 is shown in Figure 5.1, taking δ1<0. The interpretation of δ1
is the following: δ1 is the difference in hourly wage between females and males, given
the same amount of education (and the same error disturbance u). Thus, the coefficient δ1
determines whether there is discrimination against women or not. If δ1<0 then, for the
same level of other factors (education, in this case), women earn less than men on average.
Assuming that the disturbance mean is zero, if we take expectation for both categories we
obtain:
µ wage| female = E ( wage | female = 1, educ) = β1 + δ1 + β 2 educ
(5-3)
µ wage|male= E ( wage | female= 0, educ=
) β1 + β 2 educ
As can be seen in (5-3), the intercept is β1 for males, and β1+δ1 for females.
Graphically, as can be seen in Figure 5.1, there is a shift of the intercept, but the lines for
men and women are parallel.
wage
β2
δ1 β2
β1
β1 + δ1
0 educ
157
INTRODUCTION TO ECONOMETRICS
Nothing has changed with the new equation, except the interpretation of α1 and
γ1: α1 is the intercept for women, which is now the reference category, and α1+γ1 is the
intercept for men. This implies the following relationship between the coefficients:
α1=β1+δ1 and α1+γ1=β1⇒ γ1=−δ1
In any application, it does not matter how we choose the reference category, since
this only affects the interpretation of the coefficients associated to the dummy variables,
but it is important to keep track of which category is the reference category. Choosing a
reference category is usually a matter of convenience. It would also be possible to drop
the intercept and to include a dummy variable for each category. The equation would then
be
wage = µ1male +ν 1 female + β 2 educ + u (5-5)
where the intercept is µ1 for men and ν1 for women.
Hypothesis testing is performed as usual. In model (5-2), the null hypothesis of no
difference between men and women is H 0 : δ1 = 0 , while the alternative hypothesis that
there is discrimination against women is H1 : δ1 < 0 . Therefore, in this case, we must
apply a one sided (left) t test.
A common specification in applied work has the dependent variable as the
logarithm transformation ln(y) in models of this type. For example:
β1 + δ1 female + β 2 educ + u
ln( wage) = (5-6)
Let us see the interpretation of the coefficient of the dummy variable in a log
model. In model (5-6), taking u=0, the wage for a female and for a male is as follows:
ln( wageF ) = β1 + δ1 + β 2 educ (5-7)
) β1 + β 2 educ
ln( wageM= (5-8)
Given the same amount of education, if we subtract (5-7) from (5-8), we have
δ1
ln( wageF ) − ln( wageM ) = (5-9)
Taking antilogs in (5-9) and subtracting 1 from both sides of (5-9), we
get
wageF
− 1= eδ1 − 1 (5-10)
wageM
That is to say
wageF − wageM
= eδ1 − 1 (5-11)
wageM
According to (5-11), the proportional change between the female wage and the
male wage, for the same amount of education, is equal to eδ1 − 1 . Therefore, the exact
percentage change in hourly wage between men and women is 100 ×(eδ1 − 1) . As an
approximation to this change, 100×δ1 can be used. However, if the magnitude of the
percentage is high, then this approximation is not so accurate.
158
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
2
RSS=35.672 R =0.893 n=92
The marketcap/bookvalue elasticity is equal to 0.690; that is to say, if the book value increases by
1%, then the market capitalization of the quoted stocks will increase by 0.675%.
To test whether the stocks included in ibex35 have on average a higher capitalization implies
testing H 0 : δ1 = 0 against H1 : δ1 > 0 . Given that the t statistic is (0.690/0.179)=3.85, we reject the null
hypothesis for the usual levels of significance. On the other hand, we see that the stocks included in ibex35
are quoted 99.4% higher than the stocks not included. The percentage is obtained as follows:
100 × (e0.690 − 1) =99.4% .
In the case of β2, we can test H 0 : β 2 = 0 against H1 : β 2 ≠ 0 . Given that the t statistic is
(0.675/0.037)=18, we reject the null hypothesis for the usual levels of significance.
EXAMPLE 5.3 Do people living in urban areas spend more on fish than people living in rural areas?
To see whether people living in urban areas spend more on fish than people living in rural areas,
the following model is proposed:
β1 + δ1urban + β 2 ln(inc) + u
ln( fish) = (5-13)
where fish is expenditure on fish, urban is a dummy variable which takes the value 1 if the person lives in
an urban area and inc is disposable income.
Using a sample of size 40 (file demand), model (5-13) was estimated:
· fish) = - 6.375 + 0.140 urban + 1.313ln(inc)
ln(
(0.511) (0.055) (0.070)
159
INTRODUCTION TO ECONOMETRICS
160
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
1 1 0 0 educ1
1 1 0 0 educ2
1 0 1 0 educ3
X=
1 0 1 0 educ4
1 0 0 1 educ5
1 0 0 1 educ6
As can be seen in matrix X, column 1 of this matrix is equal to the sum of columns
2, 3 and 4. Therefore, there is perfect multicollinearity due to the so-called dummy
variable trap. Generalizing, if an attribute has g categories, we need to include only g−1
dummy variables in the model along with the intercept. The intercept for the reference
category is the overall intercept in the model, and the dummy variable coefficient for a
particular group represents the estimated difference in intercepts between that category
and the reference category. If we include g dummy variables along with an intercept, we
will fall into the dummy variable trap. An alternative is to include g dummy variables and
to exclude an overall intercept. In the case we are examining, the model would be the
following:
wage = θ 0 small + θ1medium + θ 2large + β 2 educ + u (5-16)
This solution is not advisable for two reasons. With this configuration of the model
it is more difficult to test differences with respect to a reference category. Second, this
solution only works in the case of a model with only one unique attribute.
EXAMPLE 5.4 Does firm size influence wage determination?
Using the sample of example 5.1 (file wage02sp), model (5-14), taking log for wage, was
estimated:
·wage) = 1.566 + 0.281 medium + 0.162 large + 0.0480 educ
ln(
(0.027) (0.025) (0.024) (0.0025)
2
RSS=406 R =0.218 n=2000
To answer the question above, we will not perform an individual test on θ1 or θ2. Instead we must
jointly test whether the size of firms has a significant influence on wage. That is to say, we must test whether
medium and large firms together have a significant influence on the determination of wage. In this case,
the null and the alternative hypothesis, taking (5-14) as the unrestricted model, will be the following:
H 0 : θ=
1 θ=
2 0
H1 : H 0 is not true
The restricted model in this case is the following:
β1 + β 2 educ + u
ln( wage) = (5-17)
The estimation of this model is the following:
·wage) = 1.657 + 0.0525 educ
ln(
(0.026) (0.0026)
2
RSS=433 R =0.166 n=2000
Therefore, the F statistic is
=F
[ RSS=
R − RSSUR ] / q [ 433 − 406] / 2
= 66.4
RSSUR / (n − k ) 406 / (2000 − 4)
So, according to the value of the F statistic, we can conclude that the size of the firm has a
significant influence on wage determination for the usual levels of significance.
161
INTRODUCTION TO ECONOMETRICS
Example 5.5 In the case of Lydia E. Pinkham, are the time dummy variables introduced significant
individually or jointly?
In example 3.4, we considered the case of Lydia E. Pinkham in which sales of a herbal extract
from this company (expressed in thousands of dollars) were explained in terms of advertising expenditures
in thousands of dollars (advexp) and last year's sales (salest-1). However, in addition to these two variables,
the author included three time dummy variables: d1, d2 and d3. These dummy variables encompass the
various situations which took place in the company. Thus, d1 takes 1 in the period 1907-1914 and 0 in the
remaining periods, d2 takes 1 in the period 1915-1925 and 0 in other periods, and finally, d3 takes 1 in the
period 1926 - 1940 and 0 in the remaining periods. Thus, the reference category is the period 1941-1960.
The final formulation of the model was therefore the following:
salest=β1+β2advexpt+β3salest-1+β4d1t+β5d2t+β6d3t+ut (5-18)
The results obtained in the regression, using file pinkham, were the following:
·
sales = 254.6+ 0.5345 advexp + 0.6073 sales - 133.35 d1 + 216.84 d 2 - 202.50 d 3
t t t- 1 t t t
(96.3) (0.136) (0.0814) (89) (67) (67)
R2=0.929 n=53
To test whether the dummy variables individually have a significant effect on sales, the null and
alternative hypotheses are:
ìïï H 0 : qi = 0
í i = 1, 2,3
ïïî H1 : qi ¹ 0
The corresponding t statistics are the following:
- 133.35 216.84 - 202.50
tqˆ = = - 1.50 tqˆ = = 3.22 tqˆ = = −3.02
1
89 2
67 3
67
As can be seen, the regressor d1 is not significant for any of the usual levels of significance,
whereas on the contrary the regressors d2 and d3 are significant for any of the usual levels.
The interpretation of the coefficient of the regressor d2, for example, is as follows: holding fixed
the advertising spending and given the previous year's sales, sales for one year of the period 1915-1920 are
$ 2.684 higher than for a year of the period 1941-1960.
To test jointly the effect of the time dummy variables, the null and alternative hypotheses are
ïíìï H 0 : q1 =q2 = q3 = 0
ïïî H1 : H 0 is not true
and the corresponding test statistic is
2
( RUR − RR2 ) / q (0.9290 − 0.8770) / 3
=F = = 11.47
(1 − RUR ) / (n − k ) (1 − 0.9290) / (53 − 6)
2
For any of the usual significance levels the null hypothesis is rejected. Therefore, the time dummy
variables have a significant effect on sales
162
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
Each of these two attributes has a reference category, which is the omitted
category. In this case, male is the reference category for gender and full-time for type of
contract. If we take expectations for the four categories involved, we obtain:
µ wage| female, partime = E [ wage | female, partime, educ ] = β1 + δ1 + φ1 + β 2 educ
µ wage| female, fulltime = E [ wage | female, fulltime, educ ] = β1 + δ1 + β 2 educ
µ wage|male, partime = E [ wage | male, partime, educ ] = β1 + φ1 + β 2 educ
µ wage|male, fulltime = E [ wage | male, fulltime, educ=] β1 + β 2 educ
(5-20)
The overall intercept in the equation reflects the effect of both reference categories,
male and full-time, and so full-time male is the reference category. From (5-20), you can
see the intercept for each combination of categories.
EXAMPLE 5.6 The influence of gender and length of the workday on wage determination
Model (5-19), taking log for wage, was estimated by using data from the wage structure survey
of Spain for 2006 (file wage06sp):
·wage) = 2.006 - 0.233 female - 0.087 partime + 0.0531 educ
ln(
(0.026) (0.021) (0.027) (0.0023)
2
RSS=161.95 R =0.760 n=48
Next, we will look at whether bluecoll is significant. Testing H 0 : δ1 = 0 against H1 : δ1 ≠ 0 , the
t statistic is (0.968/0.669)=1.45. As t400.10/ 2 =1.68, we fail to reject the null hypothesis for α=0.10. And so
there is no empirical evidence to state that absenteeism amongst blue collar workers is different from white
collar workers. But if we test H 0 : δ1 = 0 against H1 : δ1 > 0 , as t40 0.10
=1.30 for α=0.10, then we cannot
reject that absenteeism amongst blue collar workers is greater than amongst white collar workers.
On the contrary, in the case of the male dummy, testing H 0 : ϕ1 = 0 against H1 : ϕ1 ≠ 0 , given that
the t statistic is (2.049/0.712)=2.88 and t400.01/ 2 =2.70, we reject that absenteeism is equal in men and women
for the usual levels of significance.
EXAMPLE 5.8 Size of firm and gender in determining wage
In order to know whether the size of the firm and gender jointly are two relevant factors in
determining wage, the following model is formulated:
β1 + δ1 female + θ1medium + θ 2 large + β 2 educ + u
ln( wage) = (5-22)
In this case, we must perform a joint test where the null and the alternative hypotheses are
163
INTRODUCTION TO ECONOMETRICS
H 0 : δ=
1 θ=
1 θ=
2 0
H1 : H 0 is not true
In this case, the restricted model is model (5-17) which was estimated in example 5.4 (file
wage02sp). The estimation of the unrestricted model is the following:
·wage) = 1.639 - 0.327 female + 0.308 medium + 0.168 large + 0.0499 educ
ln(
(0.026) (0.021) (0.023) (0.023) (0.0024)
2
RSS=361 R =0.305 n=2000
The F statistic is
=F R − RSSUR ] / q
[ RSS= [ 433 − 361] / 3
= 133
RSSUR / (n − k ) 361 / (2000 − 5)
Therefore, according to the value of F, we can conclude that the size of the firm and gender jointly
have a significant influence in wage determination.
2
RSS=363 R =0.238 n=2000
To answer the question posed, we have to test H 0 : ϕ1 = 0 against H 0 : ϕ1 ≠ 0 . Given that the t
statistic is (0.167/0.058)=2.89 and taking into account that t600.01/ 2 =2.66, we reject the null hypothesis in
favor of the alternative hypothesis. Therefore, there is empirical evidence that the interaction between
females and part-time work is statistically significant.
EXAMPLE 5.10 Do small firms discriminate against women more or less than larger firms?
To answer this question, we formulate the following model:
ln( wage) = β1 + δ1 female + θ1medium + θ 2 large
(5-24)
+ϕ1 female × medium + ϕ 2 female × large + β 2 educ + u
Using the sample of example 5.1 (file wage02sp), model (5-24) was estimated:
·wage) = 1.624 - 0.262 female + 0.361 medium + 0.179 large
ln(
(0.027) (0.034) (0.028) (0.027)
2
RSS=359 R =0.308 n=2000
If in (5-24) the parameters ϕ1 and ϕ2 are equal to 0, this will imply that in the equation for wage
determination, there will be non interaction between gender and firm size. Thus to answer the above
question, we take (5-24) as the unrestricted model. The null and the alternative hypothesis will be the
following:
164
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
H 0 : ϕ=
1 ϕ=
2 0
H1 : H 0 is not true
In this case, the restricted model is therefore model (5-22) estimated in example 5.7. The F statistic
takes the value
=F R − RSSUR ] / q
[ RSS= [361 − 359] / 2
= 5.55
RSSUR / (n − k ) 359 / (2000 − 7)
For α=0.01, we find that F2,1993
0.01 0.01
; F2,60 = 4.98 . As F>5.61, we reject H0 in favor of H1. As H0 has
been rejected for α=0.01, it will also be rejected for levels of 5% and 10%. Therefore, the usual levels of
significance, the interaction between gender and firm size is relevant for wage determination.
β2
β2+ δ1
β1
0 educ
165
INTRODUCTION TO ECONOMETRICS
2
RSS=400 R =0.229 n=2000
In this case, we need to test H 0 : δ1 = 0 against H1 : δ1 < 0 . Given that the t statistic is (-
0.0274/0.0021) =-12.81, we reject the null hypothesis in favor of the alternative hypothesis for any level of
significance. That is to say, there is empirical evidence that the return for an additional year of education is
greater for men than for women.
For women the intercept is β1 + δ1 , and the slope β 2 + δ 2 . For female=0, we obtain
equation (5-1). In this case, for men the intercept is β1 , and the slope β 2 . Therefore, δ1
measures the difference in intercepts between men and women and, δ2 measures the
difference in the return to education between males and females. Figure 5.3 shows a lower
intercept and a lower slope for women than for men. This means that women earn less
than men at all levels of education, and the gap increases as educ gets larger; that is to
say, an additional year of education shows a lower return for women than for men.
Estimating (5-27) is equivalent to estimating two wage equations separately, one
for men and another for women. The only difference is that (5-27) imposes the same
variance across the two groups, whereas separate regressions do not. This set-up is ideal,
as we will see later on, for testing the equality of slopes, equality of intercepts, and
equality of both intercepts and slopes across groups.
166
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
wage
β2
β2+ δ2
β1
β1 + δ1
0 educ
FIGURE 5.3. Different slope, different intercept.
EXAMPLE 5.12 Is the wage equation valid for both men and women?
If parameters δ1 and δ2 are equal to 0 in model (5-27), this will imply that the equation for wage
determination is the same for men and women. In order to answer the question posed, we take (5-27), as
the unrestricted model but express wage in logs. The null and the alternative hypothesis will be the
following:
H 0 : δ=
1 δ=
2 0
H1 : H 0 is not true
Therefore, the restricted model is model (5-17). Using the same sample as in example 5.1 (file
wage02sp), we have obtained the following estimation of models (5-27) and (5-17):
·wage) = 1.739 - 0.3319 female + 0.0539 educ - 0.0027 educ ´ female
ln(
(0.030) (0.0546) (0.0030) (0.0054)
2
RSS=393 R =0.243 n=2000
·
ln(wage) = 1.657 + 0.0525 educ
(0.026) (0.0026)
2
RSS=433 R =0.166 n=2000
The F statistic takes the value
=F
[ RSS=
R − RSSUR ] / q [ 433 − 393] / 2
= 102
RSSUR / (n − k ) 393 / (2000 − 4)
It is clear that for any level of significance, the equations for men and women are different.
When we tested in example 5.1 whether there was discrimination in Spain against women
( H 0 : δ1 = 0 against H1 : δ1 < 0 ), it was assumed that the slope of educ (model (5-6)) is the same for men
and women. Now it is also possible to use model (5-27) to test the same null hypothesis, but assuming a
different slope. Given that the t statistic is (-0.3319/0.0546)=-6.06, we reject the null hypothesis by using
this more general model than the one in example 5.1.
In example 5.11 it was tested whether the coefficient δ2 in model (5-25), taking log for wage, was
0, assuming that the intercept is the same for males and females. Now, if we take (5-27), taking log for
wage, as the unrestricted model, we can test the same null hypothesis, but assuming that the intercept is
different for males and females. Given that the t statistic is (0.0027/0.0054)=0.49, we cannot reject the null
hypothesis which states that there is no interaction between gender and education.
EXAMPLE 5.13 Would urban consumers have the same pattern of behavior as rural consumers regarding
expenditure on fish?
To answer this question, we formulate the following model which is taken as the unrestricted
model:
167
INTRODUCTION TO ECONOMETRICS
2
RSS=1.123 R =0.904 n=40
· fish) = - 6.224 + 1.302 ln(inc)
ln(
(0.542) (0.075)
2
RSS=1.325 R =0.887 n=40
The F statistic takes the value
=F
[ RSS
= R − RSSUR ] / q [1.325
=
− 1.123] / 2
3.24
RSSUR / (n − k ) 1.123 / (40 − 4)
If we look up in the F table for 2 df in the numerator and 35 df in the denominator for α=0.10, we
0.10
find F2,36 0.10
; F2,35 = 2.46 . As F>2.46 we reject H 0 . However, as F2,36
0.05 0.05
; F2,35 = 3.27 , we fail to reject H 0
in favour of H1 for α=0.05 and, therefore, for α=0.01. Conclusion: there is no strong evidence that families
living in rural areas have a different pattern of fish consumption than families living in rural areas.
Example 5.14 Has the productive structure of Spanish regions changed?
The question to be answered is specifically the following: Did the productive structure of Spanish
regions change between 1995 and 2008? The problem posed is a problem of structural stability. To specify
the model to be taken as a reference in the test, let us define the dummy y2008, which takes the value 1 if
the year is 2008 and 0 if the year is 1995.
The reference model is a Cobb-Douglas model, which introduces additional parameters to collect
the structural changes that may have occurred. Its expression is:
ln(q ) =γ 1 + α1 ln(k ) + β1 ln(l ) + γ 2 y 2008 + α 2 y 2008 × ln(k ) + β 2 y 2008 × ln(l ) + u
(5-31)
It is easily seen, according to the definition of the dummy y2008, that the elasticities
production/capital are different in the periods 1995 and 2008. Specifically, they take the following values:
∂ ln(Q) ∂ ln(Q)
ε Q=
/ K (1995) = α1 ε Q=
/ K (2008) = α1 + α 2
∂ ln( K ) ∂ ln( K )
In the case that α2 is equal to 0, then the elasticity of production/capital is the same in both periods.
Similarly, the production/labor elasticities for the two periods are given by
∂ ln( L) ∂ ln( L)
ε Q=
/ K (1995) = β1 ε Q=
/ K (2008) = β1 + β 2
∂ ln( K ) ∂ ln( K )
The intercept in the Cobb-Douglas is a parameter that measures efficiency. In model (5-31), the
possibility that the efficiency parameter (PEF) is different in the two periods is considered. Thus
PEF (1995) γ=
= 1 PEF (2008) γ 1 + γ 2
If the parameters α1, β1 and γ1 are zero in model (5-31), the production function is the same in both
periods. Therefore, in testing structural stability of the production function, the null and alternative
hypotheses are:
H0 : γ 2 = α 2 = β2
(5-32)
H1 : H 0 is not true
Under the null hypothesis, the restrictions given in (5-32) lead to the following restricted model:
168
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
γ 1 + α1 ln(k ) + β1 ln(l ) + u
ln(q ) = (5-33)
The file prodsp contains information for each of the Spanish regions in 1995 and 2008 on gross
value added in millions of euros (gdp), occupation in thousands of jobs (labor), and productive capital in
millions of euros (captot). You can also find the dummy y2008 in that file.
The results of the unrestricted regression model (5-31) are shown below. It is evident that we
cannot reject the null hypothesis that each of the coefficients α1, β1 and γ1, taken individually, are 0, since
none of the t statistics reaches 0.1 in absolute value.
· gva ) = 0.0559+ 0.6743ln(captot ) + 0.3291ln(labor )
ln(
(0.916) (0.185) (0.185)
2
R =0.99394 n=34
The results of the restricted model (5-33) are the following:
· gva ) = − 0.0690+ 0.6959 ln(captot ) + 0.311ln(labor )
ln(
(0.200) (0.036) (0.042)
2
R =0.99392 n=34
As can be seen, the R2 of the two models are virtually identical because they differ only from the
fifth decimal. It is not surprising, therefore, that the F statistic for testing the null hypothesis (5-32) takes a
value close to 0:
2
( RUR − RR2 ) / q (0.99394 − 0.99392) / 3
=F = = 0.0308
(1 − RUR ) / (n − k ) (1 − 0.99394) / (34 − 6)
2
Thus, the alternative hypothesis that there is structural change in the productive economy of the
Spanish regions between 1995 and 2008 is rejected for any significance level.
169
INTRODUCTION TO ECONOMETRICS
the pooled (P) regression. Thus, we will consider that the RSSR and RSSP are equivalent
expressions.
Therefore, the F statistic will be the following:
RSS P − ( RSS1 + RSS 2 ) / k
F= (5-35)
[ RSS1 + RSS2 ] / [ n − 2k ]
It is important to remark that, under the null hypothesis, the error variances for the
groups must be equal. Note that we have k restrictions: the slope coefficients (interactions)
plus the intercept. Note also that in the unrestricted model we estimate two different
intercepts and two different slope coefficients, and so the df of the model are n−2k.
One important limitation of the Chow test is that under the null hypothesis there
are no differences at all between the groups. In most cases, it is more interesting to allow
partial differences between both groups as we have done using dummy variables.
The Chow test can be generalized to more than two groups in a natural way. From
a practical point of view, to run separate regressions for each group to perform the test is
probably easier than using dummy variables.
In the case of three groups, the F statistic in the Chow test will be the following:
F=
[ RSS P − ( RSS1 + RSS2 + RSS3 )] / 2 × k (5-36)
( RSS1 + RSS 2 + RSS3 ) / (n − 3k )
Note that, as a general rule, the number of the df of the numerator is equal to the
(number of groups-1) × k, while the number of the df of the denominator is equal to n
minus (number of groups) × k.
EXAMPLE 5.15 Another way to approach the question of wage determination by gender
Using the same sample as in example 5.1 (file wage02sp), we have obtained the estimation of the
equations in (5-34), taking log for wage, for men and women, which taken together gives the estimation of
the unrestricted model:
Female equation ·wage) = 1.407 + 0.0566 educ
ln(
(0.042) (0.0041)
2
RSS=104 R =0.236 n=617
Male equation ·wage) = 1.739 + 0.0539 educ
ln(
(0.031) (0.0032)
2
RSS=289 R =0.175 n=1383
The restricted model, estimated in example 5.4, has the same configuration as the equations in
(5-34) but in this case refers to the whole sample. Therefore, it is the pooled regression corresponding to
the restricted model. The F statistic takes the value
=F
[ RSS
= P − ( RSS F + RSS M ) ] / k [ 433 − (104 + 289)] / 2
= 102
RSS F + RSS M ) / (n − 2k ) (104 + 289) / (2000 − 2 × 2)
The F statistic must be, and is, the same as in example 5.12. The conclusions are therefore the
same.
EXAMPLE 5.16 Is the model of wage determination the same for different firm sizes?
In other examples the intercept, or the slope on education, was different for three different firm
sizes (small, medium and large). Now we shall consider a completely different equation for each firm size.
Therefore, the unrestricted model will be composed by three equations:
170
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
H1 : No H 0
Given this null hypothesis, the restricted model is model (5-2).
The estimations of the three equations of (5-37), by using file wage02sp, are the following:
small ·wage) = 1.706 - 0.249 female + 0.0396 educ
ln(
(0.034) (0.031) (0.0038)
2
RSS=121 R =0.160 n=801
medium ·wage) = 1.934 - 0.422 female + 0.0548 educ
ln(
(0.051) (0.039) (0.0046)
F=
[ RSS P − ( RSSS + RSSM + RSS L )] / 2 × k
( RSS S + RSS M + RSS L ) / (n − 3k )
=
[393 − (121 + 123 + 114)] / 6
= 32.4
(121 + 123 + 114) / (2000 − 3 × 3)
For any level of significance, we reject that the equations for wage determination are the same for
different firm sizes.
EXAMPLE 5.17 Is the Pinkham model valid for the four periods?
In example 5.5, we introduced time dummy variables and we tested whether the intercept was
different for each period. Now, we are going to test whether the whole model is valid for the four periods
considered. Therefore, the unrestricted model will be composed by four equations:
1907-1914 β11 + β 21advexpt + β 31 salest −1 + ut
salest =
1915-1925 β12 + β 22 advexpt + β 32 salest −1 + ut
salest =
(5-38)
1926-1940 β13 + β 23 advexpt + β 33 salest −1 + ut
salest =
1941-1960 β14 + β 24 advexpt + β 34 salest −1 + ut
salest =
The null and the alternative hypothesis will be the following:
β=
11 β=
12 β=
13 β14
H 0 : β=
21 β=
22 β=
23 β 24
β= β= β= β
31 32 33 34
H1 : No H 0
Given this null hypothesis, the restricted model is the following model:
β1 + β 2 advexpt + β3 salest −1 + ut
salest = (5-39)
The estimations of the four equations of (5-38) are the following:
·
1907-1914 sales t = 64.84+ 0.9149 advexp + 0.4630 salest - 1 SSR = 36017 n = 7
(603) (1.025) (0.425)
171
INTRODUCTION TO ECONOMETRICS
·
1915-1925 salest = 221.5+ 0.1279 advexp + 0.9319 salest - 1 SSR = 400605 n = 11
(190) (0.557) (0.300)
·
1926-1940 salest = 446.8+ 0.4638 advexp + 0.4445 salest - 1 SSR = 201614 n = 15
(112) (0.115) (0.0827)
·
1941-1960 salest = - 182.4+ 1.6753 advexp + 0.3042 salest - 1 SSR = 187332 n = 20
(134) (0.241) (0.111)
F=
[ SSRP − ( SSR1 + SSR2 + SSR3 + SSR4 )] / 3 × k
( SSR1 + SSR2 + SSR3 + SSR4 ) / (n − 4k )
[=
2527215 − (36017 + 400605 + 201614 + 187332) ] / 9
9.16
(36017 + 400605 + 201614 + 187332) / (53 − 4 × 3)
For any level of significance, we reject that the model (5-39) is the same for the four periods
considered.
Exercises
Exercise 5.1 Answer the following questions for a model with explanatory dummy
variables:
a) What is the interpretation of the dummy coefficients?
b) Why are not included in the model so many dummy variables as categories
there are?
Exercise 5.2 Using a sample of 560 families, the following estimations of demand for
rental are obtained:
qˆi =4.17 − 0.247 pi + 0.960 yi
(0.11) (0.017) (0.026)
2
R =0.371 n=560
qˆi =5.27 − 0.221 pi + 0.920 yi + 0.341 di yi
(0.13) (0.030) (0.031) (0.120)
R2=0.380
where qi is the log of expenditure on rental housing of the ith family, pi is the logarithm of
rent per m2 in the living area of the ith family, yi is the log of household disposable income
of the ith family and di is a dummy variable that takes value one if the family lives in an
urban area and zero in a rural area.
(The numbers in parentheses are standard errors of the estimators.)
a) Test the hypothesis that the elasticity of expenditure on rental housing with
respect to income is 1, in the first fitted model.
b) Test whether the interaction between the dummy variable and income is
significant. Is there a significant difference in the housing expenditure
elasticity between urban and rural areas? Justify your answer.
Exercise 5.3 In a linear regression model with dummy variables, answer the following
questions:
a) The meaning and interpretation of the coefficients of dummy variables in
models with endogenous variable in logs.
172
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
n n
∑ (=
yˆt − y ) ∑ uˆt2 20.22
109.24=
2
=t 1 =t 1
173
INTRODUCTION TO ECONOMETRICS
174
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
where q is output, k is capital, l is labor and f is a dummy variable that takes the value 1
for 1995 data and 0 for 2000.
a) Test whether there is structural change between 1995 and 2000.
b) Compare the results of estimations (3) and (4) with estimation (1).
c) Test the overall significance of model (1).
Exercise 5.10 With a sample of 300 service sector firms, the following cost function was
estimated:
· i = 0.847 + 0.899 qty RSS = 901.074 n = 300
cost i
(0.025)
RSS=1.1575 R2=0.9286
·mag ) = 1.26 + 0.811ln(inc ) + 0.030 age + 0.003 male - 0.250 prim + 0.108 sec
ln( i i i i i i
(0.020) (0.007) (0.0002) (0.003) (0.004) (0.005)
2
RSS=0.0306 R =0.9981
a) Is education a relevant factor to explain spending on magazines? What is
the reference category for education?
b) In the first model, is spending on magazines higher for men than for
women? Justify your answer.
c) Interpret the coefficient on the male variable in the second model. Is
spending on magazines higher for men than for women? Compare with the
result obtained in section a).
175
INTRODUCTION TO ECONOMETRICS
Exercise 5.12 Let fruit be the expenditure on fruit expressed in euros over a year carried
out by a household and let r1, r2, r3, and r4 be dichotomous variables which reflect the
four regions of a country.
a) If you regress fruit only on r1, r2, r3, and r4 without an intercept, what is
the interpretation of the coefficients?
b) If you regress fruit only on r1, r2, r3, and r4 with an intercept, what would
happen? Why?
c) If you regress fruit only on r2, r3, and r4 without an intercept, what is the
interpretation of the coefficients?
d) If you regress fruit only on r1- r2, r2, r4-r3, and r4 without an intercept, what
is the interpretation of the coefficients?
Exercise 5.13 Consider the following model
β1 + δ1 female + β 2 educ + u
wage =
Now, we are going to consider three possibilities of defining the female dummy
variable.
1 for female 2 for female 2 for female
1) female = 2) female = 3) female =
0 for male 1 for male 0 for male
a) Interpret the dummy variable coefficient for each definition.
b) Is one dummy variable definition preferable to another? Justify the
answer.
Exercise 5.14 In the following regression model:
β1 + δ1 female + u
wage =
where female is a dummy variable, taking value 1 for female and value 0 for a male.
Prove that applying the OLS formulas for simple regression you obtain that
βˆ = wage
1 M
176
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
R 2 = 0.9412 n = 18
3)
− 0.1650 educi × femalei + 0.1019 agei × femalei − 0.02625 paidworki × femalei (3)
(0.546) (0.112) (0.009)
= =
R 2 0.306 n 1000
a) In model (1), is there a statistically significant tradeoff between time
devoted to paid work and time devoted to housework?
b) All other factors being equal and taking as a reference model (2), is there
evidence that women devote more time to housework than men?
c) Compare the R2 of models (1) and (2). What is your conclusion?
177
INTRODUCTION TO ECONOMETRICS
178
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
hightech× 0.00153
workers (0.000271)
medtech× -0.000326
workers (0.000222)
179
INTRODUCTION TO ECONOMETRICS
b) In model (2), all other factors being equal, is there evidence that rdintens
in medium technology firms is equal to low technology firms? How strong
is the evidence?
c) Taking as reference model (2), if you had to test the hypothesis that
rdintens in high technology firms is equal to medium technology firms,
formulate a model that allows you to test this hypothesis without using
information on covariance matrix of the estimators
d) Is the influence of workers on rdintens associated with the level of
technology in the firms?
e) Is the model (1) valid for all firms regardless of their technological level?
Exercise 5.19 To explain the overall satisfaction of people (stsfglo), the following model
were estimated using data from the file hdr2010:
·
stsfglo i =− 0.375+ 0.0000207 gnipci + 0.0858 lifexpeci
(0.584) (0.00000617) (0.009) (1)
= 2
R 0.642= n 144
·
stsfglo = 2.911+ 0.0000381 gnipc + 1.215 lifexpec
i i i
(0.897) (0.00000572) (0.18)
= =
R 2 0.748 n 144
·
stsfgloi =
1.701+ 0.0000327 gnipci + 0.0527 lifexpeci + 1.166 dlatami
(1.014) (0.000006) (0.0147) (0.177)
R 2 = 0.760 n = 144
where
- gnipc is gross national income per capita expressed in PPP 2008 US dollar
terms,
- lifexpec is life expectancy at birth, i.e., number of years a newborn infant
could be expected to live,
- dafrica is a dummy variable that takes value 1 if the country is in Africa,
- dlatam is a dummy variable that takes value 1 if the country is in Latin
America.
a) In model (2), what is the interpretation of the coefficients on dlatam and
dafrica?
b) In model (2), do dlatam and dafrica individually have a significant positive
influence on global satisfaction?
c) In model (2), do dlatam and dafrica have a joint influence on global
satisfaction?
d) Is the influence of life expectancy on global satisfaction smaller in Africa
than in other regions of the world?
e) Is the influence of the variable gnipc greater in Africa than in other regions
of the world at 10%?
f) Are the interactions of people living in Africa and the variables gnipc and
lifexpec jointly significant?
180
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
Exercise 5.20 The equations which appear in the attached table have been estimated using
data from the file timuse03. This file contains 1000 observations corresponding to a
random subsample extracted from the time use survey for Spain carried out in 2002-2003.
The following variables appear in the table:
- educ is years of education attained,
- sleep, paidwork and unpaidwrk are measured in minutes per day,
- female, workday (Monday to Friday), spaniard and houswife are dummy
variables.
a) In model (1), is there a statistically significant tradeoff between time
devoted to paid work and time devoted to sleep?
b) In model (1), is the coefficient on unpaidwk statistically significant?
c) In model (1), is there evidence that women sleep more than men?
d) In model (2), are workday and spaniard individually significant? Are they
jointly significant?
e) Is the coefficient on housewife statistically significant?
f) Are the interactions between female and educ, paidwork and unpaidwk
jointly significant?
181
INTRODUCTION TO ECONOMETRICS
houswife -14.71
(10.42)
unpaidwk 0.00607
×female (0.00726)
paidwork -0.000324
×female (0.00540)
Exercise 5.21 To study infant mortality in the world, the following models have been
estimated using data from the file hdr2010:
·
deathinf = 93.02- 0.00037 gnipc - 0.6046 physicn - 0.003 contrcep
i i i i
(4.58) (0.0002) (0.1866) (0.003)
(1)
2
RSS=40285 R =0.6598 n=108
182
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
·
deathinf i = 78.55- 0.00042 gnipc - 0.3809 physicni - 0.6989 contrcepi
(5.96) (0.0002) (0.1879) (0.1042)
(2)
+ 17.92 dafrica
(5.05)
watchtv =−
127 3.653 educ + 1.291 age − 0.120 paidwork − 25.146 female
(9.915) (0.615) (0.129) (0.010) (4.903)
(2)
+ 17.137 y 2009 R =
0.184 n =
2
2000
(5.247)
183
INTRODUCTION TO ECONOMETRICS
watchtv =−
123 3.583 educ + 1.302 age − 0.105 paidwork − 24.869 female
(10.01) (0.615) (0.129) (0.012) (4.899)
(3)
+ 24.536 y 2009 − 0.050 y 2009 × paidwork R =
2
0.186 n=2000
(6.115) (0.021)
where
- educ is years of education attained,
- watchtv and paidwork are measured in minutes per day.
- female is a dummy variable that takes value 1 if the interviewee is a female
- y2009 is a dummy variable that takes value 1 if the survey was carried out in
2008-2009
a) In model (1), what is interpretation of the coefficient on educ?
b) In model (1), is there a statistically significant tradeoff between time
devoted to work and time devoted to watching television?
c) All other factors being equal and taking as reference model (2), is there
evidence that men watch television more than women? How strong is the
evidence?
d) In model (2), what is the estimated difference in watching television
between females surveyed in 2008-2009 and males surveyed in 2002-2003?
Is this difference statistically significant?
e) In model (3), what is the marginal effect of time devoted to paid work on
time devoted to watching television?
f) Is there a significant interaction between the year of the survey and time
devoted to paid work?
Exercise 5.23 Using the file consumsp, the following models were estimated to analyze
if the entry of Spain into the European community in 1986 had any impact on the behavior
of Spanish consumers:
·
conspc t = − 7.156+ 0.3965 incpct + 0.5771 conspct −1
(84.88) (0.0857) (0.0903)
(1)
2
R =0.9967 RSS=1891320 n=56
·
conspc t = −102.4+ 0.3573 incpct + 0.5992 conspct −1 + 148.60 y1986t
(108) (0.0879) (0.0901) (92.56)
(2)
2
R =0.9968 RSS=1802007 n=56
·
conspc =79.17 + 0.5181incpc + 0.4186 conspc + 819.82 y1986
t t t −1 t
(114) (0.1100) (0.1199) (456.3)
184
MULTIPLE REGRESSION ANALYSIS WITH QUALITATIVE INFORMATION
185
6 RELAXING THE ASSUMPTIONS IN THE LINEAR
CLASSICAL MODEL
186
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
E (uu′) = σ 2I (6-2)
When one or both assumptions indicated are not fulfilled, then the covariance
matrix will be less restrictive. Thus, we will consider the following covariance matrix of
the disturbances:
187
INTRODUCTION TO ECONOMETRICS
E (uu′) = σ 2Ω (6-3)
where the only restriction imposed on Ω is that it is a positive definite matrix
When the covariance matrix is a non-scalar matrix such as (6-3), then one can
obtain linear, unbiased and best estimators by applying the method of generalized least
squares (GLS). The expression of these estimators is as follows:
−1
βˆ = X′Ω −1X X′Ω −1y (6-4)
In practice, formula (6-4) is not directly applied. Instead a two-step process that
leads to exactly the same results is applied.
In section 6.5, we will examine the tests to determine whether there is
heteroskedasticity, as well as the particularization of the GLS method in this case. Section
6.6 will present testing methods and the appropriate treatment of autocorrelation.
Assumption 9 of normality postulated in the CLM allows us to make statistical
inferences with known distributions. If the normality assumption is not adequate, then the
tests will only be approximately valid. In section 6.4, a normality test of the disturbances
is used to determine whether this assumption is acceptable.
6.2 Misspecification
Misspecification occurs when we estimate a different model from the population
model. The problem in social sciences, and in particular in economics, is that we do not
usually know the population model.
Bearing in mind this observation, we shall consider three types of misspecification:
- Inclusion of irrelevant variables.
- Exclusion of relevant variables.
- Incorrect functional form.
188
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
∑ (x 2i − x2 ) x3i
Bias ( β%2 ) = β3 i =1
n
(6-11)
∑ (x
i =1
2i − x2 ) 2
The bias is null if, according to (6-11), the covariance between x2 and x3 is 0. It is
important to remark that the ratio
n
∑ (x
i =1
2i − x2 ) x3i
n
∑ (x
i =1
2i − x2 ) 2
is just the OLS slope ( δˆ2 ) coefficient from regression of x3 on x2. That is to say,
189
INTRODUCTION TO ECONOMETRICS
∑ (x 2i − x2 ) x3i
δˆ1 + δˆ2 xˆ2 =
xˆ2 = δˆ1 + i =1
n
xˆ2 (6-12)
∑ (x
i =1
2i − x2 ) 2
Thus, according to (6-72) - in appendix 6.1-, and (6-12), we can write that
E ( β%2=
) β 2 + β3δˆ2 (6-13)
Therefore, the bias is equal to β3δˆ2 . In table 6.1, there is a summary of the sign of
the bias in β%2 when x3 is omitted in estimating equation. It must be taken into account that the
sign of δˆ2 is the same as the sign of the sample correlation between x2 and x3.
190
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
Step 1. The initial model is estimated and the fitted values, yˆ i , are calculated.
Step 2. The augmented model, which can include one or more powers of yˆ i , is
estimated.
2 2
Step 3. Taking the Rinit corresponding to the initial model and the Raugm
corresponding to the augmented model, the F statistic is calculated:
2
( Raugm − Rinit
2
)/r
F= (6-19)
(1 − Raugm
2
) / ( n − h)
where r is the number of new parameters added to the initial model, and h
is the number of parameters of the augmented model, including the
intercept.
Under the null hypothesis, this statistic is distributed as follows:
F | H 0 : Fr ,n- h (6-20)
Step 4. For a significance level α, and designating by Frα,n − h the corresponding value
in the F table, the decision to make is the following:
191
INTRODUCTION TO ECONOMETRICS
If F ≥ Frα,n − h reject H0
If F < Frα,n − h not reject H 0
Therefore, high values of the statistic lead to the rejection of the initial model.
In RESET test we test the null hypothesis against an alternative hypothesis that
does not indicate what the correct specification should be. This test is therefore a
misspecification test which may indicate that there is some form of misspecification but
does not give any indication of what the correct specification should be.
EXAMPLE 6.1 Misspecification in a model for determination of wages
Using a subsample of data from the wage structure survey of Spain for 2006 (file wage06sp), the
following model is estimated:
·
wage = 4.679 + 0.681 educ + 0.293 tenure
i i i
(1.55) (0.146) (0.071)
R2=0.249 n=150
where educ (education) and tenure (experience in the firm) are measured in years and wage in euros per
hour.
Considering that we may have a problem of incorrect functional form, an augmented model is
· 2 and wage
estimated. In this augmented model - besides educ, tenure, and the intercept - wage · 3 from the
i i
2 2
initial model are included as regressors. The F statistic calculated using the Rinit and Raugm , according
0.05
to (6-19), is equal to 4.18. Given that F2,145 0.05
; F2,60 = 3.15 , we reject that, for the levels α=0.05 and α=0.10,
0.01
the linear form is adequate to explain wage determination. On the contrary, given that F2,145 0.01
; F2,60 = 4.98
H0 is not rejected for α=0.01.
6.3 Multicollinearity
6.3.1 Introduction
Perfect multicollinearity is not usually seen in practice, unless the model is
wrongly designed as we saw in chapter 5. Instead, an approximately linear relationship
between the regressors often exists. In this case, the estimators obtained will generally
not be very accurate, despite still being BLUE. In other words, the relationship between
regressors makes it difficult to quantify accurately the effect each one has on the
regressand. This is due to the fact that the variances of the estimators are high. When there
is an approximately linear relationship between the regressors, then it is said that there is
not perfect multicollinearity. The multicollinearity problem arises because there is
insufficient information to get an accurate estimation of model parameters.
To analyze the problem of multicollinearity, we will examine the variance of an
estimator. In the multiple linear regression model, the estimator of the variance of any
slope coefficient - for example, βˆ j - is equal, as we saw in (3-68), to
· sˆ 2
var(bˆ j ) = (6-21)
nS 2j (1- R 2j )
where ŝ 2 is the unbiased estimator of σ2, n is the sample size, S 2j is the sample variance
of the regressor xj, and R 2j is the R-squared obtained from regressing xj on all other x’s.
192
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
The last of these four factors which determines the value of the variance of βˆ j ,
(1- R 2j ), is precisely an indicator of multicollinearity. Multicollinearity arises in
estimating βj when R 2j is “close” to one, but there is no absolute number that we can
quote to conclude that multicollinearity is really a problem for the precision of the
estimators. Although the problem of multicollinearity cannot be clearly defined, it is true
that, for estimating βj, the lower the correlation between xj and the other independent
variables the better. If R 2j is equal to 1, then we would have perfect multicollinearity and
it is not possible to obtain the estimators of the coefficients. In any case, when one or
more R 2j are close to 1, multicollinearity is a serious problem. In this case, when making
inferences with the model, the following problems arise:
a) The variances of the estimators are very large.
b) The estimated coefficients will be very sensitive to small changes in the
data.
6.3.2 Detection
Multicollinearity is a problem of the sample, because it is associated with the
specific configuration of the sample of the x’s. For this reason, there are no statistical
tests. (Remember that statistical tests only work with population parameters). Instead,
many practical rules were developed attempting to determine to what extent
multicollinearity seriously affects the inference made with a model. These rules are not
always reliable, and in some cases are questionable. In any case, we are going to look at
some measures that are very useful to detect the degree of multicollinearity: the variance
inflation factor (VIF) and the tolerance, and the condition number and the coefficient
variance decomposition.
193
INTRODUCTION TO ECONOMETRICS
1
Tolerance( βˆ j ) = = 1 − R 2j (6-24)
VIF
Thus, VIF ( βˆ j ) is the ratio between the estimated variance and the one that there
would have been if xj was uncorrelated with the other regressors in the model. In other
words, the VIF shows the extent to which the variance of the estimator is "inflated" as a
result of non-orthogonallity of the regressors. It is readily seen that the higher the VIF (or
the lower the tolerance index), the higher the variance of βˆ j .
The procedure is to choose each one of the regressors at a time as the dependent
variable and to regress them against a constant and the remaining explanatory variables.
We would then get k values for the VIF’s. If any of them is high, then multicollinearity is
detected. Unfortunately, however, there is no theoretical indicator to determine whether
the VIF is “high.” Also, there is no theory that tells us what to do if multicollinearity is
found.
The variance inflation factor (VIF) and the tolerance are both widely used
measures of the degree of multicollinearity. Unfortunately, several rules of thumb – most
commonly the rule of 10 – associated with the VIF– are regarded by many practitioners
as a sign of severe or serious multicollinearity (this rule appears in both scholarly articles
and advanced statistical textbooks), but this rule has no scientific justification
The problem with the VIF (or the tolerance) is that it does not provide any
information that could be used to treat the problem.
EXAMPLE 6.2 Analyzing multicollinearity in the case of labor absenteeism
In example 3.1 a model was formulated and estimated, using file absent, to explain absenteeism
from work as a function of the variables age, tenure and wage.
Table 6.2 provides information on the tolerance and the VIF of each regressor. According to these
statistics, multicollinearity does not appear to affect the wage but there is a certain degree of
multicollinearity in the variables age and tenure. In any case, the problem of multicollinearity in this model
does not appear to be serious because all VIF are below 5.
TABLE 6.2. Tolerance and VIF.
Collinearity statistics
Tolerance VIF
age 0.2346 4.2634
tenure 0.2104 4.7532
wage 0.7891 1.2673
194
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
by one or more eigenvalues of X’X being “small”. The closer a matrix is to singularity
the smaller the eigenvalues. The condition number (κ) is defined as the square root of the
largest eigenvalue (λmax) divided by the smallest eigenvalue (λmin):
λmax
κ=
λmin
(6-25)
When there is no multicollinearity at all, then all the eigenvalues and the condition
number will be equal to one. As multicollinearity increases, eigenvalues will be both
greater and smaller than 1 (eigenvalues close to zero indicate a multicollinearity problem),
and the condition number will increase. An informal rule of thumb is that if the condition
number is greater than 15, multicollinearity is a concern; if it is greater than 30
multicollinearity is a very serious concern.
The variance of βˆ j can be decomposed into the contributions from each one of
the eigenvalues and can be expressed in the following way:
2
u jh
var( βˆ j ) = σ 2 ∑ (6-26)
h λh
195
INTRODUCTION TO ECONOMETRICS
As can be seen in table 6.3 3, the greater proportions associated with the smallest eigenvalue, which
is the main cause of multicollinearity in this model, correspond to the regressors educ and age. These two
regressors are inversely correlated. The greatest proportions associated with the second smallest eigenvalue
correspond to the regressors educ and the household income, which are positively correlated.
TABLE 6.3. Eigenvalues and variance decomposition proportions.
Eigenvalues 7.03E-06 0.000498 0.025701 1.861396 542.1400
Variance decomposition proportions
Associated Eigenvalue
Variable 1 2 3 4 5
6.3.3 Solutions
In principle, the problem of multicollinearity is related to deficiencies in the
sample. The non-experimental design of the sample is often responsible for these
deficiencies. Let us look at some of the solutions to solve the problem of multicollinearity.
Elimination of variables
Multicollinearity can be mitigated if the regressors most affected by
multicollinearity are removed. The problem with this solution is that the estimators of the
new model would be biased if the original model was correct. On this issue the following
reflection should be made. In any case, the researcher is interested in obtaining an
unbiased estimator (or at least with very small bias) with a reduced variance. The mean
square error (MSE) includes both factors. Thus, for the estimator βˆ j , the MSE is defined
as follows:
2
MSE ( βˆ j ) bias ( βˆ j ) + var ( βˆ j )
= (6-28)
3
In table 6.3, the eigenvalues are ordered from the lowest to the highest as the associated
eigenvalues in the variance decomposition proportions. It is important to remark that in E-views
eigenvalues are ordered from the highest to the lowest. However, in this package the condition number is
defined differently than usual in the econometrics manuals which we have followed.
196
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
Using ratios
If instead of the regressand and the regressors of the original model, we use ratios
with respect to the most affected regressor by collinearity, the correlations among the
regressors of the model may decrease. One such solution is very attractive for the
simplicity of implementation. However, the transformations of the original variables of
the model using ratios can cause other problems. Assuming the original model fulfills the
CLM assumptions, this transformation implicitly modifies the properties of the model,
and therefore the disturbances of the transformed model will no longer be homoskedastic
but heteroskedastic.
197
INTRODUCTION TO ECONOMETRICS
γ 1( uˆ ) = ∑ uˆ / n3
i
(6-29)
3/2
∑ uˆ / n
2
i
γ 2( uˆ ) = ∑ uˆ / n
4
i
(6-30)
∑ uˆ / n
2 2
i
The indication n → ∞ means that BJ is an asymptotic test, i.e. valid when the
sample is sufficiently large.
EXAMPLE 6.4 Is the hypothesis of normality acceptable in the model to analyze the efficiency of the
Madrid Stock Exchange?
In example 4.5, using file bolmadef, we analyzed the market efficiency of the Madrid Stock
Exchange in 1992, using a model that relates the daily rate of return on the rate of the previous day. Now
we will test the normality assumption on the disturbances of this model. Given the low proportion of the
variance explained with this model (see example 4.5), the test of normality of the disturbances is roughly
equivalent to test the normality of the endogenous variable.
Table 6.4 shows the coefficients of skewness, kurtosis and the Bera and Jarque statistic, applied to
the residuals. The asymmetry coefficient (-0.04) is not far from the value 0 corresponding to a distribution
N(0.1). On the other hand, the coefficient of kurtosis (4.43) is slightly different from 3, which is the value
in the normal distribution. In this case, we reject the assumption of normality for the usual levels of
significance, as the Bera and Jarque statistic takes the value of 21.02, which is larger than c 22(0.01) = 9.21.
TABLE 6.4. Normality test in the model on the Madrid Stock Exchange.
skewness coefficient kurtosis coefficient Bera and Jarque statistic
-0.0421 4.4268 21.0232
The fact that the normality assumption is rejected may seem paradoxical, since the values of
kurtosis and especially of skewness do not differ substantially from the values taken by these coefficients
in a normal distribution. However, the discrepancies are significant enough because they are supported by
a large sample size (247 observations). If n (the size of the sample) had been 60 rather than 247, the BJ
statistic, calculated according to (6-31) and using the same coefficient of skewness and kurtosis, takes the
198
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
value of 5.11, which is smaller than c 22(0.01) = 9.21. To put it another way, with the same coefficients, but
with a smaller sample, there is not enough empirical evidence to reject the null hypothesis of normality.
Note that this is due to the fact that the BJ statistic increases proportionally to the size of the sample, but
the degrees of freedom (2) remain unchanged.
6.5 Heteroskedasticity
The homoskedasticity assumption (assumption 7 of the CLM) states that the
disturbances have a constant variance, that is to say:
=
var (ui ) σ=
2
i 1, 2, n (6-33)
Assuming that there is only one independent variable, the homoskedasticity
assumption means that the variability around of the regression line is the same for any
value of x. In other words, variability does not increase or decrease when x varies, as
shown in figure 2.7, part a) of chapter 2. In figure 6.1, a scatter plot is shown
corresponding to a model in which disturbances are homoskedastic.
If the homoskedasticity assumption is not satisfied, then there is
heteroskedasticity, or disturbances are heteroskedastic. In figure 2.7, part b) a model with
heteroskedastic disturbances was represented: the dispersion increases with increasing
values of x. Figure 6.2 shows the scatter diagram corresponding to a model in which the
dispersion grows when x grows.
y y
x x
FIGURE 6.1. Scatter diagram corresponding to a FIGURE 6.2. Scatter diagram corresponding to a
model with homoskedastic disturbances. model with heteroskedastic disturbances.
199
INTRODUCTION TO ECONOMETRICS
be seen. Logically, low income families are unlikely to spend large amounts on hotels,
and in this case we can expect that the oscillations in the expenditure of one family to
another are not significant. In contrast, in high-income families a greater variability in
this type of expenditure can be expected. Indeed, high-income families may choose
between spending a substantial part of their income on hotels or spending virtually
nothing. The scatter diagram in figure 6.2 may be adequate to represent what happens in
a model to explain the demand for a luxury good such as spending on hotels.
b) The presence of outliers can cause heteroskedasticity. An outlier is an
observation generated apparently by a different population to that generating the
remaining sample observations. When the sample size is small, the inclusion or exclusion
of such an observation can substantially alter the results of regression analysis and cause
heteroskedasticity.
c) Data transformation. As we saw in a previous section, one of the solutions to
solve the problem of multicollinearity consisted in transforming the model taking ratios
with respect to a variable (say xji), i.e. dividing both sides of the model by xji. Therefore,
the disturbance will now be ui/xji, instead of ui. Assuming that ui fulfills the
homoskedasticity assumption, the disturbances of the transformed model (ui/xji) will no
longer be homoskedastic but heteroskedastic.
200
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
H 0 : α 2 = α 3 = α m = 0 (6-35)
The steps involved in this test are as follows:
Step 1. The original model is estimated and the OLS residuals are calculated.
Step 2. The following auxiliary regression is estimated, taking as the regressand the
2
square of the residuals ( uˆi ) obtained in estimating the original model,
since we know neither σ i nor ui :
2 2
201
INTRODUCTION TO ECONOMETRICS
using data from table 6.5, the following estimated model is obtained:
·
hosteli = - 7.427+ 0.0533 inci
(3.48) (0.0065)
uˆi -2.226 -5.888 1.100 1.505 -4.751 -0.234 -0.565 8.913 2.777 -0.631
Step 4. Given that χ12(0.01) =3.84, the null hypothesis of homoskedasticity is rejected for a
significance level of 5%, because BPG>3.84, but not for the significance level of 1%.
Note that the validity of this test is asymptotic. However, the sample used in this example is very
small.
White test
In the White test the hypothetical variables determining the heteroskedasticity are
not specified. This test is a non-constructive test because it gives no indication of the
heteroskedasticity scheme when the null hypothesis is rejected
The White test is based on the fact that the standard errors are asymptotically valid
if we substitute the homoskedasticity assumption for the weaker assumption that the
squared disturbance u2 is uncorrelated with all the regressors, their squares, and their cross
products. Taking this into account, White proposed to carry out the auxiliary regression
2 2
of uˆi , since ui is unknown, on the factors mentioned above. If the coefficients of the
auxiliary regression are jointly non-significant, then we can admit that the disturbances
are homoskedastic. According to the assumption adopted, the White test is an asymptotic
test.
The application of the White test can pose problems in models with many
regressors. For example, if the original model has five independent variables, the White
auxiliary regression would involve 16 regressors (unless some are redundant), which
implies that the estimation is done with a loss of 16 degrees of freedom. For this reason,
when the model has many regressors a simplified version of the White test is often
applied. In the simplified version, the cross products are omitted from the auxiliary
regression.
The steps involved in the complete version of the White test are as follows:
Step 1. The original model is estimated and the OLS residuals are calculated.
Step 2. The following auxiliary regression is estimated, taking as the regressand the
square of the residuals obtained in the previous step:
202
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
uˆi2 = α1 + α 2ψ 2i + α 3ψ 3i + + α mψ mi + ε i (6-38)
203
INTRODUCTION TO ECONOMETRICS
In graphic 6.1, the scatter plot between the residuals in absolute value (ordinate) and the variable
bookval (in abscissa) is represented. This graphic shows that the absolute values of the residuals, which are
indicative of the spread of this series, grow with increasing values of the variable bookval. In other words,
this graph provides an indication but not a formal proof of the existence of heteroskedasticity of the
disturbances associated with the variable bookval.
400
350
Residuals in absolute value
300
250
200
150
100
50
0
0 100 200 300 400 500 600 700
valcon
GRAPHIC 6.1. Scatter plot between the residuals in absolute value and the variable bookval in the
linear model.
The BPG statistic takes the following value:
2
BPG= nRra = 20×0.5220=10.44
In graphic 6.2 the scatter plot between the residuals in absolute value (ordinate), corresponding to
this estimated model, and the variable ln(bookval) (in abscissa) is represented. As shown, the two largest
residuals correspond to two banks with small market value. Even disregarding these two cases, apparently
there is no relationship between the residuals and the explanatory variable of the model.
204
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
1.0
0.9
0.8
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
ln(valcon)
GRAPHIC 6.2. Scatter plot between the residuals in absolute value and the variable bookval in the
log-log model.
The results of the two tests of heteroskedasticity applied are shown in table 6.7.
TABLE 6.7. Tests of heteroskedasticity on the log-log model to explain the market
value of Spanish banks.
Test Statistic Table values
where inc is disposable income of a household, hhsize is the number of household members, and secstud
and terstud are two dummies that take the value one if individuals have completed secondary and tertiary
studies respectively.
The results obtained, using file hostel, are the following (file hostel):
·hostel ) = - 16.37+ 2.732 ln(inc) + 1.398 secstud + 2.972 terstud - 0.444 hhsize
ln( i i i i i
(2.26) (0.324) (0.258) (0.333) (0.088)
R2=0.921 n=40
Note that hostel services are a luxury good, as the elasticity of demand/income for this good is
very high (2.73). This means that if income increases by 1%, spending on hostel services will increase, on
average, by 2.73%. As can be seen, families where the main breadwinner has secondary studies (secstud)
or, especially, higher education (terstud), spend more on hostel services than if the main breadwinner only
205
INTRODUCTION TO ECONOMETRICS
has primary education. However, spending on hostel services will decrease as household size (hhsize)
increases.
Graphic 6.3 shows the scatter plot between the residuals in absolute value and the variable ln(inc).
Income (or a transformation of it) is the main candidate, if not the only one, to explain the hypothetical
heteroskedasticity in the disturbances. As shown in the graphic, the dispersion of residuals is smaller for
low incomes than for middle or upper incomes.
We will now apply the two tests of heteroskedasticity that have been discussed in this section.
1.6
1.4
Residuals in absolute value
1.2
0.8
0.6
0.4
0.2
0
6.4 6.6 6.8 7 7.2 7.4 7.6 7.8 8
ln(inc)
GRAPHIC 6.3. Scatter plot between the residuals in absolute value and the variable ln(inc) in the
hostel model.
The results of the two tests of heteroskedasticity applied are shown in table 6.8
TABLE 6.8. Tests of heteroskedasticity in the model of demand for hostel services.
Test Statistic Table values
Breusch-Pagan-Godfrey
2
BPG= nRra =7.83 χ 22(0.05) =5.99
White W= nRra2 =12.24 χ 22(0.01) =9.21
In the BPG test we reject the null hypothesis of homoskedasticity for a significance level of α=0.05,
but not for α=0.01.
Since there are many dummy variables in the model, including cross products in the auxiliary
regression, this can lead to serious problems of multicollinearity. For this reason, in the auxiliary regression
cross products are not included. Not surprisingly, among the regressors of the auxiliary regression squares
of secstud and terstud are not included because they are dummies. Given the value obtained in the White
statistic, we reject the null hypothesis of homoskedasticity for a significance level of α=0.01. Therefore,
the White test is more conclusive in rejecting the homoskedasticity assumption.
206
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
However, it is important to note that this estimator does not work well if the sample is
small, given that it is an asymptotic approximation.
Most econometric packages allow standard errors to be calculated by the White
procedure. By using these consistent standard deviations, adequate tests can be made
under the heteroskedasticity assumption.
EXAMPLE 6.9 Heteroskedasticity consistent standard errors in the models explaining the market value of
Spanish banks (Continuation of example 6.7)
In the following estimated equation of the linear model, using file bolmad95, standard deviations
of the estimates are calculated by the White procedure and therefore they are consistent under
heteroskedasticity:
·
marktval = 29.42+ 1.219 bookval
(18.67) (0.249)
As can be seen, the standard error of the bookval coefficient goes from 0.127 in the usual procedure
to 0.249 in the White procedure. However, the p-value remains very low (0.0001). Accordingly, the
significance of the variable bookval for all usual levels is still maintained. By contrast, the intercept, which
has no special meaning in the model, now has a standard error (18.67), which is lower than that obtained
with the usual procedure (30.85).
If we apply the White procedure to the log-log model, the following results are obtained:
·
ln( marktval ) = 0.676+ 0.9384 ln(bookval )
(0.3218) (0.0698)
In this case, the standard error of ln(bookval) coefficient is practically the same in the two
procedures.
From the above results, the following conclusions can be obtained. In determining the market
value of Spanish banks, disturbances of the linear model are strongly heteroskedastic. Therefore, when
using a consistent estimate, the standard deviation is almost doubled compared to the standard one. By
contrast, in the log-log model, which is not affected by heteroskedasticity, there is little difference between
the standard errors obtained with both procedures.
207
INTRODUCTION TO ECONOMETRICS
procedure is often called weighted least squares (WLS). In this case, the weighting factor
is 1/f(xji).
If the function f(xji) is not known, it is necessary to estimate it. In that case, the
estimation method will not be exactly the GLS method because the application of this
method involves the knowledge of the covariance matrix, or, at least, knowledge of a
matrix that is proportional to it. If we estimate the covariance matrix, in addition to the
parameters, it is said that feasible GLS is applied. In the case of heteroskedastic
disturbances, the particularization of the feasible GLS method is called WLS (weighted
least squares) in two stages. In the first the function f(xij) stage is estimated, whereas in
the second stage OLS is applied to the model transformed using the f(xji) estimates.
To see how to apply the WLS method in two stages, let us consider the following
relationship, which simply defines the variance of the disturbances, in the case of
heteroskedasticity,
E ( ui2 ) = σ i2 (6-43)
Therefore, the squared disturbance can be made equal, as in the regression model,
to its expectation plus a random variable. That is to say:
u=
2
i σ i2 + ε i (6-44)
As the disturbances are not observable, one can establish a relationship analogous
to the above using residuals instead of disturbances. Therefore,
uˆ=
2
i σ i2 + η2i (6-45)
It should be noted that the above relationship does not have exactly the same
properties as (6-44) because the residuals are correlated and heteroskedastic, even if the
disturbances fulfill the CLM assumptions. However, in large samples they will have the
same properties.
If we use the residuals as the regressand instead of the squared residuals, we must
take the absolute values, since the standard deviation takes only positive values. Taking
into account (6-45), the following relationship can be established:
uˆi = σ i2 + η 2i = f ( xij ) + η 2i (6-46)
Since the function f(xij) is generally unknown, different functions are often tried.
Here there are some of the most common:
α1 + α 2 x ji + η2i
uˆi =
α1 + α 2 x ji + η2i
uˆi =
1 (6-47)
α1 α 2
uˆi =+ + η2i
x ji
α1 + α 2 ln( x ji ) + η2i
uˆi =
The functional form with the best fit (a higher coefficient of determination or a
smaller AIC statistic) is selected. For the transformation two circumstances are
contemplated, depending on the significance of the intercept. If this coefficient is
208
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
statistically significant, the model is transformed by dividing by the fitted values of the
selected equation. If it is not statistically significant, the model is transformed by dividing
by the regressor corresponding to the selected equation. Thus, if the selected equation
were the second one of (6-47), with the intercept not being significant, the transformed
model would be as follows:
yi 1 x x x u
= β1 + β 2 2i + β3 3i + L + β k ki + i
x ji x ji x ji x ji x ji x ji
(6-48)
Note that if the intercept is not significant, the estimated parameters are not
involved in the transformation of the model, but they are if the intercept is significant. As
the estimators in models (6-47) are biased, although consistent, it is not convenient to
transform the models by applying the fitted values, uˆi -obtained by using α̂ 0 and α̂1 -
except when the significance of the intercept is very high (e.g., exceeding 1%).
EXAMPLE 6.10 Application of weighted least squares in the demand of hotel services (Continuation of
example 6.8)
Since the two tests applied to the model to explain the cost of hotel services indicate that the
disturbances are heteroskedastic, we apply the weighted least squares method to estimate the model (6-40).
First, we estimate the four models (6-47), using as the regressand the residuals uˆi -in absolute
value- obtained in the estimation of model (6-40) by OLS. The results are presented below:
=¶ =
uˆi 0.0239+ 0.0003 inc R 2 0.1638
(0.143) (2.73)
=¶
uˆi -=
0.4198+ 0.0235 inc R 2 0.1733
(- 1.34) (2.82)
¶ 1
= =
uˆi 0.8857 - 532.1 R 2 0.1780
(5.39) (- 2.87) inc
=¶
uˆi -=
2.7033+ 0.4389 ln(inc) R 2 0.1788
(- 2.46) (2.88)
R2=0.914 n=40
Compared to the OLS estimates of example 6.5, it can be seen that the differences are very small,
which is indicative of the robustness of the model.
6.6 Autocorrelation
No autocorrelation, or no serial correlation assumption (assumption 8 of the CLM)
states that disturbances with different subscripts are not correlated with each other:
=
E (ui u j ) 0 i≠ j (6-49)
That is, the disturbances corresponding to different periods of time, or to different
individuals, are not correlated with each other. Figure 6.3 shows a plot corresponding to
disturbances which are not autocorrelated. The x axis is time. As can be seen, disturbances
209
INTRODUCTION TO ECONOMETRICS
are randomly distributed above and below the line 0 (theoretical mean of u). In the figure,
each disturbance is linked by a line to the disturbance of the following period: in total this
line crosses the line 0 on 13 occasions.
3
u
00 time
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
-1
-2
-3
3
2
2
1
1
00 time 00 time
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
-1
-1
-2
-2
-3
-3
-4
-4 -5
FIGURE 6.4. Plot of positive autocorrelated FIGURE 6.5. Plot of negative autocorrelated
disturbances. disturbances.
210
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
Let us suppose the correct functional form for determining wage as a function of
years of experience (exp) is as follows:
β1 β 2exp + β3exp 2 + u
wage =+
Instead of this model, the following one is fitted:
β1 β 2exp + v
wage =+
In the second model, the disturbance has a systematic component
=( v β3exp + u ). In figure 6.5, a scatter diagram (generated for the first model) and the
2
fitted function of the second model are represented. As can be seen, for the low values of
exp the fitted model overestimates wages; for intermediate values of exp wages are
underestimated; finally, for high values the fitted model again overestimates wages. This
example illustrates a case in which the use of an uncorrected functional form provokes
positive autocorrelation.
On the other hand, the omission of a relevant variable in the model could induce
positive autocorrelation if that variable has, for example, a cyclical behavior.
y
x
FIGURE 6.6. Autocorrelated disturbances due to a specification bias.
b) Inertia. The disturbance term in a regression equation reflects the influence of
those variables affecting the dependent variable that have not been included in the
regression equation. To be precise, inertia or the persisting effects of excluded variables
of the model –and included in u- is probably the most frequent cause of positive
autocorrelation. As is well known, macroeconomic time series -such as GDP, production,
employment and price indexes- tend to move together: during expansion periods these
series tend to increase in parallel, while in times of contraction they tend to decrease also
in a parallel form. For this reason, in regressions involving time series data, successive
observations of the disturbance are likely to be dependent on the previous ones. Thus, this
cyclical behavior can produce autocorrelation in the disturbances.
c) Data Transformation. As an example let us consider the following model to
explain consumption as a function of income:
const = b1 + b2inct + ut (6-50)
For the observation t-1, we can write
211
INTRODUCTION TO ECONOMETRICS
212
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
∑ ( uˆ t − uˆt −1 )
= =
d DW t =2
n
(6-55)
∑ uˆ
t =1
2
t
213
INTRODUCTION TO ECONOMETRICS
-1
-2
-3
-4
GRAPHIC 6.4. Standardized residuals in the estimation of the model to determine the efficiency of
the Madrid Stock Exchange.
EXAMPLE 6.12 Autocorrelation in the model for the demand for fish
4
Standardized residuals are equal to residuals divided by σˆ .
214
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
In example 4.9 we estimated model (4-44), using file fishdem, to explain the demand for fish in
Spain. The graphic 6.5 shows the standardized residuals obtained in the estimation of this model. This
graph does not show that there is a significant autocorrelation scheme. In this regard, it should be noted
that, over a total of 28 observations, the line joining the points of the residuals crosses the axis 0 11 times,
which indicates a degree of randomness of the distribution of the residuals.
The value of the DW statistic for testing the scheme (6-53) is 1.202. For n=28 and k'=3, and for a
significance level of 1%, we get the following tabulated values:
dL=0.969 dU=1.415
Since dL<1.202<dU, there is not enough evidence to accept the null hypothesis, or to reject it.
3
-1
-2
2 4 6 8 10 12 14 16 18 20 22 24 26 28
GRAPHIC 6.5. Standardized residuals in the model on the demand for fish.
Durbin’s h test
Durbin (1970) proposed a statistic, called h, to test the hypothesis (6-54) in the
case that one or more lagged endogenous variables appear as explanatory variables. The
expression of the h statistic is the following:
n
h = rˆ (6-60)
¶ ( )
1- n var bˆ j
where rˆ is the correlation coefficient between uˆi and uˆi −1 , n is the sample size, and
¶ bˆ ) is the variance corresponding to the coefficient of the lagged endogenous
var (j
variable.
The statistic rˆ can be estimated using the following approximation, d ; 2(1- rˆ )
. If the regressand appears with different time lags as regressors, the variance
corresponding to the regressor with the lowest lag is selected.
Under assumptions (6-54), the h statistic has the following distribution:
h ¾ n¾
®¥
¾
® N (0,1) (6-61)
The critical region is therefore in the tails of the standard normal distribution: the
tail on the right for positive autocorrelation and the tail on the left for negative
autocorrelation.
¶ (bˆ ) ³ 1 . In this case, Durbin
The statistic (6-60) cannot be calculated if n var j
215
INTRODUCTION TO ECONOMETRICS
taken as the regressand, the regressors are the same as those of the original model and the
residuals also lagged a period. This procedure is a particular case of the Breusch–Godfrey
test, which we will see next.
EXAMPLE 6.13 Autocorrelation in the case of Lydia E. Pinkham
In example 5.5 with the case of Lydia E. Pinkham, a model to explain the sales of a herbal extract
was estimated using file pinkham. Graphic 6.6 shows the graph of standardized residuals corresponding to
this model. As can be seen, it appears that the residuals are not distributed in a random way. Note, for
example, that from 1936 the residuals take positive values for 8 consecutive years.
The adequate test for autocorrelation in this model is Durbin’s h statistic, as there is a lagged
endogenous variable salest-1 in this model. The h statistic is:
n é dù n é 1.2012 ù 53
h = rˆ = ê1- ú = ê1- ú = 3.61
¶ ˆ ( )
1- n var b j ê
ë 2 ú ¶ ˆ
û 1- n var b j( )ê
ë 2 ú
û 1 - 53´ 0.08142
Given this value of h, the null hypothesis of no autocorrelation is rejected for α=0.01 or, even, for
α=0.001, according to the table of the normal distribution.
5,0
4,0
3,0
2,0
1,0
0,0
-1,0
-2,0
-3,0
-4,0
-5,0
8 13 18 23 28 33 38 43 48 53 58
GRAPHIC 6.6. Standardized residuals in the estimation of the model of the Lydia E. Pinkham case.
216
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
Step 1. The original model is estimated and the OLS residuals ( uˆi ) are calculated.
Step 2. An auxiliary regression is estimated, in which the residuals ( uˆi ) are taken
as the regressand and the regressors of the original model and the residuals
lagged 1, 2, ... and p periods are taken as regressors:
uˆt = α1 + α 2 x2t + + α k xkt + γ 1uˆt −1 + + γ 1uˆt − p + ε i (6-63)
The auxiliary regression should have an intercept, even if the original
model is estimated without it. In accordance with expression (6-63), in the
auxiliary regression there are k+p regressors in addition to the intercept.
2
Step 3. Designating by Rar the coefficient of determination of the auxiliary
2
regression, the statistic nRar is calculated.
Under the null hypothesis, the BG statistic is distributed as follows:
BG= nRar2
n →∞
→ χ k2+ p (6-64)
The BG statistic is used to test the overall significance of the model (6-63).
For this purpose, the F statistic can also be used. However, in this case it
has only asymptotic validity, in the same way as with the BG statistic.
Step 4 For a significance level α, and designating by χ k2(+αp) the corresponding value
in χ2 table, the decision to make is the following:
If BG > χ k2(+αp) H0 is rejected
217
INTRODUCTION TO ECONOMETRICS
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
-1.0
-1.5
-2.0
5 10 15 20 25 30 35 40 45
GRAPHIC 6.7. Standardized residuals in the estimation of the model explaining the expenditures of
residents abroad.
Graphic 6.7 shows the standardized residuals corresponding to this model. As can be seen, it
appears that the residuals are not distributed in a random way because, for example, there are peaks every
4 quarters, indicating that the autocorrelation has a scheme AR(4).
The BG statistic, calculated for a AR(4) scheme, is equal to nRar2 =36.35. Given this value of BG,
the null hypothesis of no autocorrelation is rejected for α=0.01, since χ 52(α ) =15.09. In the auxiliary
regression, in which uˆt −1 , uˆt − 2 , uˆt −3 and uˆt − 4 have been used as regressors, uˆt − 4 is the only significant
regressor.
218
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
TABLE 6.9.The t statistics, conventional and HAC, in the case of Lydia E. Pinkham.
regressor t conventional t HAC ratio
intercept 2.644007 1.779151 1.49
advexp 3.928965 5.723763 0.69
sales(-1) 7.45915 6.9457 1.07
d1 -1.499025 -1.502571 1.00
d2 3.225871 2.274312 1.42
d3 -3.019932 -2.658912 1.14
yt 1 − ρ 2= β1 1 − ρ 2 + β 2 1 − ρ 2 x2t + L + β k 1 − ρ 2 xkt +ε t
(6-68)
When we estimate ρ together with the other model parameters, then the method is
called feasible GLS.
In general, in the application of feasible GLS the transformation of the first
observation according to (6-68) is ignored. Feasible GLS methods for estimating a model
in which the disturbances follow a AR(1) scheme can be grouped into three blocks: a)
two-step methods, b) iterative methods, and c) scanning methods.
Here we present two methods for block a), called direct method and Durbin two
stages method.
219
INTRODUCTION TO ECONOMETRICS
In the first stage of these two methods, ρ is estimated. In the direct method, ρ is
easily estimated from the DW statistic, using this approximate ratio DW ; 2(1- rˆ ) . In
the method of Durbin in two stages, we estimate the following regression model in which
the explanatory variables are the regressors of the original model, the regressors lagged
one period and the endogenous variable lagged one period:
yt = α1 + α 2,0 x2t + α 2,1 x2,t −1 + L + α k 0 xkt + α k1 xk ,t −1 +ρ yt −1 +υt
(6-69)
The coefficient of the lagged endogenous variable is precisely the parameter ρ. In
the first stage, the model (6-69) is estimated by OLS, taking from it the estimate of ρ. In
the second stage, applicable to both methods, the model is transformed with the estimation
of ρ calculated in the first stage as follows:
Exercises
Exercise 6.1 Let us consider that the population model is the following:
y=i β1 + β 2 xi +ui (1)
Instead, the following model is estimated:
yi = β%2 x2i
% (2)
Exercise 6.2 Let us consider that the population model is the following:
yi = β 2 xi +ui (1)
Instead, the following model is estimated:
y=i β%1 + β%2 x2i (2)
%
Is β%2 , obtained by applying OLS in (2), an unbiased estimator of β 2 ?
220
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
A researcher estimates the model mistaking only 8 observations, and obtains the
following results:
·
output = 97.259+ 0.970 labor + 0.650 capital
i i i
(1.956) (0.124) (0.027)
R2 = 0.999 F=3422
The numbers in parentheses are the standard errors of the estimators and the F
statistic corresponds to the test of the whole model.
When he realizes his mistake, he estimates the model with all observations (n=9),
obtaining in this case the following results:
·
output = 75.479- 1.970 labor + 1.272 capital
i i i
(32.046) (1.742) (0.376)
R2 = 0.824 F= 14.056
221
INTRODUCTION TO ECONOMETRICS
His confusion is great when comparing the two estimates, and he cannot
understand why the results become very different when using one more observation. Can
we find any reason that could justify these differences?
Exercise 6.6 Suppose in the model
y =β 0 + β1 x1 + β 2 x2 +u
2
the R-squared obtained from regressing x1 on x2, which will be called R1/2 , is zero.
Run the following regressions:
y λ0 + λ1 x1 +u
=
y γ 0 + γ 1 x2 +u
=
a) Will lˆ1 be equal to b̂1 and ĝ1 be equal to b̂2 ?
b) Will b̂0 be equal to lˆ0 or b̂0 be equal to ĝ0 ?
c) Will var( lˆ1 ) be equal to var( b̂1 ) and var( ĝ1 ) be equal to var( b̂2 )?
Exercise 6.7 An analyst wants to estimate the following model using the observations of
the attached table:
yi = e β1 x2βi2 x3βi3 x4βi4 eui
x2 x3 x4
3 12 4
2 10 5
4 4 1
3 9 3
2 6 3
5 5 1
What problems can occur in the estimation of this model with these data?
Exercise 6.8 In exercise 4.8, using the file airqualy, the following model was estimated:
·
airqual = 97.35+ 0.0956 popln − 0.0170 medincm − 0.0254 poverty
i i i i
(10.19) (0.0311) (0.0055) (0.0089)
R2=0.415 n=30
a) Calculate the statistic VIF for each coefficient.
b) What is your conclusion?
Exercise 6.9 To examine the effects of firm performance on CEO salary, the following
model is formulated:
β1 β 2 roa + β3 ln( sales ) + β 4 profits + β5tenure + β 6 age + u
ln( salary ) =+
where roa is the ratio profits/assets expressed as a percentage, tenure is the number of
years as CEO (=0 if less than six months), and age is age in years. Salaries are expressed
in thousands of dollars, and sales and profits in millions of dollars.
a) Using the full sample (447 observations) of the file ceoforbes, estimate the
model by OLS.
222
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
223
INTRODUCTION TO ECONOMETRICS
b) Show that [4] is less than or equal to [3]. (Hint: Apply the Cauchy-Schwarz
2
inequality which says that éêëå wi zi ùúû £ éêëå wi2 ùé 2ù
ûëå zi ú
úê û is true)
224
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
225
INTRODUCTION TO ECONOMETRICS
R2=0.997 n=27
where gdp is the gross domestic product at market prices, and rpimp are the relative prices
imports/gdp. The variables imp and gdp are expressed in millions of pesetas.
a) Set up and estimate the auxiliary regression to perform the Breusch-Pagan-
Godfrey heteroskedasticity test.
b) Apply the Breusch-Pagan-Godfrey heteroskedasticity test using the
auxiliary regression run in section a).
c) Set up the auxiliary regression to perform the complete White
heteroskedasticity test.
d) Apply the complete White heteroskedasticity test using the auxiliary
regression run in section c).
e) Set up the auxiliary regression to perform the simplified White
heteroskedasticity test.
f) Apply the simplified White heteroskedasticity test using the auxiliary
regression run in section e).
g) Compare the results of the test carried out in sections b), d) and f).
Exercise 6.22 Using data from file tradocde, the following model has been estimated to
explain the imports (impor) in OECD countries:
·impor ) =
ln( 18.01+ 1.6425ln( gdp ) − 0.5151ln( popul )
i i i
(6.67) (0.658) (0.636)
R2=0.614 n=34
where gdp is gross domestic product at market prices, and popul is the population of each
country.
a) What is the interpretation of the coefficient on ln(gdp)?
b) Set up the auxiliary regression to perform the White heteroskedasticity test.
c) Apply the White heteroskedasticity test using the auxiliary regression run
in section b).
d) Test whether the import/gdp elasticity is greater than 1. To make this test,
do you need to use the White heteroskedasticity-robust standard errors?
226
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
Exercise 6.23 Explain in detail what the appropriate autocorrelation test would be in each
situation:
a) When the model has no lagged endogenous variables and the observations
are annual.
b) When the model has lagged endogenous variables and the observations are
annual.
c) When the model has no lagged endogenous variables and the observations
are quarterly.
Exercise 6.24 Two alternative models were used to estimate the average cost of annual
car production of a particular brand in the period 1980-1999:
c =α + β p + u R 2 =0.848; R 2 =0.812; d =DW =0.51
c =α + β p + γ p 2 + u R 2 =0.852; R 2 =0.811; d =DW =2.11
a) When comparing the two estimations, indicate if you detect any
econometric problem. Explain it.
b) Depending on your answer to the previous section, which of the two
models would you choose?
Exercise 6.25 In the period 1950-1980, the following production is estimated
ln(ot ) =
− 3.94 + 1.45 ln(lt ) + 0.38 ln(kt )
(0.24) (0.083) (0.048)
=R 2 0.994= =
DW 0.858 ρˆ 0.559
where o is output, l is labor, and k is capital.
(The numbers in parentheses are standard errors of the estimators.)
a) Test whether there is autocorrelation.
b) If the model had a lagged endogenous variable as an explanatory variable,
indicate how you would test whether there is autocorrelation.
Exercise 6.26 Using 38 annual observations, the following demand function for a product
was estimated:
di =
2.47 + 0.35 pi + 0.9 di −1 R2 =
0.98 DW =
1.82
(0.39) (0.06)
227
INTRODUCTION TO ECONOMETRICS
a) Estimate the model [1] by OLS and calculate the corresponding adjusted
determination coefficient.
b) Calculate the Durbin-Watson statistic for the estimations made in a).
228
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
c) In view of the Durbin and Watson test and the representation of the fitted
line and residuals, is it appropriate to reformulate model [1]? Justify your
answer and, if it is yes, estimate the alternative model that you consider
the most appropriate for the data.
Exercise 6.32 Let the model be:
yt =β1 + β 2 xt + ut
ut = r ut- 1 + et ; et : NI (0, s 2 )
The following additional information is also disposable:
ρ = 0.5
yi 22 26 32 31 40 46 46 50
xi 4 6 10 12 13 16 20 22
= =
R 2 0.9687 DW=3.4 n 15
(The numbers in parentheses are standard errors of the estimators.)
Furthermore, the following additional information about the residual regressions is
disposable:
1. =
uˆt 0.167 + 0.127 xt
(0.210) (0.180)
2. =
uˆt 0.231+ 0.218 xt1/2
(0.098) (0.095)
R 2 = 0.997 DW=0.73 n = 27
where gdp is gross domestic product at market prices, and rpimp are the relative prices
import/gdp. Both magnitudes are expressed in millions of pesetas.
(The numbers in parentheses are standard errors of the estimators.)
a) Interpret the coefficient on rpimp.
b) Is there autocorrelation in this model?
229
INTRODUCTION TO ECONOMETRICS
c) Test whether the imp/gdp elasticity plus four times the imp/rpimp elasticity
is equal to zero. (Additional information: var( βˆ2 ) =0.044247; var( βˆ3 )
=0.000540; and var( βˆ , βˆ ) =0.004464).
2 3
d) Test the overall significance of this model.
Exercise 6.35 Using a sample for the period 1954-2009 (file electsp), the following model
was estimated to explain the electricity consumption in Spain (conselec):
·
ln( conselec ) = - 9.98+ 1.469 ln( gdp )
t t
(0.46) (0.035)
230
RELAXING THE ASSUMPTIONS IN THE LINEAR CLASSICAL MODEL
d) Using the HAC standard errors, test the significance of the coefficient on
unempl.
Exercise 6.37 It is important to remark that the Phillips curve is a relative relationship.
Inflation is considered low or high relative to the expected rate of inflation and
unemployment is considered low or high relative to the so-called natural rate of
unemployment. In the augmented Phillips curve this is taken into account:
inft − infte⁄t −1 = β 2 (unemplt − λ0 ) +ut
where λ0 is the natural rate of unemployment and infte⁄t −1 is the expected rate of inflation
for t formed in t-1. If we consider that the expected inflation for t is equal to the inflation
in t-1 ( infte⁄t −1 = inft −1 ) and β1 = − β 2λ0 , the augmented Phillips curve can be written as:
inft − inft −1 = β1 + β 2unemplt +ut
a) Using file phillipsp, estimate the above model.
b) Interpret the coefficient on unempl.
c) Test whether there is second order autocorrelation.
d) Test whether the natural rate of unemployment is greater than 10.
Appendix 6.1
First we are going to express the β%2 taking into account that y is generated by the
model (6-8):
n n
∑ ( x1i − x2 )( yi − y ) ∑ ( x1i − x2 ) yi
=
=
β2 =
% i 1 =i 1
n n
∑ 1i 2
(
=i 1 =i 1
x − x ) 2
∑ ( x1i − x2 )2
n
∑ (x
i =1
1i − x2 ) 2
n n n
1i 2 1i 1i 2∑ (x
2i − x )x ∑ (x − x )x ∑ (x1i − x2 )ui
=i 1
2 n
=β
=i 1 =i 1
3 n n
+β +
1i
=i 1 =i 1
2
2
1i ∑ (x
2
2
i =1
−x ) ∑ (x −x ) ∑ (x 1i − x2 ) 2
n n
231
INTRODUCTION TO ECONOMETRICS
n n
1i 2 ∑ (x
2i 1i − x )x ∑ (x − x2 ) E (ui | x2 , x3 )
2E ( β%) =
=i 1 =i 1
2 3 n
β +β n
+
=i 1 =i 1
1i ∑ (x − x )
2
2
∑ (x 1i − x2 ) 2
n
(6-72)
∑ (x 1i − x2 ) x2i
= β2 + β 3
i =1
n
∑ (x
i =1
1i − x2 ) 2
232