Econometrics I - Lecture 6 (Wooldridge)
Econometrics I - Lecture 6 (Wooldridge)
ECONOMETRICS I
National Economics
University
2020
MA Hai Duong
1
MULTIPLE REGRESSION:
FUNCTIONAL
FORM, MODEL SELECTION,
PREDICTION
2
RECAP.
We have studied the multiple regression model and learnt:
1. to express it for a single observation, and, using matrix form, for
observations;
2. the OLS estimator and its derivation in matrix form
3. the assumptions needed for the OLS estimator to be
3.1. an unbiased estimator
3.2. the best linear unbiased estimator
3.3. normally distributed
4. to interpret the parameters of a regression model
5. to test a simple hypothesis about a single parameter
6. to perform a joint test of multiple linear restrictions, and in particular
testing the overall significance of a model
7. to test a hypothesis involving a linear combination of parameters 3
LECTURE OUTLINE
The world is non-linear: How useful can a linear model be? (textbook
reference 2-4b, 6-2b, 6-2c)
The interpretation of parameters in log-level, level-log and log-log
models
Transformation of persistent time series data (textbook reference 11-3b)
Model selection: the adjusted R-Squared (textbook reference section 6.3)
and other model selection criteria
Error Specification: RAMSEY R.E.SE.T test
Confidence interval for the conditional mean and prediction interval for
the target (textbook reference 6.4)
4
IS A LINEAR MODEL USEFUL IN
A NON-LINEAR WORLD?
We all feel that the world is non-linear. Our speed in learning (or
any other activity) accelerates as we grow up, gets to a peak and
goes downhill eventually. How good is a linear model in this non-
linear world?
But the linear regression model only needs to be linear in
parameters.
Variables as and to can be non-linear transformations of
variables.
It is quite usual that and some of variables are logarithms of
observed variables, and also some variable can be quadratic
functions of measured variables
Example: Recall the wage example: 5
MODELS INVOLVING
LOGARITHMS: LOG-LEVEL
This is not satisfactory because it predicts that regardless of what
your wage currently is, an extra year of schooling will add $42.06 to
your wage.
It is more realistic to assume that it adds a constant percentage to
your wage, not a constant dollar amount
How can we incorporate this in the model?
Logarithm of on :
Note: : the percentage change in predicted as increases by 1 unit,
keeping constant
6
MODELS INVOLVING
LOGARITHMS: LOG-LEVEL
In our example, we use natural logarithm of wage as the
dependent variable
Or
8
MODELS INVOLVING
LOGARITHMS: LOG-LEVEL
Let’s revisit the wage equation
The coefficient of : In firms with the exact same market valuation with
CEOs who have the same level of experience, a 1% increase in sales
increases the predicted CEO salary by 0.16% 11
CONSIDERATIONS FOR USING
LEVELS OR LOGARITHMS
1. A variable must have a strictly positive range to be a candidate for
logarithmic transformation.
2. Thinking about the problem: does it make sense that a unit change in leads
to a constant change in the magnitude of or a constant % change in ?
3. Looking at the scatter plot, if there is only one .
4. Explanatory variables that are measured in years, such as years of
education, experience or age, are not logged.
5. Variables that are already in percentages (such as interest rate or tax rate)
are not logged. A unit change in these variables already is a one percent
change.
6. If a variable is positively skewed (like income or wealth), taking logarithms
makes its distribution less skewed.
Pay your attention carefully in 6-2a (page 171) 12
TRANSFORMATION OF
PERSISTENT TIME SERIES DATA
A number of economic and financial series, such as interest rates,
foreign exchange rates, price series of an asset tend to be highly
persistent.
This means that the past heavily affects the future (but not vice
versa)
A time series can be subject to different types of persistence
(deterministic or stochastic).
A common feature of persistence is lack of mean-reversion. This is
evident by visual inspection of a line chart of the time series.
13
EMPIRICAL EXAMPLE
E.g. Below is displayed the Standard and Poors Composite Price Index from
January 1985 to July 2017 (monthly observations)
14
COMPONENTS OF A TIME-
SERIES
15
COMPONENTS OF A TIME-
SERIES
16
TRANSFORMATION OF
PERSISTENT TIME SERIES DATA
In such cases the researcher transforms the time series by
differencing over the preceding period.
The transformed series is then easier to handle and has more
attractive statistical properties.
More precisely, assume that the S&P price index at time is denoted
by
The said log differencing is expressed as:
17
TRANSFORMATION OF
PERSISTENT TIME SERIES DATA
By differencing the logarithmic transformation to our S&P500 price
series, we obtain the S&P500 returns (sometimes called log-
returns)
18
OTHER NON-LINEAR
MODELS: QUADRATIC TERMS
We can have as well as in a multiple regression model:
19
EXAMPLES OF THE
QUADRATIC MODEL
Sleep and age: Predicting how long women sleep from their age and
education level. Data: age, years of education, and minutes slept in a week
recorded by women who participated in a survey
Keeping education constant, the predicted sleep reaches its minimum at the
age:
House price and distance to the nearest train station: Data: price (000$s),
area (m2), number of bedrooms and distance from the train station (km) for
120 houses sold in a suburb of Melbourne in a certain month:
20
CONSIDERATIONS FOR
USING A LINEAR OR A
QUADRATIC MODEL
1. Thinking about the problem: is a unit increase in likely to lead to
a constant change in for all values of , or is it likely to lead to a
change that is increasing or decreasing in ?
2. Is there an optimal or peak level of for ? Example: wage and
age, house price and distance to train station.
3. If there is only one , looking at scatter plot can give us insights.
4. In multiple regression, there are tests that we can use to check
the specification of the functional form (RESET test, to be covered
later)
5. When in doubt, we can add the quadratic term and check its
statistical significance, or see if it improves the adjusted R2.
21
MODEL SELECTION CRITERIA
Parsimony is very important in predictive analytics (which includes
forecasting). You may have heard about the KISS principle. If not,
google it! (Keep It Simple Stupid)
We want models that have predictive power, but are as
parsimonious as possible
We cannot use R2 to select models, because R2 always increases as
we make the model bigger, even when we add irrelevant and
insignificant predictors (on nested models)
One can use t-stats and drop insignificant predictors, but when
there are many predictors, and several of them are insignificant,
the model that we end up with depends on which predictor we drop
first
22
MODEL SELECTION CRITERIA
Model selection criteria are designed to help us with selecting
among competing models
All model selection criteria balance the (lack of) fit of the model
(given by its sum of squared residuals) with the size of the model
(given by the number of parameters)
These criteria can be used when modelling time series data as well
There are many model selection criteria, differing on the penalty
that they place on the lack of parsimony
23
MODEL SELECTION CRITERIA
Criteria Formula
Adjusted (also known as )
Remember that with BIC, HQ or AIC, we choose the model with the
smallest value for the criterion, whereas with , we choose the model with
the largest
Different softwares may report different values for the same criterion.
That is because some include and and some don’t. The outcome of the
model selection exercise does not depend on these constants, so
25
regardless of the software, the final results should be the same.
MODEL SELECTION
CRITERIA: AN EXAMPLE
Example: Making an app to predict birth weight using CIGS (the
number of cigarettes smoking per day while a woman being
pregnant), FAMINC (family income) and MOTHEDUC (the number of
school years of mother). Data file: bwght.wf1
R Adjusted R
Predictors squared squared AIC HQ SC
CIGS 0.022729 0.022024 8.843598 8.84642 8.851142
FAMINC 0.011867 0.011154 8.854651 8.857473 8.862195
MOTHEDUC 0.004779 0.00406 8.862284 8.865107 8.869832
C, F 0.029805 0.028404 8.837772 8.842005 8.849089
C, M 0.024203 0.022793 8.844015 8.84825 8.855338
F, M 0.012265 0.010838 8.856175 8.86041 8.867498
C, F, M 0.029774 0.02767 8.839731 8.845378 8.854828 26
ERROR SPECIFICATION:
RAMSEY R.E.S.E.T TEST
Recall assumption Zero Conditional Mean, if this assumption is
violated, then estimators are biased. That why checking out the
satisfaction of this assumption is one criteria of model selection.
In the multiple regression model
where (1)
Assume an important explanatory variable (maybe more than one
variable) is omitted from the upper model, then error specification
exists on this model leading to biased OLS estimators ()
• If Z variables’ s data can be collected, plug these variables into (1),
then use t-test or F-test to check the indispensability of them on (1)
• If Z variables’ s data is untouchable, model builder uses proxy
variables (high power of predicted dependent variables as , then do
the same procedure to test. 27
ERROR SPECIFICATION:
RAMSEY R.E.S.E.T TEST
RAMSEY Regression Equation Specification Error Test:
Initial model
where (1)
Regress (1) by OLS, obtain , generate
Auxiliary regression:
Pair of hypotheses:
29
RAMSEY R.E.SE.T TEST ON
EVIEWS
RAMSEY R.E.S.E.T test
30
RAMSEY R.E.SE.T TEST ON
EVIEWS
RAMSEY R.E.S.E.T test
31
RAMSEY R.E.SE.T TEST ON
EVIEWS
RAMSEY R.E.S.E.T test
34
COMPUTING PREDICTIONS AND
PREDICTION INTERVALS
Example: Birth weight app (bwght.wf1)
We want to make a birth weight app, that users enter pregnant
woman information (number of cigs per day while being pregnant,
family income and her educational number of years), and she can
get a prediction of the baby birth weight by herself on this app. We
have data of 1388 women after they gave birth. After we estimate
the model of Birth weight (BW) conditional of the number of
cigarettes (CIGS) and family income (FAMINC), we can plug in any
pregnant woman’s CIGS and FAMINC then get a point prediction for
that her new-born baby weight.
Here is our estimated model:
The prediction of BW from a woman with CIGS = 10 cigs/day and
FAMINC = 10 thousands USD is
35
COMPUTING PREDICTIONS AND
PREDICTION INTERVALS
Or we can use our knowledge of geometry of OLS to trick the computer
with a suitable reparameterization. For example, for CIGS = 10 and
FAMINC = 10
36
COMPUTING PREDICTIONS AND
PREDICTION INTERVALS
The advantage of this is that we also get the standard error, which shows
how well estimates . This error is due to estimation of , so it can be called
“estimation uncertainty”
There are two sources of error that make our prediction uncertain.
1. Estimation uncertainty caused by not knowing the value of
the true parameters. This uncertainty gets smaller with sample size.
This is often ignored when the sample size is large.
2. The more important source of uncertainty is u that is not
predictable by our predictors, even if we knew the true values of β. This
one is independent of the sample size.
In the BW example: and . The 1st is quite smaller than the 2nd
37
COMPUTING PREDICTIONS AND
PREDICTION INTERVALS
From the output: (a little bit difference with previous manual calculation)
and
40
COMPUTER EXCERCISES
C2 (page 199)
C4, C6 (page 200)
C10 (page 201)
Extra request: Use bwght.wf1 to make prediction of the birth
weight of a woman’s baby given cigs = 20 and faminc = 20
41
THANK FOR YOUR
ATTENDANCE - Q & A
42