0% found this document useful (0 votes)
16 views42 pages

Econometrics I - Lecture 6 (Wooldridge)

The document outlines the concepts and techniques related to multiple regression analysis, including model selection, error specification, and the use of logarithmic transformations. It discusses the importance of understanding the functional form of models and provides criteria for selecting appropriate models based on parsimony and predictive power. Additionally, it covers the Ramsey Regression Equation Specification Error Test (RESET) and the distinction between confidence intervals for conditional means and prediction intervals.

Uploaded by

selfsebat1812
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views42 pages

Econometrics I - Lecture 6 (Wooldridge)

The document outlines the concepts and techniques related to multiple regression analysis, including model selection, error specification, and the use of logarithmic transformations. It discusses the importance of understanding the functional form of models and provides criteria for selecting appropriate models based on parsimony and predictive power. Additionally, it covers the Ramsey Regression Equation Specification Error Test (RESET) and the distinction between confidence intervals for conditional means and prediction intervals.

Uploaded by

selfsebat1812
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

BFI Program

ECONOMETRICS I
National Economics
University
2020
MA Hai Duong
1
MULTIPLE REGRESSION:
FUNCTIONAL
FORM, MODEL SELECTION,
PREDICTION
2
RECAP.
We have studied the multiple regression model and learnt:
1. to express it for a single observation, and, using matrix form, for
observations;
2. the OLS estimator and its derivation in matrix form
3. the assumptions needed for the OLS estimator to be
3.1. an unbiased estimator
3.2. the best linear unbiased estimator
3.3. normally distributed
4. to interpret the parameters of a regression model
5. to test a simple hypothesis about a single parameter
6. to perform a joint test of multiple linear restrictions, and in particular
testing the overall significance of a model
7. to test a hypothesis involving a linear combination of parameters 3
LECTURE OUTLINE
The world is non-linear: How useful can a linear model be? (textbook
reference 2-4b, 6-2b, 6-2c)
The interpretation of parameters in log-level, level-log and log-log
models
Transformation of persistent time series data (textbook reference 11-3b)
Model selection: the adjusted R-Squared (textbook reference section 6.3)
and other model selection criteria
Error Specification: RAMSEY R.E.SE.T test
Confidence interval for the conditional mean and prediction interval for
the target (textbook reference 6.4)
4
IS A LINEAR MODEL USEFUL IN
A NON-LINEAR WORLD?
We all feel that the world is non-linear. Our speed in learning (or
any other activity) accelerates as we grow up, gets to a peak and
goes downhill eventually. How good is a linear model in this non-
linear world?
But the linear regression model only needs to be linear in
parameters.
Variables as and to can be non-linear transformations of
variables.
It is quite usual that and some of variables are logarithms of
observed variables, and also some variable can be quadratic
functions of measured variables
Example: Recall the wage example: 5
MODELS INVOLVING
LOGARITHMS: LOG-LEVEL
This is not satisfactory because it predicts that regardless of what
your wage currently is, an extra year of schooling will add $42.06 to
your wage.
It is more realistic to assume that it adds a constant percentage to
your wage, not a constant dollar amount
How can we incorporate this in the model?
Logarithm of on :
Note: : the percentage change in predicted as increases by 1 unit,
keeping constant

6
MODELS INVOLVING
LOGARITHMS: LOG-LEVEL
In our example, we use natural logarithm of wage as the
dependent variable

Holding and fixed,

Or

Use result of calculus:


([A.23] Appendix A - page 637)
Keep in mind that [A.23] only correct for small change in 7
MODELS INVOLVING
LOGARITHMS: LOG-LEVEL
This leads to a simple interpretation of
when holding constant
If we do not multiply by 100, we have the decimal version (the
proportionate change).
In this example, is often called the return to education (just like
an investment). This measure is free of units of measurement of
wage (currency, price level).

8
MODELS INVOLVING
LOGARITHMS: LOG-LEVEL
Let’s revisit the wage equation

These results tell us that ...


Wage increases by , on average, for every additional year of
education, holding IQ constant
Wage increases by , on average, for every additional IQ score,
holding the number of years of education constant
Warning: This R-squared is not directly comparable to the R-
squared when wage is the dependent variable. We can only
compare R-squared of two models if they have the same dependent
variable. The total variation (SSTs) in and are completely different.
9
MODELS INVOLVING
LOGARITHMS: LEVEL-LOG
We can use logarithmic transformation of as well.
on log of :
Note: : the change in predicted as increases by 1%, keeping constant
Example: The context: determining the effect of cigarette smoking during
pregnancy on health of babies. Data: birth weight in kg, family income in
$s, mother’s education in years, number of cigarettes smoked per day by
the mother during pregnancy

The coefficient of : Consider newborn babies whose mothers have the


same level of education and the same smoking habits. Every percentage
increase in family income increases the predicted birth weight by
0.0005kg = 0.5g.
10
MODELS INVOLVING
LOGARITHMS: LOG-LOG
log of y on log of x:
Note: : the percentage change in predicted as increases by 1%, keeping
constant. The estimator in this case is also called the estimated elasticity
of with respect to when all else constant
Example 6.7 in the textbook: Predicting CEO salaries based on sales,
market value of the firm (mktval) and years that CEO has been in his/her
current position (tenure):

The coefficient of : In firms with the exact same market valuation with
CEOs who have the same level of experience, a 1% increase in sales
increases the predicted CEO salary by 0.16% 11
CONSIDERATIONS FOR USING
LEVELS OR LOGARITHMS
1. A variable must have a strictly positive range to be a candidate for
logarithmic transformation.
2. Thinking about the problem: does it make sense that a unit change in leads
to a constant change in the magnitude of or a constant % change in ?
3. Looking at the scatter plot, if there is only one .
4. Explanatory variables that are measured in years, such as years of
education, experience or age, are not logged.
5. Variables that are already in percentages (such as interest rate or tax rate)
are not logged. A unit change in these variables already is a one percent
change.
6. If a variable is positively skewed (like income or wealth), taking logarithms
makes its distribution less skewed.
Pay your attention carefully in 6-2a (page 171) 12
TRANSFORMATION OF
PERSISTENT TIME SERIES DATA
A number of economic and financial series, such as interest rates,
foreign exchange rates, price series of an asset tend to be highly
persistent.
This means that the past heavily affects the future (but not vice
versa)
A time series can be subject to different types of persistence
(deterministic or stochastic).
A common feature of persistence is lack of mean-reversion. This is
evident by visual inspection of a line chart of the time series.

13
EMPIRICAL EXAMPLE
E.g. Below is displayed the Standard and Poors Composite Price Index from
January 1985 to July 2017 (monthly observations)

14
COMPONENTS OF A TIME-
SERIES

15
COMPONENTS OF A TIME-
SERIES

16
TRANSFORMATION OF
PERSISTENT TIME SERIES DATA
In such cases the researcher transforms the time series by
differencing over the preceding period.
The transformed series is then easier to handle and has more
attractive statistical properties.
More precisely, assume that the S&P price index at time is denoted
by
The said log differencing is expressed as:

Where denotes the (for small )

17
TRANSFORMATION OF
PERSISTENT TIME SERIES DATA
By differencing the logarithmic transformation to our S&P500 price
series, we obtain the S&P500 returns (sometimes called log-
returns)

18
OTHER NON-LINEAR
MODELS: QUADRATIC TERMS
We can have as well as in a multiple regression model:

In this model: , that is, the change in predicted as increases


depends on .
Here, the coefficients of and on their own do not have meaningful
interpretations
at . At this level of x the predicted y is at its maximum if , and it is
at its minimum if

19
EXAMPLES OF THE
QUADRATIC MODEL
Sleep and age: Predicting how long women sleep from their age and
education level. Data: age, years of education, and minutes slept in a week
recorded by women who participated in a survey

Keeping education constant, the predicted sleep reaches its minimum at the
age:
House price and distance to the nearest train station: Data: price (000$s),
area (m2), number of bedrooms and distance from the train station (km) for
120 houses sold in a suburb of Melbourne in a certain month:

20
CONSIDERATIONS FOR
USING A LINEAR OR A
QUADRATIC MODEL
1. Thinking about the problem: is a unit increase in likely to lead to
a constant change in for all values of , or is it likely to lead to a
change that is increasing or decreasing in ?
2. Is there an optimal or peak level of for ? Example: wage and
age, house price and distance to train station.
3. If there is only one , looking at scatter plot can give us insights.
4. In multiple regression, there are tests that we can use to check
the specification of the functional form (RESET test, to be covered
later)
5. When in doubt, we can add the quadratic term and check its
statistical significance, or see if it improves the adjusted R2.
21
MODEL SELECTION CRITERIA
Parsimony is very important in predictive analytics (which includes
forecasting). You may have heard about the KISS principle. If not,
google it! (Keep It Simple Stupid)
We want models that have predictive power, but are as
parsimonious as possible
We cannot use R2 to select models, because R2 always increases as
we make the model bigger, even when we add irrelevant and
insignificant predictors (on nested models)
One can use t-stats and drop insignificant predictors, but when
there are many predictors, and several of them are insignificant,
the model that we end up with depends on which predictor we drop
first
22
MODEL SELECTION CRITERIA
Model selection criteria are designed to help us with selecting
among competing models
All model selection criteria balance the (lack of) fit of the model
(given by its sum of squared residuals) with the size of the model
(given by the number of parameters)
These criteria can be used when modelling time series data as well
There are many model selection criteria, differing on the penalty
that they place on the lack of parsimony

23
MODEL SELECTION CRITERIA
Criteria Formula
Adjusted (also known as )

Akaike Information Criteria (AIC)

Hannan-Quinn Criterion (HQ)

Schwarz or Bayesian Information Criterion


(SIC or BIC)

and are constants that do not depend on the fit or number of


parameters, so play no important role. ln is the natural logarithm. Also, all
models are assumed to include an intercept 24
MODEL SELECTION CRITERIA
BIC gives the largest penalty to lack of parsimony, i.e., if we use BIC to
select among models, the model we end up with would be the same or
smaller than the model that we end up with if we used any of the other
criteria
The order of the penalties that each criterion places on parsimony
relative to fit is (for )

Remember that with BIC, HQ or AIC, we choose the model with the
smallest value for the criterion, whereas with , we choose the model with
the largest
Different softwares may report different values for the same criterion.
That is because some include and and some don’t. The outcome of the
model selection exercise does not depend on these constants, so
25
regardless of the software, the final results should be the same.
MODEL SELECTION
CRITERIA: AN EXAMPLE
Example: Making an app to predict birth weight using CIGS (the
number of cigarettes smoking per day while a woman being
pregnant), FAMINC (family income) and MOTHEDUC (the number of
school years of mother). Data file: bwght.wf1
R Adjusted R
Predictors squared squared AIC HQ SC
CIGS 0.022729 0.022024 8.843598 8.84642 8.851142
FAMINC 0.011867 0.011154 8.854651 8.857473 8.862195
MOTHEDUC 0.004779 0.00406 8.862284 8.865107 8.869832
C, F 0.029805 0.028404 8.837772 8.842005 8.849089
C, M 0.024203 0.022793 8.844015 8.84825 8.855338
F, M 0.012265 0.010838 8.856175 8.86041 8.867498
C, F, M 0.029774 0.02767 8.839731 8.845378 8.854828 26
ERROR SPECIFICATION:
RAMSEY R.E.S.E.T TEST
Recall assumption Zero Conditional Mean, if this assumption is
violated, then estimators are biased. That why checking out the
satisfaction of this assumption is one criteria of model selection.
In the multiple regression model
where (1)
Assume an important explanatory variable (maybe more than one
variable) is omitted from the upper model, then error specification
exists on this model leading to biased OLS estimators ()
• If Z variables’ s data can be collected, plug these variables into (1),
then use t-test or F-test to check the indispensability of them on (1)
• If Z variables’ s data is untouchable, model builder uses proxy
variables (high power of predicted dependent variables as , then do
the same procedure to test. 27
ERROR SPECIFICATION:
RAMSEY R.E.S.E.T TEST
RAMSEY Regression Equation Specification Error Test:
Initial model
where (1)
Regress (1) by OLS, obtain , generate
Auxiliary regression:

Pair of hypotheses:

F-test is applied later.


28
RAMSEY R.E.S.E.T TEST ON
EVIEWS
Initial model with file

29
RAMSEY R.E.SE.T TEST ON
EVIEWS
RAMSEY R.E.S.E.T test

30
RAMSEY R.E.SE.T TEST ON
EVIEWS
RAMSEY R.E.S.E.T test

31
RAMSEY R.E.SE.T TEST ON
EVIEWS
RAMSEY R.E.S.E.T test

Probability = 0.0173 < significance level = 5%, then Reject


Null Hypothesis, that enough evidence to conclude Error 32
CONFIDENCE INTERVALS FOR
THE CONDITIONAL MEAN
VERSUS PREDICTION
INTERVALS
Remember that the population model
, for all (1)
With the CLM assumptions implying that
, for all (2)
Our estimated regression model provides:
, for all (3)
Comparing (3) and (2), we see that gives us the best estimate of
the conditional expectation of given ,...,
Also, since is not predictable given ,..., , is also our best prediction
for
33
CONFIDENCE INTERVALS FOR
THE CONDITIONAL MEAN
VERSUS PREDICTION
INTERVALS
As an estimator for conditional mean, the error in is only due to
estimation uncertainty in
We can compute using the estimated variance covariance matrix of
estimated parameters, or we can get it with a cool trick
The 95% confidence interval for is

Therefore, the estimated variance of prediction error is or


The 95% confidence interval for is

34
COMPUTING PREDICTIONS AND
PREDICTION INTERVALS
Example: Birth weight app (bwght.wf1)
We want to make a birth weight app, that users enter pregnant
woman information (number of cigs per day while being pregnant,
family income and her educational number of years), and she can
get a prediction of the baby birth weight by herself on this app. We
have data of 1388 women after they gave birth. After we estimate
the model of Birth weight (BW) conditional of the number of
cigarettes (CIGS) and family income (FAMINC), we can plug in any
pregnant woman’s CIGS and FAMINC then get a point prediction for
that her new-born baby weight.
Here is our estimated model:
The prediction of BW from a woman with CIGS = 10 cigs/day and
FAMINC = 10 thousands USD is
35
COMPUTING PREDICTIONS AND
PREDICTION INTERVALS
Or we can use our knowledge of geometry of OLS to trick the computer
with a suitable reparameterization. For example, for CIGS = 10 and
FAMINC = 10

36
COMPUTING PREDICTIONS AND
PREDICTION INTERVALS
The advantage of this is that we also get the standard error, which shows
how well estimates . This error is due to estimation of , so it can be called
“estimation uncertainty”
There are two sources of error that make our prediction uncertain.
1. Estimation uncertainty caused by not knowing the value of
the true parameters. This uncertainty gets smaller with sample size.
This is often ignored when the sample size is large.
2. The more important source of uncertainty is u that is not
predictable by our predictors, even if we knew the true values of β. This
one is independent of the sample size.
In the BW example: and . The 1st is quite smaller than the 2nd
37
COMPUTING PREDICTIONS AND
PREDICTION INTERVALS
From the output: (a little bit difference with previous manual calculation)
and

95% prediction interval


(ounces)
(kilograms)
In practice, we can use for the 95% prediction interval
38
SUMMARY
Modelling non-linear relationships: The linear regression model is
only linear in parameters. By using non-linear transformations (such as
logarithmic or quadratic) of or any of the variables, we can model non-
linear relationships with the regression model.
The adjusted R2: We can use (and other model selection criteria) to
help us choose the best model for predictive analytics.
Point interval and prediction interval: We explained how to provide
point prediction and a prediction interval for the target variable given
specific values for explanatory variables using our estimated model
We now understand why people use as a rule of thumb for providing
prediction intervals. Provided that the size of the estimation sample is
large, this is a pretty good rule of thumb!
39
TEXTBOOK EXERCISES
Problems 3 (page 196)
Problems 4, 6, 7 (page 197)
Problems 10 (page 198)

40
COMPUTER EXCERCISES
C2 (page 199)
C4, C6 (page 200)
C10 (page 201)
Extra request: Use bwght.wf1 to make prediction of the birth
weight of a woman’s baby given cigs = 20 and faminc = 20

41
THANK FOR YOUR
ATTENDANCE - Q & A

42

You might also like