0% found this document useful (0 votes)

186 views49 pages

Econometrics Ch6 Applications

econometrics

Uploaded by

Mihaela Sirițanu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

186 views49 pages

Econometrics Ch6 Applications

econometrics

Uploaded by

Mihaela Sirițanu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Martin Luther University of Halle-Wittenberg

Department of Economics
Chair of Econometrics

Econometrics
Lecture
6. Applications

Summer 2015
1 / 49

Key questions and objectives

This chapter focuses on the following key questions:
How does changing the units of measurement of variables affect the

OLS regression results (OLS intercept, slope estimates, standard errors,

t statistics, F statistics, and confidence intervals)?
How can we specify an appropriate functional form relationship between

the explained and explanatory variables?

How can we obtain confidence intervals for a prediction from the OLS

regression line?

2 / 49

Applications

6 Applications

6.1 Effects of data scaling on OLS statistics

6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms

6.3 Goodness-of-fit and selection of regressors

6.3.1 Adjusted R-squared
6.3.2 Selection of regressors

6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable

3 / 49

Applications
Effects of data scaling on OLS statistics

6 Applications

6.1 Effects of data scaling on OLS statistics

6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms

6.3 Goodness-of-fit and selection of regressors

6.3.1 Adjusted R-squared
6.3.2 Selection of regressors

6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable

4 / 49

Applications
Effects of data scaling on OLS statistics

6.1 Effects of data scaling on OLS statistics

In general, the coefficients, standard errors, confidence intervals, t

statistics, and F statistics change in ways that preserve all measured

effects and testing outcomes when variables are rescaled.
Data scaling is often used to reduce the number of zeros after a
decimal point in an estimated coefficient.
Example: birth weight and cigarette smoking
Regression model:

\ = 0 + 1 cigs + 2 faminc,
bwght

(6.1)

where
bwght
cigs
faminc

=
=
=

child birth weight, in ounces.

no. of cigs smoked by the pregnant mother, per day.
annual family income, in thousands of dollars
5 / 49

Applications

CHAPTER 6

187

Multiple Regression Analysis: Further Issues

Effects of data scaling on OLS statistics

T A B L E 6 . 1 Effects of Data Scaling

(1) bwght

Dependent Variable

(2) bwghtlbs

(3) bwght

Independent Variables
cigs

packs

.4634
(.0916)

faminc

intercept

Observations

R-Squared

SSR

557,485.51

SER

.0927
(.0292)
116.974
(1.049)
.0298
20.063

.0058
(.0018)

7.3109
(.0656)

1,388

.0289
(.0057)

.0298

2,177.6778

1.2539

9.268
(1.832)
.0927
(.0292)
116.974
(1.049)
1,388
.0298

557,485.51

20.063

Source: Wooldridge (2013), Table 6.1

49
The estimates of this equation, obtained using the data in BWGHT.RAW, are given in6 /the

Applications
Effects of data scaling on OLS statistics

Conversion of the dependent variable:

All OLS estimates change. But once the effects are transformed into
the same units, we get exactly the same answer, regardless of how the
dependent variable is measured.
Standard errors and confidence intervals change.
Residuals and SSR change.
Statistical significance is not affected. t and p values remain
unchanged.
R-squared is not affected.
Conversion of an explanatory variable affects only its coefficient and

standard error.
Question: in the birth weight equation, suppose that faminc is

measured in dollars rather than in thousands of dollars. Thus, define

the variable fincdol = 1, 000 faminc. How will the OLS statistics
change when fincdol is substituted for faminc? Do you think it is better
to measure income in dollars or in thousands of dollars?
7 / 49

Applications
Effects of data scaling on OLS statistics

If the dependent variable appears in logarithmic form, changing the

unit of measurement does not affect the slope coefficients:

Conversion: ln(cyi ) = ln c + ln yi , c > 0
New intercept: 0new = 0old + ln c

Similarly, changing the unit of measurement of any explanatory

variable xj , where ln(xj ) appears in the regression, only affects the

intercept.
Conversion: ln(cxij ) = ln c + ln xij , c > 0
New intercept: 0new = 0old j ln c

8 / 49

Applications
Functional form specification

6 Applications

6.1 Effects of data scaling on OLS statistics

6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms

6.3 Goodness-of-fit and selection of regressors

6.3.1 Adjusted R-squared
6.3.2 Selection of regressors

6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable

9 / 49

Applications
Functional form specification

6.2.1 Using logarithmic functional forms

Example: housing prices and air pollution
Estimated equation:
ln\
price =9.23
(0.19)

.718 ln nox
(.066)

+.306rooms

(6.7)

(.019)

The coefficient 1 is the elasticity of price with respect to nox: if nox

increases by 1%, price is predicted to fall by .718%, ceteris paribus.

The coefficient 2 is the semi-elasticity of price with respect to rooms.

It is the change in ln price, when rooms = 1. When multiplied by 100,

this is the approximate percentage change in price: one more room
increases price by about 30.6%.
The approximation error occurs because, as the change in ln y
becomes larger and larger, the approximation %y 100 ln y
becomes more and more inaccurate.
10 / 49

Applications
Functional form specification

For the exact interpretation, consider the general estimated model:

d
ln
y = 0 + 1 ln x1 + 2 x2 .
d
Holding fixed x1 , we have ln
y = 2 x2 .
Exact percentage change:
%
y = 100[exp (2 x2 ) 1],

(6.8)

where the multiplication by 100 turns the proportionate change into a

percentage change.
When x2 = 1,
%
y = 100[exp (2 ) 1].

(6.9)

In the housing price example,

%price = 100 [exp(.306) 1] = 35.8%, which is notably larger than

the approximate percentage change, 30.6%.
11 / 49

Applications
Functional form specification

Adjustment in 6.8 is not as crucial for small percentage changes.

Approximate
Exact

2
2 100
100[exp (2 ) 1]
0.05
5
5.13
0.10
10
10.52
0.15
15
16.18
0.20
20
22.14
0.30
30
34.99
0.50
50
64.87
Advantages of using logarithmic variables:
Appealing interpretations
When y > 0, models using ln y as the dependent variable often satisfy

the CLM assumptions more closely than models using the level of y .
Taking the log of a variable often narrows its range (e.g. monetary

values, such as firms annual sales). Narrowing the range of the

dependent and independent variables can make OLS estimates less
sensitive to outliers.
12 / 49

Applications
Functional form specification

Using explanatory variables that are measured as percentages:

\ = 0.3 0.05unemployment rate

ln(wage)
\ = 0.3 0.05 ln(unemployment rate)
ln(wage)

The first equation says that an increase in the unemployment rate by

one percentage point (e.g. a change from 8 to 9) decreases wages by

about 5%.
The second equation says that an increase in the unemployment rate by
one percent (e.g. a change from 8 to 8.08) decreases wages by about
0.05%.
Limitations of logarithms: logs cannot be used if a variable takes on

zero or negative values. Sometimes, ln(1 + y ) is used. However, this

approach is acceptable only when the data on y contain relatively few
zeros. Alternatives are Tobit and Poisson models.
13 / 49

Applications
Functional form specification

6.2.2 Models with quadratics

Quadratic functions are also used often to capture decreasing or

increasing marginal effects.

Example:

y = 0 + 1 x + 2 x 2 ,

(6.10)

where y = wage and x = exper.

Interpretation: the effect of x on y depends on the value of x.

y (1 + 22 x)x, so
1 + 22 x.
x

(6.11)

Typically, we might plug in the average value of x in the sample, or

some other interesting values, such as the median or the lower and
upper quartile values.
14 / 49

Applications
Functional form specification

Example: wage regression

Estimated equation:

w
[
age = 3.73 + .298exper .0061exper2

(6.12)

Equation 6.12 implies that exper has a diminishing effect on wage.

The first year of experience is worth 0.298 cent per hour.
The second year of experience is worth less: .298 2(.0061)(1) = .286.
In going from 10 to 11 years of experience, wage is predicted to

increase by about .298 2(.0061)(10) = .176.

The turning point (or maximum of the function) is achieved at the

coefficient on x over twice the absolute value of the coefficient on x2 :

x =

1
.298
=
24.4.
2(.0061)
22

(6.13)

15 / 49

all
Applications

but a small percentage of the people in the sample, then this is not of much concern.

Functional form specification

F I G U R E 6 . 1 Quadratic relationship between !

wage and exper.
wage

7.37

24.4

Source: Wooldridge (2013), Figure 6.1

exper

Cengage Learning, 2013

3.73

. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
uppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

16 / 49

Applications
Functional form specification

Example: effects of pollution on housing prices

ln\
price = 0 + 1 ln nox + 2 ln dist + 3 rooms + 4 rooms2 5 stratio
. reg lprice lnox ldist c.rooms##c.rooms stratio
Source |
SS
df
MS
-------------+-----------------------------Model | 50.9872385
5 10.1974477
Residual | 33.5949865
500 .067189973
-------------+-----------------------------Total |
84.582225
505 .167489554

Number of obs
F( 5,
500)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

506
151.77
0.0000
0.6028
0.5988
.25921

--------------------------------------------------------------------------------lprice |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
----------------+---------------------------------------------------------------lnox |
-.901682
.1146869
-7.86
0.000
-1.12701
-.6763544
ldist | -.0867814
.0432807
-2.01
0.045
-.1718159
-.001747
rooms |
-.545113
.1654541
-3.29
0.001
-.870184
-.2200419
|
c.rooms#c.rooms |
.0622612
.012805
4.86
0.000
.037103
.0874194
|
stratio | -.0475902
.0058542
-8.13
0.000
-.059092
-.0360884
_cons |
13.38548
.5664731
23.63
0.000
12.27252
14.49844
--------------------------------------------------------------------------------17 / 49

Applications
Functional form specification

Interpretation: what is the effect of rooms on ln price?

Because the coefficient on rooms is negative and the coefficient on

rooms2 is positive, this equation implies that, at low values of rooms,

an additional room has a negative effect on ln price.
At some point, the effect becomes positive, and the quadratic shape

means that the semi-elasticity of price with respect to rooms is

increasing as rooms increases.
Turnaround value of rooms:

x =

(.5451)
= 4.4
2(.0623)

18 / 49

CHAPTER 6

Applications

197

Multiple Regression Analysis: Further Issues

Functional form specification

FIGURE 6.2 !
log(price) as a quadratic function of rooms.

4.4

Source: Wooldridge (2013), Figure 6.2

and so

rooms

Cengage Learning, 2013

log(price)

19 / 49

Applications
Functional form specification

Only five of the 506 communities in the sample have houses averaging

4.4 rooms or less, about 1% of the sample. Hence, the quadratic to the
left of 4.4 can, for practical purposes, be ignored.
To the right of 4.4, we see that adding another room has an increasing

effect on the percentage change in price:

d 100 {[.545 + 2(.062)] rooms} rooms
%price
= (54.5 + 12.4rooms)rooms
An increase in rooms from, say, five to six increases price by about

54.5 + 12.4(5) = 7.5%.

An increase from six to seven increases price by

54.5 + 12.4(6) = 19.9%.

20 / 49

Applications
Functional form specification

If the coefficients on the level and squared terms have the same sign

(either both positive or both negative) and the explanatory variable is

nonnegative, then there is no turning point for values x > 0.
Quadratic functions may also be used to allow for a nonconstant

elasticity.
Example:

ln price = 0 + 1 ln nox + 2 (ln nox)2 + ... + u.

(6.15)

The elasticity depends on the level of nox:

%price [1 + 22 ln nox]%nox.

(6.16)

Further (higher) polynomial terms can be included in regression models:

y = 0 + 1 x + 2 x 2 + 3 x 3 + 4 x 4 + u.
21 / 49

Applications
Functional form specification

6.2.3 Models with interaction terms

Sometimes, the partial effect, elasticity, or semi-elasticity of the

dependent variable with respect to an explanatory variable depends on

the magnitude of another explanatory variable.
Example: in the model

price = 0 + 1 sqrft + 2 bdrms + 3 sqrft bdrms + 4 bthrms + u

the partial effect of bdrms on price is
price
= 2 + 3 sqrft.
bdrms

(6.17)

Interaction effect between square footage and number of bedrooms:

if 3 > 0, then an additional bedroom yields a higher increase in
housing price for larger houses.
22 / 49

Applications
Functional form specification

Example: did returns to education change between 1978 and 1985?

Consider the following wage regression:

ln wage =1 + 2 y 85 + 3 educ + 4 y 85 educ + ... + u.

Returns to education are:

ln wage
= 3 + 4 y 85 =
educ

3 ,
if y 85 = 0;
3 + 4 , if y 85 = 1.

23 / 49

Applications
Functional form specification

Source |
SS
df
MS
-------------+-----------------------------Model | 135.992074
8 16.9990092
Residual | 183.099094 1075 .170324738
-------------+-----------------------------Total | 319.091167 1083
.29463635

Number of obs
F( 8, 1075)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

1084
99.80
0.0000
0.4262
0.4219
.4127

-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------y85 |
.1178062
.1237817
0.95
0.341
-.125075
.3606874
educ |
.0747209
.0066764
11.19
0.000
.0616206
.0878212
y85educ |
.0184605
.0093542
1.97
0.049
.000106
.036815
[output omitted]
_cons |
.4589329
.0934485
4.91
0.000
.2755707
.642295
------------------------------------------------------------------------------

Returns to education in 1978: 7.47%

Returns to education in 1985: (.0747 + .0185)100 = 9.32%
Returns to education increased between 1978 and 1985 by

4 = 0.0185, i.e. by 1.85 percentage points.

24 / 49

Applications
Goodness-of-fit and selection of regressors

6 Applications

6.1 Effects of data scaling on OLS statistics

6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms

6.3 Goodness-of-fit and selection of regressors

6.3.1 Adjusted R-squared
6.3.2 Selection of regressors

6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable

25 / 49

Applications
Goodness-of-fit and selection of regressors

6.3.1 Adjusted R-squared

R-squared is the proportion of the total sample variation in y that is

explained by x1 , x2 , ..., xk .
The size of R-squared does not affect unbiasedness.
R-squared never decreases when additional explanatory variables are
added to the model because SSR never goes up (and usually falls) as
more variables are added:
R2 = 1

SSR
.
SST

The adjusted R-squared imposes a penalty for adding additional

independent variables to a model:

2
2 = 1 SSR/(n k 1) = 1
R
.
SST /(n 1)
SST /(n 1)

(6.21)
26 / 49

Applications
Goodness-of-fit and selection of regressors

SSR/(n k 1) can go up or down when a new independent variable

is added to a regression.
2
If we add a new independent variable to a regression equation, R
increases if, and only if, the t statistic on the new variable is greater
than one in absolute value.
It holds that
2
2 = 1 (1 R )(n 1) .
R
(n k 1)

(6.22)

2 can be negative, indicating a very poor model fit relative to the

R
number of degrees of freedom.

27 / 49

Applications
Goodness-of-fit and selection of regressors

Adjusted R-squared can be used to choose between nonnested

models. (Two equations are nonnested models when neither equation

is a special case of the other.)
Example: explaining major league baseball players salaries

Model 1: ln salary = 0 + 1 yrs + 2 games + 3 bavg + 4 hrunsyr + u

2 = .6211
R
Model 2: ln salary = 0 + 1 yrs + 2 games + 3 bavg + 4 rbisyr + u
2 = .6226
R
Based on the adjusted R-squared, there is a very slight preference for
the model with rbisyr.

28 / 49

Applications
Goodness-of-fit and selection of regressors

Example: explaining R&D intensity

Model 1:

rdintens = 0 + 1 ln sales + u
2 = .030
R 2 = .061, R

Model 2:

rdintens = 0 + 1 sales + 2 sales2 + u

2 = .090
R 2 = .148, R

The first model captures a diminishing return by including sales in

logarithmic form; the second model does this by using a quadratic.
Thus, the second model contains one more parameter than the first.
2 can be used to choose between different functional
Neither R 2 nor R
forms for the dependent variable.

29 / 49

Applications
Goodness-of-fit and selection of regressors

6.3.2 Selection of regressors

A long regression (i.e. with many explanatory variables) is more likely

to have ceteris paribus interpretation than a short regression.

Furthermore, a long regression generates more precise estimates of the

coefficients on the variables included in a short regression because these

covariates lead to a smaller residual variance.
However, it is also possible to control for too many variables in a

regression analysis (over controlling).

30 / 49

Applications
Goodness-of-fit and selection of regressors

Example: impact of state beer taxes on traffic fatalities

Idea: a higher tax on beer will reduce alcohol consumption, and
likewise drunk driving, resulting in fewer traffic fatalities.
Model to measure the ceteris paribus effect of taxes on fatalities:
fatalities = 0 + 1 tax + 2 miles + 3 percmale + 4 perc16 21 + ...,
where
miles = total miles driven.
percmale = percentage of the state population that is male.
perc16 21 = percentage of the population between ages 16 and 21,
The model does not included a variable measuring per capita beer
consumption. Are we committing an omitted variables error?
No, because controlling for beer consumption would imply that we
measures the difference in fatalities due to a one percentage point
increase in tax, holding beer consumption fixed. This is not
interesting.
31 / 49

Applications
Prediction

6 Applications

6.1 Effects of data scaling on OLS statistics

6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms

6.3 Goodness-of-fit and selection of regressors

6.3.1 Adjusted R-squared
6.3.2 Selection of regressors

6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable

32 / 49

Applications
Prediction

6.4.1 Confidence intervals for predictions

(a) CI for E (y |x1 , ..., xk ) (for the average value of y for the
subpopulation with a given set of covariates)
Predictions are subject to sampling variation because they are obtained
using the OLS estimators.
Estimated equation:
y = 0 + 1 x1 + 2 x2 + ... + k xk .

(6.27)

Plugging in particular values of the independent variables, we obtain a

prediction for y . The parameter we would like to estimate is:

0 = 0 + 1 c1 + 2 c2 + ... + k ck

(6.28)

= E (y |x1 = c1 , x2 = c2 , ..., xk = ck ).
The estimator of 0 is

0 = 0 + 1 c1 + 2 c2 + ... + k ck .

(6.29)
33 / 49

Applications
Prediction

The uncertainty in this prediction is represented by a confidence

interval for 0 .
With a large df, we can construct a 95% confidence interval for 0

using the rule of thumb 0 2 se(0 ).

How do we obtain the standard error of 0 ? Trick:
Write 0 = 0 1 c1 2 c2 ... k ck .
Plug this into
y = 0 + 1 x1 + 2 x2 + ... + k xk + u.
This gives

y = 0 + 1 (x1 c1 ) + 2 (x2 c2 ) + ... + k (xk ck ) + u.

(6.30)

That is, we run a regression where we subtract the value cj from each
observation on xj .
The predicted value and its standard error are obtained from the
intercept in regression 6.30.
34 / 49

Applications
Prediction

Example: confidence interval for predicted college GPA

Estimation results for predicting college GPA:
Source |
SS
df
MS
-------------+-----------------------------Model | 499.030504
4 124.757626
Residual | 1295.16517 4132 .313447524
-------------+-----------------------------Total | 1794.19567 4136 .433799728

Number of obs
F( 4, 4132)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

4137
398.02
0.0000
0.2781
0.2774
.55986

-----------------------------------------------------------------------------colgpa |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------sat |
.0014925
.0000652
22.89
0.000
.0013646
.0016204
hsperc | -.0138558
.000561
-24.70
0.000
-.0149557
-.0127559
hsize | -.0608815
.0165012
-3.69
0.000
-.0932328
-.0285302
hsizesq |
.0054603
.0022698
2.41
0.016
.0010102
.0099104
_cons |
1.492652
.0753414
19.81
0.000
1.344942
1.640362
------------------------------------------------------------------------------

Note: definition of variables is colgpa=GPA after fall semester,

sat=combined SAT score, hsperc=high school percentile (from top),
hsize=size grad. class (100s).

35 / 49

Applications
Prediction

What is predicted college GPA, when sat=1,200, hsperc=30, and

hsize=5 (which means 500)?

Define a new set of independent variables: sat0 = sat - 1,200, hsperc0
= hsperc - 30, hsize0 = hsize - 5, and hsizesq0 = hsize2 - 25.
Source |
SS
df
MS
-------------+-----------------------------Model | 499.030503
4 124.757626
Residual | 1295.16517 4132 .313447524
-------------+-----------------------------Total | 1794.19567 4136 .433799728

Number of obs
F( 4, 4132)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

4137
398.02
0.0000
0.2781
0.2774
.55986

-----------------------------------------------------------------------------colgpa |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------sat0 |
.0014925
.0000652
22.89
0.000
.0013646
.0016204
hsperc0 | -.0138558
.000561
-24.70
0.000
-.0149557
-.0127559
hsize0 | -.0608815
.0165012
-3.69
0.000
-.0932328
-.0285302
hsizesq0 |
.0054603
.0022698
2.41
0.016
.0010102
.0099104
_cons |
2.700075
.0198778
135.83
0.000
2.661104
2.739047
-----------------------------------------------------------------------------36 / 49

Applications
Prediction

The variance of the prediction is smallest at the mean values of the xj

(because the variance of the intercept estimator is smallest when each

explanatory variable has zero sample mean).
(b) CI for a particular unit from the population: prediction interval
In forming a confidence interval for an unknown outcome on y , we
must account for the variance in the unobserved error.
Let y 0 be the value for an individual not in our original sample.
Let x10 , x20 , ..., xk0 be the new values of the independent variables.
Let u 0 be the unobserved error.
Model for observation (y 0 , x10 , ..., xk0 ):
y 0 = 0 + 1 x10 + 2 x20 + ... + k xk0 + u 0 .

(6.33)

Prediction:

y 0 = 0 + 1 x10 + 2 x20 + ... + k xk0 .

Prediction error:

e0 = y 0 y 0 = (0 + 1 x10 + 2 x20 + ... + k xk0 ) + u 0 y 0 .

(6.34)
37 / 49

Applications
Prediction

The expected prediction error is zero, E (

e 0 ) = 0, because

E (
y 0 ) = y 0 (as the j are unbiased) and u 0 has zero mean.
The variance of the prediction error is the sum of the variances

because u 0 and y 0 are uncorrelated:

Var (
e 0 ) = Var (
y 0 ) + Var (u 0 ) = Var (
y 0) + 2.

(6.35)

There are two sources of variation in e0 :

1 Sampling error in y
0 , which arises because we have estimated the j ;
decreases with sample size.
2 2 is variance of the error in the population; it does not change with the
sample size.
Standard error of e0 :

se(
e 0 ) = {[se(
y 0 )]2 +
2 }1/2 .

(6.36)

38 / 49

Applications
Prediction

It holds that e0 /se(

e 0 ) has a t distribution with n k 1 degrees of

freedom.
Therefore,

e0
P t/2 6
6
t
/2 = 1
se(
e 0)

y 0 y 0
6 t/2 = 1
P t/2 6
se(
e 0)

P y 0 t/2 se(
e 0 ) 6 y 0 6 y0 + t/2 se(
e 0) = 1

39 / 49

Applications
Prediction

Example: prediction interval (for GPA) for any particular student

Source |
SS
df
MS
-------------+-----------------------------Model | 499.030503
4 124.757626
Residual | 1295.16517 4132 .313447524
-------------+-----------------------------Total | 1794.19567 4136 .433799728

Number of obs
F( 4, 4132)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

4137
398.02
0.0000
0.2781
0.2774
.55986

-----------------------------------------------------------------------------colgpa |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------sat0 |
.0014925
.0000652
22.89
0.000
.0013646
.0016204
hsperc0 | -.0138558
.000561
-24.70
0.000
-.0149557
-.0127559
hsize0 | -.0608815
.0165012
-3.69
0.000
-.0932328
-.0285302
hsizesq0 |
.0054603
.0022698
2.41
0.016
.0010102
.0099104
_cons |
2.700075
.0198778
135.83
0.000
2.661104
2.739047
------------------------------------------------------------------------------

se(
e 0 ) = [(.020)2 + 0.5602 ]1/2 .560.
Prediction interval: 2.70 1.96 .560 = [1.60, 3.80].
40 / 49

Applications
Prediction

6.4.2 Predicting y when ln y is the dependent variable

Given the OLS estimators, we can predict ln y for any value of the

explanatory variables:
d
ln
y = 0 + 1 x1 + 2 x2 + ... + k xk .

(6.39)

How to predict y ?

d
N.B.: y 6= exp(ln
y ). Hence, simply exponentiate the predicted value
for ln y does not work. In fact, it will systematically underestimate the
expected value of y .
It can be shown that

E (y |x) = exp( 2 /2) exp(0 + 1 x1 + 2 x2 + ... + k xk ),

where 2 is the variance of u.
41 / 49

Applications
Prediction

Hence, the prediction of y is:

d
y = exp(
2 /2) exp(ln
y ),

(6.40)

where
2 is the unbiased estimator of 2 .
The prediction in 6.40 relies on the normality of the error term, u.
How to obtain a prediction that does not rely on normality?
General model:

E (y |x) = 0 exp(0 + 1 x1 + 2 x2 + ... + k xk ),

(6.41)

where 0 is the expected value of exp(u).

Given an estimate
0 , we can predict y as

d
y =
0 exp(ln
y ).

(6.42)

42 / 49

Applications
Prediction

First approach to estimate 0 : a consistent but not unbiased

smearing estimate is

0 = n1

n
X

exp(
ui ).

(6.43)

i=1

Second approach to estimate 0 :

Define mi = exp(0 + 1 xi1 + 2 xi2 + ... + k xik ).
d
Replace the j with their OLS estimates and obtain m
i = exp(ln
yi ).
Estimate a simple regression of yi on m
i without an intercept. The
slope estimate is a consistent but not unbiased estimate for 0 .
With a consistent estimate for 0 , the prediction for y can be

d
calculated as
0 exp(ln
y ).

43 / 49

Applications
Prediction

Example: predicting CEO salaries

Model:
ln salary = 0 + 1 ln sales + 2 ln mktval + 3 ceoten + u,
Estimation results:
Source |
SS
df
MS
-------------+-----------------------------Model | 20.5672434
3 6.85574779
Residual | 44.0789697
173 .254791732
-------------+-----------------------------Total | 64.6462131
176 .367308029

Number of obs
F( 3,
173)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

177
26.91
0.0000
0.3182
0.3063
.50477

-----------------------------------------------------------------------------lsalary |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lsales |
.1628545
.0392421
4.15
0.000
.0853995
.2403094
lmktval |
.109243
.0495947
2.20
0.029
.0113545
.2071315
ceoten |
.0117054
.0053261
2.20
0.029
.001193
.0222178
_cons |
4.503795
.2572344
17.51
0.000
3.996073
5.011517
-----------------------------------------------------------------------------44 / 49

Applications
Prediction

The smearing estimate for 0 is:

. predict uhat, res
. gen euhat = exp(uhat)
. su euhat
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------euhat |
177
1.135661
.6970541
.0823372
6.378018

45 / 49

Applications
Prediction

The regression estimate for 0 is:

. predict lsalary_hat
(option xb assumed; fitted values)
. gen m_hat = exp(lsalary_hat)
. reg salary m_hat, nocons
Source |
SS
df
MS
-------------+-----------------------------Model |
147352711
1
147352711
Residual |
46113901
176 262010.801
-------------+-----------------------------Total |
193466612
177 1093031.71

Number of obs
F( 1,
176)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

177
562.39
0.0000
0.7616
0.7603
511.87

-----------------------------------------------------------------------------salary |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------m_hat |
1.116857
.0470953
23.71
0.000
1.023912
1.209801
------------------------------------------------------------------------------

46 / 49

Applications
Prediction

Prediction for sales = 5,000 (which means $5 billion because sales is in

millions), mktval = 10,000 (or $10 billion), and ceoten = 10:

ln\
salary = 4.503 + 0.163 ln(5000) + 0.109 ln(10000) + 0.012 10
= 7.013.
Naive prediction: exp(7.013) = 1110.983.
Prediction using smearing estimate: 1.136 exp(7.013) = 1262.076.
Prediction using regression estimate: 1.117 exp(7.013) = 1240.967.

47 / 49

Key terms

References

Key terms

adjusted R-squared
interaction effect
nonnested models
over controlling
prediction error
prediction interval
predictions
quadratic functions
smearing estimate
variance of the prediction error

48 / 49

Key terms

References

References
Textbook: Chapter 6 in Wooldridge (2013).
Further readings: Chapter 8, Chapter 9 in Stock and Watson (2012).
Chapter 6, Chapter 10 in Hill et al. (2001)
Hill, R. C., Griffiths, W. E., and Judge, G. G. (2001). Undergraduate
Econometrics. John Wiley & Sons, New York.
Stock, J. H. and Watson, M. W. (2012). Introduction to Econometrics.
Pearson, Boston.
Wooldridge, J. M. (2013). Introductory Econometrics: A Modern Approach.
Cengage Learning, Mason, OH.

49 / 49

Chapter 6 Multiple Regression Analysis Further Issues
100% (3)
Chapter 6 Multiple Regression Analysis Further Issues
9 pages
Wooldridge 7e Ch06 IM
100% (1)
Wooldridge 7e Ch06 IM
20 pages
Tinywow Groupproject
No ratings yet
Tinywow Groupproject
376 pages
Sample Question For Business Statistics
100% (2)
Sample Question For Business Statistics
12 pages
Topic 9. Factorial Experiments (ST&D Chapter 15) : 1 2 I I I I
No ratings yet
Topic 9. Factorial Experiments (ST&D Chapter 15) : 1 2 I I I I
19 pages
Introduction To Econometrics - Stock & Watson - CH 6 Slides
No ratings yet
Introduction To Econometrics - Stock & Watson - CH 6 Slides
59 pages
Buku SPSS Complete
No ratings yet
Buku SPSS Complete
72 pages
Lecture 9 - Slides - Multiple Regression and Effect On Coefficients PDF
No ratings yet
Lecture 9 - Slides - Multiple Regression and Effect On Coefficients PDF
127 pages
Chapter 08 Nonlinear Regression Functions
No ratings yet
Chapter 08 Nonlinear Regression Functions
75 pages
Advanced Regression PDF
No ratings yet
Advanced Regression PDF
160 pages
Univariate ANOVA and ANCOVA
100% (1)
Univariate ANOVA and ANCOVA
33 pages
GLM Notes
No ratings yet
GLM Notes
173 pages
ECON6001 F2021 Topic4
No ratings yet
ECON6001 F2021 Topic4
76 pages
ADM730 InstructorNotes
No ratings yet
ADM730 InstructorNotes
138 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
CH 07 Specification and Data Issues TQT
No ratings yet
CH 07 Specification and Data Issues TQT
45 pages
Multiple Regression Analysis Further Issues
No ratings yet
Multiple Regression Analysis Further Issues
27 pages
EC501 Lecture 04
No ratings yet
EC501 Lecture 04
30 pages
Lecture 2 Simple Regression Model
100% (1)
Lecture 2 Simple Regression Model
47 pages
Funtional Form
No ratings yet
Funtional Form
21 pages
Ch06 - Further Issues - Ver1
No ratings yet
Ch06 - Further Issues - Ver1
39 pages
Taguchi Va RSM
No ratings yet
Taguchi Va RSM
14 pages
Optimization and In-Vitro Characterization of Erythromycin Nanogels by Using Simplex-Lattice Mixture Design
No ratings yet
Optimization and In-Vitro Characterization of Erythromycin Nanogels by Using Simplex-Lattice Mixture Design
13 pages
Lecture 9
No ratings yet
Lecture 9
16 pages
Sun Tzu Jurnal - Bahan
No ratings yet
Sun Tzu Jurnal - Bahan
10 pages
1-Chap II Econometrics ABC DR Mitiku
No ratings yet
1-Chap II Econometrics ABC DR Mitiku
80 pages
ECON0019 Lecture7 Slides
No ratings yet
ECON0019 Lecture7 Slides
43 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
42 pages
2024 1 Metrics 6 Multipleols 1
No ratings yet
2024 1 Metrics 6 Multipleols 1
35 pages
Lecture Set 5
No ratings yet
Lecture Set 5
54 pages
Effects of Global Market Conditions On Brand Image Customization and Brand Performance
No ratings yet
Effects of Global Market Conditions On Brand Image Customization and Brand Performance
26 pages
Introductory Econometrics A Modern Approach 6th Edition Wooldridge Test Bank Download
100% (24)
Introductory Econometrics A Modern Approach 6th Edition Wooldridge Test Bank Download
9 pages
University of California Press Society For The Study of Social Problems
No ratings yet
University of California Press Society For The Study of Social Problems
25 pages
Units of Measurement and Functional Form
No ratings yet
Units of Measurement and Functional Form
9 pages
Patients Preferences Regarding The Process and Outcomes of Lifesaving Technology
No ratings yet
Patients Preferences Regarding The Process and Outcomes of Lifesaving Technology
12 pages
Econoch 6
No ratings yet
Econoch 6
43 pages
Chapter 6
No ratings yet
Chapter 6
23 pages
Multiple Regression
No ratings yet
Multiple Regression
14 pages
Sol HW6
100% (2)
Sol HW6
15 pages
Statistics For Managers Using Microsoft Excel: 7 Edition
No ratings yet
Statistics For Managers Using Microsoft Excel: 7 Edition
68 pages
H U M: D E: OW TO SE Initab Esign OF Xperiments
100% (1)
H U M: D E: OW TO SE Initab Esign OF Xperiments
38 pages
Lecture 3 - LRM
No ratings yet
Lecture 3 - LRM
40 pages
Tema II (Forma Funcional)
No ratings yet
Tema II (Forma Funcional)
41 pages
Vahit Saydam Term Paper
No ratings yet
Vahit Saydam Term Paper
9 pages
Kerr y Zelazo 2004 - Desarrollo - Funciones - Ejecutivas - Ni - Os PDF
No ratings yet
Kerr y Zelazo 2004 - Desarrollo - Funciones - Ejecutivas - Ni - Os PDF
10 pages
Fraenkel4 PPT ch13
No ratings yet
Fraenkel4 PPT ch13
31 pages
Chap 6
No ratings yet
Chap 6
11 pages
CH 6 Slides
No ratings yet
CH 6 Slides
59 pages
Lecture 3 - Functional Forms
No ratings yet
Lecture 3 - Functional Forms
31 pages
Linear Regression Model - Applied - Part 1&2
No ratings yet
Linear Regression Model - Applied - Part 1&2
69 pages
Topic 3 Econometrics Interactions Between Regressors
No ratings yet
Topic 3 Econometrics Interactions Between Regressors
25 pages
Business Stats Ken Black Case Answers
100% (1)
Business Stats Ken Black Case Answers
54 pages
Lecture 2 - LRM
No ratings yet
Lecture 2 - LRM
43 pages
Gait Speed and Survival JAMA 2011
No ratings yet
Gait Speed and Survival JAMA 2011
9 pages
CH - 05 - Further Issues - TQT
No ratings yet
CH - 05 - Further Issues - TQT
35 pages
Ssss PDF
No ratings yet
Ssss PDF
50 pages
Econ321 2017 Tutorial 1
No ratings yet
Econ321 2017 Tutorial 1
12 pages
Notes 9
No ratings yet
Notes 9
57 pages
Logarithmic Functional Form
No ratings yet
Logarithmic Functional Form
20 pages
UnivariateRegression 3
No ratings yet
UnivariateRegression 3
81 pages
Financial Econometrics: ASSIGNMENT: Functional Forms of Regression Models
No ratings yet
Financial Econometrics: ASSIGNMENT: Functional Forms of Regression Models
14 pages
Extensions of Regression Models Unit 3
No ratings yet
Extensions of Regression Models Unit 3
5 pages
Lec 7
No ratings yet
Lec 7
39 pages
2 Functionalformmf
No ratings yet
2 Functionalformmf
14 pages
SMuR Complete
No ratings yet
SMuR Complete
114 pages
Chap3 - Multiple Regression
No ratings yet
Chap3 - Multiple Regression
56 pages
Lec Topic6
No ratings yet
Lec Topic6
33 pages
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
No ratings yet
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
17 pages
CH 06
No ratings yet
CH 06
22 pages
CER139.1-Activity3 AlteradoCastillon
No ratings yet
CER139.1-Activity3 AlteradoCastillon
11 pages
Introductory Econometrics: Regression Functional Form, Model Selection, Prediction
No ratings yet
Introductory Econometrics: Regression Functional Form, Model Selection, Prediction
32 pages
ps5 Fall+2015
No ratings yet
ps5 Fall+2015
9 pages
PDF
No ratings yet
PDF
9 pages
Hayashi 1 13
No ratings yet
Hayashi 1 13
13 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Teaching DoE With Paper Helicopters and Minitab
No ratings yet
Teaching DoE With Paper Helicopters and Minitab
17 pages
Ch3 Multiple Regression
No ratings yet
Ch3 Multiple Regression
56 pages
Introduction To Econometrics - Summary
No ratings yet
Introduction To Econometrics - Summary
23 pages
Business Statistics, 5 Ed.: by Ken Black
No ratings yet
Business Statistics, 5 Ed.: by Ken Black
34 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
Lecture 18. Serial Correlation: Testing and Estimation Testing For Serial Correlation
No ratings yet
Lecture 18. Serial Correlation: Testing and Estimation Testing For Serial Correlation
21 pages
Choosing A Functional Form
No ratings yet
Choosing A Functional Form
8 pages
Corrosion Prediction
No ratings yet
Corrosion Prediction
12 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
3 pages
I2.4. Aggressive Driving-The Contribution of The Drivers and The Situation
No ratings yet
I2.4. Aggressive Driving-The Contribution of The Drivers and The Situation
24 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
17 pages
Trần Thị Thanh Thảo 31231025653
No ratings yet
Trần Thị Thanh Thảo 31231025653
5 pages
Stress Coping and Satisfaction in Nursing Students
No ratings yet
Stress Coping and Satisfaction in Nursing Students
12 pages
Leadership in
No ratings yet
Leadership in
20 pages
Solutions Manual to accompany Introduction to Linear Regression Analysis
From Everand
Solutions Manual to accompany Introduction to Linear Regression Analysis
Douglas C. Montgomery
1/5 (1)