0% found this document useful (0 votes)
12 views59 pages

CH 04

Slides from Chapter 4 Hill et al principle of econometrics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views59 pages

CH 04

Slides from Chapter 4 Hill et al principle of econometrics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 59

Chapter 4

Prediction, Goodness-of-Fit, and Modeling


Issues

Principles of
Econometrics
Fifth Edition
R. Carter Hill William E. Griffiths Guay C. Lim
Chapter Outline
 4.1 Least Squares Prediction

 4.2 Measuring Goodness-of-Fit

 4.3 Modeling Issues

 4.4 Polynomial Models

 4.5 Log-Linear Models

 4.6 Log-Log Models

Copyright ©2018 John Wiley & Son, Inc. 2


4.1 Least Squares Prediction (1 of 7)
 The ability to predict is important to:
 Business economists and financial analysts who attempt to forecast the sales and
revenues of specific firms.
 Government policymakers who attempt to predict the rates of growth in national
income, inflation, investment, saving, social insurance program expenditures, and
tax revenues.
 Local businesses who need to have predictions of growth in neighborhood
populations and income so that they may expand or contract their provision of
service.
 Accurate predictions provide a basis for better decision-making in every type of planning
context.

Copyright ©2018 John Wiley & Son, Inc. 3


4.1 Least Squares Prediction (2 of 7)
 In order to use regression analysis as a basis for prediction, we must assume

that y0 and x0 are related to one another by the same regression model that

describes our sample of data, so that, in particular, SR1 holds for these

observations.
y0 β1  β 2 x0  e0
(4.1)

where e0 is a random error.

Copyright ©2018 John Wiley & Son, Inc. 4


4.1 Least Squares Prediction (3 of 7)
 The task of predicting y0 is related to the problem of estimating E(y0| x0)= β1 + β2x0.

 Although E(y0|x0) = β1 + β2x0 is not random, the outcome y0 is random.

 Consequently, as we will see, there is a difference between the interval estimate of

E(y0|x0) = β1 + β2x0 and the prediction interval for y0.

 The least squares predictor of y0 comes from the fitted regression line

(4.2)
ŷ 0 b1  b 2 x 0

Copyright ©2018 John Wiley & Son, Inc. 5


4.1 Least Squares Prediction (4 of 7)

 To evaluate how well this predictor performs, we define the

forecast error, which is analogous to the least squares residual:

(4.3) f  y0  yˆ 0 β1  β 2 x0  e0   b1  b 2 x0 

 We would like the forecast error to be small, implying that our

forecast is close to the value we are predicting.

Copyright ©2018 John Wiley & Son, Inc. 6


4.1 Least Squares Prediction (5 of 7)
 Taking the expected value of f, we find that:

which means, on average, the forecast error is zero and is an unbiased predictor of y0.

 However, unbiasedness does not necessarily imply that a particular forecast will be close

to the actual value.

 y0 is the best linear unbiased predictor (BLUP) of y0 if assumptions SR1–SR5 hold.

Copyright ©2018 John Wiley & Son, Inc. 7


4.1 Least Squares Prediction (6 of 7)
 The variance of the forecast is: (4.4) ]

 The variance of the forecast is smaller when:

 The overall uncertainty in the model is smaller, as measured by the


variance of the random errors σ2.
 The sample size N is larger.

 The variation in the explanatory variable is larger.

 The value of (x0-x)2 is small.

Copyright ©2018 John Wiley & Son, Inc. 8


4.1 Least Squares Prediction (7 of 7)
 In practice we use for the variance.

 The standard error of the forecast is: (4.5)

 The 100(1 – α)% prediction interval is:

 (4.6)

yˆ 0 tc se  f 

Copyright ©2018 John Wiley & Son, Inc. 9


Copyright ©2018 John Wiley & Son, Inc. 10
4.2 Measuring Goodness-of-Fit (1 of 6)

 There are two major reasons for analyzing the model:

(4.7) yi β1  β 2 xi  ei

1. To explain how the dependent variable (yi) changes as the

independent variable (xi) changes.

2. To predict y0 given an x0.

Copyright ©2018 John Wiley & Son, Inc. 11


4.2 Measuring Goodness-of-Fit (2 of 6)

 To develop a measure of the variation in yi that is explained by the

model, we begin by separating yi into its explainable and

unexplainable components.

 (4.8)

 E(yi|x) is the explainable or systematic part.

 ei is the random, unsystematic and unexplainable component.


Copyright ©2018 John Wiley & Son, Inc. 12
4.2 Measuring Goodness-of-Fit (3 of 6)
 Recall that the sample variance of yi is s 2

  yˆ  y 
i
y
N1
 Squaring and summing both sides of (4.10), and using the fact that

  yˆ i  y eˆi 0
we get: (4.11)

 y  y    yˆi  y    eˆi2
2 2
i

 Eq. 4.11 decomposition of the ‘‘total sample variation’’ in y into explained and

unexplained components.

 These are called ‘‘sums of squares.’’


Copyright ©2018 John Wiley & Son, Inc. 13
4.2 Measuring Goodness-of-Fit (4 of 6)
 Specifically:

  y  y  total sum of squares  SST


2
i

  yˆ  y  sum of squares due to regression


2
i  SSR

 eˆ sum of squares due to error  SSE


2
i

 Using these abbreviations, (4.11) becomes SST = SSR + SSE.

Copyright ©2018 John Wiley & Son, Inc. 14


4.2 Measuring Goodness-of-Fit (5 of 6)

 Let’s define the coefficient of determination, or R2 , as the

proportion of variation in y explained by x within the regression

model:
2 SSR SSE
 (4.12) R  1 
SST SST

 The closer R2 is to 1, the closer the sample values yi are to the fitted

regression equation.
Copyright ©2018 John Wiley & Son, Inc. 15
4.2 Measuring Goodness-of-Fit (6 of 6)

 If R2 = 1, then all the sample data fall exactly on the fitted least

squares line, so SSE = 0, and the model fits the data ‘‘perfectly.’’

 If the sample data for y and x are uncorrelated and show no linear

association, then the least squares fitted line is ‘‘horizontal,’’ and

identical to y, so that SSR = 0 and R2 = 0.

Copyright ©2018 John Wiley & Son, Inc. 16


Copyright ©2018 John Wiley & Son, Inc. 17
4.2.1 Correlation Analysis (1 of 2)
 The correlation coefficient ρxy between x and y is defined as:

cov  x, y  σ xy
ρ xy  
(4.13) σ xσ y
var  x  var  y 

 Substituting sample values, we get the sample correlation


coefficient: sxy
rxy 
sx s y

Copyright ©2018 John Wiley & Son, Inc. 18


4.2.1 Correlation Analysis (2 of 2)
 Where:
sxy   xi  x  yi  y   N  1

  x  x   N  1
2
sx  i

  y  y   N  1
2
sy  i

 The sample correlation coefficient rxy has a value between -1 and 1, and it

measures the strength of the linear


Copyright association
©2018 between
John Wiley & Son, Inc. observed values of x 19
4.2.2 Correlation Analysis and R2

 Two relationships between R2 and rxy:

1. r2xy = R2

2. R2 can also be computed as the square of the sample

correlation coefficient between yi and b1 + b2 xi.

Copyright ©2018 John Wiley & Son, Inc. 20


4.3.1 The Effects of Scaling the Data
(1 of 4)
 What are the effects of scaling the variables in a regression model?

 Consider the food expenditure example.

 We report weekly expenditures in dollars, but we report income in $100 units,


so a weekly income of $2,000 is reported as x = 20.
 If we had estimated the regression using income in dollars, the results would
have been:

FOOD_EXP = 83.42 + 0.1021 INCOME($) = 0.385 (se) (43.41) ∗(0.0209) ∗∗∗

Copyright ©2018 John Wiley & Son, Inc. 21


4.3.1 The Effects of Scaling the Data
(2 of 4)
 Possible effects of scaling the data:

1. Changing the scale of x: the coefficient of x must be multiplied by c, the


scaling factor.
 When the scale of x is altered, the only other change occurs in the
standard error of the regression coefficient, but it changes by the same
multiplicative factor as the coefficient, so that their ratio, the t-statistic,
is unaffected.
 All other regression statistics are unchanged.

Copyright ©2018 John Wiley & Son, Inc. 22


4.3.1 The Effects of Scaling the Data
(3 of 4)
 Possible effects of scaling the data:

2. Changing the scale of y: If we change the units of measurement of y, but not x,


then all the coefficients must change in order for the equation to remain valid.
 Because the error term is scaled in this process the least squares residuals
will also be scaled.
 This will affect the standard errors of the regression coefficients, but it will
not affect t-statistics or R2.

Copyright ©2018 John Wiley & Son, Inc. 23


4.3.1 The Effects of Scaling the Data
(4 of 4)
 Possible effects of scaling the data:

3. Changing the scale of y and x by the same factor: there will be no change in
the reported regression results for b2 , but the estimated intercept and
residuals will change.
 t-statistics and R2 are unaffected.
 The interpretation of the parameters is made relative to the new units of
measurement.

Copyright ©2018 John Wiley & Son, Inc. 24


4.3.2 Choosing a Functional Form (1
of 3)
 The starting point in all econometric analyses is economic theory.

 What does economics really say about the relation between food expenditure

and income, holding all else constant?

 We expect there to be a positive relationship between these variables because

food is a normal good.

 But nothing says the relationship must be a straight line.

Copyright ©2018 John Wiley & Son, Inc. 25


4.3.2 Choosing a Functional Form (2
of 3)
 By transforming the variables y and x we can represent many curved, nonlinear
relationships and still use the linear regression model.
 Choosing an algebraic form for the relationship means choosing
transformations of the original variables.
 The most common are:
 Power: If x is a variable, then xp means raising the variable to the power p.
 Quadratic (x2)
 Cubic (x3)
 Natural logarithm: If x is a variable, then its natural logarithm is ln(x).

Copyright ©2018 John Wiley & Son, Inc. 26


Copyright ©2018 John Wiley & Son, Inc. 27
Copyright ©2018 John Wiley & Son, Inc. 28
4.3.2 Choosing a Functional Form (3
of 3)
 Summary of three configurations:

1. In the log-log model both the dependent and independent variables are
transformed by the ‘‘natural’’ logarithm.

 The parameter β2 is the elasticity of y with respect to x.

2. In the log-linear model only the dependent variable is transformed by the


logarithm.

3. In the linear-log model the variable x is transformed by the natural


logarithm.

Copyright ©2018 John Wiley & Son, Inc. 29


4.3.3 A Linear-Log Food Expenditure
Model (1 of 2)
 A linear-log equation has a linear, untransformed term on the left-hand side

and a logarithmic term on the right-hand side: y = β1 + β2ln(x).

 The elasticity of y with respect to x is:  slope x y β 2 y


 A convenient interpretation is:
y  y1  y0 β 2  ln  x1   ln  x0 
• The change in y, represented in β
 2 100  ln  x1   ln  x0 
its units of measure, is 100
β2
approximately β2 =100 times  % x 
100
the percentage change in x.
Copyright ©2018 John Wiley & Son, Inc. 30
4.3.3 A Linear-Log Food Expenditure
Model (2 of 2)
 Given alternative models that involve different transformations of the dependent
and independent variables, and some of which have similar shapes, what are
some guidelines for choosing a functional form?

1. Choose a shape that is consistent with what economic theory tells us about the
relationship.

2. Choose a shape that is sufficiently flexible to “fit” the data.

3. Choose a shape so that assumptions SR1–SR6 are satisfied, ensuring that the
least squares estimators have the desirable properties described in Chapters 2
and 3.
Copyright ©2018 John Wiley & Son, Inc. 31
4.3.4 Using Diagnostic Residual Plots
(1 of 5)
 When specifying a regression model, we may inadvertently choose an
inadequate or incorrect functional form.

1. Examine the regression results.


 There are formal statistical tests to check for:

 Homoskedasticity

 Serial correlation

2. Use residual plots.

Copyright ©2018 John Wiley & Son, Inc. 32


Copyright ©2018 John Wiley & Son, Inc. 33
Copyright ©2018 John Wiley & Son, Inc. 34
Copyright ©2018 John Wiley & Son, Inc. 35
Copyright ©2018 John Wiley & Son, Inc. 36
4.3.5 Are the Regression Errors
Normally Distributed?
 Hypothesis tests and interval estimates for the coefficients rely on the
assumption that the errors, and hence the dependent variable y, are normally
distributed.
 A histogram of the least squares residuals gives us a graphical representation
of the empirical distribution.
 There are many tests for normality:

 The Jarque–Bera test for normality is valid in large samples.

 It is based on two measures: skewness and kurtosis.

Copyright ©2018 John Wiley & Son, Inc. 37


4.3.6 Identifying Influential
Observations (1 of 2)
 One worry in data analysis is that we may have some unusual and/or influential

observations Sometimes, these are termed “outliers.”

 If an unusual observation is the result of a data error, then we should correct it.

 Understanding how it came about, the story behind it, can be informative.

 One way to detect whether an observation is influential is to delete it and re-estimate

the model.

Copyright ©2018 John Wiley & Son, Inc. 38


4.3.6 Identifying Influential
Observations (2 of 2)
 The studentized residual is the standardized residual based on the delete-

one sample.

 If the studentized residual falls outside the 95% interval estimate interval,

then the observation is worth examining because it is “unusually” large.

 Another measure of the influence of a single observation on the least

squares estimates is called DFBETAS.

Copyright ©2018 John Wiley & Son, Inc. 39


4.4 Polynomial Models
 In addition to estimating linear equations, we can also estimate

quadratic and cubic equations.

 Economics students will have seen many average and marginal

cost curves (U-shaped) and average and marginal product curves

(inverted-U shaped) in their studies.

Copyright ©2018 John Wiley & Son, Inc. 40


4.4.1 Quadratic and Cubic
Equations
 The general form of a quadratic equation is:
2
𝑦 =a 0 + a 1 𝑥+ a 2 𝑥

 The general form of a cubic equation is:


2 3
𝑦 =a 0 + a 1 𝑥+ a 2 𝑥 +a 3 𝑥
 A problem with the linear equation is that it implies an increase at the same
constant rate, when one might expect a rate to be increasing.
 Polynomial models may provide a better fit.

Copyright ©2018 John Wiley & Son, Inc. 41


4.5 Log-Linear Models (1 of 2)
 Econometric models that employ natural logarithms are very common.

 Logarithmic transformations are often used for variables that are monetary

values.

 Wages, salaries, income, prices, sales, and expenditures

 In general, for variables that measure the ‘‘size’’ of something

 These variables have the characteristic that they are positive and often

have distributions that are positively skewed, with a long tail to the right.

Copyright ©2018 John Wiley & Son, Inc. 42


4.5 Log-Linear Models (2 of 2)
 The log-linear model, ln(y) = β1 + β2x, has a logarithmic term on the left-hand side of

the equation and an untransformed (linear) variable on the right-hand side.

 Both its slope and elasticity change at each point and are the same sign as β2.

 In the log-linear model, a one-unit increase in x leads, approximately, to a 100

β2 % change in y.

100  ln  y1   ln  y0  % y 100β 2  x1  x0  100β 2 x

Copyright ©2018 John Wiley & Son, Inc. 43


4.5.1 Prediction in the Log-Linear
Model (1 of 3)
 In a log-linear regression the R2 value automatically reported by statistical

software is the percent of the variation in ln(y) explained by the model.

 However, our objective is to explain the variations in y, not ln(y).

 Furthermore, the fitted regression line predicts

 =b1+b2 x

 whereas we want to predict y.

Copyright ©2018 John Wiley & Son, Inc. 44


4.5.1 Prediction in the Log-Linear
Model (2 of 3)
 A natural choice for prediction is:

 The subscript “n” is for “natural.”

 But a better alternative is:

 The subscript “c” is for “corrected.”

 This uses the properties of the log-normal distribution.

Copyright ©2018 John Wiley & Son, Inc. 45


4.5.1 Prediction in the Log-Linear
Model (3 of 3)
 Recall that σ2 must be greater than zero and e0 = 1.

 Thus, the effect of the correction is always to increase the value of the

prediction, because is always greater than one.

 The natural predictor tends to systematically under predict the value of y in a

log-linear model, and the correction offsets the downward bias in large

samples.

Copyright ©2018 John Wiley & Son, Inc. 46


Example 4.11 Prediction in the
Log-Linear Model
 The wage equation:

= 1.5968 + 0.0988 × EDUC = 1.5968 + 0.0988 × 12 = 2.7819

 The natural predictor is:

 The corrected predictor is:

Copyright ©2018 John Wiley & Son, Inc. 47


Copyright ©2018 John Wiley & Son, Inc. 48
4.5.2 A Generalized Measure
 A general goodness-of-fit measure, or general R2, is:

Rg2 corry, yˆ  ry2yˆ


2

 For the wage equation, the general R2 is:

R corr y, ŷ  0.4647 2 0.2159


2 2
g

 Compare this to the reported R2 = 0.2577.

Copyright ©2018 John Wiley & Son, Inc. 49


4.5.3 Prediction Intervals in the Log-
Linear Model
 If we prefer a prediction or forecast interval over a “point”

predictor for y, then we must rely on the natural predictor yn.

 A 100(1 – α)% prediction interval for y is:

Copyright ©2018 John Wiley & Son, Inc. 50


Example 4.12 Prediction Intervals
for a Log-Linear Model
 For the wage equation, a 95% prediction interval for the wage of a worker with

12 years of education is:

 [exp(2.7819-1.96 x 0.4850),exp(2.7819 + 1.96 × 0.4850)]=[6.2358, 41.8233]

 The interval prediction is $6.24–$41.82, which is so wide that it is basically

useless.

 Our model is not an accurate predictor of individual behavior in this case.

Copyright ©2018 John Wiley & Son, Inc. 51


Copyright ©2018 John Wiley & Son, Inc. 52
4.6 Log-Log Models (1 of 2)

 The log-log function, ln(y) = β1 + β2ln(x), is widely used to describe

demand equations and production functions.

 In order to use this model, all values of y and x must be positive.

 The slopes of these curves change at every point, but the elasticity

is constant and equal to β2.

Copyright ©2018 John Wiley & Son, Inc. 53


4.6 Log-Log Models (2 of 2)

 If β2 > 0, then y is an increasing function of x.

 If β2 > 1, then the function increases at an increasing rate.

 If 0 < β2 < 1, then the function is increasing, but at a

decreasing rate.

 If β2 < 0, then there is an inverse relationship between y and x.

Copyright ©2018 John Wiley & Son, Inc. 54


Example 4.13 Log-Log Poultry
Demand Equation (1 of 2)
 The estimated model is:

(4.15)

(se) (0.022) (0.049)

 We estimate that the price elasticity of demand is 1.121: a 1% increase

in real price is estimated to reduce quantity consumed by 1.121%.

Copyright ©2018 John Wiley & Son, Inc. 55


Example 4.13 Log-Log Poultry
Demand Equation (2 of 2)
 Using the estimated error variance σ ̂^2 = 0.0139, the corrected predictor is:

 The generalized goodness-of-fit is:

2
 
R  corr Q, Q̂ c
g 2
0.939 2 0.8817

Copyright ©2018 John Wiley & Son, Inc. 56


Copyright ©2018 John Wiley & Son, Inc. 57
Key Words
 coefficient of  kurtosis  prediction
determination  prediction interval
 least squares predictor
 correlation  R^2
 linear model
 forecast error  residual diagnostics
 linear relationship
 functional form
 linear-log model  scaling data
 goodness-of-fit  skewness
 log-linear model
 growth model  standard error of the
 log-log model
 influential forecast
 log-normal distribution
observations
 Jarque–Bera test

Copyright ©2018 John Wiley & Son, Inc. 58


Copyright
Copyright © 2018 John Wiley & Sons, Inc.
All rights reserved. Reproduction or translation of this work beyond that permitted in
Section 117 of the 1976 United States Act without the express written permission of the
copyright owner is unlawful. Request for further information should be addressed to the
Permissions Department, John Wiley & Sons, Inc. The purchaser may make back-up copies
for his/her own use only and not for distribution or resale. The Publisher assumes no
responsibility for errors, omissions, or damages, caused by the use of these programs or
from the use of the information contained herein.

Copyright ©2017 John Wiley & Son, Inc. 59

You might also like