0% found this document useful (0 votes)

673 views148 pages

Advanced Econometric Analysis with R & SAS

This document provides an introduction to an advanced course on analyzing economic and financial data using R and SAS. It discusses the course content, which will cover advanced topics in nonlinear time series models and models related to economic applications. Students will learn how to build models based on real data examples and explore data using statistical software like R and SAS. The course involves several projects requiring computer work, and students can work in groups for discussion but must submit individual work.

Uploaded by

wuxuefei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

673 views148 pages

Advanced Econometric Analysis with R & SAS

Uploaded by

wuxuefei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Advanced Topics in Analysis of Economic and Financial Data Using R and SAS1

ZONGWU CAIa,b
E-mail address: zcai@[Link]
a

Department of Mathematics & Statistics and Department of Economics, University of North Carolina, Charlotte, NC 28223, U.S.A.

Wang Yanan Institute for Studies in Economics, Xiamen University, China

December 27, 2007

c 2007, ALL RIGHTS RESERVED by ZONGWU CAI

manuscript may be printed and reproduced for individual or instructional use, but may not be printed for commercial purposes.

1 This

Preface
This is the advanced level of econometrics and nancial econometrics with some basic theory and heavy applications. Here, our focuses are on the SKILLS of analyzing real data using advanced econometric techniques and statistical softwares such as SAS and R. This is along the line with our WISEs spirit STRONG THEORETICAL FOUNDATION and SKILL EXCELLENCE. In other words, this course covers basically some advanced topics in analysis of economic and nancial data, particularly in nonlinear time series models and some models related to economic and nancial applications. The topics covered start from classical approaches to modern modeling techniques even up to the research frontiers. The dierence between this course and others is that you will learn step by step how to build a model based on data (or so-called let data speak themselves) through real data examples using statistical softwares or how to explore the real data using what you have learned. Therefore, there is no a single book serviced as a textbook for this course so that materials from some books and articles will be provided. However, some necessary handouts, including computer codes like SAS and R codes, will be provided with your help (You might be asked to print out the materials by yourself). Five or six projects, including the heavy computer works, are assigned throughout the term. The group discussion is allowed to do the projects, particularly writing the computer codes. But, writing the nal report to each project must be in your own language. Copying each other will be regarded as a cheating. If you use the R language, similar to SPLUS, you can download it from the public web site at [Link] and install it into your own computer or you can use PCs at our labs. You are STRONGLY encouraged to use (but not limited to) the package R since it is a very convenient programming language for doing statistical analysis and Monte Carol simulations as well as various applications in quantitative economics and nance. Of course, you are welcome to use any one of other packages such as SAS, MATLAB, GAUSS, STATA, SPSS and EVIEW. But, I might not have an ability of giving you a help if doing so. Some materials are based on the lecture notes given by Professor Robert H. Shumway, Department of Statistics, University of California at Davis and my colleague, Professor Stanislav Radchenko, Department of Economics, University of North Carolina at Charlotte, and the book by Tsay (2005). Some datasets are provided by Professor Robert H. Shumway, Department of Statistics, University of California at Davis and Professor Phillips Hans Franses at University of Rotterdam, Netherland. I am very grateful to them for providing their lecture notes and datasets. Finally, I have to express my many thanks to our Master student Ms. Huiqun Ma for writing all SAS codes.

How to Install R ?
The main package used is R, which is free from R-Project for Statistical Computing. It also needs Ox with G@RCH to t various GARCH models. Students may use other packages or programs if they prefer.

Install R
(1) go to the web site [Link] (2) click CRAN; (3) choose a site for downloading, say [Link] (4) click Windows (95 and later); (5) click base; (6) click [Link] (Version of November 26, 2007) to save this le rst and then run it to install (Note that the setup program is 29 megabytes and it is updated every three months). The above steps installs the basic R into your computer. If you need to install other packages, you need to do the followings: (7) After it is installed, there is an icon on the screen. Click the icon to get into R; (8) Go to the top and nd packages and then click it; (9) Go down to Install package(s)... and click it; (10) There is a new window. Choose a location to download packages, say USA(CA1), move mouse to there and click OK; (11) There is a new window listing all packages. You can select any one of packages and click OK, or you can select all of them and then click OK.

Install Ox?
OxMetricstm is a family of of software packages providing an integrated solution for the econometric analysis of time series, forecasting, nancial econometric modelling, or statistical analysis of cross-section and panel data. OxMetrics consists of a front-end program called OxMetrics, and individual application modules such as PcGive, STAMP, etc. OxMetrics Enterprisetm is a single product that includes all the important components: OxMetrics desktop, G@RCH, Ox Professional, PcGive and STAMP. To install Ox Console, please download the le [Link] zcai/installation Ox G@[Link] and follow the steps.

iii

Data Analysis and Graphics Using R An Introduction (109 pages)

I encourage you to download the le [Link] (109 pages) which can be downloaded from [Link] zcai/[Link] and learn it by yourself. Please see me if any questions.

Contents
1 Review of Multiple Regression Models 1.1 Least Squared Estimation . . . . . . . 1.2 Model Diagnostics . . . . . . . . . . . 1.2.1 Box-Cox Transformation . . . . 1.2.2 Reading Materials . . . . . . . 1.3 Computer Codes . . . . . . . . . . . . 1.4 References . . . . . . . . . . . . . . . . 1 1 3 3 4 4 11 12 12 13 16 17 19 20 22 22 23 24 32 36 38 38 47 48 48 51 53 53 57 57 58

. . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Classical and Modern Model Selection Methods 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 2.2 Subset Approaches . . . . . . . . . . . . . . . . . . 2.3 Sequential Methods . . . . . . . . . . . . . . . . . . 2.4 Likelihood Based-Criteria . . . . . . . . . . . . . . 2.5 Cross-Validation and Generalized Cross-Validation . 2.6 Penalized Methods . . . . . . . . . . . . . . . . . . 2.7 Implementation in R . . . . . . . . . . . . . . . . . 2.7.1 Classical Models . . . . . . . . . . . . . . . 2.7.2 LASSO Type Methods . . . . . . . . . . . . 2.7.3 Example . . . . . . . . . . . . . . . . . . . . 2.8 Computer Codes . . . . . . . . . . . . . . . . . . . 2.9 References . . . . . . . . . . . . . . . . . . . . . . . 3 Regression Models With Correlated Errors 3.1 Methodology . . . . . . . . . . . . . . . . . . 3.2 Nonparametric Models with Correlated Errors 3.3 Predictive Regression Models . . . . . . . . . 3.4 Computer Codes . . . . . . . . . . . . . . . . 3.5 References . . . . . . . . . . . . . . . . . . . . 4 Estimation of Covariance Matrix 4.1 Methodology . . . . . . . . . . . 4.2 Details (see the paper by Zeileis) 4.3 Computer Codes . . . . . . . . . 4.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS 5 Seasonal Time Series Models 5.1 Characteristics of Seasonality . . . . . 5.2 Modeling . . . . . . . . . . . . . . . . . 5.3 Nonlinear Seasonal Time Series Models 5.4 Computer Codes . . . . . . . . . . . . 5.5 References . . . . . . . . . . . . . . . . 6 Robust and Quantile Regressions 6.1 Robust Regression . . . . . . . . 6.2 Quantile Regression . . . . . . . . 6.3 Computer Codes . . . . . . . . . 6.4 References . . . . . . . . . . . . . . . . . . . . . . . . .

v 59 59 62 71 72 78 80 80 81 83 84 86 86 87 87 88 92 94 96 96 101 101 102 103 103 103 107 108 113 114 119 119 119 121 124 125 126 127 133 133 137

. . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 How to Analyze Boston House Price Data? 7.1 Description of Data . . . . . . . . . . . . . . 7.2 Analysis Methods . . . . . . . . . . . . . . . 7.2.1 Linear Models . . . . . . . . . . . . . 7.2.2 Nonparametric Models . . . . . . . . 7.3 Computer Codes . . . . . . . . . . . . . . . 7.4 References . . . . . . . . . . . . . . . . . . .

8 Value at Risk 8.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 R Commends (R Menu) . . . . . . . . . . . . . . . . . . . 8.2.1 Generalized Pareto Distribution . . . . . . . . . . . 8.2.2 Background of the Generalized Pareto Distribution 8.3 Reading Materials I and II (see Handouts) . . . . . . . . . 8.4 New Developments (Nonparametric Approaches) . . . . . . 8.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 8.4.2 Framework . . . . . . . . . . . . . . . . . . . . . . 8.4.3 Nonparametric Estimating Procedures . . . . . . . 8.4.4 Real Examples . . . . . . . . . . . . . . . . . . . . 8.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Long Memory Models and Structural Changes 9.1 Long Memory Models . . . . . . . . . . . . . . . 9.1.1 Methodology . . . . . . . . . . . . . . . 9.1.2 Spectral Density . . . . . . . . . . . . . 9.1.3 Applications . . . . . . . . . . . . . . . . 9.2 Related Problems and New Developments . . . 9.2.1 Long Memory versus Structural Breaks . 9.2.2 Testing for Breaks (Instability) . . . . . 9.2.3 Long Memory versus Trends . . . . . . . 9.3 Computer Codes . . . . . . . . . . . . . . . . . 9.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

List of Tables
2.1 9.1 AICC values for ten models for the recruits series . . . . . . . . . . . . . . . 30

Critical Values of the QLR statistic with 15% Trimming . . . . . . . . . . . 129

List of Figures
1.1 1.2 1.3 1.4 2.1 2.2 Scatterplot with regression line and lowess smoothed curve. . . . . . . . . . Scatterplot with regression line and both lowess and loess smoothed curves as well as the Theil-Sen estimated line. . . . . . . . . . . . . . . . . . . . . . Residual plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (a) Leverage plot. (b) Inuential plot. (c) Plots without observation #34. . . Monthly SOI (left) and simulated recruitment (right) from a model (n=453 months, 1950-1987). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The SOI series (black solid line) compared with a 12 point moving average (red thicker solid line). The left panel: original data and the right panel: ltered series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiple lagged scatterplots showing the relationship between SOI and the present (xt ) versus the lagged values (xt+h ) at lags 1 h 16. . . . . . . . . Autocorrelation functions of SOI and recruitment and cross correlation function between SOI and recruitment. . . . . . . . . . . . . . . . . . . . . . . . Multiple lagged scatterplots showing the relationship between the SOI at time t + h, say xt+h (x-axis) versus recruits at time t, say yt (y-axis), 0 h 15. Multiple lagged scatterplots showing the relationship between the SOI at time t, say xt (x-axis) versus recruits at time t + h, say yt+h (y-axis), 0 h 15. Partial autocorrelation functions for the SOI (left panel) and the recruits (right panel) series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ACF of residuals of AR(1) for SOI (left panel) and the plot of AIC and AICC values (right panel). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quarterly earnings for Johnson & Johnson (4th quarter, 1970 to 1st quarter, 1980, left panel) with log transformed earnings (right panel). . . . . . . . . . Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the detrended log J&J earnings series (top two panels)and the tted ARIMA(0, 0, 0) (1, 0, 0)4 residuals. . . . . . . . . . . . . . . . . . . . . . . Time plots of U.S. weekly interest rates (in percentages) from January 5, 1962 to September 10, 1999. The solid line (black) is the Treasury 1-year constant maturity rate and the dashed line the Treasury 3-year constant maturity rate (red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 10 10 25

26 27 28 28 29 30 31 41

2.3 2.4 2.5 2.6 2.7 2.8 3.1 3.2

3.3

vii

LIST OF FIGURES 3.4 Scatterplots of U.S. weekly interest rates from January 5, 1962 to September 10, 1999: the left panel is 3-year rate versus 1-year rate, and the right panel is changes in 3-year rate versus changes in 1-year rate. . . . . . . . . . . . . Residual series of linear regression Model I for two U.S. weekly interest rates: the left panel is time plot and the right panel is ACF. . . . . . . . . . . . . . Time plots of the change series of U.S. weekly interest rates from January 12, 1962 to September 10, 1999: changes in the Treasury 1-year constant maturity rate are in denoted by black solid line, and changes in the Treasury 3-year constant maturity rate are indicated by red dashed line. . . . . . . . . . . . . Residual series of the linear regression models: Model II (top) and Model III (bottom) for two change series of U.S. weekly interest rates: time plot (left) and ACF (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

viii

43 43

3.5 3.6

3.7

5.1 5.2 5.3

5.4

5.5

5.6

5.7

US Retail Sales Data from 1967-2000. . . . . . . . . . . . . . . . . . . . . . . 60 Four-weekly advertising expenditures on radio and television in The Netherlands, 1978.01 1994.13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Number of live births 1948(1) 1979(1) and residuals from models with a rst dierence, a rst dierence and a seasonal dierence of order 12 and a tted ARIMA(0, 1, 1) (0, 1, 1)12 model. . . . . . . . . . . . . . . . . . . . . . . . 64 Autocorrelation functions and partial autocorrelation functions for the birth series (top two panels), the rst dierence (second two panels) an ARIMA(0, 1, 0) (0, 1, 1)12 model (third two panels) and an ARIMA(0, 1, 1) (0, 1, 1)12 model (last two panels). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the log J&J earnings series (top two panels), the rst dierence (second two panels), ARIMA(0, 1, 0) (1, 0, 0)4 model (third two panels), and ARIMA(0, 1, 1) (1, 0, 0)4 model (last two panels). . . . . . . . . . . . . . . 68 ACF and PACF for ARIMA(0, 1, 1)(0, 1, 1)4 model (top two panels) and the residual plots of ARIMA(0, 1, 1)(1, 0, 0)4 (left bottom panel) and ARIMA(0, 1, 1) (0, 1, 1)4 model (right bottom panel). . . . . . . . . . . . . . . . . . . . . . . 69 Monthly simple return of CRSP Decile 1 index from January 1960 to December 2003: Time series plot of the simple return (left top panel), time series plot of the simple return after adjusting for January eect (right top panel), the ACF of the simple return (left bottom panel), and the ACF of the adjusted simple return. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Dierent loss functions for Quadratic, Hubers c (), c,0 (), 0.05 (), LAD and 0.95 (), where c = 1.345. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The results from model (7.1). . . . . . . . . . . . . . . . . . . . . . . . . . . (a) Residual plot for model (7.1). (b) Plot of g1 (x6 ) versus x6 . (c) Residual plot for model (7.2). (d) Density estimate of Y . . . . . . . . . . . . . . . . . Boston Housing Price Data: Displayed in (a)-(d) are the scatter plots of the house price versus the covariates X13 , X6 , X1 and X1 , respectively. . . . . . 82 89 90 91

6.1 7.1 7.2 7.3

LIST OF FIGURES 7.4 Boston Housing Price Data: The plots of the estimated coecient functions for three quantiles = 0.05 (solid line) for model (7.4), = 0.50 (dashed line), and = 0.95 (dotted line), and the mean regression (dot-dashed line): a0, (u) and a0 (u) versus u in (a), a1, (u) and a1 (u) versus u in (b), and a2, (u) and a2 (u) versus u in (c). The thick dashed lines indicate the 95% point-wise condence interval for the median estimate with the bias ignored. . . . . . .

8.1 8.2

(a) 5% CVaR estimate for DJI index. (b) 5% CES estimate for DJI index. . 114 (a) 5% CVaR estimates for IBM stock returns. (b) 5% CES estimates for IBM stock returns index. (c) 5% CVaR estimates for three dierent values of lagged negative IBM returns (0.275, 0.025, 0.325). (d) 5% CVaR estimates for three dierent values of lagged negative DJI returns (0.225, 0.025, 0.425). (e) 5% CES estimates for three dierent values of lagged negative IBM returns (0.275, 0.025, 0.325). (f) 5% CES estimates for three dierent values of lagged negative DJI returns (0.225, 0.025, 0.425). . . . . . . . . . . . . . . 115 Sample autocorrelation function of the absolute series of daily simple returns for the CRSP value-weighted (left top panel) and equal-weighted (right top panel) indexes. Sample partial autocorrelation function of the absolute series of daily simple returns for the CRSP value-weighted (left middle panel) and equal-weighted (right middle panel) indexes. The log smoothed spectral density estimation of the absolute series of daily simple returns for the CRSP value-weighted (left bottom panel) and equal-weighted (right bottom panel) indexes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Break testing results for the Nile River data: (a) Plot of F -statistics. (b) The scatterplot with the breakpoint. (c) Plot of the empirical uctuation process with linear boundaries. (d) Plot of the empirical uctuation process with alternative boundaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Break testing results for the oil price data: (a) Plot of F -statistics. (b) Scatterplot with the breakpoint. (c) Plot of the empirical uctuation process with linear boundaries. (d) Plot of the empirical uctuation process with alternative boundaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Break testing results for the consumer price index data: (a) Plot of F statistics. (b) Scatterplot with the breakpoint. (c) Plot of the empirical uctuation process with linear boundaries. (d) Plot of the empirical uctuation process with alternative boundaries. . . . . . . . . . . . . . . . . . . . .

9.1

126

9.2

130

9.3

131

9.4

132

Chapter 1 Review of Multiple Regression Models

1.1 Least Squared Estimation

We begin our discussion of univariate and multivariate regression (time series) models by considering the idea of a simple regression model, which we have met before in other contexts such as statistics or econometrics courses. All of the multivariate methods follow, in some sense, from the ideas involved in simple univariate linear regression. In this case, we assume that there is some collection of xed known functions of time, say zt1 , zt2 , . . . , ztq that are inuencing our output yt which we know to be random. We express this relation between the inputs (predictors or independent variables or covariates or exogenous variables) and outputs (dependent or response variable) as yt = 1 zt1 + 2 zt2 + + q ztq + et (1.1)

coecients and et is a random error or noise, assumed to be white noise; this means that

at the time points t = 1, 2, , n, where 1 , , q are unknown xed regression

the observations have zero means, equal variances 2 and are (uncorrelated) independent. We traditionally assume also that the white noise series, et , is Gaussian or normally as 1 zt1 + 2 zt2 + + q ztq . distributed. Finally, of course, the basic assumption is E(yt |zt1 , , ztq ) is a linear function Question: If at leat one of those (four) assumptions is violated, what should we do? The linear regression model described by (1.1) can be conveniently written in slightly more general matrix notation by dening the column vectors zt = (zt1 , . . . , ztq )T and = (1 , . . . , q )T so that we write (1.1) in the alternate form yt = T zt + et . 1 (1.2)

CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS

To nd estimators for and 2 , it is natural to determine the coecient vector minimizing the sum of squared errors (SSE)
n 2 t=1 et

with respect to . Of course, if the distribution

of {et } is known, it should be to maximize the likelihood. This yields least squares (LSE) or maximum likelihood estimator (MLE) and the maximum likelihood estimator for 2
n

which is proportional to the unbiased 1 = nq

t=1

yt zt

(1.3)

Note that the LSE is exactly same as the MLE when the distribution of {et } is normal (WHY?). An alternate way of writing the model (1.2) is as y = Z + e, (1.4)

where ZT = (z1 , z2 , , zn ) is a q n matrix composed of the values of the input variables with the errors stacked in the vector e = (e1 , e2 , , en )T . The ordinary least squares ZT Z = Z y. You need not be concerned as to how the above equation is solved in practice as all computer packages have ecient software for inverting the q q matrix ZT Z to obtain = ZT Z
1

at the observed time points and yT = (y1 , y2 , , yn ) is the vector of observed outputs

estimators are the solutions to the normal equations

Z y.

(1.5)

An important quantity that all software produces is a measure of uncertainty for the estimated regression coecients, say Cov = 2 ZT Z
1

2 C 2 (cij ).

(1.6)

Then, Cov(i , j ) = 2 cij and a 100(1 )% condence interval for i is i tdf (/2) freedom. What is the df for our case? cii , (1.7)

where tdf (/2) denotes the upper 100(1 )% point on a t distribution with df degrees of

Question: If at leat one of those (four) assumptions is violated, are equations (1.6) and (1.7) still true? If not, how to x or modify them?

CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS

It seems that it is VERY IMPORTANT to make sure that all assumptions are satised. The question is how to do this task. Well, that is what called model checking or model diagnostics, discussed in the next section.

1.2
1.2.1

Model Diagnostics
Box-Cox Transformation

If the distribution of the error is not normal, a naive way to deal with this problem is to use a transformation to the dependent variable. A simple and easy way to use a transformation is the Box-Cox transformation in a regression model. When the errors are heterogeneous and often non-Gaussian, a Box-Cox power transformation on the dependent variable is a useful method to alleviate heteroscedasticity when the distribution of the dependent variable is not known. For situations in which the dependent variable Y is known to be positive, the following transformation can be used: Y =

if = 0, log(Y ), if = 0.

Y 1 ,

Given the vector of data observations {Yi }n , one way to select the power is to use the i=1 that maximizes the logarithm of the likelihood function 1 f () = log Sn (Y ) ( 1) 2
n

log(Yi ),
i=1

function f ().

can take to be one of 2 + j (0 j 4/) to maximize the logarithm of the likelihood

where Sn (Y ) is the sample standard variance of the transformed data {Yi }. Generally, we

Note that the Box-Cox transformation approach can be applied to handle the case when any covariate is nonlinear. For this case, you need to do a transformation to each individual covariate. When you do the Box-Cox transformation to covariates, you need to minimize the SSE instead of the likelihood by running a regression. There is no build-in function in any statistical package so that when implementing the Box-Cox transformation, you have to write your won code. Another way to choose is to use the Q-Q plot. The command in R for the Q-Q plot is qqnorm() for the Q-Q normal plot or qqplot() for the Q-Q plot of one versus another.

CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS

1.2.2

Reading Materials

See materials from le [Link]. Also, le [Link] contains references about how to use R.

1.3

Computer Codes

To t a multiple regression in R, one can use lm() or glm(); see the followings for details lm(formula, data, subset, weights, [Link], method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, [Link] = TRUE, contrasts = NULL, offset, ...) glm(formula, family = gaussian, data, weights, subset, [Link], start = NULL, etastart, mustart, offset, control = [Link](...), model = TRUE, method = "[Link]", x = FALSE, y = TRUE, contrasts = NULL, ...) to t a regression model without intercept, you need to use fit1=lm(y~-1+x1+...x9) where t1 is called the objective function containing all outputs you need. If you want to model diagnostic checking, you need to use plot(fit1) For multivariate data, it is usually a good idea to view the data as a whole using the pairwise scatter plots generated by the pairs() function: pairs(data) The smoothing parameter in lowess can be adjusted using the f =argument. The default is f = 2/3 meaning that 2/3 of the data points are used in calculating the tted value at each point. If we set f = 1 we get the ordinary regression line. Specifying f = 1/3 should yield a choppier curve. lowess(x, y = NULL, f = 2/3, iter = 3, delta = 0.01 * diff(range(xy$x[o])))

CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS

Theres a second loess function in R, loess, that has more options and generates more output. loess(formula, data, weights, subset, [Link], model = FALSE, span = 0.75, [Link], degree = 2, parametric = FALSE, [Link] = FALSE, normalize = TRUE, family = c("gaussian", "symmetric"), method = c("loess", "[Link]"), control = [Link](...), ...)

Example 1.1: We examine tting a simple linear regression model to data using R. The data are from Harder and Thompson (1989). As part of a study to investigate reproductive strategies in plants, these biologists recorded the time spent at sources of pollen and the proportions of pollen removed by bumblebee queens and honeybee workers pollinating a species of Erythronium lily. The data set consists of three variables. (1) removed: proportion of pollen removed duration; (2) duration of visit in seconds; and (3) code: 1=bumblebee queens, 2=honeybee workers. The response variable of interest is removed, the predictor is duration. Because removed is a proportion, it oers special challenges for analysis. We will ignore these issues for the moment. We will also look only at the queen bumblebees. I will leave consideration of the full collection of bees as a homework exercise. The following R code is for Example 1.1, can be found in the le ex1-1.r. # 10-12-2006 [Link]() beepollen<-[Link](c:/res-teach/xiamen12-06/data/[Link],header=T) #beepollen queens<-beepollen[beepollen[,3]==1,] attach(queens) #print(names(queens)) postscript(file="c:\\res-teach\\xiamen12-06\\figs\\[Link]",

CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS horizontal=F,width=6,height=6) #[Link]() par(bg="light yellow") plot(duration,removed) [Link]<-lm(removed~duration) [Link] print(anova([Link])) print(summary([Link])) print(names([Link])) abline([Link],lty=1,lwd=3) lowess(duration,removed) lines(lowess(duration,removed),lty=2,col=2,lwd=2) legend(35,.25,c(regression,loess),lty=c(1,2), col=c(1,2),lwd=c(2,2),bty="n") [Link]() postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) #[Link]() par(bg="light grey") plot(duration,removed) abline([Link],lty=1,lwd=3) lines(lowess(duration,removed,f=1/3),lty=2,col=6,lwd=2) [Link]<-loess(removed~duration,family="symmetric") print(names([Link])) lines(sort([Link]$x),[Link]$fitted[order([Link]$x)], col=4,lty=2,lwd=2) tsp1reg<-function(x,y){ # # Compute the Theil-Sen regression estimator. # Only a single predictor is allowed in this version

CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS # temp<-matrix(c(x,y),ncol=2) temp<-elimna(temp) x<-temp[,1] y<-temp[,2] ord<-order(x) xs<-x[ord] ys<-y[ord] vec1<-outer(ys,ys,"-") vec2<-outer(xs,xs,"-") v1<-vec1[vec2>0] v2<-vec2[vec2>0] slope<-median(v1/v2) coef<-0 coef[1]<-median(y)-slope*median(x) coef[2]<-slope res<-y-slope*x-coef[1] list(coefficients=coef,residuals=res) } elimna<-function(m){ # # remove any rows of data having missing values # ikeep<-c(1:nrow(m)) for(i in 1:nrow(m))if(sum([Link](m[i,])>=1))ikeep[i]<-0 elimna<-m[ikeep[ikeep>=1],] elimna } [Link]<-tsp1reg(duration,removed) abline(reg=[Link],col=2,lty=1,lwd=4) # Remove any pairs with missing values

CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS [Link]() ######################################################## postscript(file="c:\\res-teach\\xiamen12-06\\figs\\[Link]", horizontal=F,width=6,height=6) #[Link]() par(mfrow=c(2,2),mex=0.4,bg="light pink") plot([Link]) [Link]() postscript(file="c:\\res-teach\\xiamen12-06\\figs\\[Link]", horizontal=F,width=6,height=6) #[Link]() par(mfrow=c(2,2),mex=0.4,bg="light blue") h<-hatvalues([Link]) plot(h,type=h) abline(h=c(2,3)*mean(h),col=4,lty=2) plot(c(0,.6),c(-3,2),xlab=leverage, ylab=studentized residual,type=n) cook<-sqrt([Link]([Link])) points(hatvalues([Link]),rstudent([Link]), cex=10*cook/max(cook)) abline(h=c(-2,0,2),lty=2,col=2) abline(v=c(2,3)*mean(hatvalues([Link])),lty=2,col=4) #identify(hatvalues([Link]),rstudent([Link])) # take long time to do this bee.reg1<-update([Link],subset=-34) bee.reg2<-update([Link],subset=-c(14,34)) plot(duration,removed) abline([Link],col=1,lty=1,lwd=3) abline(bee.reg1,col=2,lty=2,lwd=3)

CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS abline(bee.reg2,col=3,lty=2,lwd=3) abline(reg=[Link],col=4,lty=1,lwd=3) legend(25,.3,c(all observations,without #34,without #14, #34, Theil-Sen),lty=c(1,2,2,1),col=1:4,lwd=rep(2,4),bty=n) [Link]() print(c(I am done))

0.1 0.2 0.3 0.4 0.5 0.6 0.7

removed

regression loess

30 duration

Figure 1.1: Scatterplot with regression line and lowess smoothed curve.

0.1 0.2 0.3 0.4 0.5 0.6 0.7

removed

30 duration

Figure 1.2: Scatterplot with regression line and both lowess and loess smoothed curves as well as the Theil-Sen estimated line.

CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS

Residuals vs Fitted

Normal QQ

Residuals 0.1 0.1 0.2 0.3

35 28

Standardized residuals 1 0 1

0.3

2
34

0.3

0.4

0.5 0.6 Fitted values

0.7

1 0 1 Theoretical Quantiles

ScaleLocation

Residuals vs Leverage

1.5

Standardized residuals 0.5 1.0

Standardized residuals 1 0 1

1 0.5

0.5 14 1 1

0.0

Cooks distance 0.4 0.5 0.6 Fitted values 0.7 0.0 0.1 0.2 0.3 Leverage 0.4

0.3

0.5

Figure 1.3: Residual plots.

0.5

0.1

15 20 Index

3
0.0

studentized residual 2 1 0 1

0.2

h 0.3

0.4

0.1

0.2

0.3 0.4 leverage

0.5

0.6

removed 0.1 0.2 0.3 0.4 0.5 0.6 0.7

all observations without #34 without #14, #34 TheilSen 0 10 20 30 duration 40 50 60

Figure 1.4: (a) Leverage plot. (b) Inuential plot. (c) Plots without observation #34.

CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS

1.4

References

Harder, L.D. and J. D. Thompson (1989). Evolutionary options for maximizing pollen dispersal of animal-pollinated plants. American Naturalist, 133. 323-344.

Chapter 2 Classical and Modern Model Selection Methods

2.1 Introduction

Given a possibly large set of potential predictors, which ones do we include in our model? Suppose X1 , X2 , is a pool of potential predictors. The model with all predictors is given by Y = 0 + 1 X1 + 2 X2 + + , is the most general model. It holds even if some of the individual j s are zero. But if some j s zero or close to zero, it is better to omit those Xj s from the model. Reasons why you should omit variables whose coecients are close to zero: (a) Parsimony principle: Given two models that perform equally well in terms of prediction, one should choose the model that is more parsimonious (simple). (b) Prediction principle: The model should give predictions that are as accurate as possible, not just for current observation, but for future observations as well. Including unnecessary predictors can apparently improve prediction for the current data, but can harm prediction for future data. Note that the sum of squares error (SSE) never increases as we add more predictors. Next, we discuss all possible methods available in the literature.

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

2.2

Subset Approaches

The all-possible-regressions procedure calls for considering all possible subsets of the pool of potential predictors and identifying for detailed examination a few good sub-sets according to some criterion. The purpose of all-possible-regressions approach is identifying a small group of regression models that are good according to a specied criterion (summary statistic) so that a detailed examination can be made of these models leading to the selection of the nal regression model to be employed. The main problem of this approach is computationally expensive. For example, with 10 predictors, we need to investigate 210 = 1024 potential regression models. With the aid of modern computing power, this computation is possible. But still the number of 1024 possible models to examine carefully would be an overwhelming task for a data analyst. Dierent criteria for comparing the regression models may be used with the all-possibleregressions selection procedure. We discuss several summary statistics:
2 (i) Rp (or SSEp ) 2 (ii) Radj;p (or MSEp )

(iii) Cp (iv) PRESSp (v) Sequential Methods (vi) AIC type criteria We shall denote the number of all potential predictors in the pool by K 1. Hence including

an intercept parameter 0 , we have K potential parameters. The number of predictors in a function for this subset of predictors. Thus we have 1 p K. Now, we discuss each one in detail.
2 1. Rp (or SSEp ) 2 Rp indicates that there are p parameters (or, p 1 predictors) in a regression model.

subset will be denoted by p 1, as always, so that there are p parameters in the regression

2 The coecient of multiple determination Rp is dened as 2 Rp = 1

SSEp . SSTO

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS It measures the proportion of variance of Y explained by p 1 predictors.
2 Rp always goes up as we add more predictors.

2 Rp varies inversely with SSEp because SSTO is constant for all possible regression

2 models. That is, choosing the model with the largest Rp is equivalent to choosing the

model with smallest SSEp .

2 Rp might not be a good criterion. WHY? 2 2. Radj;p (or MSEp ) 2 2 One often considers models with a large Rp value. However, Rp always increases with

the number of predictors. Hence it can not be used to compare models with dierent
2 sizes. The adjusted coecient of multiple determination Radj;p has been suggested as

an alternative criterion:
2 Radj;p = 1

SSEp /(n p) =1 SSTO/(n 1)

n1 np

MSEp SSEp =1 . SSTO SSTO/(n 1)

2 2 It is like Rp but with a penalty for adding unnecessary variables. Radj;p can go down

when a useless predictor is added. It can be even negative.

2 Radj;p varies inversely with MSEp because SSTO/(n 1) is constant for all possible 2 regression models. That is, choosing the model with the largest Radj;p is equivalent to

choosing the model with smallest MSEp .

2 2 Rp is useful when comparing models of the same size, while Radj;p (or Cp ) is used to

compare models with dierent sizes.

2 2 Radj:p is better than Rp .

3. Mallows Cp The Mallows Cp is concerned with the total mean squared error of the n tted values for each subset regression model. The mean squared error concept involves the total error in each tted value: Yi i = Yi E(Yi ) + E(Yi ) i ,
random error bias

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

where i is the true mean response at ith observation. The means squared error for Yi is dened as the expected value of the square of the total error in the above. It can be shown that MSE(Yi ) = E (Yi i )2 = Var(Yi ) + Bias(Yi ) the sum over the observation i:
n 2

where Bias(Yi ) = E(Yi ) i . The total mean square error for all n tted values Yi is
n n 2

MSE(Yi ) =
i=1 i=1

Var(Yi ) +
i=1

Bias(Yi )

It can be shown that

n n

Var(Yi ) = p
i=1

and
i=1

Bias(Yi )

2 = (n p)[E(Sp ) 2 ],

2 where Sp is the MSE from the current model. Using this, we have n 2 MSE(Yi ) = p 2 + (n p)[E(Sp ) 2 ],

(2.1)

i=1

Dividing (2.1) by 2 , we make it scale-free:

n 2 E(Sp ) 2 MSE(Yi ) = p + (n p) , 2 2

i=1

2 2 If the model does not t well, then Sp is a biased estimate of 2 . We can estimate E(Sp )

by MSEp and estimate 2 by the MSE from the maximal model (the largest model we can consider), i.e., 2 = MSEK1 = MSE(X1 , . . . , XK1 ). Using the estimators for
2 E(Sp ) and 2 gives

Cp = p + (n p)

SSEp MSEp MSE(X1 , . . . , XK1 ) = (n 2p). MSE(X1 , . . . , XK1 ) MSE(X1 , . . . , XK1 )

Small Cp is a good thing. A small value of Cp indicates that the model is relatively future responses. This precision will not improve much by adding more predictors. Look for models with small Cp .

precise (has small variance) in estimating the true regression coecients and predicting

If we have enough predictors in the regression model so that all the signicant predictors are included, then MSEp MSE(X1 , . . . , XK1 ) and it follows that Cp p.

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

Thus Cp close to p is evidence that the predictors in the pool of potential predictors (X1 , . . . , XK1 ) but not in the current model, are not important. Models with considerable lack of t have values of Cp larger than p. The Cp can be used to compare models with dierent sizes. If we use all the potential predictors, then Cp = K. 4. PRESSp The PRESS (prediction sum of squares) is dened as
n

PRESS =
i=1

2 , (i)

where (i) is called PRESS (prediction sum of squares) residual for the the ith observation, similar to the jackknife residual. The PRESS residual is dened as (i) = Yi Y(i) , PRESSp t well in the sense of having small prediction errors. PRESSp can be calculated without tting the model n times, each time deleting one of the n cases. One can show that (i) = i , 1 hii where Y(i) is the tted value obtained by leaving the ith observation. Models with small

where hii is the ith diagonal element of H = X(X T X)1 X T (hat matrix).

2.3

Sequential Methods

1. Forward selection (i) Start with the null model. (ii) Add the signicant variable if p-value is less than penter , (equivalently, F is larger than Fenter ). (iii) Continue until no more variables enter the model. 2. Backward elimination

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS (i) Start with the full model.

(ii) Eliminate the least signicant variable whose p-value is larger than premove , (equivalently, F is smaller than Fremove ). (iii) Continue until no more variables can be discarded from the model. 3. Stepwise selection (i) Start with any model. (ii) Check each predictor that is currently in the model. Suppose the current model contains X1 , . . . , Xk . Then F statistic for Xi is F = SSE(X1 , . . . , Xi1 , Xi+1 , . . . , Xk ) SSE(X1 , . . . , Xk ) F (1; n k 1). MSE(X1 , . . . , Xk )

Eliminate the least signicant variable whose p-value is larger than premove , (equivalently, F is smaller than Fremove ). (iii) Continue until no more variables can be discarded from the model. (iv) Add the signicant variable if p-value is less than penter , (equivalently, F is larger than Fenter ). (v) Go to step (ii) (vi) Repeat until no more predictors can be entered and no more can be discarded.

2.4

Likelihood Based-Criteria

The following is based on Akaikes approach Akaike (1973) and subsequent papers; see also recent book by Burnham and Anderson (2003). Suppose that f (y) : true model (unknown) giving rise to data ( is a vector of data) and g(y, ) : candidate model (parameter vector). Want to nd a model g(y, ) close to f (y). The Kullback-Leibler discrepancy (K-L distance): K(f, g) = Ef log f (Y ) g(Y, ) .

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

This is a measure of how far model g is from model f (with reference to model f ). Properties: K(f, g) 0 K(f, g) = 0 f () = g().

Of course, we can never know how far our model g is from f . But Akaike (1973) showed that we might be able to estimate something almost as good. Suppose we have two models under consideration: g(y, ) and h(y, ). Akaike (1973) showed that we can estimate K(f, g)K(f, h). It turns out that the dierence of maximized log-likelihoods, corrected for a bias, estimates the dierence of K-L distances. The maximized likelihoods are, Lg (y, ) and Lh (y, ), where and are the ML estimates of the parameters. approaches zero as sample size increases) of K(f, g) K(f, h). Here q is the number of Akaikes result: [log(Lg ) q] [log(Lh ) r] is an asymptotically unbiased estimate (i.e. bias

parameters estimated in (model g) and r is the number of parameters estimated in (model h). The price of parameters: the likelihoods in the above expression are penalized by the number of parameters. The Akaike Information Criterion (AIC) for model g: AIC = 2 log(Lg ) + 2 q. dened by AICC = AIC + 2 q(q + 1)/(n q 1) = 2 log(Lg ) + 2 qn/(n q 1). (2.3) (2.2)

A biased correction version of AIC was proposed by Hurvich and Tsai (1989), called AICC ,

The dierence between AIC and AICC is the penalty term. Intuitively, one can think of 2qn/(n q 1) in (2.3) as a penalty term to discourage over-parameterization. Shibata (1976) suggested that the AIC has a tendency to overestimate parameter q. By comparing the penalty terms in (2.2) and (2.3), we can see that the factors, 2qn/(n q 1) and 2q, for

however has more extreme penalty for larger-order models which counteracts the over-tting tendency of the AIC. Another approach is given by the much older notion of Bayesian statistics. In the Bayesian approach, we assume that a priori uncertainty about the value of model parameters is represented by a prior distribution. Upon observing the data, this prior is updated, yielding a

the AICC and AIC statistics are asymptotically equivalent as n . The AICC statistic

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

posterior distribution. In order to make inferences about the model (rather than its parameters), we integrate across the posterior distribution. Under the assumption that all models are a priori equally likely (because the Bayesian approach requires model priors as well as parameter priors), Bayesian model selection chooses the model with highest marginal likelihood. The ratio of two marginal likelihoods is called a Bayes factor (BF), which is a widely used method of model selection in Bayesian inference. The two integrals in the Bayes factor are nontrivial to compute unless they form a conjugated family. Monte Carlo methods are usually required to compute BF, especially for highly parameterized models. A large sample approximation of BF yields the easily-computable Bayesian information criterion (BIC) BIC = 2 log(Lg ) + q log n. In a sum, both AIC and BIC as well as their generalizations have a similar form as LC = 2 log(Lg ) + q, (2.5) (2.4)

where is xed constant. From (2.2), (2.3), and (2.4), we can see that the BIC statistic has much more penalty when n is large than AIC and AICC to overcome the over-tting. Also, from (2.5), it is easy to see that LC includes AIC, AICC and BIC as a special case. The recent developments suggest the use of a data adaptive penalty to replace the xed penalties . See, Bai, Rao and Wu (1999) and Shen and Ye (2002). That is to estimate by data in a complexity form based on a concept of generalized degree of freedom.

2.5

Cross-Validation and Generalized Cross-Validation

The cross validation (CV) is the most commonly used method for model assessment and selection. The main idea is a direct estimate of extra-sample error. The general version of CV is to split data into K roughly equal-sized parts and to t the model to the other K-1 parts and calculate prediction error on the remaining part.
n

CV =
i=1

(Yi Yi )2

(2.6)

where Yi is the tted value computed with i-th part of data removed and Yi Yi is the jackknife residual.

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

A convenient approximation to CV for linear tting with squared error loss is generalized cross validation (GCV). A linear tting method has the following property: Y = S Y , where Yi is the tted value with the whole data and S = (sij )nn is the smoothing (hat) matrix. For many linear tting methods with leave-one-out (k = 1), it can be showed easily that
n n 2

CV =
i=1

(Yi Yi )2 =

i=1

Yi Yi 1 sii
n i=1 (Yi

Due to the intensive computation, the CV can be approximated by the GCV, dened by
n

GCV =
i=1

Yi Yi 1 trace(S)/n

Yi )2 . (1 trace(S)/n)2

(2.7)

It has been shown that both the CV and GCV methods are very appalling to nonparametric modeling; see the book by Hastie and Tishirani (1990). It follows from (2.7) that
n

GCV where 2 =

i=1

(Yi Yi )2 (1 + trace(S)/n)2 2 SSE/ 2 + 2 trace(S) ,

(2.8)

n i=1 (Yi

totically equivalent to AIC since trace(S) = q.

Yi )2 /n. Therefore, under the normality assumption, GCV is asymp-

Recently, the leave-one-out cross-validation method was challenged by Shao (1993). Shao (1993) claimed that the popular leave-one-out cross-validation method, which is asymptotically equivalent to many other model selection methods such as the AIC, the Cp , and the bootstrap, is asymptotically inconsistent in the sense that the probability of selecting the model with the best predictive ability does not converge to 1 as the total number of observations n and he showed that the inconsistency of the leave-one-out cross-validation reserved for validation, satisfying n /n 1 as n . can be rectied by using a leave-n -out cross-validation with n , the number of observations

2.6

Penalized Methods
Frank and Friedman (1993) proposed the Lq (q > 0) penalized least squares as
n

1. Bridge and Ridge:

i=1

(Yi

j Xij )2 +
j j

|j |q ,

which results in the estimator which is called the bridge estimator. If q = 2, the resulting estimator is called the ridge estimator given by = (X T X + I)1 X T Y .

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 2. LASSO:

Tibshirani (1996) proposed the so-called the least absolute shrinkage and selection operator (LASSO) which is the minimizer of the following constrained least squares
n

i=1

(Yi

j Xij )2 +
j j

|j |,

0 0 which results in the soft threshing rule j = sign(j )(|j | )+ .

3. Non-concave Penalized LS: Fan and Li (2001) proposed the non-concave penalized least squares
n

i=1

(Yi

j Xij )2 +
j j

p (|j |),

where the hard threshing penalty function p (||) = 2 (|| )2 |(|| < ), which

0 0 results in the hard threshing rule j = j I(|j | > ). Finally, Fan and Li (2001)

proposed the so-called the smoothly clipped absolute deviation (SCAD) model selection criterion with the penalized function dened as p () = I( ) (a )+ I( > ) (a 1) for some a > 2 and > 0,

which results in the estimator 0 0 0 sign(j )(|j | )+ when |j | 2, 0 0 0 (a 1)j sign(j ) a /(a 2) when 2 |j | a , j = 0 0 when |j | > a . j

Also, Fan and Li (2001) showed that the SCAD estimator satises three properties: (1)

unbiasedness for large true coecient to avoid unnecessary estimation bias; (2) sparsity of estimating a small coecient as 0 to reduce model complexity; and (3) continuity of resulting estimator to avoid unnecessary variation in model prediction. Fan and Peng (2004) considered the case that the number of regressors can depend on the sample size and goes to innity in a certain rate. To choose the appropriate tuning parameters a and in the SCAD penalty function, the CV or GCV or BIC method can be used; see Fan and Li (2001), Fan and Peng (204) and Wang, Li and Tsai (2007). This approach can be generalized to a more general setting, called penalized likelihood which can be applied to other purposes, say to select copulas in modeling dependent structure

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

of multivariate nancial variables; see Cai, Chen, Fan and Wang (2007),1 and Cai and Wang (2007).2 Also, it can be applied to quantile regression settings.

2.7
2.7.1

Implementation in R
Classical Models

To t a multiple regression in R, one can use lm() or glm(); see the followings for details lm(formula, data, subset, weights, [Link], method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, [Link] = TRUE, contrasts = NULL, offset, ...) glm(formula, family = gaussian, data, weights, subset, [Link], start = NULL, etastart, mustart, offset, control = [Link](...), model = TRUE, method = "[Link]", x = FALSE, y = TRUE, contrasts = NULL, ...) to t a regression model without intercept, you need to use fit1=lm(y~-1+x1+...x9) where t1 is called the objective function containing all outputs you need. If you want to model diagnostic checking, you need to use plot(fit1) For multivariate data, it is usually a good idea to view the data as a whole using the pairwise scatter plots generated by the pairs() function: pairs(data) To drop or add one variable from or to a regression model, you use the command drop1() or add1(), for example, drop1(fit1) add1(fit1,~x10+x11+...+x20)
1 Cai, Z., X. Chen, Y. Fan and X. Wang (2007). Selection of Copulas with Applications in Finance. Working paper. 2 Cai, Z. and X. Wang (2007). Selection of Mixture Distributions for Time Series. Working paper.

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

The last command means that you choose the best one from X10 to X20 to add it into the model. Adding and dropping terms using add1() and drop1() is useful method for selecting a model when only a few terms are involved, but it can quickly become tedious. Functions add1() and drop1() are based on the Cp criterion. The step() function provides an automatic procedure for conducting stepwise model selection. The step() function requires an initial model, often constructed explicitly as an intercept-only model. For example, suppose that we want to nd the best model involving X1 , , X10 , we could create an intercept-only model and then call step() as follows: fit0=lm(y~1) fit2=step(fit0,~x1+x2+...+x10, trace=F) step(object, scope, scale = 0, direction = c("both", "backward", "forward"), trace = 1, keep = NULL, steps = 1000, k = 2, ...) With trace=T or trace=1, step() displays the output of each step of the selection process. The step() function is based on AIC or BIC by specifying k in the function. Also, one can use the function stepAIC() in the package MASS for a wider range of object classes.

2.7.2

LASSO Type Methods

The package lasso2 provides many features for solving regression problems while imposing L1 constraints on the estimates and the package lars provides ecient procedures for an entire LASSO with the cost of a single least squares. LAR is the least angle regression, proposed by Efron, Hastie, Johnstone and Tibshirani (2004). In lars, you can use the function lars() lars(x, y, type = c("lasso", "lar", "[Link]"),trace = FALSE, Gram, eps = .Machine$[Link], [Link], [Link] = TRUE) and the function [Link]() to compute the K-fold cross-validated mean squared prediction error for least angle regressions (lars), LASSO, or forward stagewise. [Link](x, y, K = 10, fraction = seq(from = 0, to = 1, length = 100), trace = FALSE, [Link] = TRUE, se = TRUE, ...)

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

In the package lasso2, the function l1ce() is for regression tting with L1 -constraint on the parameters. l1ce(formula, data = [Link](), weights, subset, [Link], [Link] = ~ 1, x = FALSE, y = FALSE, contrasts = NULL, standardize = TRUE, trace = FALSE, [Link] = double(p), bound = 0.5, absolute.t = FALSE) or the function gl1ce() for tting a generalized regression problem while imposing an L1 constraint on the parameters gl1ce(formula, data = [Link](), weights, subset, [Link], family = gaussian, control = [Link](...), [Link] = ~ 1, x = FALSE, y = TRUE, contrasts = NULL, standardize = TRUE, [Link] = double(p), bound = 0.5, ...) Also, you can use the function gcv() to extracts the generalised cross-validation score(s) from tted model objects. gcv(object, ...)

2.7.3

Example

We begin by introducing several environmental and economic as well as nancial time series to serve as illustrative data for time series methodology. Figure 2.1 shows monthly values of an environmental series called the Southern Oscillation Index (SOI) and associated recruitment (number of new sh) computed from a model by Pierre Kleiber, Southwest Fisheries Center, La Jolla, California. This data set is provided by Shumway (2006). Both series are for a period of 453 months ranging over the years 1950 1987. The SOI measures changes in air pressure that are related to sea surface temperatures in the central Pacic. The central Pacic Ocean warms up every three to seven years due to the El Nio eect which has been n blamed, in particular, for foods in the midwestern portions of the U.S.

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

Both series in Figure 2.1 tend to exhibit repetitive behavior, with regularly repeating (stochastic) cycles that are easily visible. This periodic behavior is of interest because underlying processes of interest may be regular and the rate or frequency of oscillation characterizing the behavior of the underlying series would help to identify them. One can also remark that the cycles of the SOI are repeating at a faster rate than those of the recruitment series. The recruit series also shows several kinds of oscillations, a faster frequency that seems to repeat about every 12 months and a slower frequency that seems to repeat about every 50 months. The study of the kinds of cycles and their strengths will be discussed later. The two series also tend to be somewhat related; it is easy to imagine that somehow the sh population is dependent on the SOI. Perhaps there is even a lagged relation, with the SOI signalling changes in the sh population. The study of the variation in the dierent kinds of cyclical behavior in a time series can be aided by computing the power spectrum which shows the variance as a function of the frequency of oscillation. Comparing the power spectra of the two series would then give valuable information relating to the relative cycles driving each one. One might also want to know whether or not the cyclical variations of a particular frequency in one of the series, say the SOI, are associated with the frequencies in the recruitment series. This would be measured by computing the correlation as a function of frequency, called the coherence. The study of systematic periodic variations in time series is called spectral analysis. See
Southern Oscillation Index Recruit

0.5

0.0

0.5

1.0

100

200

300

400

0
0

100

1.0

100

200

300

400

Figure 2.1: Monthly SOI (left) and simulated recruitment (right) from a model (n=453 months, 1950-1987). Shumway (1988), Shumway (2006), and Shumway and Stoer (2000) for details.

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

We will need a characterization for the kind of stability that is exhibited by the environmental and sh series. One can note that the two series seem to oscillate fairly regularly around central values (0 for SOI and 64 for recruitment). Also, the lengths of the cycles and their orientations relative to each other do not seem to be changing drastically over the time histories. We consider the twelve month moving average aj = 1/12, j = 0, 1, 2, 3, 4, 5,

2.2. It is clear that this lter removes some higher oscillations and produces a smoother
1.0

6 and zero otherwise. The result of applying this lter to the SOI index is shown in Figure

0.5

0.0

1.0

100

200

300

400

0.5
0

0.0

0.5

100

200

300

400

Figure 2.2: The SOI series (black solid line) compared with a 12 point moving average (red thicker solid line). The left panel: original data and the right panel: ltered series. series. In fact, the yearly oscillations have been ltered out (see the right panel in Figure 2.2) and a lower frequency oscillation appears with a cycling rate of about 42 months. This is the so-called El Nio eect that accounts for all kinds of phenomena. This ltering eect n will be examined further later on spectral analysis since it is extremely important to know exactly how one is inuencing the periodic oscillations by ltering. In Figure 2.3, we have made a lagged scatterplot of the SOI series at time t + h against the SOI series at time t and obtained a high correlation, 0.412, between the series xt+12 and The scatterplot shows the direction of the relation which tends to be positive for lags 1, signicant nonlinearities to be present. In order to develop a measure for this self correlation the series xt shifted by 12 years. Lower order lags at t 1, t 2 also show correlation.

2, 11, 12, 13, and tends to be negative for lags 6, 7, 8. The scatterplot can also show no

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

1.0 1.0 1.0 1.0

1.0

Figure 2.3: Multiple lagged scatterplots showing the relationship between SOI and the present (xt ) versus the lagged values (xt+h ) at lags 1 h 16. or autocorrelation, we utilize a sample version of the scaled autocovariance function, say x (h) = x (h)/x (0), where 1 x (h) = n
nh

t=1

(xt+h x)(xt x),

n t=1

which is the sample counterpart with x =

xt /n. Under the assumption that the

underlying process xt is white noise, the approximate standard error of the sample ACF is 1 = . n That is, x (h) is approximately normal with mean 0 and variance 1/n. As an illustration, consider the autocorrelation functions computed for the environmental and recruitment series shown in the top two panels of Figure 2.4. Both of the autocorrelation functions show some evidence of periodic repetition. The ACF of SOI seems to repeat at periods of 12 while the recruitment has a dominant period that repeats at about 12 to 16 time points. Again, the maximum values are well above two standard errors shown as dotted lines above and below the horizontal axis. (2.9)

1.0

0.0

1.0

0.0

1.0

0.0

1.0

0.0

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

1.0

ACF of SOI Index

1.0

ACF of Recruits

0.5

0.0

0.5

0.0
0

0.5

1.0

CCF of SOI and Recruits

0.5

0.0
40

0.5

Figure 2.4: Autocorrelation functions of SOI and recruitment and cross correlation function between SOI and recruitment. In order to examine this possibility, consider the lagged scatterplot matrix shown in Figures 2.5 and 2.6. Figure 2.5 plots the SOI at time t + h, xt+h , versus the recruitment
100 100 100
oo o o oo o ooooooo o o o o o ooooooo o o o o ooo oo o o o oo o o o o o oo o o o o o o o ooo o o o ooo oo oo o oo o o ooo oooo o o o o o o o o ooo o o o oo o o o oooo oooo oo o o o o o o oo ooo o oooo oo o o o o oo oooo oo o o oo o ooo o oo o oo oo oo oo o o o o o o ooo oo o o o oo o o oo ooo o o o o o oo oo o o oo o o o oo o oo o o o oo o oo oo o o o o o o o o o oo o o o o o oo o ooo o o o o o o oo oo oo o o o ooo o o o ooo oo o oo o o oo o o o o oo o oo o o oo oo o o o o o o ooo oo oo o o o o o oo o o o o o o o o oo o o oo o o oooo oo o o o o o o o oo o o oo o o 1.0 0.0 0.5 1.0 o o ooo o ooooo oo o o o o oooo o o o o oo o o o oo o o o o o o ooo oooo o ooo oo o o o oooo ooo oo o oo o o o o o o o oo o o o o oo oo o o oooo oo o o oo o o o o oo oooo o o o oo oooo oo o oo o oo oooo ooo o o o oo o oo o oo o oo o o o o oo ooo o o o oo o o o oo oooo o o o oooo oo o o o oo o o o o o o o oo o oo oo o o o o o o o o o oooo o oo o o o oo o oo o o o ooo oo o o o o o o o oooo o oo o o o oo o oo o o o o o o oo o o oo o oo oo o o oo o o o o oo oo o o o oooo o o o o o o o o o oo o oo o o o o o o o o oo o o oo oo o o o o oo ooo o o o o o o o o oo o o oooo o o o o o o oo o o o o o o 1.0 0.0 0.5 1.0 o oo o o oo o o oooooooooo o oo ooo oooooo oo oo ooo o o o o o ooo o oooo oo o o oo o oooooooo o o o o o o oo oo o o o o o oo o o o o o ooo o o ooo o o oo ooo oo o o o o oo o o oooooooo ooo o oo oooo ooooooo o o oo oo o o o o o o oo o ooo o oo oo o o oo o o ooo ooo oo o o ooo o oo oo o o oo o o oo oo o ooo o o o o o oo o oooooo o o o oo o o o o o oo ooooo o o o oo oo o o o oo ooo oo o o o o o o o o ooooo oo oo o o o oo o oo o oooo o o ooo oooo o ooo o o oo o oo o o o o oo o o oo o o oo o oo o o oo o oo o o o o o o oo o o o o o o o oo o o o o o o o o o o 1.0 0.0 0.5 1.0

0 20

0 20 0 20 0 20 0 20 60 100 60 100 60 100

100

0 20

100

0 20

100

0 20

Figure 2.5: Multiple lagged scatterplots showing the relationship between the SOI at time t + h, say xt+h (x-axis) versus recruits at time t, say yt (y-axis), 0 h 15.

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

apparent in this plots, i.e. future values of SOI are not related to current recruitment. This means that the temperatures are not responding to past recruitment. In Figure 2.6, the It is clear from Figure 2.6 that the series are correlated negatively for lags h = 5, . . . , 9. The
100 100 100
o oo o o oo o oooooooo o o o o ooooooo o o o o ooo o oo o o oo o o o o o oo o o o o o o ooo o o o ooo oo oo o oo o o ooo oooo o o o o o o o o ooo o o o oo o o o oooo oooo oo o o o o o o oo ooo o oooo oo o o o oooo oooo oo o o o oooo o o oo o o o o o oo o o o o oo o o o o o oo o o o o o o o ooo o oo oo o o o o o o oo o o o oo o o oo o o o o o oo oo o oo o o o o ooo o o o o o o o oo o o o o o o o o oo oo o o o o oo o o o ooo o o o ooo o o o oo o o o o o o oo oo o oo o o oo oo o o o o o o ooo oo o oo o o o o oo oo o o o o o o oo oo o o oooo oo o o o o o o o oo o o o o o o 1.0 0.0 0.5 1.0 o o oooo o oo o ooo o o o o oo o o o ooo o oo o oo oooo ooooooo o o o o oo oo o o ooo o o oo oooo ooo o oo o o oo o o o oo o o o o o o o oo oo ooo o o ooo o o oo o o o o o o o oo oo o o o o oo o o oo o oo oo o o o o oo ooo o oooooooo o oo o oo ooo ooo o o o o o oo oo o o oo o oo o oo oo o o ooo o o o o o o ooo o o oo oo o o o o o oo o o o o oo oooo oo o o o o o o o o oo oo o o o o oo oo o o o o ooo oooooo o ooo oo o o o o o o o o o oo o o o o o o oooo o o o o o o ooo o o o o o oo o oo oo o oo o o o o o ooo o o ooooo o o o o o o o o oo o oo o oo oo o o o oo o o o o o o o 1.0 0.0 0.5 1.0 oo o oooooo o o ooooooo o oo oooo oooooo o o o o o oo o o oo oo o o o o oo o oooo ooo o o o ooo oo oo o o o o oo o o oo o o o o ooo oo oo o o o o o o oo o o o o oo oooo o o o o ooo ooo o oo oo o oooo oo o o o o oo ooo o oo o oo oo o o o o oo oo oo o o o o oo oo o o o o o o o oo o oooo oo o o o o oo o o oo oo o o o o o o o o o o oo o oo o o o o o o oo oo o o o o o oo oo o oo ooo o o o o oo o o o o oo o o oo o o o o o oo o oo o oo o o o o o oo o o o ooo oooo o oo oo o o o o o o oo oo ooo oo o o ooo o o o oo o o oo oo oo ooo ooo o oo o ooooo o o o oo oo o o o o oooo o o o o o 1.0 0.0 0.5 1.0

series yt at lag 0 h 15 in Figure 2.5. There are no particularly strong linear relations

current SOI values, xt are plotted against the future recruitment values, yt+h for 0 h 15.

0 20

0 20 0 20 0 20 0 20 60 100 60 100 60 100

100

0 20

100

0 20

100

1.0

0.0

0.5

1.0

0 20

Figure 2.6: Multiple lagged scatterplots showing the relationship between the SOI at time t, say xt (x-axis) versus recruits at time t + h, say yt+h (y-axis), 0 h 15. correlation at lag 6, for example, is 0.60 implying that increases in the SOI lead decreases in number of recruits by about 6 months. On the other hand, the series are hardly correlated that predicting recruits might be possible using the El Nio at lags of 5, 6 ,7, . . . months. n We show in the right panels panel of Figure 2.7 the partial autocorrelation functions of the SOI series (left panel) and the recruits series (right panel). Note that the PACF of the SOI has a single peak at lag h = 1 and then relatively small values. This means, in eect, that fairly good prediction can be achieved by using the immediately preceding point and that adding further values does not really improve the situation. Hence we might try an autoregressive model with p = 1. The recruits series has two peaks and then small values, implying that the pure correlation between points is summarized by the rst two lags. (0.025) at all in the conventional sense, measured at lag h = 0. The general pattern suggests

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

1.0

PACF of SOI

1.0
PACF of Recruits 0 5 10 15 20 25 30

0.5

0.0

0.5

0.5
0

0.0

0.5

Figure 2.7: Partial autocorrelation functions for the SOI (left panel) and the recruits (right panel) series. Table 2.1: AICC values for ten models for the recruits series p AICC 1 2 3 5.75 5.52 5.53 4 5.54 5 5.54 6 5.55 7 5.55 8 5.56 9 5.57 10 5.58

We consider the simple problem of modeling the recruit series shown in the right panel of Figure 2.1 using an autoregressive model. The top right panel of Figure 2.4 and the right panel of Figure 2.7 shows the autocorrelation and partial autocorrelation functions of the recruit series. The PACF has large values for h = 1 and 2 and then is essentially zero for higher order lags. This implies by the property of an autoregressive model that a second order (p = 2) AR model might provide a good t. Running the regression program for an AR(2) model with intercept xt = 0 + 1 xt1 + 2 xt2 + wt leads to the estimators 0 = 61.8439(4.0121), 1 = 1.3512(0.0417), 2 = 0.4612(0.0416)

and 2 = 89.53, where the estimated standard deviations are in parentheses. To determine

whether the above order is the best choice, we tted models for 1 p 10, obtaining corrected AIC (AICC) values summarized in Table 2.1 using (2.3). This shows that the minimum AICC is obtained for p = 2 and we choose the second order model. The previous example uses various autoregressive models for the recruits series, for example, one can t a second-order regression model. We may also use this regression idea to t

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

the model to other series such as a detrended version of the SOI given in previous discussions. We have noted in our discussions of Figure 2.7 from the partial autocorrelation function that a plausible model for this series might be a rst order autoregression of the form given above with p = 1. Again, putting the model above into the regression framework (1.2) for a single coecient leads to the estimators 1 = 0.59 with standard error 0.04, 2 = 0.09218 and AICC(1) = 1.375. The ACF of these residuals shown in the left panel of Figure 2.8, however, will still show cyclical variation and it is clear that they still have a number of values exceeding the 1.96/ n threshold. A suggested procedure is to try higher order autoregressive
1.70 1.65 1.60 1.55 1.50 1.45 1.40 1.35 1.0
o o o o o o o o

ACF of residuls of AR(1) for SOI

AIC AICC
o o o o

0.5

o o o o o o o o o o o o o o o o o o o o o o o o

0.0

o o o o

0.5

o o o o o o o o o o o o o o o o

15 Lag

Figure 2.8: ACF of residuals of AR(1) for SOI (left panel) and the plot of AIC and AICC values (right panel). models and successive models for 1 p 30 were tted and the AIC and AICC values are

plotted in the right panel of Figure 2.8. There is a clear minimum for a p = 16 order model.
0.4050(0.0469), 0.0740(0.0505), 0.1527(0.0499), 0.0915(0.0505), 0.0377(0.0500), 0.0803(0.0493), 0.0281(0.0504), 0.1902(0.0501), 0.1283(0.0510), 0.0413(0.0476), and 2 = 0.07166. 0.0743(0.0493), 0.0679(0.0492), 0.0096(0.0492), 0.1108 (0.0491), 0.1707(0.0492), 0.1606(0.0499),

The coecient vector is with components and their standard errors in the parentheses

Exercise: Please use LASSO type approach (you can choose any one) to re-run this example again to see what you can obtain. As for the R command, you can use the package lasso2 or lars.

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

2.8

Computer Codes

##################################################################### # This is Example 2.1 for Southern Oscillation Index and Recruits data #################################################################### y<-[Link]("c:/res-teach/xiamen12-06/data/[Link]",header=F) # read data file x<-[Link]("c:/res-teach/xiamen12-06/data/[Link]",header=F) y=y[,1] x=x[,1] postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) #[Link]() par(mfrow=c(1,2),mex=0.4,bg="yellow") # save the graph as a postscript file [Link](y,type="l",lty=1,ylab="",xlab="") # make a time series plot title(main="Southern Oscillation Index",cex=0.5) # set up the title of the plot abline(0,0) # make a straight line #[Link]() [Link](x,type="l",lty=1,ylab="",xlab="") abline(mean(x),0) title(main="Recruit",cex=0.5) [Link]() n=length(y) n2=n-12 yma=rep(0,n2) for(i in 1:n2){yma[i]=mean(y[i:(i+12)])} # compute the mean

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS yy=y[7:(n2+6)] yy0=yy-yma postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4) [Link](yy,type="l",lty=1,ylab="",xlab="") points(1:n2,yma,type="l",lty=1,lwd=3,col=2) [Link](yy0,type="l",lty=1,ylab="",xlab="") points(1:n2,yma,type="l",lty=1,lwd=3,col=2) abline(0,0) [Link]() m=17 n1=n-m [Link]=rep(0,n1*m) dim([Link])=c(n1,m) [Link]=[Link] for(i in 1:m){ [Link][,i]=y[i:(n1+i-1)] [Link][,i]=x[i:(n1+i-1)]} text_soi=c("1","2","3","4","5","6","7","8","9","10","11","12","13", "14","15","16") postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) par(mfrow=c(4,4),mex=0.4,bg="light blue") for(i in 2:17){ plot([Link][,1],[Link][,i],type="p",pch="o",ylab="",xlab="", ylim=c(-1,1),xlim=c(-1,1)) text(0.8,-0.8,text_soi[i-1],cex=2)} [Link]() # make a point plot

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

text1=c("ACF of SOI Index") text2=c("ACF of Recruits") text3=c("CCF of SOI and Recruits") SOI=y Recruits=x postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4,bg="light pink") acf(y,ylab="",xlab="",ylim=c(-0.5,1),[Link]=50,main="") # make an ACF plot legend(10,0.8, text1) legend(10,0.8,text2) ccf(y,x, ylab="",xlab="",ylim=c(-0.5,1),[Link]=50,main="") legend(-40,0.8,text3) [Link]() postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) par(mfrow=c(4,4),mex=0.4,bg="light green") for(i in 1:16){ plot([Link][,i],[Link][,1],type="p",pch="o",ylab="",xlab="", ylim=c(0,100),xlim=c(-1,1)) text(-0.8,10,text_soi[i],cex=2)} [Link]() postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) par(mfrow=c(4,4),mex=0.4,bg="light grey") for(i in 1:16){ plot([Link][,1],[Link][,i],type="p",pch="o",ylab="",xlab="", # set up the legend acf(x,ylab="",xlab="",ylim=c(-0.5,1),[Link]=50,main="")

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS ylim=c(0,100),xlim=c(-1,1)) text(-0.8,10,text_soi[i],cex=2)} [Link]() postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4,bg="light blue") pacf(y,ylab="",xlab="",lag=30,ylim=c(-0.5,1),main="") text(10,0.9,"PACF of SOI") pacf(x,ylab="",xlab="",lag=30,ylim=c(-0.5,1),main="") text(10,0.9,"PACF of Recruits") [Link]() ################################################################ x<-[Link]("c:/res-teach/xiamen12-06/data/[Link]",header=T) [Link]=x[,1] n=length([Link]) aicc=0 if(aicc==1){ [Link]=rep(0,30) [Link]=[Link] [Link]=rep(0,30) for(i in 1:30){ fit3=arima([Link],order=c(i,0,0)) [Link][i]=fit3$aic/n-2 [Link][i]=fit3$sigma2 # obtain the estimated sigma^2 [Link][i]=log([Link][i])+(n+i)/(n-i-2) print(c(i,[Link][i],[Link][i]))} data=cbind([Link],[Link]) write(t(data),"c:/res-teach/xiamen12-06/soi_aic.dat",ncol=2) # compute AICC # fit an AR(i) # compute AIC # [Link]=30

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS }else{ data<-matrix(scan("c:/res-teach/xiamen12-06/soi_aic.dat"),byrow=T,ncol=2) } text4=c("AIC", "AICC") fit11=arima([Link],order=c(1,0,0)) resid1=fit11$residual postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4,bg="light yellow") acf(resid1,ylab="",xlab="",[Link]=20,ylim=c(-0.5,1),main="") text(10,0.8,"ACF of residuls of AR(1) for SOI") matplot(1:30,data,type="b",pch="o",col=c(1,2),ylab="",xlab="Lag",cex=0.6) legend(16,-1.40,text4,lty=1,col=c(1,2)) [Link]() #fit2=arima([Link],order=c(16,0,0)) #print(fit2)

2.9

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Proceeding of 2nd International Symposium on Information Theory (V. Petrov and F. Cski, eds.) 267281. Akadmiai Kiad, Budapest. a e o Bai, Z., C.R. Rao and Y. Wu (1999). Model selection with data-oriented penalty. Journal of Statistical Planning and Inferences, 77, 103-117. Burnham, K.P. and D. Anderson (2003). Model Selection And Multi-Model Inference: A Practical Information Theoretic Approach, 2nd edition. New York: Springer-Verlag. Efron, B., T. Hastie, I. Johnstone and R. Tibshirani (2004). Least angle regression. The Annals of Statistics, 32, 407-451. Eicker, F. (1967). Limit theorems for regression with unequal and dependent errors. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (L. LeCam and J. Neyman, eds.), University of California Press, Berkeley.

CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS

Fan and Li (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360. Fan, J. and H. Peng (2004). Nonconcave penalized likelihood with a diverging number of parameters. Annals of Statistics, 32, 928-961. Frank, I.E. and J.H. Friedman (1993). A statistical view of some chemometric regression tools (with discussion). Technometrics, 35, 109-148. Hastie, T.J. and R.J. Tishirani (1990). Generalized Additive Models. London: Chapman and Hall. Hurvich, C.M. and C.-[Link] (1989). Regression and time series model selection in small samples. Biometrika, 76, 297-307. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461-464. Shao, J. (1993). Linear model selection by cross-validation. Journal of the American Statistical Association, 88, 486-494. Shen, X.T. and J.M. Ye (2002). Adaptive model selection. Journal of the American Statistical Association, 97, 210-221. Shibata (1976). Biometrika, 63, 117-126. Shumway, R.H. (1988). Applied Statistical Time Series Analysis. Englewood Clis, NJ: Prentice-Hall. Shumway, R.H. (2006). Lecture Notes on Applied Time Series Analysis. Department of Statistics, University of California at Davis. Shumway, R.H. and D.S. Stoer (2000). Time Series Analysis & Its Applications. New York: Springer-Verlag. Tibshirani, R.J. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, 267-288. Wang, H., R. Li and C.-L. Tsai (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94,553-568.

Chapter 3 Regression Models With Correlated Errors

3.1 Methodology

In many applications, the relationship between two time series is of major interest. The market model in nance is an example that relates the return of an individual stock to the return of a market index. The term structure of interest rates is another example in which the time evolution of the relationship between interest rates with dierent maturities is investigated. These examples lead to the consideration of a linear regression in the form yt = 1 + 2 xt + et , where yt and xt are two time series and et denotes the error term. The least squares (LS) LS method produces consistent estimates. In practice, however, it is common to see that the error term et is serially correlated. In this case, we have a regression model with time series errors, and the LS estimates of 1 and 2 may not be consistent and ecient. Regression model with time series errors is widely applicable in economics and nance, but it is one of the most commonly misused econometric models because the serial dependence in et is often overlooked. It pays to study the model carefully. The standard method for dealing with correlated errors et in the regression model yt = T zt + et is to try to transform the errors et into uncorrelated ones and then apply the standard least squares approach to the transformed observations. For example, let P be an n n matrix 38 method is often used to estimate the above model. If {et } is a white noise series, then the

CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS

that transforms the vector e = (e1 , , en )T into a set of independent identically distributed variables with variance 2 . Then, transform the matrix version (1.4) to P y = PZ + P e and proceed as before. Of course, the major problem is deciding on what to choose for P but in the time series case, happily, there is a reasonable solution, based again on time series ARMA models. Suppose that we can nd, for example, a reasonable ARMA model for the residuals, say, for example, the ARMA(p, 0, 0) model
p

et =
k=1

k et + wt ,

which denes a linear transformation of the correlated et to a sequence of uncorrelated wt . We can ignore the problems near the beginning of the series by starting at t = p. In the ARMA notation, using the back-shift operator B, we may write (L) et = wt , where (L) = 1
p

(3.1)

k Lk
k=1

(3.2)

and applying the operator to both sides of (1.2) leads to the model (L) yt = (L) zt + wt , (3.3)

where the {wt }s now satisfy the independence assumption. Doing ordinary least squares model. The only problem is that we do not know the values of the coecients k (1 k p)

on the transformed model is the same as doing weighted least squares on the untransformed

in the transformation (3.2). However, if we knew the residuals et , it would be easy to estimate the coecients, since (3.2) can be written in the form et = T et1 + wt , (3.4)

known as the Cochran-Orcutt (1949) procedure for dealing with the problem of correlated errors in the time series context.

et1 = (et1 , et2 , , etp )T replacing zt . The above comments suggest a general approach

which is exactly the usual regression model (1.2) with = (1 , , p )T replacing and

CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS

1. Begin by tting the original regression model (1.2) by least squares, obtaining and the residuals et = yt zt . 2. Fit an ARMA to the estimated residuals, say (L) et = (L) wt . 3. Apply the ARMA transformation found to both sides of the regression equation (1.2) to obtain (L) (L) yt = T z t + wt . (L) (L)
T

4. Run an ordinary least squares on the transformed values to obtain the new . 5. Return to 2. if desired. Often, one iteration is enough to develop the estimators under a reasonable correlation structure. In general, the Cochran-Orcutt procedure converges to the maximum likelihood or weighted least squares estimators. Note that there is a function in R to compute the Cochrane-Orcutt estimator arima(x, order = c(0, 0, 0), seasonal = list(order = c(0, 0, 0), period = NA), xreg = NULL, [Link] = TRUE, [Link] = TRUE, fixed = NULL, init = NULL, method = c("CSS-ML", "ML", "CSS"), [Link], [Link] = list(), kappa = 1e6) by specifying xreg=..., where xreg is a vector or matrix of external regressors, which must have the same number of rows as x. Example 3.1: The data shown in Figure 3.1 represent quarterly earnings per share for the American Company Johnson & Johnson from the from the fourth quarter of 1970 to the rst quarter of 1980. We might consider an alternative approach to treating the Johnson and Johnson earnings series, assuming that yt = log(xt ) = 1 + 2 t + et . In order to analyze the data with this approach, rst we t the model above, obtaining 1 = 0.6678(0.0349) and the ACF and PACF are shown in the top two panels of Figure 3.2. Note that the ACF and PACF suggest that a seasonal AR series will t well and we show the ACF and PACF of these residuals in the bottom panels of Figure 3.2. The seasonal AR model is of the form 2 = 0.0417(0.0071). The computed residuals et = yt 1 2 t can be computed easily,

CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS

J&J Earnings transformed log(earnings)

0
0

Figure 3.1: Quarterly earnings for Johnson & Johnson (4th quarter, 1970 to 1st quarter, 1980, left panel) with log transformed earnings (right panel).
ACF PACF

1.0

detrended

0.5

0.0

0.5

0.0
0

0.5

1.0

ARIMA(1,0,0,)_4

0.5

0.0

0.5

0.0
0

0.5

1.0

Figure 3.2: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the detrended log J&J earnings series (top two panels)and the tted ARIMA(0, 0, 0) (1, 0, 0)4 residuals.
2 et = 1 et4 + wt and we obtain 1 = 0.7614(0.0639), with w = 0.00779. Using these values,

we transform yt to yt 1 yt4 = 1 (1 1 ) + 2 [t 1 (t 4)] + wt using the estimated value 1 = 0.7614. With this transformed regression, we obtain the new estimators 1 = 0.7488(0.1105) and 2 = 0.0424(0.0018). The new estimator has the

CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS advantage of being unbiased and having a smaller generalized variance.

To forecast, we consider the original model, with the newly estimated 1 and 2 . We
t obtain the approximate forecast for yt+h = 1 + 2 (t+h)+et for the log transformed series, t+h

along with upper and lower limits depending on the estimated variance that only incorporates the prediction variance of et , considering the trend and seasonal autoregressive parameters t+h as xed. The narrower upper and lower limits (The gure is not presented here) are mainly a refection of a slightly better t to the residuals and the ability of the trend model to take care of the nonstationarity. Example [Link] We consider the relationship between two U.S. weekly interest rate series: xt : the 1-year Treasury constant maturity rate and yt : the 3-year Treasury constant maturity rate. Both series have 1967 observations from January 5, 1962 to September 10, 1999 and are measured in percentages. The series are obtained from the Federal Reserve Bank of St Louis. Figure 3.3 shows the time plots of the two interest rates with solid line denoting the 1-year rate and dashed line for the 3-year rate. The left panel of Figure 3.4 plots yt versus

10 12 14 16

1970

1980

1990

2000

Figure 3.3: Time plots of U.S. weekly interest rates (in percentages) from January 5, 1962 to September 10, 1999. The solid line (black) is the Treasury 1-year constant maturity rate and the dashed line the Treasury 3-year constant maturity rate (red). xt , indicating that, as expected, the two interest rates are highly correlated. A naive way to describe the relationship between the two interest rates is to use the simple model, Model I:
2 yt = 1 + 2 xt + et . This results in a tted model yt = 0.911 + 0.924 xt + et , with e = 0.538

CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS

1.5

o o o o

o o o o o o oo o oo o o o o o oooo o o o o o o o o oo o o o o o o oo o o ooo o oooo o oo oo ooo o o o oo o oo o o o o oo o o o oo o o o o o o o o o o o o o o o o o o oo o o o oo o o o o oo oo o o o o oo oo o o o ooo o o oo o oo o o o o o o oo o o o o o o o o oo oo o o o o o oo oo o oo oo o o o o o o o o o o o o o oo o oo o o o oo o oo o oo o o o o o o o o o o oo o o oo o o o o oo o o o o oo oo ooo o o oo o oo oo oo o ooo o o o oo o o o oo o o oo oo oo o oo oo o o o oo o o o o o oo oo oo o o o oo o oo oo oo o o o oo oo o o o o o o o o o o o o o o oo o oo o o o o oo o o o oo oo o o o oo o o o o oo o o o o o o oo o o o o o o o o oo o oo o oo o o oo o o o o oo o o o o o o o o o oo ooo oo oo oo oooo o oooo o o o o o o oooo o o o o o ooooo o oo o o o o oo o o o o oo o o oo o o o o oooo o o o ooo o ooo o o o oo oo o oo ooo oo o oooo o o o o oo ooo oo o o o oo o o o o oo o o ooo o o o ooooo o oooo o o oo o o o oo o o o o o ooo o o oo o ooo o o oooo o o o o oo o o o oo ooo oo o o ooo o ooo o o o o oo oo oo o o o oo o o oo o o oo o o oo o o o oo oooo o oo o o oo oooo o oo o oo o o oo o o oo o oo o ooo o oo oo o oo o o o oo o o o ooooo o o o o oo oo o ooooooo o oooo o o oo o oooo o ooo oooo o oo o o oo ooo oo oo o oo o o o o oo o o ooo oo o o oo o o ooo o o ooo o o oooo o o o ooo oo o oo oo o o o o ooo oo o o o ooo ooo o oo o ooo o oo o ooo o ooo o oo oo ooo o o o o o oo o oo o o oo o oo oo o oo o o ooooo oo oo ooo o oo o ooooo o oo oo oo o o o o o oooooo o o o oooo o o o oo o o o ooo ooo o oo oooo o ooo o oooo o o o oooo o oo o oooo o o oooo o oooooooo o oo oooo o oooo o oooo o oo oo o ooooo o oo o oo oo o oo o o o o ooo o o o oooo o o o ooooo oo o oo o oooo o o o oo o oooo o o o o oo o o oo o ooo oo o o o ooo o o o o o oo o oo oo o oo o o ooo o oo oo o oo o o oo oo o o o o ooo oo oo o o oo o o ooo o o o o o o o o o o o o o o o o o o oo o oo oo o oo o o oo o o o o o o oo oooo o oo oo o oo o o o o o oo oo oo oo oo o oo o o o oo oo o o o oo o oo o o o o oo o o o oo o oo o o o o o oo o oo o o o o o oo o o o o oo oo oo oo o o o o o o o o o o o o o o o oo o o o o o o

1.0

0.5

0.0

0.5

o o o o

1.0

o o o

o o

1.5

0.5

1.0

1.5

Figure 3.4: Scatterplots of U.S. weekly interest rates from January 5, 1962 to September 10, 1999: the left panel is 3-year rate versus 1-year rate, and the right panel is changes in 3-year rate versus changes in 1-year rate. and R2 = 95.8%, where the standard errors of the two coecients are 0.032 and 0.004, respectively. This simple model (Model I) conrms the high correlation between the two interest rates. However, the model is seriously inadequate as shown by Figure 3.5, which
1.0 0.5 1.0 0.5 0.0 1.5 1.0
1970 1980 1990 2000

0.5
0

0.0

0.5

Figure 3.5: Residual series of linear regression Model I for two U.S. weekly interest rates: the left panel is time plot and the right panel is ACF. gives the time plot and ACF of its residuals. In particular, the sample ACF of the residuals is highly signicant and decays slowly, showing the pattern of a unit root nonstationary time series1 . The behavior of the residuals suggests that marked dierences exist between the two interest rates. Using the modern econometric terminology, if one assumes that the two
1

We will discuss in detail on how to do unit root test later

CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS

interest rate series are unit root nonstationary, then the behavior of the residuals indicates that the two interest rates are not co-integrated; see later chapters for discussion of unit root and co-integration. In other words, the data fail to support the hypothesis that there exists a long-term equilibrium between the two interest rates. In some sense, this is not surprising because the pattern of inverted yield curve did occur during the data span. By the inverted yield curve, we mean the situation under which interest rates are inversely related to their time to maturities. The unit root behavior of both interest rates and the residuals leads to the consideration 1-year interest rate and yt = yt yt1 = (1 L) yt denote changes in the 3-year interest of the change series of interest rates. Let xt = yt yt1 = (1 L) xt be changes in the

rate. Consider the linear regression, Model II: yt = 1 +2 xt +et . Figure 3.6 shows time

plots of the two change series, whereas the right panel of Figure 3.4 provides a scatterplot

1.5 1.0 0.5 0.0 0.5 1.0 1.5

1970

1980

1990

2000

Figure 3.6: Time plots of the change series of U.S. weekly interest rates from January 12, 1962 to September 10, 1999: changes in the Treasury 1-year constant maturity rate are in denoted by black solid line, and changes in the Treasury 3-year constant maturity rate are indicated by red dashed line. between them. The change series remain highly correlated with a tted linear regression
2 model given by yt = 0.0002 + 0.7811 xt + et with e = 0.0682 and R2 = 84.8%. The

standard errors of the two coecients are 0.0015 and 0.0075, respectively. This model further conrms the strong linear dependence between interest rates. The two top panels of Figure 3.7 show the time plot (left) and sample ACF (right) of the residuals (Model II). Once again, the ACF shows some signicant serial correlation in the residuals, but the magnitude of the correlation is much smaller. This weak serial dependence in the residuals can be modeled by

CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS

0.2

0.4

0.0

0.2

0.4

500

1000

1500

2000

0.5

0.0
0

0.5

1.0

0.2

0.4

0.0

0.2

0.4

500

1000

1500

2000

0.5

0.0
0

0.5

1.0

Figure 3.7: Residual series of the linear regression models: Model II (top) and Model III (bottom) for two change series of U.S. weekly interest rates: time plot (left) and ACF (right). using the simple time series models discussed in the previous sections, and we have a linear regression with time series errors. The main objective of this section is to discuss a simple approach for building a linear regression model with time series errors. The approach is straightforward. We employ a simple time series model discussed in this chapter for the residual series and estimate the whole model jointly. For illustration, consider the simple linear regression in Model II. Because residuals of the model are serially correlated, we identify a simple ARMA model for the residuals. From the sample ACF of the residuals shown in the right top panel of Figure 3.7, we specify an MA(1) model for the residuals and modify the linear regression model to (Model III): yt = 1 + 2 xt + et and et = wt 1 wt1 , where {wt } is assumed to be a white noise series. In other words, we simply use an MA(1) model, without the constant term, to capture the serial dependence in the error term of Model II. The two bottom panels of Figure 3.7 show the time plot (left) and sample ACF (right) of the residuals (Model III). The resulting model is a simple example of linear regression with time series errors. In practice, more elaborated time series models can be added to a linear regression equation to form a general regression model with time series errors.

CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS

Estimating a regression model with time series errors was not easy before the advent of modern computers. Special methods such as the Cochrane-Orcutt estimator have been proposed to handle the serial dependence in the residuals. By now, the estimation is as easy as that of other time series models. If the time series model used is stationary and invertible, then one can estimate the model jointly via the maximum likelihood method or conditional maximum likelihood method. For the U.S. weekly interest rate data, the tted version of
2 Model II is yt = 0.0002 + 0.7824 xt + et and et = wt + 0.2115 wt1 with w = 0.0668

and R2 = 85.4%. The standard errors of the parameters are 0.0018, 0.0077, and 0.0221, respectively. The model no longer has a signicant lag-1 residual ACF, even though some minor residual serial correlations remain at lags 4 and 6. The incremental improvement of adding additional MA parameters at lags 4 and 6 to the residual equation is small and the result is not reported here. Comparing the above three models, we make the following observations. First, the high R2 and coecient 0.924 of Modle I are misleading because the residuals of the model show strong serial correlations. Second, for the change series, R2 and the coecient of xt of Model II and Model III are close. In this particular instance, adding the MA(1) model to the change series only provides a marginal improvement. This is not surprising because the estimated MA coecient is small numerically, even though it is statistically highly signicant. Third, the analysis demonstrates that it is important to check residual serial dependence in linear regression analysis. Because the constant term of Model III is insignicant, the model wt + 0.212 wt1 . The interest rates are concurrently and serially correlated. shows that the two weekly interest rate series are related as yt = yt1 + 0.782 (xt xt1 ) +

Finally, we outline a general procedure for analyzing linear regression models with time series errors: First, t the linear regression model and check serial correlations of the residuals (see later). Second, if the residual series is unit-root nonstationary, take the rst dierence of both the dependent and explanatory variables. Go to step 1. If the residual series appears to be stationary, identify an ARMA model for the residuals and modify the linear regression model accordingly. Third, perform a joint estimation via the maximum likelihood method and check the tted model for further improvement. All above steps can be implemented in R with the command arima(). To check the serial correlations of residuals, we recommend that the Ljung-Box statistics

CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS

should be used instead of the Durbin-Watson (DW) statistic because the latter only considers the lag-1 serial correlation. There are cases in which residual serial dependence appears at higher order lags. This is particularly so when the time series involved exhibits some seasonal behavior. Remark: For a residual series et with T observations, the Durbin-Watson statistic is
T T

DW =
t=2

(et et1 ) /

e2 . t
t=1

{et }.

Straightforward calculation shows that DW 2(1 e (1)), where e (1) is the lag-1 ACF of

Consider testing that several autocorrelation coecients are simultaneously zero, i.e. H0 : 1 = 2 = . . . = m = 0. Under the null hypothesis, it is easy to show (see, Box and Pierce (1970)) that Q=T
k=1 m

2 2 . k m

(3.5)

Ljung and Box (1978) provided the following nite sample correction which yields a better t to the 2 for small sample sizes: m
m

Q = T (T + 2)
k=1

2 k 2 . m T k

(3.6)

Both are called Q-test and well known in the statistics literature. Of course, they are very useful in applications. The function in R for the Ljung-Box test is [Link](x, lag = 1, type = c("Box-Pierce", "Ljung-Box")) and the Durbin-Watson test for autocorrelation of disturbances is dwtest(formula, [Link] = NULL, alternative = c("greater","[Link]", "less"),iterations = 15, exact = NULL, tol = 1e-10, data = list())

3.2

Nonparametric Models with Correlated Errors

See the paper by Xiao, Linton, Carrol and Mammen (2003).

CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS

3.3

Predictive Regression Models

see the papers by Stambaugh (1999), Amihud and Hurvich (2004), Torous, Valkanov and Yan (2004), and Campbell and Yogo (2006) and the references therein.

3.4

Computer Codes

################################################## # This is Example 3.1 for Johnson and Johnson data ################################################## y=[Link](c:/res-teach/xiamen12-06/data/[Link],header=F) n=length(y[,1]) y_log=log(y[,1]) horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4,bg="light yellow") [Link](y,type="l",lty=1,ylab="",xlab="") title(main="J&J Earnings",cex=0.5) [Link](y_log,type="l",lty=1,ylab="",xlab="") title(main="transformed log(earnings)",cex=0.5) [Link]() # MODEL 1: y_t=beta_0+beta_1 t+ e_t z1=1:n fit1=lm(y_log~z1) e1=fit1$resid # Now, we need to re-fit the model using the transformed data x1=5:n y_1=y_log[5:n] y_2=y_log[1:(n-4)] y_fit=y_1-0.7614*y_2 x2=x1-0.7614*(x1-4) # fit log(z) versus time trend # log of data postscript(file="c:\\res-teach\\xiamen12-06\\figs\\[Link]",

CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS x1=(1-0.7614)*rep(1,n-4) fit2=lm(y_fit~-1+x1+x2) e2=fit2$resid postscript(file="c:\\res-teach\\xiamen12-06\\figs\\[Link]", horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4,bg="light pink") acf(e1, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="ACF") text(10,0.8,"detrended") pacf(e1,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="PACF") acf(e2, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="") text(15,0.8,"ARIMA(1,0,0,)_4") pacf(e2,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="") [Link]() ##################################################### # This is Example 3.2 for weekly interest rate series ##################################################### z<-[Link]("c:/res-teach/xiamen12-06/data/[Link]",header=F) # # first column=one year Treasury constant maturity rate; third column=date # second column=three year Treasury constant maturity rate;

x=z[,1] y=z[,2] n=length(x) u=seq(1962+1/52,by=1/52,length=n) x_diff=diff(x) y_diff=diff(y) # Fit a simple regression model and examine the residuals fit1=lm(y~x) e1=fit1$resid # Model 1

CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS

postscript(file="c:\\res-teach\\xiamen12-06\\figs\\[Link]", horizontal=F,width=6,height=6) matplot(u,cbind(x,y),type="l",lty=c(1,2),col=c(1,2),ylab="",xlab="") [Link]() postscript(file="c:\\res-teach\\xiamen12-06\\figs\\[Link]", horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4,bg="light grey") plot(x,y,type="p",pch="o",ylab="",xlab="",cex=0.5) plot(x_diff,y_diff,type="p",pch="o",ylab="",xlab="",cex=0.5) [Link]() postscript(file="c:\\res-teach\\xiamen12-06\\figs\\[Link]", horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4,bg="light green") plot(u,e1,type="l",lty=1,ylab="",xlab="") abline(0,0) acf(e1,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="") [Link]() # Take different and fit a simple regression again fit2=lm(y_diff~x_diff) e2=fit2$resid postscript(file="c:\\res-teach\\xiamen12-06\\figs\\[Link]", horizontal=F,width=6,height=6) matplot(u[-1],cbind(x_diff,y_diff),type="l",lty=c(1,2),col=c(1,2), ylab="",xlab="") abline(0,0) [Link]() # Model 2

CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS postscript(file="c:\\res-teach\\xiamen12-06\\figs\\[Link]", horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4,bg="light pink") [Link](e2,type="l",lty=1,ylab="",xlab="") abline(0,0) acf(e2, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="") # fit a model to the differenced data with an MA(1) error fit3=arima(y_diff,xreg=x_diff, order=c(0,0,1)) e3=fit3$resid [Link](e3,type="l",lty=1,ylab="",xlab="") abline(0,0) acf(e3, ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="") [Link]() # Model 3

3.5

References

Amihud, Y. and C. Hurvich (2004). Predictive regressions: A reduced-bias estimation method. Journal of Financial and Quantitative Analysis, 39, 813-841. Box, G. and D. Pierce (1970). Distribution of residual autocorrelations in autoregressive integrated moving average time series models. Journal of the American Statistical Association, 65, 1509-1526. Campbell, J. and M. Yogo (2006). Ecient tests of stock return predictability. Journal of Financial Economics, 81, 27-60. Cochrane, D. and G.H. Orcutt (1949). Applications of least squares regression to relationships containing autocorrelated errors. Journal of the American Statistical Association, 44, 32-61. Ljung, G. and G. Box (1978). On a measure of lack of t in time series models. Biometrika, 66, 67-72. Stambaugh, R. (1999). Predictive regressions. Journal of Financial Economics, 54, 375421. Torous, W., R. Valkanov and S. Yan (2004). On predicting stock returns with nearly integrated explanatory variables. Journal of Business, 77, 937-966.

CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS

Xiao, X., O.B. Linton, R.J. Carroll and E. Mammen (2003). More ecient local polynomial estimation in nonparametric regression with autocorrelated errors. Journal of the American Statistical Association, 98, 480-992.

Chapter 4 Estimation of Covariance Matrix

4.1 Methodology

Consider again the regression model in (1.2). There may exist situations which the error et has serial correlations and/or conditional heteroscedasticity, but the main objective of the analysis is to make inference concerning the regression coecients . When et has serial correlations, we discussed methods in Example 3.1 and Example 3.2 above to overcome this diculty. However, we assume that et follows an ARIMA type model and this assumption might not be always satised in some applications. Here, we consider a general situation without making this assumption. In situations under which the ordinary least squares estimates of the coecients remain consistent, methods are available to provide consistent estimate of the covariance matrix of the coecients. Two such methods are widely used in economics and nance. The rst method is called heteroscedasticity consistent (HC) estimator; see Eicker (1967) and White (1980). The second method is called heteroscedasticity and autocorrelation consistent (HAC) estimator; see Newey and West (1987). To ease in discussion, we shall re-write the regression model as yt = T xt + et , where yt is the dependent variable, xt = (x1t , , xpt )T is a p-dimensional vector of exparameter vector. The LS estimate of is given by
n 1

planatory variables including constant and lagged variables, and = (1 , , p )T is the

=
t=1

xt xT t
t=1

xt yt ,

CHAPTER 4. ESTIMATION OF COVARIANCE MATRIX and the associated covariance matrix has the so-called sandwich form as
n 1 n 1

= Cov() =
t=1

xt xT t

C
t=1

xt xT t

if et is iid

n 2 e t=1

xt xT t

where C is called the meat given by

C = Var
t=1

et xt ,

2 e is the variance of et and is estimated by the variance of residuals of the regression. In the

presence of serial correlations or conditional heteroscedasticity, the prior covariance matrix estimator is inconsistent, often resulting in inating the t-ratios of . The estimator of White (1980) is based on following:
n 1 n 1

,hc =
t=1 T

xt xT t

Chc
t=1

xt xT t

where with et = yt xt being the residual at time t, Chc n = np

1 n

e2 xt xT . t t
t=1

The estimator of Newey and West (1987) is

n n 1

,hac =
t=1

xt xT t

Chac
t=1

xt xT t

where Chac is given by

n l n

Chac =
t=1

e2 t

xt xT t

+
j=1

wj
t=j+1

xt et etj xT + xtj etj et xT t tj

with l is a truncation parameter and wj is weight function such as the Barlett weight function (1987) showed that if l and l4 /T 0, then Chac is a consistent estimator of C. dened by wj = 1 j/(l + 1). Other weight function can also used. Newey and West

Newey and West (1987) suggested choosing l to be the integer part of 4(n/100)1/4 and Newey and West (1994) suggested using some adaptive (data-driven) methods to choose l; see Newey and West (1994) for details. In general, this estimator essentially can use a nonparametric method to estimate the covariance matrix of
n t=1 et

xt and a class of kernel-

based heteroskedasticity and autocorrelation consistent (HAC) covariance matrix estimators

CHAPTER 4. ESTIMATION OF COVARIANCE MATRIX

was introduced by Andrews (1991). For example, the Barlett weight wj above can be replaced by wj = K(j/(l+1)) where K() is a kernel function such as truncated kernel K(x) = I(|x| 1), the Tukey-Hanning kernel K(x) = (1 + cos( x))/2 if |x| 1, the Parzen kernel 1 6 x2 + 6 |x|3 for 0 |x| 1/2, K(x) = 2(1 |x|)3 for 1/2 |x| 1, 0 otherwsie, K(x) = 25 12 2 x2 sin(6 x/5) cos(6 x/5) . 6 x/5

and the Quadratic spectral kernel

Andrews (1991) suggested using the data-driven method to select the bandwidth l: l = 2.66( T )1/5 for the Parzen kernel, l = 1.7462( T )1/5 for the Tukey-Hanning kernel, and l = 1.3221( T )1/5 for the quadratic spectral kernel, where = 4
k 2 4 8 i=1 i i /(1 i ) n 4 4 i=1 i /(1 i )

with i and i being parameters estimated from an AR(1) model for ut = et xt . Example 4.1: (Continuation of Example 3.2) For illustration, we consider the rst dierenced interest rate series in Model II in Example 3.2. The t-ratio of the coecient of xt is 104.63 if both serial correlation and conditional heteroscedasticity in residuals are ignored; it becomes 46.73 when the HC estimator is used, and it reduces to 40.08 when the HAC estimator is employed. To use HC or HAC estimator, we can use the package sandwich in R and the commands are vcovHC() or vcovHAC() or meatHAC(). There are a set of functions implementing a class of kernel-based heteroskedasticity and autocorrelation consistent (HAC) covariance matrix estimators as introduced by Andrews (1991). In vcovHC(), these estimators dier in cases is given in the following: their choice of the i in = Var(e) = diag{1 , , n }, an overview of the most important const : i = 2 HC0 : i = e2 i HC1 : i = HC2 : i n e2 nk i e2 i = 1 hi

CHAPTER 4. ESTIMATION OF COVARIANCE MATRIX e2 i (1 hi )2 e2 i = (1 hi )i

HC3 : i = HC4 : i

where hi = Hii are the diagonal elements of the hat matrix and i = min{4, hi /h}. vcovHC(x, type = c("HC3", "const", "HC", "HC0", "HC1", "HC2", "HC4"), omega = NULL, sandwich = TRUE, ...) meatHC(x, type = , omega = NULL) vcovHAC(x, [Link] = NULL, prewhite = FALSE, weights = weightsAndrews, adjust = TRUE, diagnostics = FALSE, sandwich = TRUE, [Link] = "ols", data = list(), ...) meatHAC(x, [Link] = NULL, prewhite = FALSE, weights = weightsAndrews, adjust = TRUE, diagnostics = FALSE, [Link] = "ols", data = list()) kernHAC(x, [Link] = NULL, prewhite = 1, bw = bwAndrews, kernel = c("Quadratic Spectral", "Truncated", "Bartlett", "Parzen", "Tukey-Hanning"), approx = c("AR(1)", "ARMA(1,1)"), adjust = TRUE, diagnostics = FALSE, sandwich = TRUE, [Link] = "ols", tol = 1e-7, data = list(), verbose = FALSE, ...) weightsAndrews(x, [Link] = NULL,bw = bwAndrews, kernel = c("Quadratic Spectral","Truncated","Bartlett","Parzen", "Tukey-Hanning"), prewhite = 1, [Link] = "ols", tol = 1e-7, data = list(), verbose = FALSE, ...) bwAndrews(x,[Link]=NULL,kernel=c("Quadratic Spectral", "Truncated", "Bartlett","Parzen","Tukey-Hanning"), approx=c("AR(1)", "ARMA(1,1)"), weights = NULL, prewhite = 1, [Link] = "ols", data = list(), ...)

CHAPTER 4. ESTIMATION OF COVARIANCE MATRIX

Also, there are a set of functions implementing the Newey and West (1987, 1994) heteroskedasticity and autocorrelation consistent (HAC) covariance matrix estimators. NeweyWest(x, lag = NULL, [Link] = NULL, prewhite = TRUE, adjust = FALSE, diagnostics = FALSE, sandwich = TRUE, [Link] = "ols", data = list(), verbose = FALSE) bwNeweyWest(x, [Link] = NULL, kernel = c("Bartlett", "Parzen", "Quadratic Spectral", "Truncated", "Tukey-Hanning"), weights = NULL, prewhite = 1, [Link] = "ols", data = list(), ...)

4.2 4.3

Details (see the paper by Zeileis) Computer Codes

############################################ # This is Example 4.1 of using HC and HAC ########################################## library(sandwich) library(zoo) z<-[Link]("c:/res-teach/xiamen12-06/data/[Link]",header=F) x=z[,1] y=z[,2] x_diff=diff(x) y_diff=diff(y) # Fit a simple regression model and examine the residuals fit1=lm(y_diff~x_diff) print(summary(fit1)) e1=fit1$resid # Heteroskedasticity-Consistent Covariance Matrix Estimation #hc0=vcovHC(fit1,type="const") #print(sqrt(diag(hc0))) # type=c("const","HC","HC0","HC1","HC2","HC3","HC4") # HC and HAC are in the package "sandwich"

CHAPTER 4. ESTIMATION OF COVARIANCE MATRIX # HC0 is the White estimator hc1=vcovHC(fit1,type="HC0") print(sqrt(diag(hc1))) #Heteroskedasticity and autocorrelation consistent (HAC) estimation #of the covariance matrix of the coefficient estimates in a #(generalized) linear regression model. hac1=vcovHAC(fit1,sandwich=T) print(sqrt(diag(hac1)))

4.4

References

Andrews, D.W.K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica, 59, 817-858. Eicker, F. (1967). Limit theorems for regression with unequal and dependent errors. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (L. LeCam and J. Neyman, eds.), University of California Press, Berkeley. Newey, W.K. and K.D. West (1987). A simple, positive-denite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55, 703-708. Newey, W.K. and K.D. West (1994). Automatic lag selection in covariance matrix estimation. Review of Economic Studies, 61, 631-653. White, H. (1980). A Heteroskedasticity consistent covariance matrix and a direct test for heteroskedasticity. Econometrica, 48, 817-838. Zeileis, A. (2004). Econometric computing with HC and HAC covariance matrix estimators. Journal of Statistical Software, Volume 11, Issue 10. Zeileis, A. (2006). Object-oriented computation of sandwich estimators. Journal of Statistical Software, 16, 1-16.

Chapter 5 Seasonal Time Series Models

5.1 Characteristics of Seasonality

When time series (particularly, economic and nancial time series) are observed each day or month or quarter, it is often the case that such as a series displays a seasonal pattern (deterministic cyclical behavior). Similar to the feature of trend, there is no precise denition of seasonality. Usually we refer to seasonality when observations in certain seasons display strikingly dierent features to other seasons. For example, when the retail sales are always large in the fourth quarter (because of the Christmas spending) and small in the rst quarter as can be observed from Figure 5.1. It may also be possible that seasonality is reected in the variance of a time series. For example, for daily observed stock market returns the volatility seems often highest on Mondays, basically because investors have to digest three days of news instead of only day. For mode details, see the book by Taylor (2005, 4.5). Example 5.1: For Example 3.1, the data shown in Figure 3.1 represent quarterly earnings per share for the American Company Johnson & Johnson from the from the fourth quarter of 1970 to the rst quarter of 1980. It is easy to note some very nonstationary behavior in this series that cannot be eliminated completely by dierencing or detrending because of the larger uctuations that occur near the end of the record when the earnings are higher. The right panel of Figure 3.1 shows the log-transformed series and we note that the latter peaks have been attenuated so that the variance of the transformed series seems more stable. One would have to eliminate the trend still remaining in the above series to obtain stationarity. For more details on the current analyses of this series, see the later analyses and the papers by Burman and Shumway (1998) and Cai and Chen (2006).

CHAPTER 5. SEASONAL TIME SERIES MODELS

Example 5.2: In this example we consider the monthly US retail sales series (not seasonally adjusted) from January of 1967 to December of 2000 (in billions of US dollars). The data can be downloaded from the web site at [Link] The U.S. retail sales index

50000 100000
0

200000

300000

100

200

300

400

Figure 5.1: US Retail Sales Data from 1967-2000. is one of the most important indicators of the US economy. There are vast studies of the seasonal series (like this series) in the literature; see, e.g., Franses (1996, 1998) and Ghysels and Osborn (2001) and Cai and Chen (2006). From Figure 5.1, we can observe that the peaks occur in December and we can say that retail sales display seasonality. Also, it can be observed that the trend is basically increasing but nonlinearly. The same phenomenon can be observed from Figure 3.1 for the quarterly earnings for Johnson & Johnson. If simple graphs are not informative enough to highlight possible seasonal variation, a formal regression model can be used, for example, one might try to consider the following regression model with seasonal dummy variables
s

yt = yt yt1 =

j Dj,t + t ,
j=1

where Dj,t is a seasonal dummy variable and s is the number of seasons. Of course, one discussed later. can use a seasonal ARIMA model, denoted by ARIMA(p, d, q) (P, D, Q)s , which will be

Example 5.3: In this example, we consider a time series with pronounced seasonality displayed in Figure 5.2, where logs of four-weekly advertising expenditures on ratio and television in The Netherlands for 1978.01 1994.13. For these two marketing time series one

CHAPTER 5. SEASONAL TIME SERIES MODELS

11
television

radio

8
0

100

150

200

Figure 5.2: Four-weekly advertising expenditures on radio and television in The Netherlands, 1978.01 1994.13. can observe clearly that the television advertising displays quite some seasonal uctuation throughout the entire sample and the radio advertising has seasonality only for the last ve years. Also, there seems to be a structural break in the radio series around observation 53. This break is related to an increase in radio broadcasting minutes in January 1982. Furthermore, there is a visual evidence that the trend changes over time. Generally, it appears that many time series seasonally observed from business and economics as well as other applied elds display seasonality in the sense that the observations in certain seasons have properties that dier from those data points in other seasons. A second feature of many seasonal time series is that the seasonality changes over time, like what studied by Cai and Chen (2006). Sometimes, these changes appear abrupt, as is the case for advertising on the radio in Figure 5.2, and sometimes such changes occur only slowly. To capture these phenomena, Cai and Chen (2006) proposed a more general exible seasonal eect model having the following form: yij = (ti ) + j (ti ) + eij , i = 1, . . . , n, j = 1, . . . , s,

where yij = y(i1)s+j , ti = i/n, () is a (smooth) common trend function in [0, 1], {j ()}

are (smooth) seasonal eect functions in [0, 1], either xed or random, subject to a set of

constraints, and the error term eij is assumed to be stationary. For more details, see Cai and Chen (2006).

CHAPTER 5. SEASONAL TIME SERIES MODELS

5.2

Modeling

Some economic and nancial as well as environmental time series such as quarterly earning per share of a company exhibits certain cyclical or periodic behavior; see the later chapters on more discussions on cycles and periodicity. Such a time series is called a seasonal (deterministic cycle) time series. Figure 3.1 shows the time plot of quarterly earning per share of Johnson and Johnson from the rst quarter of 1960 to the last quarter of 1980. The data possess some special characteristics. In particular, the earning grew exponentially during the sample period and had a strong seasonality. Furthermore, the variability of earning increased over time. The cyclical pattern repeats itself every year so that the periodicity of the series is 4. If monthly data are considered (e.g., monthly sales of Wal-Mart Stores), then the periodicity is 12. Seasonal time series models are also useful in pricing weather-related derivatives and energy futures. Analysis of seasonal time series has a long history. In some applications, seasonality is of secondary importance and is removed from the data, resulting in a seasonally adjusted time series that is then used to make inference. The procedure to remove seasonality from a time series is referred to as seasonal adjustment. Most economic data published by the U.S. government are seasonally adjusted (e.g., the growth rate of domestic gross product and the unemployment rate). In other applications such as forecasting, seasonality is as important as other characteristics of the data and must be handled accordingly. Because forecasting is a major objective of economic and nancial time series analysis, we focus on the latter approach and discuss some econometric models that are useful in modeling seasonal time series. When the autoregressive, dierencing, or seasonal moving average behavior seems to occur at multiples of some underlying period s, a seasonal ARIMA series may result. The seasonal nonstationarity is characterized by slow decay at multiples of s and can often be example, when we have monthly data, it is reasonable that a yearly phenomenon will induce we can obtain a stationary series by transforming with the operator (1 L12 ) xt = xt xt12 s = 12 and the ACF will be characterized by slowly decaying spikes at 12, 24, 36, 48, , and which is the dierence between the current month and the value one year or 12 months ago. If the autoregressive or moving average behavior is seasonal at period s, we dene formally eliminated by a seasonal dierencing operator of the form D xt = (1 Ls )D xt . For s

CHAPTER 5. SEASONAL TIME SERIES MODELS the operators (Ls ) = 1 1 Ls 2 L2s P LP s and (Ls ) = 1 1 Ls 2 L2s Q LQs . The nal form of the seasonal ARIMA(p, d, q) (P, D, Q)s model is (Ls ) (L)D d xt = (Ls ) (L) wt . s Note that one special model of (5.3) is ARIMA(0, 1, 1) (0, 1, 1)s , that is (1 Ls )(1 L) xt = (1 1 L)(1 1 Ls ) wt .

(5.1)

(5.2)

(5.3)

This model is referred to as the airline model or multiplicative seasonal model in the literature; see Box and Jenkins (1970), Box, Jenkins, and Reinsel (1994, Chapter 9), and Brockwell and Davis (1991). It has been found to be widely applicable in modeling seasonal time series. The AR part of the model simply consists of the regular and seasonal dierences, whereas the MA part involves two parameters. We may also note the properties below corresponding to Properties 5.1 - 5.3. Property 5.1: The ACF of a seasonally non-stationary time series decays very slowly at lag multiples s, 2s, 3s, , with zeros in between, where s denotes a seasonal period, usually 4 for quarterly data or 12 for monthly data. The PACF of a non-stationary time series tends to have a peak very near unity at lag s. Property 5.2: For a seasonal autoregressive series of order P , the partial autocorrelation function hh as a function of lag h has nonzero values at s, 2s, 3s, , P s, with zeros in should be some exponential decay. Property 5.3: For a seasonal moving average series of order Q, note that the autocorrelation function (ACF) has nonzero values at s, 2s, 3s, , Qs and is zero for h > Qs. Remark: Note that there is a build-in command in R called arima() which is a powerful tool for estimating and making inference for an ARIMA model. The command is between, and is zero for h > P s, the order of the seasonal autoregressive process. There

CHAPTER 5. SEASONAL TIME SERIES MODELS arima(x,order=c(0,0,0),seasonal=list(order=c(0,0,0),period=NA), xreg=NULL,[Link]=TRUE, [Link]=TRUE,fixed=NULL,init=NULL, method=c("CSS-ML","ML","CSS"),[Link],[Link]=list(),kappa=1e6) See the manuals of R for details about this commend.

Example 5.4: We illustrate by tting the monthly birth series from 1948-1979 shown in Figure 5.3. The period encompasses the boom that followed the Second World War and there
400

Births

350

300

250

100

200

300

First difference

100

200

300

ARIMA(0,1,0)X(0,1,0)_{12}

ARIMA(0,1,1)X(0,1,1)_{12}

100

200

300

40
0

100

200

300

Figure 5.3: Number of live births 1948(1) 1979(1) and residuals from models with a rst dierence, a rst dierence and a seasonal dierence of order 12 and a tted ARIMA(0, 1, 1) (0, 1, 1)12 model. is the expected rise which persists for about 13 years followed by a decline to around 1974. The series appears to have long-term swings, with seasonal eects superimposed. The longterm swings indicate possible non-stationarity and we verify that this is the case by checking the ACF and PACF shown in the top panel of Figure 5.4. Note that by Property 5.1, slow decay of the ACF indicates non-stationarity and we respond by taking a rst dierence. The results shown in the second panel of Figure 2.5 indicate that the rst dierence has eliminated the strong low frequency swing. The ACF, shown in the second panel from the top in Figure 5.4 shows peaks at 12, 24, 36, 48, , with now decay. This behavior implies

CHAPTER 5. SEASONAL TIME SERIES MODELS

ACF
0.5 0.5

PACF data

0.5

ARIMA(0,1,0)

0.5

ARIMA(0,1,0)X(0,1,0)_{12}

0.5

ARIMA(0,1,0)X(0,1,1)_{12}

0.5

ARIMA(0,1,1)X(0,1,1)_{12}

Figure 5.4: Autocorrelation functions and partial autocorrelation functions for the birth series (top two panels), the rst dierence (second two panels) an ARIMA(0, 1, 0)(0, 1, 1)12 model (third two panels) and an ARIMA(0, 1, 1) (0, 1, 1)12 model (last two panels).

CHAPTER 5. SEASONAL TIME SERIES MODELS

seasonal non-stationarity, by Property 5.1 above, with s = 12. A seasonal dierence of the rst dierence generates an ACF and PACF in Figure 5.4 that we expect for stationary series. Taking the seasonal dierence of the rst dierence gives a series that looks stationary and has an ACF with peaks at 1 and 12 and a PACF with a substantial peak at 12 and lesser peaks at 12, 24, . This suggests trying either a rst order moving average term, or a rst order seasonal moving average term with s = 12, by Property 5.3 above. We model with s = 12. The ACF and PACF of the residual series from this model, i.e. from fourth panel from the top in Figure 5.4. We note that the peak at lag one is still there, with moving average term and we consider the model ARIMA(0, 1, 1) (0, 1, 1)12 , written as (1 L)(1 L12 ) xt = (1 1 L)(1 1 L12 ) wt . The ACF of the residuals from this model are relatively well behaved with a number of peaks (0, 1, 1)12 model leads to the model either near or exceeding the 95% test of no correlation. Fitting this nal ARIMA(0, 1, 1) (1 L)(1 L12 ) xt = (1 0.4896 L)(1 0.6844 L12 ) wt with AICC= 4.95, R2 = 0.98042 = 0.961, and the p-values are (0.000, 0.000). The ARIMA search leads to the model (1 L)(1 L12 ) xt = (1 0.4088 L 0.1645 L2 )(1 0.6990 L12 ) wt , yielding AICC= 4.92 and R2 = 0.9812 = 0.962, slightly better than the ARIMA(0, 1, 1) (0, 1, 1)12 model. Evaluating these latter models leads to the conclusion that the extra ARIMA(0, 1, 0) (0, 1, 1)12 , written as (1 L)(1 L12 ) xt = (1 1 L12 ) wt , is shown in the choose to eliminate the largest peak rst by applying a rst-order seasonal moving average

attending exponential decay in the PACF. This can be eliminated by tting a rst-order

parameters do not add a practically substantial amount to the predictability. The model is expanded as xt = xt1 + xt12 xt13 + wt 1 wt1 1 wt12 + 1 1 wt13 . The forecast is xt = xt + xt11 xt12 1 wt 1 wt11 + 1 1 wt12 t+1

CHAPTER 5. SEASONAL TIME SERIES MODELS xt = xt + xt10 xt11 1 wt10 + 1 1 wt11 . t+2 t+1 Continuing in the same manner, we obtain
t xt t+12 = xt+11 + xt xt1 1 wt + 1 1 wt1

for the 12 month forecast. Example 5.5: Figure 5.5 shows the autocorrelation function of the log-transformed J&J earnings series that is plotted in Figure 3.1 and we note the slow decay indicating the nonstationarity which has already been obvious in the Chapter 3 discussion. We may also compare the ACF with that of a random walk, and note the close similarity. The partial autocorrelation function is very high at lag one which, under ordinary circumstances, would indicate a rst order autoregressive AR(1) model, except that, in this case, the value is close to unity, indicating a root close to 1 on the unit circle. The only question would be whether dierencing or detrending is the better transformation to stationarity. Following in the Box-Jenkins tradition, dierencing leads to the ACF and PACF shown in the second panel and no simple structure is apparent. To force a next step, we interpret the peaks at 4, 8, 12, 16, , as contributing to a possible seasonal autoregressive term, leading to a possible ARIMA(0, 1, 0) (1, 0, 0)4 and we simply t this model and look at the ACF and PACF of

the residuals, shown in the third two panels. The t improves somewhat, with signicant peaks still remaining at lag 1 in both the ACF and PACF. The peak in the ACF seems more isolated and there remains some exponentially decaying behavior in the PACF, so we try a model with a rst-order moving average. The bottom two panels show the ACF and PACF

of the resulting ARIMA(0, 1, 1)(1, 0, 0)4 and we note only relatively minor excursions above and below the 95% intervals under the assumption that the theoretical ACF is white noise. The nal model suggested is (yt = log xt ) (1 1 L4 )(1 L) yt = (1 1 L) wt , forecast form as yt = yt1 + 1 (yt4 yt5 ) + wt 1 wt1 . The residual plot of the above is plotted in the left bottom panel of Figure 5.6. To forecast the original series for, say 4 quarters, we compute the forecast limits for yt = log xt and then
t exponentiate, i.e. xt = exp(yt+h ). t+h

(5.4)

2 where 1 = 0.820(0.058), 1 = 0.508(0.098), and w = 0.0086. The model can be written in

CHAPTER 5. SEASONAL TIME SERIES MODELS

ACF
1.0

68
PACF

log(J&J)

0.5

0.0

0.5

0.0

0.5

1.0

First Difference
0.5 0.5

0.0

0.5

0.0

1.0

ARIMA(0,1,0)X(1,0,0,)_4

0.5

0.0

0.5

0.0

0.5

1.0

ARIMA(0,1,1)X(1,0,0,)_4

0.5

0.0

0.5

0.0

0.5

1.0

Figure 5.5: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the log J&J earnings series (top two panels), the rst dierence (second two panels), ARIMA(0, 1, 0) (1, 0, 0)4 model (third two panels), and ARIMA(0, 1, 1) (1, 0, 0)4 model (last two panels). Based on the the exact likelihood method, Tsay (2005) considered the following seasonal ARIMA(0, 1, 1) (0, 1, 1)4 model (1 L)(1 L4 ) yt = (1 0.678 L)(1 0.314 L4 ) wt , (5.5)

2 with w = 0.089, where standard errors of the two MA parameters are 0.080 and 0.101,

respectively. The Ljung-Box statistics of the residuals show Q(12) = 10.0 with p-value 0.44. The model appears to be adequate. The ACF and PACF of the ARIMA(0, 1, 1) (0, 1, 1)4

CHAPTER 5. SEASONAL TIME SERIES MODELS

ACF PACF

1.0

ARIMA(0,1,1)X(0,1,1,)_4

0.5

0.0

0.5

0.0
0

0.5

1.0

0.2

Residual Plot
ARIMA(0,1,1)X(1,0,0,)_4

Residual Plot
ARIMA(0,1,1)X(0,1,1,)_4

0.0

0.1

0.2

0.1
0

0.0

0.1

Figure 5.6: ACF and PACF for ARIMA(0, 1, 1) (0, 1, 1)4 model (top two panels) and the residual plots of ARIMA(0, 1, 1)(1, 0, 0)4 (left bottom panel) and ARIMA(0, 1, 1)(0, 1, 1)4 model (right bottom panel). model are given in the top two panels of Figure 5.6 and the residual plot is displayed in the right bottom panel of Figure 5.6. Based on the comparison of ACF and PACF of two model (5.4) and (5.5) [the last two panels of Figure 5.5 and the top two panels in Figure 5.6], it seems that ARIMA(0, 1, 1) (0, 1, 1)4 model in (5.5) might perform better than ARIMA(0, 1, 1) (1, 0, 0)4 model in (5.4).

To illustrate the forecasting performance of the seasonal model in (5.5), we re-estimate the model using the rst 76 observations and reserve the last eight data points for forecasting evaluation. We compute 1-step to 8-step ahead forecasts and their standard errors of the tted model at the forecast origin t = 76. An anti-log transformation is taken to obtain forecasts of earning per share using the relationship between normal and log-normal distributions. Figure 2.15 in Tsay (2005, p.77) shows the forecast performance of the model, where the observed data are in solid line, point forecasts are shown by dots, and the dashed lines show 95% interval forecasts. The forecasts show a strong seasonal pattern and are close to the observed data. For more comparisons for forecasts using dierent models including semiparametric and nonparametric models, the reader is referred to the book by Shumway (1988), and Shumway and Stoer (2000) and the papers by Burman and Shummay (1998)

CHAPTER 5. SEASONAL TIME SERIES MODELS and Cai and Chen (2006).

When the seasonal pattern of a time series is stable over time (e.g., close to a deterministic function), dummy variables may be used to handle the seasonality. This approach is taken by some analysts. However, deterministic seasonality is a special case of the multiplicative seasonal model discussed before. Specically, if 1 = 1, then model contains a deterministic seasonal component. Consequently, the same forecasts are obtained by using either dummy variables or a multiplicative seasonal model when the seasonal pattern is deterministic. Yet use of dummy variables can lead to inferior forecasts if the seasonal pattern is not deterministic. In practice, we recommend that the exact likelihood method should be used to estimate a multiplicative seasonal model, especially when the sample size is small or when there is the possibility of having a deterministic seasonal component. Example 5.6: To determine deterministic behavior, consider the monthly simple return of the CRSP Decile 1 index from January 1960 to December 2003 for 528 observations. The series is shown in the left top panel of Figure 5.7 and the time series does not show any clear pattern of seasonality. However, the sample ACf of the return series shown in the left
Simple Returns Januaryadjusted returns

0.0

0.2

100

200

300

400

500

0.3

0.1
0

0.1 0.2 0.3 0.4

0.2

0.4

100

200

300

400

500

ACF

1.0

0.5

0.0

0.5

0.0
0

0.5

1.0

Figure 5.7: Monthly simple return of CRSP Decile 1 index from January 1960 to December 2003: Time series plot of the simple return (left top panel), time series plot of the simple return after adjusting for January eect (right top panel), the ACF of the simple return (left bottom panel), and the ACF of the adjusted simple return.

CHAPTER 5. SEASONAL TIME SERIES MODELS

bottom panel of Figure 5.7 contains signicant lags at 12, 24, and 36 as well as lag 1. If seasonal AIMA models are entertained, a model in form (1 1 L)(1 1 L12 ) xt = + (1 1 L12 ) wt is identied, where xt is the monthly simple return. Using the conditional likelihood, the tted model is (1 0.25 L)(1 0.99 L12 ) xt = 0.0004 + (1 0.92 L12 ) wt with w = 0.071. The MA coecient is close to unity, indicating that the tted model is close to being non-invertible. If the exact likelihood method is used, we have (1 0.264 L)(1 0.996 L12 ) xt = 0.0002 + (1 0.999 L12 ) wt with w = 0.067. Cancellation between seasonal AR and MA factors is clearly. This highlights the usefulness of using the exact likelihood method, and the estimation result suggests that the seasonal behavior might be deterministic. To further conrm this assertion, we dene the dummy variable for January, that is Jt = and employ the simple linear regression xt = 0 + 1 Jt + et . The right panels of Figure 5.7 show the time series plot of and the ACF of the residual series of the prior simple linear regression. From the ACF, there are no signicant serial correlation at any multiples of 12, suggesting that the seasonal pattern has been successfully removed by the January dummy variable. Consequently, the seasonal behavior in the monthly simple return of Decile 1 is due to the January eect. 1 if t is January, 0 otherwise,

5.3

Nonlinear Seasonal Time Series Models

See the papers by Burman and Shumway (1998) and Cai and Chen (2006) and the books by Franses (1998) and Ghysels and Osborn (2001). The reading materials are the papers by Burman and Shumway (1998) and Cai and Chen (2006).

CHAPTER 5. SEASONAL TIME SERIES MODELS

5.4

Computer Codes

########################################## # This is Example 5.2 for retail sales data ######################################### y=[Link]("c:/res-teach/xiamen12-06/data/[Link]",header=F) postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) [Link](y,type="l",lty=1,ylab="",xlab="") [Link]() ############################################ # This is Example 5.3 for the marketing data ############################################ text_tv=c("television") text_radio=c("radio") data<-[Link]("c:/res-teach/xiamen12-06/data/[Link]",header=T) TV=log(data[,1]) RADIO=log(data[,2]) postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) [Link](cbind(TV,RADIO),type="l",lty=c(1,2),col=c(1,2),ylab="",xlab="") text(20,10.5,text_tv) text(165,8,text_radio) [Link]() #################################################################### # This is Example 5.4 in Chapter 2 #################################### x<-matrix(scan("c:/res-teach/xiamen12-06/data/[Link]"),byrow=T,ncol=1) n=length(x) x_diff=diff(x) x_diff_12=diff(x_diff,lag=12)

CHAPTER 5. SEASONAL TIME SERIES MODELS

fit1=arima(x,order=c(0,0,0),seasonal=list(order=c(0,0,0)),[Link]=F) resid_1=fit1$resid fit2=arima(x,order=c(0,1,0),seasonal=list(order=c(0,0,0)),[Link]=F) resid_2=fit2$resid fit3=arima(x,order=c(0,1,0),seasonal=list(order=c(0,1,0),period=12), [Link]=F) resid_3=fit3$resid postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) par(mfrow=c(5,2),mex=0.4,bg="light pink") acf(resid_1, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="ACF",cex=0.7) pacf(resid_1,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="PACF",cex=0.7) text(20,0.7,"data",cex=1.2) acf(resid_2, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="") # differenced data pacf(resid_2,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="") text(30,0.7,"ARIMA(0,1,0)") acf(resid_3, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="") # seasonal difference of differenced data pacf(resid_3,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="") text(30,0.7,"ARIMA(0,1,0)X(0,1,0)_{12}",cex=0.8) fit4=arima(x,order=c(0,1,0),seasonal=list(order=c(0,1,1), period=12),[Link]=F) resid_4=fit4$resid fit5=arima(x,order=c(0,1,1),seasonal=list(order=c(0,1,1), period=12),[Link]=F) resid_5=fit5$resid acf(resid_4, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="")

CHAPTER 5. SEASONAL TIME SERIES MODELS # ARIMA(0,1,0)*(0,1,1)_12 pacf(resid_4,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="") text(30,0.7,"ARIMA(0,1,0)X(0,1,1)_{12}",cex=0.8) acf(resid_5, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="") # ARIMA(0,1,1)*(0,1,1)_12 pacf(resid_5,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="") text(30,0.7,"ARIMA(0,1,1)X(0,1,1)_{12}",cex=0.8) [Link]() postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4,bg="light blue") [Link](x,type="l",lty=1,ylab="",xlab="") text(250,375, "Births") [Link](x_diff,type="l",lty=1,ylab="",xlab="",ylim=c(-50,50)) text(255,45, "First difference") abline(0,0) [Link](x_diff_12,type="l",lty=1,ylab="",xlab="",ylim=c(-50,50))

# time series plot of the seasonal difference (s=12) of differenced data text(225,40,"ARIMA(0,1,0)X(0,1,0)_{12}") abline(0,0) [Link](resid_5,type="l",lty=1,ylab="",xlab="",ylim=c(-50,50)) text(225,40, "ARIMA(0,1,1)X(0,1,1)_{12}") abline(0,0) [Link]() ######################## # This is Example 5.5 ######################## y=[Link](c:/res-teach/xiamen12-06/data/[Link],header=F) n=length(y[,1])

CHAPTER 5. SEASONAL TIME SERIES MODELS y_log=log(y[,1]) y_diff=diff(y_log) y_diff_4=diff(y_diff,lag=4) fit1=ar(y_log,order=1) #print(fit1) library(tseries) library(zoo) fit1_test=[Link](y_log) # do Augmented Dicky-Fuller test for tesing unit root #print(fit1_test) fit1=arima(y_log,order=c(0,0,0),seasonal=list(order=c(0,0,0)), [Link]=F) resid_21=fit1$resid fit2=arima(y_log,order=c(0,1,0),seasonal=list(order=c(0,0,0)), [Link]=F) resid_22=fit2$resid # residual for ARIMA(0,1,0)*(0,0,0) fit3=arima(y_log,order=c(0,1,0),seasonal=list(order=c(1,0,0),period=4), [Link]=F,method=c("CSS")) resid_23=fit3$resid # residual for ARIMA(0,1,0)*(1,0,0)_4 # note that this model is non-stationary so that "CSS" is used postscript(file="c:\\res-teach\\xiamen12-06\\figs\\[Link]", horizontal=F,width=6,height=6) par(mfrow=c(4,2),mex=0.4,bg="light green") acf(resid_21, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="ACF",cex=0.7) text(16,0.8,"log(J&J)") pacf(resid_21,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="PACF",cex=0.7) acf(resid_22, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="") text(16,0.8,"First Difference") pacf(resid_22,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="") # call library(tseries) # log of data # first-order difference # first-order seasonal difference # fit AR(1) model

CHAPTER 5. SEASONAL TIME SERIES MODELS acf(resid_23, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="") text(16,0.8,"ARIMA(0,1,0)X(1,0,0,)_4",cex=0.8) pacf(resid_23,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="") fit4=arima(y_log,order=c(0,1,1),seasonal=list(order=c(1,0,0), period=4),[Link]=F,method=c("CSS")) resid_24=fit4$resid #print(fit4) fit4_test=[Link](resid_24,lag=12, type=c("Ljung-Box")) #print(fit4_test) acf(resid_24, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="") text(16,0.8,"ARIMA(0,1,1)X(1,0,0,)_4",cex=0.8) # ARIMA(0,1,1)*(1,0,0)_4 pacf(resid_24,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="") [Link]() fit5=arima(y_log,order=c(0,1,1),seasonal=list(order=c(0,1,1),period=4), [Link]=F,method=c("ML")) resid_25=fit5$resid #print(fit5) fit5_test=[Link](resid_25,lag=12, type=c("Ljung-Box")) #print(fit5_test) postscript(file="c:\\res-teach\\xiamen12-06\\figs\\[Link]", horizontal=F,width=6,height=6,bg="light grey") par(mfrow=c(2,2),mex=0.4) acf(resid_25, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="ACF") text(16,0.8,"ARIMA(0,1,1)X(0,1,1,)_4",cex=0.8) # ARIMA(0,1,1)*(0,1,1)_4 pacf(resid_25,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="PACF") [Link](resid_24,type="l",lty=1,ylab="",xlab="") title(main="Residual Plot",cex=0.5) # residual for ARIMA(0,1,1)*(0,1,1)_4 # residual for ARIMA(0,1,1)*(1,0,0)_4 # note that this model is non-stationary

CHAPTER 5. SEASONAL TIME SERIES MODELS text(40,0.2,"ARIMA(0,1,1)X(1,0,0,)_4",cex=0.8) abline(0,0) [Link](resid_25,type="l",lty=1,ylab="",xlab="") title(main="Residual Plot",cex=0.5) text(40,0.18,"ARIMA(0,1,1)X(0,1,1,)_4",cex=0.8) abline(0,0) [Link]() ######################### # This is Example 5.6 ######################## z<-matrix(scan("c:/res-teach/xiamen12-06/data/[Link]"),byrow=T,ncol=4) decile1=z[,2] # Model 1: an ARIMA(1,0,0)*(1,0,1)_12 fit1=arima(decile1,order=c(1,0,0),seasonal=list(order=c(1,0,1), period=12),[Link]=T) #print(fit1) e1=fit1$resid n=length(decile1) m=n/12 jan=rep(c(1,0,0,0,0,0,0,0,0,0,0,0),m) feb=rep(c(0,1,0,0,0,0,0,0,0,0,0,0),m) mar=rep(c(0,0,1,0,0,0,0,0,0,0,0,0),m) apr=rep(c(0,0,0,1,0,0,0,0,0,0,0,0),m) may=rep(c(0,0,0,0,1,0,0,0,0,0,0,0),m) jun=rep(c(0,0,0,0,0,1,0,0,0,0,0,0),m) jul=rep(c(0,0,0,0,0,0,1,0,0,0,0,0),m) aug=rep(c(0,0,0,0,0,0,0,1,0,0,0,0),m) sep=rep(c(0,0,0,0,0,0,0,0,1,0,0,0),m) oct=rep(c(0,0,0,0,0,0,0,0,0,1,0,0),m)

CHAPTER 5. SEASONAL TIME SERIES MODELS nov=rep(c(0,0,0,0,0,0,0,0,0,0,1,0),m) dec=rep(c(0,0,0,0,0,0,0,0,0,0,0,1),m) de=cbind(decile1[jan==1],decile1[feb==1],decile1[mar==1],decile1[apr==1], decile1[may==1],decile1[jun==1],decile1[jul==1],decile1[aug==1], decile1[sep==1],decile1[oct==1],decile1[nov==1],decile1[dec==1]) # Model 2: a simple regression model without correlated errors # to see the effect from January fit2=lm(decile1~jan) e2=fit2$resid #print(summary(fit2)) # Model 3: a regression model with correlated errors fit3=arima(decile1,xreg=jan,order=c(0,0,1),[Link]=T) e3=fit3$resid #print(fit3) postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4,bg="light yellow") [Link](decile1,type="l",lty=1,col=1,ylab="",xlab="") title(main="Simple Returns",cex=0.5) abline(0,0) [Link](e3,type="l",lty=1,col=1,ylab="",xlab="") title(main="January-adjusted returns",cex=0.5) abline(0,0) acf(decile1, ylab="", xlab="",ylim=c(-0.5,1),lag=40,main="ACF") acf(e3,ylab="",xlab="",ylim=c(-0.5,1),lag=40,main="ACF") [Link]()

5.5

References

Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis, Forecasting, and Control. Holden Day, San Francisco.

CHAPTER 5. SEASONAL TIME SERIES MODELS

Box, G.E.P., G.M. Jenkins and G.C. Reinsel (1994). Time Series Analysis, Forecasting and Control. 3th Edn. Englewood Clis, NJ: Prentice-Hall. Brockwell, P.J. and Davis, R.A. (1991). Time Series Theory and Methods. New York: Springer. Burman, P. and R.H. Shumway (1998). Semiparametric modeling of seasonal time series. Journal of Time Series Analysis, 19, 127-145. Cai, Z. and R. Chen (2006). Flexible seasonal time series models. Advances in Econometrics, 20B, 63-87. Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. New York: Cambridge University Press. Franses, P.H. and D. van Dijk (2000). Nonlinear Time Series Models for Empirical Finance. New York: Cambridge University Press. Ghysels, E. and D.R. Osborn (2001). The Econometric Analysis of Seasonal Time Series. New York: Cambridge University Press. Shumway, R.H. (1988). Applied Statistical Time Series Analysis. Englewood Clis, NJ: Prentice-Hall. Shumway, R.H., A.S. Azari and Y. Pawitan (1988). Modeling mortality uctuations in Los Angeles as functions of pollution and weather eects. Environmental Research, 45, 224-241. Shumway, R.H. and D.S. Stoer (2000). Time Series Analysis & Its Applications. New York: Springer-Verlag. Tiao, G.C. and R.S. Tsay (1983). Consistency properties of least squares estimates of autoregressive parameters in ARMA models. Annals of Statistics, 11, 856-871. Tsay, R.S. (2005). Analysis of Financial Time Series, 2th Edition. John Wiley & Sons, New York.

Chapter 6 Robust and Quantile Regressions

There are a lot of materials to cover both robust regression and linear quantile regression. However here we give only some basic concepts and brief methodology. For more detailed discussions, see the books by Huber (1981) and Koenker (2005).

6.1

Robust Regression

Robustness means that good statistical procedures should work well even if the underlying assumptions are somewhat in error. For a classical regression model Yi = T Xi + i , the general form of a robust estimation is the estimator to minimizing the function
n

i=1

(Yi T Xi ),

where () is a function (loss function) to be specied. If (z) = z 2 (quadratic), then it is estimator (median regression). Another important choice is the Huber (1981)s function c (z) = where c is a xed constant, or c,0 (z) = z 2 , if |z| c, c2 , if |z| > c, z2, if |z| c, 2 c|z| c /2, if |z| > c, the usual least squares criterion. If (z) = |z|, it gives the least absolute deviation (LAD)

and the corresponding estimator is called M -estimator. There are a lot of choices of (); see the book by Sering (1980) for details. This choice for c () results in down-weighting 80

CHAPTER 6. ROBUST AND QUANTILE REGRESSIONS

cases with large residuals. For more details on robust regression, see the book by Huber (1981). To implement the robust regression in R, you can use the command rlm() or lqs() (M-estimator) in the package MASS [library(MASS)] . rlm(formula, data, weights, ..., subset, [Link], method = c("M", "MM", "[Link]"), [Link] = c("[Link]", "case"), model = TRUE, [Link] = TRUE, [Link] = FALSE, contrasts = NULL) lqs(formula, data, ..., method = c("lts", "lqs", "lms", "S", "[Link]"), subset, [Link], model = TRUE, [Link] = FALSE, [Link] = FALSE, contrasts = NULL)

6.2

Quantile Regression

The above LAD estimator is indeed a special case ( = 1/2) of the following quantile regression
n

i=1

(Yi T Xi ),

where the loss function (z) = z( I{z<0} ) is also called the loss (check) function, and IA is the indicator function of any set A. All the aforementioned loss functions are plotted above problem can be formulated as a general quantile regression, for any 0 < < 1, Fy (q (Xi ) | Xi ) = ,
T where Fy (y | Xi ) is the conditional distribution of Yi given Xi , and q (Xi ) = Xi which

in Figure 6.1 using R (the R code for this gure can be found in Section 7.3). Indeed, the

allows the coecients to depend on . This linear quantile regression model was proposed by Koenker and Bassett (1978, 1982). For details, see the papers by Koenker and Bassett (1978, 1982). There are several advantages of using a quantile regression: A quantile regression does not require knowing the distribution of the dependent variable.

CHAPTER 6. ROBUST AND QUANTILE REGRESSIONS

Loss Functions
4

quadradic Huber Huber0 tau=0.05 LAD tau=0.95

c 2 1 0 1

c 2

Figure 6.1: Dierent loss functions for Quadratic, Hubers c (), c,0 (), 0.05 (), LAD and 0.95 (), where c = 1.345. It does not require the symmetry of the measurement error. It can characterize the heterogeneity. It can estimate the mean and variance simultaneously. There are a lot more. It is a robust procedure. Clearly, a quantile regression model includes the LAD and the classical mean regression models as a special case. To see this, for a classical regression model Yi = T Xi + i , its -th quantile is q (Xi ) = T Xi + q,i (), where q,i () is the -th quantile of i . For example, if i = (Xi ) ui with iid {ui }, then, q (Xi ) = T Xi + (Xi ) q,u .

q,i () = (Xi ) q,u , where q,u is the -th quantile of ui , and

CHAPTER 6. ROBUST AND QUANTILE REGRESSIONS

This property can be used as an informative way (graphically) to detect whether (Xi ) is constant or not by plotting two quantile functions q1 (Xi ) and q2 (Xi ) for two distinct 1 = 2 . If two quantile curves (lines) are parallel, this means that (Xi ) is a constant. Further, if (Xi ) is a linear function of Xi , then the quantile regression model can be used to model the conditional heteroscedasticity such as the autoregressive conditional heteroscedastic (ARCH) and generalized autoregressive conditional heteroscedastic (GARCH) as well as stochastic volatility (SV) models. For details on this aspect, see the papers by Koenker and Zhao (1996) and Xiao (2006). Of course, one can model some nonlinear type models such as ARCH or GARCH or SV models. For more details, see the book by Koenker (2005). Finally, another application of conditional quantiles is the construction of prediction intervals for the next value given a small section of recent past values in a stationary time series (Granger, White, and Kamstra, 1989; Koenker, 1994; Zhou and Portnoy, 1996; Koenker and Zhao, 1996; Taylor and Bunn, 1999). Also, Granger, White, and Kamstra (1989), Koenker and Zhao (1996), and Taylor and Bunn (1999) considered an interval forecasting for parametric autoregressive conditional heteroscedastic type models. For example, the quantile regression can be used to construct a prediction interval as (q0.025 (Xt ), q0.975 (Xt )) such as P (q0.025 (Xt ) Yt+1 q0.975 (Xt ) | Xt ) = 0.95. To t a linear quantile regression using R, one can use the command rq() in the package quantreg. For a nonlinear parametric model, the command is nlrq(). For a nonparametric quantile model for univariate case, one can use the command lprq() for implementing the local polynomial estimation. For an additive quantile regression, one can use the commands rqss() and qss(). rq(formula, tau=.5, data, subset, weights, [Link], method="br", model = TRUE, contrasts, ...) lprq(x, y, h, tau = .5, m = 50)

6.3

Computer Codes

The following is the R code for making Figure 6.1. # define functions

CHAPTER 6. ROBUST AND QUANTILE REGRESSIONS rho1=function(x){x^2} rho2=function(x,c){x^2*(abs(x)<=c)+(c*abs(x)-c^2/2)*(abs(x)>c)} rho3=function(x,c){x^2*(abs(x)<=c)+c^2*(abs(x)>c)} rho_tau=function(x,tau){x*(tau-(x<0))} z=seq(-2,2,by=0.1) c=1.345 y=cbind(rho1(z),rho2(z,c),rho3(z,c),rho_tau(z,0.05),rho_tau(z,0.5), rho_tau(z,0.95)) text1=c("quadradic","Huber","Huber0","tau=0.05","LAD","tau=0.95") c1=c*rep(1,10) c2=seq(0,c,length=10) c3=c2^2 postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) # output the graph as a ps or eps file #[Link]() # show the graph on the screen par(bg="light green") matplot(z,y,type="l",lty=1:6,ylab="",xlab="",ylim=c(-0.2,4)) points(c1,c3,type="l") points(-c1,c3,type="l") title(main="Loss Functions",[Link]="red",cex=0.5) legend(0,4,text1,lty=1:6,col=1:6,cex=0.7) text(-c,-0.2,"-c") text(c,-0.2,"c") abline(0,0) [Link]()

6.4

References

Huber, P.J. (1981). Robust Statistics. New York: Wiley. Granger, C.W.J., White, H. and Kamstra, M. (1989). Interval forecasting: an analysis based upon ARCH-quantile estimators. Journal of Econometrics, 40, 87-96.

CHAPTER 6. ROBUST AND QUANTILE REGRESSIONS

Koenker, R. (1994). Condence intervals for regression quantiles. In Proceedings of the Fifth Prague Symposium on Asymptotic Statistics (P. Mandl and M. Huskova, eds.), 349-359. Physica, Heidelberg. Koenker, R. (2005). Quantile Regression. Cambridge University Press, New York. Koenker, R. and G.W. Bassett (1978). Regression quantiles. Econometrica, 46, 33-50. Koenker, R. and G.W. Bassett (1982). Robust tests for heteroscedasticity based on regression quantiles. Econometrica, 50, 43-61. Koenker, R. and Q. Zhao (1996). Conditional quantile estimation and inference for ARCH models. Econometric Theory, 12, 793-813. Sering, R.J. (1980). Approximation Theorems of Mathematical Statistics. John Wiley, New York. Taylor, J.W. and Bunn, D.W. (1999). A quantile regression approach to generating prediction intervals. Management Science, 45, 225-237. Xiao, Z. (2006). Conditional quantile estimation for GARCH models. Working Paper, Department of Economics, Boston College. Zhou, K.Q. and Portnoy, S.L. (1996). Direct use of regression quantiles to construct condence sets in linear models. The Annals of Statistics, 24, 287306.

Chapter 7 How to Analyze Boston House Price Data?

7.1 Description of Data

The well known Boston house price data set1 consists of 14 variables, collected on each of 506 dierent houses from a variety of locations. The Boston house-price data set was used originally by Harrison and Rubinfeld (1978) and it was re-analyzed in Belsley, Kuh and Welsch (1980) by various transformations in the table on pages 244-261. Variables are, denoted by X1 , , X13 and Y , in order: CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B
1

per capita crime rate by town proportion of residential land zoned for lots over 25,000 [Link]. proportion of non-retail business acres per town Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) nitric oxides concentration (parts per 10 million) average number of rooms per dwelling proportion of owner-occupied units built prior to 1940 weighted distances to five Boston employment centers index of accessibility to radial highways full-value property-tax rate per 10,000USD pupil-teacher ratio by town 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

This dataset can be downloaded from the web site at [Link]

CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA? LSTAT MEDV lower status of the population Median value of owner-occupied homes in $1000s

The dependent variable is Y , the median value of owner-occupied homes in $1, 000s (house price). The major factors possibly aecting the house prices used in the literature are: X13 =proportion of population of lower educational status, X6 =the average number of rooms per house, X1 =the per capita crime rate, X10 =the full property tax rate, and X11 =the pupil/teacher ratio. For the complete description of all 14 variables, see Harrison and Rubinfeld (1978) and Gilley and Pace (1996) for corrections.

7.2
7.2.1

Analysis Methods
Linear Models

Harrison and Rubinfeld (1978) was the rst to analyze this data set using a standard regression model Y versus all 13 variables including some higher order terms or transformations on Y and Xj s. The purpose of this study is to see whether there are the eects of pollution on housing prices via hedonic pricing methodology. Belsley, Kuh and Welsch (1980) used this data set to illustrate the eects of using robust regression and outlier detection strategies. From these results, we might conclude that the model might not be linear and there might exist outliers. Also, Pace and Gilley (1997) added a georeferencing idea (spatial statistics or econometrics) and used a spatial estimation method to consider this data set. Exercise: Please use all possible methods to explore this dataset to see what is the best linear model (including higher orders) you can obtain.

CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA?

7.2.2

Nonparametric Models

Mean Additive Models There have been several papers devoted to the analysis of this dataset using some nonparametric methods. For example, Breiman and Friedman (1985), Pace (1993), Chaudhuri, Doksum and Samarov (1997), and Opsomer and Ruppert (1998) used four covariates: X6 , X10 , X11 and X13 or their transformations (including the transformation on Y ) to t the data through a mean additive regression model such as log(Y ) = + g1 (X6 ) + g2 (X10 ) + g3 (X11 ) + g4 (X13 ) + , (7.1)

Chaudhuri, Doksum and Samarov (1997) also considered the nonparametric estimation of the rst derivative of each additive component which measures how much the response changes as one covariate is perturbed while the other covariates are held xed; see Chaudhuri, Doksum and Samarov (1997). Let us use model (7.1) to t the Boston house price data. The results are summarized in Figure 7.1 (the R code can be found in Section 7.3). To t an additive model or a partially additive model in R, the function is gam() in the package gam. For details, please look at the help command help(gam) after loading the package gam [library(gam)]. gam(formula, family = gaussian, data, weights, subset, [Link], start, etastart, mustart, control = [Link](...), model=FALSE, method, x=FALSE, y=TRUE, ...) Note that the function gam() allows to t a semi-parametric additive model as log(Y ) = + g1 (X6 ) + 2 X10 + 3 X11 + 4 X13 + , which can be done by specifying some components without smooth. Fan and Jiang (2005) used the generalized likelihood ratio (GLR) [see Cai, Fan and Li (2000), Cai, Fan and Yao (2000) and Fan, Zhang and Zhang (2001)] test to test if any component is linear or insignicant. In other words, they tested an additive model versus a partially additive model or a smaller additive model. That is, H0 : gj (x) = a + bx or H0 : gj (x) = 0. Let us use model (7.2) to t the Boston house price data. The results are summarized in Figure 7.2 (the R code can be found in Section 7.3). (7.2)

where the additive components {gj ()} are unspecied smooth functions. Pace (1993) and

CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA?

0.4

0.0

0.1

6 x6

0.10
200

lo(x10) 0.00 0.10

lo(x6) 0.2 0.3

0.20

300

400 500 x10

600

700

Component of X_13

lo(x11) 0.00 0.05 0.10 0.15

lo(x13)
14 16 18 x11 20 22

0.10

0.8

0.4

0.0 0.2 0.4

20 x13

Figure 7.1: The results from model (7.1). Quantile Regression Approaches Some existing analyses (e.g., Breiman and Friedman, 1985) in mean regression models concluded that most of the variation seen in housing prices in the restricted data set can be explained by two major variables: X6 and X13 . Indeed, the correlation coecients between
of the house price versus the covariates X13 , X6 , X1 and X1 , respectively The interesting

Y and X13 and X6 are 0.7377 and 0.6954, respectively. Figure 7.3 gives the scatter plots

features of this dataset are that the response variable is the median price of a home in a given area and the distributions of Y and the major covariate X13 are left skewed (please look at the density estimates; see Figure 7.2(d) for the density estimate of Y which can be done by using the command density() in R). To overcome the skewness, one way to use a transformation such as the Box-Cox transformation as mentioned above. Another way (a better way) is to use the quantile regression. Therefore, Yu and Lu (2004) employed the additive quantile technique to analyze the

CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA?

Residual Plot of Additive Model
o o o o o o o o o oo o o o o o o o o oo o o o o oooo oo o o ooo o oo o o o o o ooooooo ooo oo o o ooo o o o o o oooooooo oo o o o o o oo oo o o ooooooooo o o ooo ooo o o o o o oo o o ooooooo o o o ooooo ooo o o o o o ooo o o o o oo o oo o oo ooo o oo o o oo o oo o o oooo ooo o oo o ooo o o o o oo o oooo ooooo oo oo oo o o o o ooo o o oo o o oo oooooooo o oo oo o oo o o oo o oo o o ooo ooo oo oo o o o o oo oo o ooo oo o o o o o o o o ooooo o o o oo oo oo ooo ooooo o o o oooooooo oo o o o oo o ooo o oo o o ooooooo oo oo o oo o o o o oo o o o oo oo ooo o o oo o oo o o oo o o o o o o o oo o o o o ooo o o oo o o oo oo o o o o oo o o 2.5 3.0 y_hat 3.5 4.0 o

s1(x6) 0.0 0.1 0.2 0.3 0.4 0.5 0.6

1.0

Component of X_6
o o o o o o o o o oo o o oo o oo o o oo o oo oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo oo oo o oo o oo ooo o ooo oo ooo oo ooo o oo ooo o ooo ooo ooo oo 4 5 6 x6 7 8

0.0

0.5

o o

0.5

1.0

Residual Plot of Model II

Density of Y

0.5

0.0

0.5

0.00

0.02
0

0.04

0.06

Figure 7.2: (a) Residual plot for model (7.1). (b) Plot of g1 (x6 ) versus x6 . (c) Residual plot for model (7.2). (d) Density estimate of Y . data to t the following additive quantile model, for any 0 < < 1,
q (X) = + ,1 (X6 ) + ,2 (X10 ) + ,3 (X11 ) + ,4 (X13 ),

(7.3)

where X10 = log(X10 ), X13 = log(X13 ), and the additive components ,j () are unspecied

smooth functions. Unfortunately, the detailed graphs are not provided and you are asked to write a code for model (7.3) to present the computational results, including graphs. Note that you can use the command rqss() in R to compute the estimated values of each component in model (7.3). Recently, Sentrk and Mller (2005a, 2005b, 2006) studied the correlation between the u u house price Y and the crime rate X1 adjusted by the confounding variable X13 through a varying coecient model and they concluded that the expected eect of increasing crime rate on declining house prices seems to be only observed for lower educational status neighborhoods in Boston. Finally, it is surprising that all the existing nonparametric models

CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA?

Plot of Price vs LSTAT Plot of Price vs RM

20 x13

10
4

6 x6

Plot of Price vs CRIM

Plot of Price vs log(CRIM)

40 x1

10
4

0 log(x1)

Figure 7.3: Boston Housing Price Data: Displayed in (a)-(d) are the scatter plots of the house price versus the covariates X13 , X6 , X1 and X1 , respectively. aforementioned above did not include the crime rate X1 , which may be an important factor aecting the housing price, and did not consider the interaction terms such as X13 and X1 . Based on the above discussions, Cai and Xu (2005) studied the following nonparametric quantile regression model, which might be well suitable to the analysis of this dataset,
q (X) = a0, (X13 ) + a1, (X13 ) X6 + a2, (X13 ) X1 ,

1 t n = 506,

(7.4)

where X1 = log(X1 ). The reason for using the logarithm of X1 in (7.4), instead of X1 itself, is that the correlation between Yt and X1 is slightly stronger than that for Yt and X1 . The

estimated curves for three coecient functions are displayed in Figure 7.4. In the model
tting, covariates X6 and X1 can be centralized. Of course, you can add more covariates

into model (7.4) and then you need to consider the model diagnostics to see whether some covariates are signicant. Also, you might explore some semi-parametric models to this data set. Unfortunately, the detailed code is not provided and you are asked to write a code by

CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA?

10 15 20 25 30 35 40

tau=0.05 tau=0.50 tau=0.95 Mean CI for tau=0.5

15 (a)

0
5

15 (b)

3 2 1 0

tau=0.05 tau=0.50 tau=0.95 Mean CI for tau=0.5

3
5

15 (c)

Figure 7.4: Boston Housing Price Data: The plots of the estimated coecient functions for three quantiles = 0.05 (solid line) for model (7.4), = 0.50 (dashed line), and = 0.95 (dotted line), and the mean regression (dot-dashed line): a0, (u) and a0 (u) versus u in (a), a1, (u) and a1 (u) versus u in (b), and a2, (u) and a2 (u) versus u in (c). The thick dashed lines indicate the 95% point-wise condence interval for the median estimate with the bias ignored. yourself.

7.3

Computer Codes

The following is the R code for making Figures 7.1 and 7.2. data=[Link]("c:/res-teach/xiamen12-06/data/[Link]") y=data[,14] x1=data[,1] x6=data[,6] x10=data[,10] x11=data[,11] x13=data[,13]

CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA? y_log=log(y) library(gam) fit_gam=gam(y_log~lo(x6)+lo(x10)+lo(x11)+lo(x13)) resid=fit_gam$residuals y_hat=fit_gam$fitted postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4) plot(fit_gam) title(main="Component of X_13",[Link]="red",cex=0.6) [Link]() fit_gam1=gam(y_log~lo(x6)+x10+x11+x13) s1=fit_gam1$smooth[,1] resid1=fit_gam1$residuals y_hat1=fit_gam1$fitted print(summary(fit_gam1)) postscript(file="c:/res-teach/xiamen12-06/figs/figs/[Link]", horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4) plot(y_hat,resid,type="p",pch="o",ylab="",xlab="y_hat") title(main="Residual Plot of Additive Model",[Link]="red",cex=0.6) abline(0,0) plot(x6,s1,type="p",pch="o",ylab="s1(x6)",xlab="x6") title(main="Component of X_6",[Link]="red",cex=0.6) plot(y_hat1,resid1,type="p",pch="o",ylab="",xlab="y_hat") title(main="Residual Plot of Model II",[Link]="red",cex=0.5) abline(0,0) plot(density(y),ylab="",xlab="",main="Density of Y") [Link]() # obtain the smoothed component

CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA?

7.4

References

Belsley, D.A., E. Kuh and R.E. Welsch (1980). Regression Diagnostic: Identifying Inuential Data and Sources of Collinearity. New York: Wiley. Breiman, L. and J.H. Friedman (1985). Estimating optimal transformation for multiple regression and correlation. Journal of the American Statistical Association, 80, 580619. Cai, Z., J. Fan and R. Li (2000). Ecient estimation and inferences for varying-coecient models. Journal of the American Statistical Association, 95, 888-902. Cai, Z., J. Fan and Q. Yao (2000). Functional-coecient regression models for nonlinear time series. Journal of the American Statistical Association, 95, 941-956. Cai, Z. and X. Xu (2005). Nonparametric quantile estimations for dynamic smooth coecient models. Revised for Journal of the American Statistical Association. Chaudhuri, P., K. Doksum and A. Samarov (1997). On average derivative quantile regression. The Annuals of Statistics, 25, 715-744. Fan, J. and J. Jiang (2005). Nonparametric inference for additive models. Journal of the American Statistical Association, 100, 890-907. Fan, J., C. Zhang and J. Zhang (2001). Generalized likelihood ratio statistics and Wilks phenomenon. The Annals of Statistics, 29, 153-193. Gilley, O.W. and R.K. Pace (1996). On the Harrison and Rubinfeld Data. Journal of Environmental Economics and Management, 31, 403-405. Harrison, D. and D.L. Rubinfeld (1978). Hedonic housing prices and demand for clean air. Journal of Environmental Economics and Management, 5, 81-102. Opsomer, J.D. and D. Ruppert (1998). A fully automated bandwidth selection for additive regression model. Journal of The American Statistical Association, 93, 605-618. Pace, R.K. (1993). Nonparametric methods with applications hedonic models. Journal of Real Estate Finance and Economics, 7, 185-204. Pace, R.K. and O.W. Gilley (1997). Using the spatial conguration of the data to improve estimation. Journal of the Real Estate Finance and Economics, 14, 333-340. Sentrk, D. and H.G. Mller (2005a). Covariate-adjusted regression. Biometrika, 92, 75-89. u u Sentrk, D. and H.G. Mller (2005b). Covariate adjusted correlation analysis. Scandina u u vian Journal of Statistics, 32, 365-383. Sentrk, D. and H.G. Mller (2006). Inference for covariate adjusted regression via varying u u coecient models. The Annals of Statistics, 34, 654-679.

CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA?

Yu, K. and Z. Lu (2004). Local linear additive quantile regression. Scandinavian Journal of Statistics, 31, 333-346.

Chapter 8 Value at Risk

8.1 Methodology

Value at Risk (VaR) is one of the basic measures of the risk of a loss in a particular portfolio. It is typically stated in a form like, there is a one percent chance that, over a particular day of trading, the position could lose some dollar amount. We can state it more generally as VaR(p, N ) = V , where V is the dollar amount of the possible loss, p is the percentage (managers might commonly be interested in the 1%, 5%, or 10%), and N is the time period (one day, one week, one month). To get from one-day VaR to N -day VaR, if the risks are independent and identically distributed, we multiply by N (Please verify this. Do you need any assumption?). Or, in one irreverent denition, VaR is a number invented by purveyors of panaceas for pecuniary peril intended to mislead senior management and regulators into false condence that market risk is adequately understood and controlled. VaR is called upon for a variety of tasks (enumerated in Due & Pan 1997): measure risk exposure; ensure correct capital allocation; provide information to counter-parties, regulators, auditors, and other stakeholders; evaluate and provide incentives to prot centers within the rm; and protect against nancial distress. Note that the nal desired outcome (protection against loss) is not the only desired outcome. Being too safe costs money and loses business! VaR is an important component of bank regulation: the Basle Accord sets capital based on the 10-day 1% VaR (so, if risks are iid, then the 10-day VaR is three times larger than the one-day). The capital is set at least 3 times as high as the 10-day 1% VaR, and can be as much as four times higher if the banks own VaR calculations have performed poorly in 96

CHAPTER 8. VALUE AT RISK

the past year. If a bank hits their own 1% VaR more than 3 times in a year (of 250 trading days) then their capital adequacy rating is increased for the next year. Poor modeling can carry a high cost! If we graph the distribution of possible returns of a portfolio, we can interpret VaR as a measure of a percentile. If we plot the cumulative distribution function (cdf) of the portfolio value, then the VaR is simply the inverse of the probability: VaR = F 1 (p) Or if we graph the probability density function (pdf), then VaR is the value that gives a particular area under the curve. This is obviously similar to hypothesis testing in statistics.

Typically we are concerned with the lower tail of losses; therefore some people might refer to either the 1% VaR or the 99% V aR. From the mathematical description above, the 99% VaR would seem to be the upper tail probability (therefore the loss in a short position) but this is rarely the intent - the formulation typically refers to the idea that 99% of losses will be smaller (in absolute value) than V . There are three basic methods of computing the Value at Risk of some portfolio: (a) parametric models that assume some error distribution, (b) historical returns that use a selection of past data, and (c) computationally intensive Monte Carlo simulations that draw thousands of possible future paths. Each of these typically has two steps: rst determine a possible path for the value of some underlying asset, then determine the change in the value of the portfolio assets due to it. The parametric model case assumes that future returns can be modeled with some Z) = p and, in the case of the normal distribution, for p = 0.01 (a 1% one-tailed critical statistical distribution such as the normal. A statistician would note that P ((X )/ <

be dierent critical values. Sometimes a modeler would use the t-distribution (with some

value), Z = 2.326. For other distributions than the normal distribution, there would

CHAPTER 8. VALUE AT RISK

degrees of freedom) or even the Cauchy (a t-distribution with 1 df) to give heavier tails and generalized Pareto distribution (see later) or some more sophisticated and complex distributions (for example, hyperbolic distributions; see Eberlein and Keller (1995)). Regardless of what distribution is chosen, the above probability can be re-written from P (X < Z) = p, so that the quantity, ( Z), is exactly V , the value at risk. An extension of the parametric approach is to use the Greeks that we have derived from the Black-Scholes-Merton model (under the assumption of normally-distributed errors) to calculate possible changes in the portfolio values. We can also incorporate information from options prices into the calculation of forward value at risk. For instance, if S&P 500 option prices imply a high VIX (volatility) then the model could use this volatility instead of the historical average. In extracting information from derivatives prices we must be careful of the dierence between the risk-neutral distribution and the distribution of actual price changes VaR is based on actual changes not some perfectly-hedged portfolio! The basic method of historical returns would just select some reference period of worst one to represent the value at risk based on past trading history. Basically instead of using some parametric cdf, this method uses a historical cdf (an empirical cdf). The choice of historical period means that this method is not quite as simple as it might sound. The parametric method has a potential misspecication if the true distribution is not the specied one. However, the historical cdf (the empirical cdf) is always consistent estimate of the true cdf. Therefore, the historical method would be better although the statistical involvements are much more. Monte Carlo simulations use computational power to compute a cumulative distribution function. Possible future paths of asset prices and derivatives are computed and the value of the portfolio calculated along each one. If the errors are drawn from a parametric distribution then the Monte Carlo method will, in the limit, provide the same result. If the errors are drawn from a historical distribution then the Monte Carlo method will give the same result as that method. Typically a Monte Carlo simulation will use some of each. Each method has strengths and weaknesses. The parametric method is quick and easy to history (say, the past 2 years), order the returns from that period, and select the 100 p%

CHAPTER 8. VALUE AT RISK

implement with a minimum of computing power. It is good for calculating day-to-day trading swings for thickly-traded assets and conventional derivatives (plain vanilla or with wellspecied Greeks). But most common probability distributions were developed for making inferences about behavior in the center of the probability mass means and medians. The tails tend to be poorly specied and so do not perform as well. Each method must choose which underlying variables are considered and which covariances. The covariances are tough (particularly for historical simulations) since the curse of dimensionality quickly takes hold. For example, consider a histogram of 250 observations (one trading year) of returns on a single variable: with the distribution of the same 250 observations but now incorporating the joint behavior of two variables: you might see that now many of the boxes represent a single observation (they are just 1 unit high) while the possible maximum value is 8 (down from 40 previously). A series of historical simulations that drew from these observations would have very few options, compared with a parametric draw that needs only the correlation between the variables and a specication of a distributional form. The imposition of a normal distribution might be unrealistic but it might be less bad than the alternatives. To handle the dimensionality problem, we might use principal components analysis, that breaks down complex movements of variables (like the yield curve) into simpler orthogonal factor movements that can then be used in place of the real changes. To consider the VaR constructions in more detail, consider how to dene the VaR for a portfolio with just a single asset. If this asset has a normal distribution then a parametric deviations below average (see the example above for details on calculating V from a Z VaR would say there is a 1% chance that a days trading will end more than 2.326 standard

statistic). We would use information on the past performance of the asset to determine its average and standard deviation. A historical simulation VaR would rank, say, the past 4 years of data (approximately 1000 observations) to nd the worst 10. For the last 4 years data on the OEX (S&P 100) (please download this dataset and analyze it by yourself), the mean daily return (expressed in logs) is 0.018% and its standard 1% VaR of 3.06%. If we instead use the historical returns we nd a 1-day 1% VaR of error is 0.013. Using the 1% Z value that we had above 2.326, we nd a parametric 1-day

CHAPTER 8. VALUE AT RISK

100

3.49%. How dierent are these numbers? In that same time period, the parametric VaR was exceeded on 1.3% of days; the historical returns VaR would be hit only 0.4% of the time if the parametric model were correct. We should add a number of caveats, principally to note that the historical period which we are considering includes September 2001, when trading stopped for a week. When trading resumed the S&P 100 had fallen over 5%, but this underestimates the nancial impact. Other world markets were open so any positions that crossed international borders would have lost more. We might choose not to include that episode, in which case the VaR numbers rise to a parametric 3.03% or historical 3.28%.

There is a general tradeo between the amount of historical data used and the degree of imposed inference. No model is completely theory-independent (the historical simulation model assumes that the future will be sampled from some dened past). As the amount of theory increases, more inferences can be drawn but at a cost: if the theory is incorrect then the inferences will be worse (although if the theory is right then the inferences will be sharper). All these methods share a weakness of looking to the past for guidance, rather than seriously considering future events that are genuinely new, and tries to push them further, by asking how a portfolio position would react so some of the most extreme price/volume changes in past decades. Back Testing acts to check out that the percentages seem to line up. Not only must the number of actual hits be close to the predicted number, but they should also be independent of each other. Hits that are clustered indicate that the VaR is not adequately dependent upon current market valuations. Back testing can also explore the adequacy of other percentile measures of VaR, on the insight that, if the 5% VaR (or 10%, or 0.5%, or whatever) is wrong then we might have more skepticism for the validity of the 1% measure. For more details, see Chapter 6 of Crouhy, Galai, and Mark (2001). All of these methods also nd only a single percentile of the distribution. Suppose a rm has some exotic option which will be worth $100 in 99.5% of the trading days but will lose $1000 on 0.5%. Then the 1% VaR is $100. Even if the option lost $10,000 or $1,000,000 on 0.5% of trading days, the calculated VaR would not change! In the pictures of the cdf

CHAPTER 8. VALUE AT RISK

101

and pdf, this is simply that the left-hand tail can be drawn out to stretch into huge losses. Therefore some advocate the calculation of a Conditional VaR, which is the expected loss conditional on it being among the worst 1% of trading days. The VaR methods all basically assume that the trading desk is a machine that can generate any basket of risk/return on the ecient frontier that is desired; the role of management is to specify how much risk they want to take on (and therefore how high of a return they want). This assumes that risk is a univariate measure with a dened specication, but there are many sources of risk. If management is genuinely so disconnected from the trading desk that is taking on this risk, then there is a real problem that no VaR can solve! If management is more closely involved, then they should understand the panoply of risks. VaRs are a tool of risk management. Emanuel Derman writes that risk managers need to love markets and know what to worry about: bad marks, misspecied contracts, uncertainty, risk, value-at-risk, illiquid positions, exotic options (in Risk, May 2004). VaR is one part of a long list.

8.2

R Commends (R Menu)

The package VaR contains mainly two functions [Link]() for using a t of Generalized Pareto Distribution (GPD) and [Link]() for a log normal distribution. This function [Link]() estimates Value at Risk and Expected Shortfall (ES) of a single risk factor with a given condence by using a t of GPD to the part of data exceeding a given threshold (Peak Over Threshold (POT) Method). The input data transformed to precentral daily return. Then, transformed data is sorted and only part exceeding a given threshold is hold. Threshold is calculated according an expression [Link]*std. Log likelihood t is then applied to get values of VaR and ES. After that, condence intervals for this values are calculated; see Embrechts, Kluppelberg and Mikosch (1997) and Tsay (2005) for details. The R menu for VaR package is [Link], which can be downloaded, for example, from the web site at [Link]

8.2.1

Generalized Pareto Distribution

The Pareto distribution is a skewed, heavy-tailed distribution that is sometimes used to model the distribution of incomes. Let F (x) = 1 x1/k for x > 1, where k > 0 is a

CHAPTER 8. VALUE AT RISK

102

parameter. The distribution dened by this function is called the Pareto distribution with shape parameter k, and is named for the economist Vilfredo Pareto. The density function f () is given by f (x) = k 1 x(1+1/k) for x > 1. Clearly, f (x) is decreasing for x > 1. f () The Pareto distribution is a heavy-tailed distribution. Thus, the mean, variance, and other moments are nite only if the shape parameter k satises certain conditions. As with many other distributions, the Pareto distribution is often generalized by adding a scale parameter. Thus, suppose that Z has the Pareto distribution with shape parameter k. If > 0, the random variable X = Z has the Pareto distribution with shape parameter of the results given above follow easily from basic properties of the scale transformation. The density function is f (x) = k 1 1/k x(1+1/k) for x > and distribution function is F (x) = 1 (x/)1/k for x > . The quantile function is F 1 (p) = (1 p)k for 0 < p < 1. k and scale parameter . Note that X takes values in the interval (, ). Analogues decreases faster as k decreases and the quantile function is F 1 (p) = (1 p)k for 0 < p < 1.

A more general form for the generalized Pareto distribution with shape parameter k = 0, scale parameter , and threshold parameter , is f (x) = 1 1+k x
1/k

x 1 exp for < x. If k = 0 and = 0, the generalized Pareto distribution is equivalent to the f (x) = exponential distribution. If k > 0 and = , the generalized Pareto distribution is equivalent to the Pareto distribution.

for < x, when k > 0, or for < x < /k when k < 0. In the limit for k = 0, the density

8.2.2

Background of the Generalized Pareto Distribution

Like the exponential distribution, the generalized Pareto distribution is often used to model the tails of another distribution. For example, you might have washers from a manufacturing process. If random inuences in the process lead to dierences in the sizes of the washers, a standard probability distribution, such as the normal, could be used to model those sizes. However, while the normal distribution might be a good model near its mode, it might not

CHAPTER 8. VALUE AT RISK

103

be a good t to real data in the tails and a more complex model might be needed to describe the full range of the data. On the other hand, only recording the sizes of washers larger (or smaller) than a certain threshold means you can t a separate model to those tail data, which are known as exceedance. You can use the generalized Pareto distribution in this way, to provide a good t to extremes of complicated data. The generalized Pareto distribution allows a continuous range of possible shapes that includes both the exponential and Pareto distributions as special cases. You can use either of those distributions to model a particular dataset of exceedance. The generalized extreme value distribution allows you to let the data decide which distribution is appropriate. The generalized Pareto distribution has three basic forms, each corresponding to a limiting distribution of exceedance data from a dierent class of underlying distributions. (1) Distributions whose tails decrease exponentially, such as the normal, lead to a generalized Pareto shape parameter of zero. (2) Distributions whose tails decrease as a polynomial, such as Students t, lead to a positive shape parameter. (3) Distributions whose tails are nite, such as the beta, lead to a negative shape parameter.

8.3

Reading Materials I and II (see Handouts)

The reading materials I and II are from Chapters 5 and 6 in the book by Crouhy, Galai, and Mark (2001) about the basic concepts and methodologies on the value at risk.

8.4

New Developments (Nonparametric Approaches)

For details, see the paper by Cai and Wang (2006). If you like to read the whole paper, you can download it from the web site at [Link] at WORKING PAPER column. Next we present only a part of the whole paper of Cai and Wang (2006).

8.4.1

Introduction

The value-at-risk (hereafter, VaR) and expected shortfall have become two popular measures on market risk associated with an asset or a portfolio of assets during the last decade.

CHAPTER 8. VALUE AT RISK

104

In particular, VaR has been chosen by the Basle Committee on Banking Supervision as the benchmark of risk measures for capital requirements and both of them have been used by nancial institutions for asset managements and minimization of risk as well as have been developed rapidly as analytic tools to assess riskiness of trading activities. See, to name just a few, Morgan (1996), Due and Pan (1997), Jorion (2001, 2003), and Due and Singleton (2003) for the nancial background, statistical inferences, and various applications. In terms of the formal denition, VaR is simply a quantile of the loss distribution (future portfolio values) over a prescribed holding period (e.g., 2 weeks) at a given condence level, while ES is the expected loss, given that the loss is at least as large as some given quantile of the loss distribution (e.g., VaR). It is well known from Artzner, Delbaen, Eber and Heath (1999) that ES is a coherent risk measure such as it satises the following four axioms: homogeneity: increasing the size of a portfolio by a factor should scale its risk measure by the same factor, monotonicity: a portfolio must have greater risk if it has systematically lower values than another, risk-free condition or translation invariance: adding some amount of cash to a portfolio should reduce its risk by the same amount, and subadditivity: the risk of a portfolio must be less than the sum of separate risks or merging portfolios cannot increase risk. VaR satises homogeneity, monotonicity, and risk-free condition but is not subadditive. See Artzner, et al. (1999) for details. As advocated by Artzner, et al. (1999), ES is preferred due to its better properties although VaR is widely used in applications. Measures of risk might depend on the state of the economy since economic and market conditions vary from time to time. This requires risk managers should focus on the conditional distributions of prot and loss, which take full account of current information about the investment environment (macroeconomic and nancial as well as political) in forecasting future market values, volatilities, and correlations. As pointed out by Due and Singleton (2003), not only are the prices of the underlying market indices changing randomly over time, the portfolio itself is changing, as are the volatilities of prices, the credit qualities of counterparties, and so on. On the other hand, one would expect the VaR to increase as the past returns become very negative, because one bad day makes the probability of the next

CHAPTER 8. VALUE AT RISK

105

somewhat greater. Similarly, very good days also increase the VaR, as would be the case for volatility models. Therefore, VaR could depend on the past returns in someway. Hence, an appropriate risk analytical tool or methodology should be allowed to adapt to varying market conditions and to reect the latest available information in a time series setting rather than the iid framework. Most of the existing risk management literature has concentrated on unconditional distributions and the iid setting although there have been some studies on the conditional distributions and time series data. For more background, see Chernozhukov and Umanstev (2001), Cai (2002), Fan and Gu (2003), Engle and Manganelli (2004), Cai and Xu (2005), Scaillet (2005), and Cosma, Scaillet and von Sachs (2006), and references therein for conditional models, and Due and Pan (1997), Artzner, et al. (1999), Rockafellar and Uryasev (2000), Acerbi and Tasche (2002), Frey and McNeil (2002), Scaillet (2004), Chen and Tang (2005), Chen (2006), and among others for unconditional models. Also, most of studies in the literature and applications are limited to parametric models, such as all standard industry models like CreditRisk+ , CreditMetrics, CreditPortfolio View and the model proposed by the KMV corporation. See Chernozhukov and Umanstev (2001), Frey and McNeil (2002), Engle and Manganelli (2004), and references therein on parametric models in practice and Fan and Gu (2003) and references therein for semiparametric models. The main focus of this paper is on studying the conditional value-at-risk (CVaR) and conditional expected shortfall (CES) and proposing a new nonparametric estimation procedure to estimate CVaR and CES functions where the conditional information is allowed to contain economic and market (exogenous) variables and the past observed returns. Parametric models for CVaR and CES can be most ecient if the underlying functions are correctly specied. See Chernozhukov and Umanstev (2001) for a polynomial type regression model and Engle and Manganelli (2004) for a GARCH type parametric model for CVaR based on regression quantile. However, a misspecication may cause serious bias and model constraints may distort the underlying distributions. A nonparametric modeling is appealing in several aspects. One of the advantages for nonparametric modeling is that little or no restrictive prior information on functionals is needed. Further, it may provide a useful insight for further parametric tting. This paper has several contributions to the literature. The rst one is to propose a new nonparametric approach to estimate CVaR and CES. In essence, our estimator for CVaR is based on inverting a newly proposed estimator of the conditional distribution function for

CHAPTER 8. VALUE AT RISK

106

time series data and the estimator for CES is by a plugging-in method based on plugging in the estimated conditional probability density function and the estimated CVaR function. Note that they are analogous to the estimators studied by Scaillet (2005) by using the Nadaraya-Watson (NW) type double kernel (smoothing in both the y and x directions) estimation, and Cai (2002) by utilizing the weighted Nadaraya-Watson (WNW) kernel type technique to avoid the so-called boundary eects as well as Yu and Jones (1998) by employing the double kernel local linear method. More precisely, our newly proposed estimator combines the WNW method of Cai (2002) and the double kernel local linear technique of Yu and Jones (1998), termed as weighted double kernel local linear (WDKLL) estimator. The second contribution is to establish the asymptotic properties for the WDKLL estimators of the conditional probability density function (PDF) and cumulative distribution function (CDF) for the -mixing time series at both boundary and interior points. It is therefore shown that the WDKLL method enjoys the same convergence rates as those of the double kernel local linear estimator of Yu and Jones (1998) and the WNW estimator of Cai (2002). It is also shown that the WDKLL estimators have desired sampling properties at both boundary and interior points of the support of the design density, which seems to be seminal. Finally, we derive the WDKLL estimator of CVaR by inverting the WDKLL conditional distribution estimator and the WDKLL estimator of CES by plugging in the WDKLL estimators of PDF and CVaR. We show that the WDKLL estimator of CVaR exists always due to the WDKLL estimator of CDF being a distribution function itself, and that it inherits all better properties from the WDKLL estimator of CDF; that is, the WDKLL estimator of CDF is a CDF and dierentiable, and it possess the asymptotic properties such as design adaption, avoiding boundary eects, and mathematical eciency. Note that to preserve shape constraints, recently, Cosma, Scaillet and von Sachs (2006) used a wavelet method to estimate conditional probability density and cumulative distribution functions and then to estimate conditional quantiles. Note that CVaR dened here is essentially the conditional quantile or quantile regression of Koenker and Bassett (1978), based on the conditional distribution, rather than CVaR dened in some risk management literature (see, e.g., Rockafellar and Uryasev, 2000; Jorion, 2001, 2003) which is what we call ES here. Also, note that the ES here is called TailVaR in Artzner, et al. (1999). Moreover, as aforementioned, CVaR can be regarded as a special case of quantile regression. See Cai and Xu (2005) for the state-of-the-art about current research

CHAPTER 8. VALUE AT RISK

107

on nonparametric quantile regression, including CVaR. Further, note that both ES and CES have been known for decades among actuary sciences and they are very popular in insurance industry. Indeed, they have been used to assess risk on a portfolio of potential claims, and to design reinsurance treaties. See the book by Embrechts, Kluppelberg, and Mikosch (1997) for the excellent review on this subject and the papers by McNeil (1997), Hrlimann (2003), u Scaillet (2005), and Chen (2006). Finally, ES or CES is also closely related to other applied elds such as the mean residual life function in reliability and the biometric function in biostatistics. See Oakes and Dasu (1990) and Cai and Qian (2000) and references therein. The rest of the paper is organized as follows. Section 8.4.2 provides a blueprint for the basic notations and concepts. In Section 8.4.3, we present the detailed motivations and formulations for the new nonparametric estimation procedures for estimating the conditional PDF, CDF, VaR and ES. We establish the asymptotic properties of these nonparametric estimators at both boundary and interior points with a comparison in Section 4. Together with a convenient and ecient data-driven method for selecting the bandwidth based on the nonparametric Akaike information criterion (AIC), Monte Carol simulation studies and empirical applications on several stock index and stock returns are presented in Section 5. Finally, the derivations of the theorems are given in Section 6 with some lemmas and the Appendix contains the technical proofs of certain lemmas needed in the proofs of the theorems presented in Section 6.

8.4.2

Framework

observed from a stationary time series model. Here Yt is the risk or loss variable which can

Assume that the observed data {(Xt , Yt ); 1 t n}, Xt d , are available and they are

be the negative logarithm of return (log loss) and Xt is allowed to include both economic and market (exogenous) variables and the lagged variables of Yt and also it can be a vector. But, for the expositional purpose, we consider only the case when Xt is a scalar (d = 1). Note that the proposed methodologies and their theory for the univariate case (d = 1) continue to hold for multivariate situations (d > 1). Extension to the case d > 1 involves no fundamentally new ideas. Note that models with large d are often not practically useful due to curse of dimensionality. We now turn to considering the nonparametric estimation of the conditional expected

CHAPTER 8. VALUE AT RISK shortfall p (x), which is dened as p (x) = E[Yt | Yt p (x), Xt = x], where p (x) is the conditional value-at-risk, which is dened as the solution of P (Yt p (x) | Xt = x) = S(p (x) | x) = p

108

or expressed as p (x) = S 1 (p | x), where S(y | x) is the conditional survival function of Yt function. It is easy to see that

given Xt = x; S(y | x) = 1 F (y | x), and F (y | x) is the conditional cumulative distribution

p (x) =
p (x)

y f (y | x) dy/p,

where f (y | x) is the conditional probability density function of Yt given Xt = x. To estimate p (x), one can use the plugging-in method as

p (x) =
p (x)

y f (y | x) dy/p,

(8.1)

of f (y | x). But the bandwidths for p (x) and f (y | x) are not necessary to be same.

where p (x) is a nonparametric estimation of p (x) and f (y | x) is a nonparametric estimation Note that Scaillet (2005) used the NW type double kernel method to estimate f (y | x)

rst, due to Roussas (1969), denoted by f (y | x), and then estimated p (x) by inverting the estimated conditional survival function, denoted by p (x), and nally estimated p (x)
y

S(y | x) =

by plugging f (y | x) and p (x) into (8.1), denoted by p (x), where p (x) = S 1 (y | x) and NW kernel type procedures have serious drawbacks: the asymptotic bias involves the design

f (u | x)du. But, it is well documented (see, e.g., Fan and Gijbels, 1996) that the

density so that they can not be adaptive, and boundary eects exist so that they require boundary modications. In particular, boundary eects might cause a serious problem for estimating p (x) since it is only concerned with the tail probability. The question is now p (x). Therefore, we address this issue in the next section. how to provide a better estimate for f (y | x) and p (x) so that we have a good estimate for

8.4.3

Nonparametric Estimating Procedures

We start with the nonparametric estimators for the conditional density function and its distribution function rst and then turn to discussing the nonparametric estimators for the conditional VaR and ES functions.

CHAPTER 8. VALUE AT RISK

109

literature, such as kernel and nearest-neighbor. To name just a few, see Lejeune and Sarda

There are several methods available for estimating p (x), f (y | x), and F (y | x) in the

(1988), Troung (1989), Samanta (1989), and Chaudhuri (1991) for iid errors, Roussas (1969) and Roussas (1991) for Markovian processes, and Troung and Stone (1992) and Boente and Fraiman (1995) for mixing sequences. To attenuate these drawbacks of the kernel type estimators mentioned in Section 8.4.2, recently, some new methods have been proposed to estimate conditional quantiles. The rst one, a more direct approach, by using the check function such as the robustied local linear smoother, was provided by Fan, Hu, and Troung (1994) and further extended by Yu and Jones (1997, 1998) for iid data. A more general nonparametric setting was explored by Cai and Xu (2005) for time series data. This modeling idea was initialed by Koenker and Bassett (1978) for linear regression quantiles and Fan, Hu, and Troung (1994) for nonparametric models. See Cai and Xu (2005) and references therein for more discussions on models and applications. An alternative procedure is rst to estimate the conditional distribution function by using double kernel local linear technique of Fan, Yao, and Tong (1996) and then to invert the conditional distribution estimator to produce an estimator of a conditional quantile or CVaR. Yu and Jones (1997, 1998) compared these two methods theoretically and empirically and suggested that the double kernel local linear would be better. Estimation of Conditional PDF and CDF To make a connection between the conditional density (distribution) function and nonparametric regression problem, it is noted by the standard kernel estimation theory (see, e.g., Fan and Gijbles, 1996) that for a given symmetric density function K(), E{Kh0 (y Yt ) | Xt = x} = f (y | x) + h2 0 2 (K) f 2,0 (y | x) + o(h2 ) f (y | x), as h0 0, 0 2 (8.2)

where Kh0 (u) = K(u/h0 )/h0 , 2 (K) =

if h0 = o(h), where h is the bandwidth used in smoothing in the x direction (see (8.3) below).

this approximation ignores the higher order terms O(hj ) for j 2, since they are negligible 0

be regarded as an initial estimate of f (y | x) smoothing in the y direction. Also, note that

denotes an approximation by ignoring the higher terms. Note that Yt (y) = Kh0 (y Yt ) can

u2 K(u)du, f 2,0 (y | x) = 2 /y 2 f (y | x), and

Therefore, the smoothing in the y direction is not important in the context of this subject so that intuitively, it should be under-smoothed. Thus, the left hand side of (8.2) can be

CHAPTER 8. VALUE AT RISK

110

regraded as a nonparametric regression of the observed variable Yt (y) versus Xt and the local linear (or polynomial) tting scheme of Fan and Gijbles (1996) can be applied to here. This leads us to consider the following locally weighted least squares regression problem:
n

t=1

{Yt (y) a b (Xt x)}2 Wh (x Xt ),

(8.3)

that (8.3) involves two kernels K() and W (). This is the reason of calling double kernel.

n h as n , which controls the amount of smoothing used in the estimation. Note

where W () is a kernel function and h = h(n) > 0 is the bandwidth satisfying h 0 and

Minimizing the above locally weighted least squares in (8.3) with respect to a and b, we a. From Fan and Gijbels (1996) or Fan, Yao and Tong (1996), f (y | x) can be re-expressed
n

obtain the locally weighted least squares estimator of f (y | x), denoted by f (y | x), which is

as a linear estimator form as

fll (y | x) = where with Sn,j (x) =

n t=1

Wll,t (x, h) Yt (y),

t=1

Wh (x Xt ) (Xt x)j , the weights {Wll,t (x, h)} are given by [Sn,2 (x) (x Xt ) Sn,1 (x)] Wh (x Xt ) . 2 Sn,0 (x)Sn,2 (x) Sn,1 (x)

Wll,t (x, h) =

Clearly, {Wll,t (x, h)} satisfy the so-called discrete moments conditions as follows: for 0 j 1,
n

t=1

Wll,t (x, h) (Xt x)j = 0,j =

1 if j = 0 0 otherwsie

(8.4)

based on the least squares theory; see (3.12) of Fan and Gijbels (1996, p.63). Note that the F (y | x) is constructed (see (8) of Yu and Jones (1998)) by integrating fll (y | x)
y n

estimator fll (y | x) can range outside [0, ). The double kernel local linear estimator of

Fll (y | x) =

fll (y | x)dy =

t=1

Wll,t (x, h) Gh0 (y Yt ),

Note that the dierentiability of the estimated distribution function can make the asymptotic analysis much easier for the nonparametric estimators of CVaR and CES (see later).

is continuous and dierentiable with respect to y with Fll ( | x) = 0 and Fll ( | x) = 1.

where G() is the distribution function of K() and Gh0 (u) = G(u/h0 ). Clearly, Fll (y | x)

CHAPTER 8. VALUE AT RISK

111

Although Yu and Jones (1998) showed that the double kernel local linear estimator has some attractive properties such as no boundary eects, design adaptation, and mathematical eciency (see, e.g., Fan and Gijbels, 1996), it has the disadvantage of producing conditional distribution function estimators that are not constrained either to lie between zero and one or to be monotone increasing, which is not good for estimating CVaR if the inverting method is used. In both these respects, the NW method is superior, despite its rather large bias and boundary eects. The properties of positivity and monotonicity are particularly advantageous if the method of inverting conditional distribution estimator is applied to produce the estimator of a conditional quantile or CVaR. To overcome these diculties, Hall, Wol, and Yao (1999) and Cai (2002) proposed the WNW estimator based on an empirical likelihood principle, which is designed to possess the superior properties of local linear methods such as bias reduction and no boundary eects, and to preserve the property that the NW estimator is always a distribution function, although it might require more computational eorts since it requires estimating and optimizing additional weights aimed at the bias correction. Cai (2002) discussed the asymptotic properties of the WNW estimator at both interior and boundary points for the mixing time series under some regularity assumptions and showed that the WNW estimator has a better performance than other competitors. See Cai (2002) for details. Recently, Cosma, Scaillet and von Sachs (2006) proposed a shape preserving estimation method to estimate cumulative distribution functions and probability density functions using the wavelet methodology for multivariate dependent data and then to estimate a conditional quantile or CVaR. The WNW estimator of the conditional distribution F (y | x) of Yt given Xt = x is dened
n

Fc1 (y | x) =

t=1

Wc,t (x, h) I(Yt y),

(8.5)

where the weights {Wc,t (x, h)} are given by Wc,t (x, h) = pt (x) Wh (x Xt ) , n t=1 pt (x) Wh (x Xt ) (8.6)

and {pt (x)} is chosen to be pt (x) = n1 {1 + (Xt x) Wh (x Xt )}1 0 with , a function of data and x, uniquely dened by maximizing the logarithm of the empirical
n

likelihood Ln () =

t=1

log {1 + (Xt x) Wh (x Xt )}

CHAPTER 8. VALUE AT RISK subject to the constraints is,

n t=1

112

pt (x) = 1 and the discrete moments conditions in (8.4); that

t=1

Wc,t (x, h) (Xt x)j = 0,j

(8.7)

for 0 j 1. Also, see Cai (2002) for details on this aspect. In implementation, Cai (2002) recommended using the Newton-Raphson scheme to nd the root of equation Ln () = 0. Note that 0 Fc1 (y | x) 1 and it is monotone in y. But Fc1 (y | x) is not continuous in y

and of course, not dierentiable in y either. Note that under regression setting, Cai (2001)

provided a comparison of the local linear estimator and the WNW estimator and discussed the asymptotic minimax eciency of the WNW estimator. To accommodate all nice properties (monotonicity, continuity, dierentiability, and lying between zero and one) and the attractive asymptotic properties (design adaption, avoiding boundary eects, and mathematical eciency, see Cai (2002) for detailed discussions) of both estimators Fll (y | x) and Fc1 (y | x) under a unied framework, we propose the following nonparametric estimators for the conditional density function f (y | x) and its conditional distribution function F (y | x), termed as weighted double kernel local linear estimation,
n

fc (y | x) = where Wc,t (x, h) is given in (8.6), and

Wc,t (x, h) Yt (y),

t=1

Fc (y | x) =

fc (y | x)dy =

t=1

Wc,t (x, h) Gh0 (y Yt ).

(8.8)

Note that if pt (x) in (8.6) is a constant for all t, or = 0, then fc (y | x) becomes the classical

NW type double kernel estimator used by Scaillet (2005). However, Scaillet (2005) adopted a single bandwidth for smoothing in both the y and x directions. Clearly, fc (y | x) is a

probability density function so that Fc (y | x) is a cumulative distribution function (monotone, dierentiable in y. Further, as expected, it will be shown that like Fc1 (y | x), Fc (y | x) has eciency. Estimation of Conditional VaR and ES We now are ready to formulate the nonparametric estimators for p (x) and p (x). To this end, from (8.8), p (x) is estimated by inverting the estimated conditional survival distribution 0 Fc (y | x) 1, Fc (| x) = 0, and Fc ( | x) = 1). Also, Fc (y | x) is continuous and

the attractive properties such as no boundary eects, design adaptation, and mathematical

CHAPTER 8. VALUE AT RISK

113

1 Sc (y | x) = 1 Fc (y | x), denoted by p (x) and dened as p (x) = Sc (p | x). Note that p (x)

always exists since Sc (p | x) is a survival function itself. Plugging-in p (x) and fc (y | x) into (8.1), we obtain the nonparametric estimation of p (x),
n

p (x) = p = p

1 p (x) n 1 t=1

y fc (y | x) dy = p

1 t=1

Wc,t (x, h)
p (x)

y Kh0 (y Yt )dy (8.9)

Wc,t (x, h) Yt Gh0 (p (x) Yt ) + h0 G1,h0 (p (x) Yt ) ,

mentioned earlier, p (x) in (8.9) can be an any consistent estimator.

where G(u) = 1 G(u), G1,h0 (u) = G1 (u/h0 ), and G1 (u) =

v K(v)dv. Note that as

8.4.4

Real Examples

Example 1. Now we illustrate our proposed methodology by considering a real data set on Dow Jones Industrials (DJI) index returns. We took a sample of 1801 daily prices from DJI index, from November 3, 1998 to January 3, 2006, and computed the daily returns as 100 times the dierence of the log of prices. Let Yt be the daily negative log return (log loss) of DJI and Xt be the rst lagged variable of Yt . The estimators proposed in this paper are used to estimate the 5% CVaR and CES functions. The estimation results are shown in Figure 8.1 for the 5% CVaR estimate in Figure 8.1(a) and the 5% CES estimate in Figure 8.1(b). Both CVaR and CES estimates exhibit a U-shape, which corresponds to the so-called volatility smile. Therefore, the risk tends to be lower when the lagged log loss of DJI is close to the empirical average and larger otherwise. We can also observe that the curves are asymmetric. This may indicate that the DJI is more likely to fall down if there was a loss within the last day than there was a same amount positive return. Example 2. We apply the proposed methods to estimate the conditional value-at-risk and expected shortfall of the International Business Machine Co. (NYSE: IBM) security returns. The data are daily prices recorded from March 1, 1996 to April 6, 2005. We use the same method to calculate the daily returns as in Example 3. In order to estimate the valueat-risk of a stock return, generally, the information set Xt may contain a market index of corresponding capitalization and type, the industry index, and the lagged values of stock return. For this example, Yt is the log loss of IBM stock returns and only two variables are chosen as information set for the sake of simplicity. Let Xt1 be the rst lagged variable of Yt and Xt2 denote the rst lagged daily log loss of Dow Jones Industrials (DJI) index. Our

CHAPTER 8. VALUE AT RISK

114

(a) Conditional VaR

(b) Conditional Es

1.90

1.85

1.75

1.80

1.70

1.65

1.60

1.0

0.5

0.0

0.5

1.0

2.2 1.0

2.3

2.4

2.5

2.6

0.5

0.0

0.5

1.0

Figure 8.1: (a) 5% CVaR estimate for DJI index. (b) 5% CES estimate for DJI index. main results from the estimation of the model are summarized in Figure 8.2. The surfaces of the estimators of IBM returns are given in Figure 8.2(a) for CVaR and in Figure 8.2(b) for CES. For visual convenience, Figures 8.2(c) and (e) depict the estimated CVaR and CES Figures 8.2(d) and (f) display the estimated CVaR and CES curves (as function of Xt1 ) for three dierent values of Xt2 = (0.225, 0.025, 0.425). From Figures 8.2(c) - (f), we can observe that most of these curves are U-shaped. This is consistent with the results observed in Example 1. Also, we can see that these three curves in each gure are not parallel. This implies that the eects of lagged IBM and lagged DJI variables on the risk of IBM are dierent and complex. To be concrete, let us examine Figure and far away otherwise. This implies that DJI has fewer eects (less information) on CVaR around this value. Otherwise, DJI has more eects when the lagged IBM log loss is far from this value. 8.2(d). Three curves are close to each other when the lagged IBM log loss is around 0.2 curves (as function of Xt2 ) for three dierent values of Xt1 = (0.275, 0.025, 0.325) and

8.5

References

Acerbi, C. and D. Tasche (2002). On the coherence of expected shortfall. Journal of Banking and Finance, 26, 1487-1503.

CHAPTER 8. VALUE AT RISK

115

(a) Conditional VaR surface

(b) Conditional ES surface

2.9 2.8 2.7 2.6 0.4 0.2 0.0 0.2 0.2 0.4 0.4 0.0 IBM 0.2
I DJ

4.6 4.4 4.2 4.0 3.8 0.4 0.2 0.0 0.2 0.2 0.4 0.4 0.0 IBM 0.2
I DJ

0.4

(c) Conditional VaR

2.90
x1=0.275 x1=0.025 x1=0.350

(d) Conditional VaR

x2=0.225 x2=0.025 x2=0.425

2.80

2.70

2.60

0.4

0.2

0.0

0.2

0.4

2.6

2.7

2.8

2.9

0.4

0.2

0.0

0.2

0.4

(e) Conditional ES
4.2
x1=0.275 x1=0.025 x1=0.350

(f) Conditional ES
4.6
x2=0.225 x2=0.025 x2=0.425

4.1

4.0

3.9

3.8

3.7

0.4

0.2

0.0

0.2

0.4

3.8

4.0

4.2

4.4

0.4

0.2

0.0

0.2

0.4

Figure 8.2: (a) 5% CVaR estimates for IBM stock returns. (b) 5% CES estimates for IBM stock returns index. (c) 5% CVaR estimates for three dierent values of lagged negative IBM returns (0.275, 0.025, 0.325). (d) 5% CVaR estimates for three dierent values of lagged negative DJI returns (0.225, 0.025, 0.425). (e) 5% CES estimates for three dierent values of lagged negative IBM returns (0.275, 0.025, 0.325). (f) 5% CES estimates for three dierent values of lagged negative DJI returns (0.225, 0.025, 0.425).

CHAPTER 8. VALUE AT RISK

116

Artzner, P., F. Delbaen, J.M. Eber, and D. Heath (1999). Coherent measures of risk. Mathematical Finance, 9, 203-228. Boente, G. and R. Fraiman (1995). Asymptotic distribution of smoothers based on local means and local medians under dependence. Journal of Multivariate Analysis, 54, 77-90. Cai, Z. (2001). Weighted Nadaraya-Watson regression estimation. Statistics and Probability Letters, 51, 307-318. Cai, Z. (2002). Regression quantiles for time series data. Econometric Theory, 18, 169-192. Cai, Z. (2006). Trending time varying coecient time series models with serially correlated errors. Journal of Econometrics, 136, xxx-xxx. Cai, Z. and L. Qian (2000). Local estimation of a biometric function with covariate eects. In Asymptotics in Statistics and Probability (M. Puri, ed), 47-70. Cai, Z. and R.C. Tiwari (2000). Application of a local linear autoregressive model to BOD time series. Environmetrics, 11, 341-350. Cai, Z. and X. Xu (2005). Nonparametric quantile estimations for dynamic smooth coecient models. Working Paper, Department of Mathematics and Statistics, University of North Carolina at Charlotte. Revised for Journal of the American Statistical Association. Carrasco, M. and X. Chen (2002). Mixing and moments properties of various GARCH and stochastic volatility models. Econometric Theory, 18, 17-39. Chaudhuri, P. (1991). Nonparametric estimates of regression quantiles and their local Bahadur representation. The Annals of statistics, 19, 760-777. Chen, S.X. (2006). Nonparametric estimation of expected shortfall. Working paper, Department of Statistics, Iowa State University. Chen, S.X. and C.Y. Tang (2005). Nonparametric inference of value at risk for dependent nancial returns. Journal of Financial Econometrics, 3, 227-255. Chernozhukov, V. and L. Umanstev (2001). Conditional value-at-risk: Aspects of modeling and estimation. Empirical Economics, 26, 271-292. Cosma, A., O. Scaillet and R. von Sachs (2006). Multivariate wavelet-based shape preserving estimation for dependent observations. Bernoulli, in press. Crouhy, M., D. Galai and R. Mark (2001). Risk Management. McGraw-Hill, New York. Due, D. and J. Pan (1997). An overview of value at risk. Journal of Derivatives, 4, 7-49. Due, D. and K.J. Singleton (2003). Credit Risk: Pricing, Measurement, and Management. Princeton: Princeton University Press.

CHAPTER 8. VALUE AT RISK

117

Eberlein, E. and U. Keller (1995). Hyperbolic distributions in nance. Bernoulli, 281-299. Embrechts, P., C. Kluppelberg, and T. Mikosch (1997). Modeling Extremal Events For Finance and Insurance. New York: Springer-Verlag. Engle, R.F. and S. Manganelli (2004). CAViaR: conditional autoregressive value at risk by regression quantile. Journal of Business and Economics Statistics, 22, 367-381. Fan, J. and I. Gijbels (1996). Local Polynomial Modeling and Its Applications. London: Chapman and Hall. Fan, J. and J. Gu (2003). Semiparametric estimation of value-at-risk. Econometrics Journal, 6, 261-290. Fan, J., T.-C. Hu and Y.K. Troung (1994). Robust nonparametric function estimation. Scandinavian Journal of Statistics, 21, 433-446. Fan, J., Q. Yao, and H. Tong (1996). Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika, 83, 189-206. Frey, R. and A.J. McNeil (2002). VaR and expected shortfall in portfolios of dependent credit risks: conceptual and practical insights. Journal of Banking and Finance, 26, 1317-1334. Hall, P., R.C.L. Wol, and Q. Yao (1999). Methods for estimating a conditional distribution function. Journal of the American Statistical Association, 94, 154-163. Hrlimann, W. (2003). A Gaussian exponential approximation to some compound Poisson u distributions. ASTIN Bulletin, 33, 41-55. Ibragimov, I.A. and Yu. V. Linnik (1971). Independent and Stationary Sequences of Random Variables. Groningen, the Netherlands: Walters-Noordho. Jorion, P. (2001). Value at Risk, 2nd Edition. New York: McGraw-Hill. Jorion, P. (2003). Financial Risk Manager Handbook, 2nd Edition. New York: John Wiley. Koenker, R. and G.W. Bassett (1978). Regression quantiles. Econometrica, 46, 33-50. Lejeune, M.G. and P. Sarda (1988). Quantile regression: a nonparametric approach. Computational Statistics and Data Analysis, 6, 229-281. Masry, E. and J. Fan (1997). Local polynomial estimation of regression functions for mixing processes. The Scandinavian Journal of Statistics, 24, 165-179. McNeil, A. (1997). Estimating the tails of loss severity distributions using extreme value theory. ASTIN Bulletin, 27, 117-137. Morgan, J.P. (1996). Risk Metrics - Technical Documents, 4th Edition, New York. Oakes, D. and T. Dasu (1990). A note on residual life. Biometrika, 77, 409-410.

CHAPTER 8. VALUE AT RISK

118

Ptscher, B.M. and I.R. Prucha (1997). Dynamic Nonlinear Econometric Models: Asympo totic Theory. Berlin: Springer-Verlag. Rockafellar, R. and S. Uryasev (2000). Optimization of conditional value-at-risk. Journal of Risk, 2, 21-41. Roussas, G.G. (1969). Nonparametric estimation of the transition distribution function of a Markov process. The Annals of Mathematical Statistics, 40, 1386-1400. Roussas, G.G. (1991). Estimation of transition distribution function and its quantiles in Markov processes: Strong consistency and asymptotic normality. In G.G. Roussas (ed.), Nonparametric Functional Estimation and related Topics, pp. 443-462. Amsterdam: Kluwer Academic. Samanta, M. (1989). Nonparametric estimation of conditional quantiles. Statistics and Probability Letters, 7, 407-412. Scaillet, O. (2004). Nonparametric estimation and sensitivity analysis of expected shortfall. Mathematical Finance, 14, 115-129. Scaillet, O. (2005). Nonparametric estimation of conditional expected shortfall. Revue Assurances et Gestion des Risques/Insurance and Risk Management Journal, 74, 639660. Troung, Y.K. (1989). Asymptotic properties of kernel estimators based on local median. The Annals of Statistics, 17, 606-617. Troung, Y.K. and C.J. Stone (1992) Nonparametric function estimation involving time series. The Annals of Statistics 20, 77-97. Tsay, R.S. (2005). Analysis of Financial Time Series, 2th Edition. John Wiley & Sons, New York. Yu, K. and M.C. Jones (1997). A comparison of local constant and local linear regression quantile estimation. Computational Statistics and Data Analysis, 25, 159-166. Yu, K. and M.C. Jones (1998). Local linear quantile regression. Journal of the American Statistical Association, 93, 228-237.

Chapter 9 Long Memory Models and Structural Changes

Long memory time series have been a popular area of research in economics, nance and statistics and other applied elds such as hydrological sciences during the recent years. Long memory dependence was rst observed by the hydrologist Hurst (1951) when analyzing the minimal water ow of the Nile River when planning the Aswan Dam. Granger (1966) gave an intensive discussion about the application of long memory dependence in economics and its consequence was initiated. But in many applications, it is not clear whether the observed dependence structure is real long memory or an artefact of some other phenomenon such as structural breaks or deterministic trends. Long memory in the data would have strong consequences.

9.1
9.1.1

Long Memory Models

Methodology

We have discussed that for a stationary time series the ACF decays exponentially to zero as lag increases. Yet for a unit root nonstationary time series, it can be shown that the sample ACF converges to 1 for all xed lags as the sample size increases; see Chan and Wei (1988) and Tiao and Tsay (1983). There exist some time series whose ACF decays slowly to zero at a polynomial rate as the lag increases. These processes are referred to as long memory or long range dependent time series. One such an example is the fractionally dierenced process dened by (1 L)d xt = wt , |d| < 0.5, (9.1)

119

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES

120

where {wt } is a white noise series and d is called the long memory parameter or H = d + 1/2 studied in the literature (e.g., Beran, 1994). We summarize some of these properties below.

is called the Hurst parameter; see Hurst (1951). Properties of model (9.1) have been widely

1. If d < 0.5, then xt is a weakly stationary process and has the innite MA representation

x t = wt +
k=1

k wtk

with k = d(d + 1) (d + k 1)/k! =

k+d1 . k

2. If d > 0.5, then xt is invertible and has the innite AR representation.

x t = wt +
k=1

k wtk

with k = (0 d)(1 d) (k 1 d)/k! =

kd1 . k

3. For |d| < 0.5, the ACF of xt is x (h) = d(1 + d) (h 1 + d) , (1 d)(2 d) (h d) (d)! 2d1 h . (d 1)! h 1.

In particular, x (1) = d/(1 d) and as h , x (h)

4. For |d| < 0.5, the PACF of xt is h,h = d/(h d) for h 1. 5. For |d| < 0.5, the spectral density function fx () of xt , which is the Fourier transform of the ACF x (h) of xt , that is 1 fx () = 2 for [1, 1], where i =

x (h) exp(i h )
h=

1, satises fx () 2d as 0, (9.2)

where [0, 1] denotes the frequency. See the books by Hamilton (1994) and Brockwell and Davis (1991) for details about the spectral analysis. The basic idea and properties of the spectral density and its estimation are discussed in the section. Of particular interest here is the behavior of ACF of xt when d < 0.5. The property says that x (h) h2d1 , which decays at a polynomial, instead of exponential rate. For

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES

121

this reason, such an xt process is called a long-memory time series. A special characteristic of the spectral density function in (9.2) is that the spectrum diverges to innity as 0. [1, 1]. Earlier we used the binomial theorem for non-integer powers

However, the spectral density function of a stationary ARMA process is bounded for all

(1 L)d =

(1)k
k=0

d Lk . k

called an fractionally dierenced autoregressive moving average (ARFIMA(p, d, q)) process, which is a generalized ARIMA model by allowing for non-integer d. In practice, if the sample ACF of a time series is not large in magnitude, but decays slowly, then the series may have long memory. For more discussions, we refer to the book by Beran (1994). For the pure fractionally dierenced model in (9.1), one can estimate d using either a maximum likelihood method in the time domain (by assuming that the distribution is known) or the approximate Whittle likelihood (see below) or a regression method with logged periodogram at the lower frequencies (using (9.2)) in the frequency domain. Finally, long-memory models have attracted some attention in the nance literature in part because of the work on fractional Brownian motion in the continuous time models.

If the fractionally dierenced series (1 L)d xt follows an ARMA(p, q) model, then xt is

9.1.2

Spectral Density

We dene the basic statistics used for detecting periodicities in time series and for determining whether dierent time series are related at certain frequencies. The power spectrum is a measure of the power or variance of a time series at a particular frequency ; the function that displays the power or variance for each frequency , say fx () is called the power spectral density. Although the function fx () is the Fourier transform of the autocovariance function x (h), we shall not make much use of this theoretical result in our generally applied discussion. For the theoretical results, please read the book by Brockwell and Davis (1991, 10.3).
n1 The discrete Fourier transform of a sampled time series {xt }t=0 is dened as

1 X(k ) = n

xt exp{2 k t},
t=0

(9.3)

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES

122

where k = k/n, k = 0, . . . , n 1, denes the set of frequencies over which (9.3) is com-

puted. This means that the frequencies over which (9.3) is evaluated are of the form = 0, 1/n, 2/n, . . . , (n 1)/n. The evaluation of (9.3) proceeds using the fast Fourier transform (DFT) and usually assumes that the length of the series, n is some power of 2. If the series is not a power of 2, it can be padded by adding zeros so that the extended series corresponds to the next highest power of two. One might want to do this anyway if frequencies of the form k = k/n are not close enough to the frequencies of interest. Since, the values of (9.3) have a real and imaginary part at every frequency and can be

positive or negative, it is conventional to calculate rst the squared magnitude of (9.3) which is called the periodogram. The periodogram is dened as Px (k ) = |X(k )|2 (9.4)

and is just the sum of the squares of the sine and cosine transforms of the series. If we consider the power spectrum fx (k ) as the quantity to estimate, it is approximately true1 ; see the books by Hamilton (1994) and Brockwell and Davis (1991, 10.3), that E[P (k )] = E |X(k )|2 fx (k ), (9.5)

which implies that the periodogram is an approximately unbiased estimator of the power spectrum at frequency k . One may also show that the approximate covariance between two frequencies k and l is zero for k = l and both frequencies multiples of 1/n. One can also show that the sine and cosine parts of (9.3) are uncorrelated and approximately normally distributed with equal variances fx (k )/2. This implies that the squared standardized real and imaginary components have chi-squared distributions with 1 degree of freedom each. The periodogram is the sum of two such squared variables and must then have a chi-squared distribution with 2 degrees of freedom. In other words, the approximate distribution of P (k ) is exponential with the parameter k = f (k ). The Whittle likelihood is based on the
n1 approximate exponential distribution of {P (k )}k=0 .

It should be noted that trends introduce apparent long range periodicities which inate the spectrum at low frequencies (small values of k ) since the periodogram assumes that the trend is part of one very long frequency. This behavior will obscure possible important
1 For Ph.D. students, I encourage you to read the related book to understand the details on this aspect, which is important for your future research.

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES

123

components at higher frequencies. For this reason (as with the auto and cross correlations) it (xt a b t) series. A further modication that sometimes improves the approximation in where at is a function, often a cosine bell, that is maximum at the midpoint of the series and decays away at the extremes. Tapering makes a dierence in engineering applications where there are large variations in the value of the spectral density fx (). As a nal comment, we note that the fast Fourier transform algorithms for computing (9.3) generally work best when the sample length is some power of two. For this reason, it is conventional to extend the missing values by xt = 0, t = n + 1, . . . , n 1. If this is done, the frequency ordinates
k = k/n are replaced by k = k/n and we proceed as before.

is important to work in (9.3) with the transform of the mean-adjusted (xt x) or detrended

the expectation in (9.5) is tapering, a procedure where by each point xt is replaced by at xt ,

the data series xt , t = 0, 1, . . . , n 1 to a length n > n that is a power of two by replacing

Theorem 10.3.2 in Brockwell and Davis (1991, p.347) shows that P (k ) is not a consistent estimator of fx (). Since for large n, the periodogram ordinates are approximately uncorrelated with variances changing only slightly over small frequency intervals, we might hope to construct a consistent estimator of fx () by averaging the periodogram ordinates in a small neighborhood of , which is called the smoothing estimator. On the other hand, such a smoothing estimator can reduce the variability of the periodogram estimator of the previous section. This procedure leads to a spectral estimator of the form fx (k ) = 1 2L + 1
L L

Px (k + l/n) =
l=L l=L

wl Px (k + l/n),

(9.6)

when the periodogram is smoothed over 2L + 1 frequencies. The width of the interval over which the frequency average is taken is called the bandwidth. Since there are 2L + 1 frequencies of width 1/n, the bandwidth B in this case is approximately B = (2L+1)/n. The smoothing procedure improves the quality of the estimator for the spectrum since it is now the average of L random variables each having a chi-squared distribution with 2 degrees of freedom. The distribution of the smoothed estimator then will have df = 2(2L + 1) degrees of freedom. If the series is adjusted to the next highest power of 2, say n , the adjusted degrees of freedom for the estimator will be df = 2(2L + 1)n/n and the new bandwidth will be B = (2L + 1)/n . Clearly, in (9.6), wl = 1/(2L + 1), which is called the Daniell window (the corresponding

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES

124

spectral window is given by the Daniell kernel). One popular approach is to take wl to be wl = w(l )/n.
1 If w(x) = rn sin2 (r x/2)/ sin2 (x/2), it is the well known Barlett or triangular window (the

corresponding spectral window is given by the Fejer kernel), where rn = n/(2L). If w(x) = sin((r + 0.5) x)/ sin(x/2), it is the rectangular window (the corresponding spectral window is given by the Dirichlet kernel). See Brockwell and Davis (1991, 10.4) for more discussions.

9.1.3

Applications

The usage of the function fracdi() is fracdiff(x, nar = 0, nma = 0, ar = rep(NA, max(nar, 1)), ma = rep(NA, max(nma, 1)), dtol = NULL, drange = c(0, 0.5), h, M = 100) This function can be used to compute the maximum likelihood estimators of the parameters of a fractionally-dierenced ARIMA(p, d, q) model, together (if possible) with their estimated covariance and correlation matrices and standard errors, as well as the value of the maximized likelihood. The likelihood is approximated using the fast and accurate method of Haslett and Raftery (1989). To generate simulated long-memory time series data from the fractional ARIMA(p, d, q) model, we can use the following function [Link]() and its usage is [Link](n, ar = NULL, ma = NULL, d, [Link] = rnorm, innov = [Link](n+q, ...), [Link] = NA, [Link] = FALSE, ..., mu = 0.) An alternative way to simulate a long memory time series is to use the function [Link](). The menu for the package fracdi can be downloaded from the web site at [Link] The function [Link]() in R calculates the periodogram using a fast Fourier transform, and optionally smooths the result with a series of modied Daniell smoothers (moving averages giving half weight to the end values). The usage of this function is [Link](x, spans = NULL, kernel, taper = 0.1, pad = 0, fast = TRUE, demean = FALSE, detrend = TRUE,

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES plot = TRUE, [Link] = [Link], ...)

125

We can also use the function spectrum() to estimate the spectral density of a time series and its usage is spectrum(x, ..., method = c("pgram", "ar")) Finally, it is worth to pointing out that there is a package called longmemo for long-memory processes, which can be downloaded from [Link] This package also provides a simple periodogram estimation by function per() and other functions like llplot() and lxplot() for making graphs for spectral density. See the menu for details. Example 9.1: As an illustration, Figure 9.1 show the sample ACFs of the absolute series of daily simple returns for the CRSP value-weighted (left top panel) and equal-weighted (right top panel) indexes from July 3, 1962 to December 31, 1997 and the sample partial autocorrelation function of the absolute series of daily simple returns for the CRSP valueweighted (left middle panel) and equal-weighted (right middle panel) indexes. The ACFs are relatively small in magnitude, but decay very slowly; they appear to be signicant at the 5% level even after 300 lags. There are only the rst few lags for PACFs outside the condence interval and then the rest is basically within the condence interval. For more information about the behavior of sample ACF of absolute return series, see Ding, Granger, and Engle (1993). To estimate the long memory parameter estimate d, we can use the function fracdi() in the package fracdi in R and results are d = 0.1867 for the absolute returns of the value-weighted index and d = 0.2732 for the absolute returns of the equalweighted index. To support our conclusion above, we plot the log smoothed spectral density estimation of the absolute series of daily simple returns for the CRSP value-weighted (left bottom panel) and equal-weighted (right bottom panel). They show clearly that both log spectral densities decay like a log function and they support the spectral densities behavior like (9.2).

9.2

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES

126

0.1 0.0 0.1 0.2 0.3 0.4

ACF for valueweighted index

0.1 0.0 0.1 0.2 0.3 0.4

ACF for equalweighted index

100

200

300

400

100

200

300

400

0.3

PACF for valueweighted index

0.3

PACF for equalweighted index

0.2

0.1

0.0

0.1

100

200

300

400

0.1

0.0
0

0.1

0.2

100

200

300

400

9 8 7 6

6
Log Smoothed Spectral Density of VW Log Smoothed Spectral Density of EW 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07

12
0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Figure 9.1: Sample autocorrelation function of the absolute series of daily simple returns for the CRSP value-weighted (left top panel) and equal-weighted (right top panel) indexes. Sample partial autocorrelation function of the absolute series of daily simple returns for the CRSP value-weighted (left middle panel) and equal-weighted (right middle panel) indexes. The log smoothed spectral density estimation of the absolute series of daily simple returns for the CRSP value-weighted (left bottom panel) and equal-weighted (right bottom panel) indexes.

9.2.1

Long Memory versus Structural Breaks

It is a well known stylized fact that many nancial time series such as squares or absolute values of returns or volatilities, even returns themselves behave as if they had long memory; see Ding, Granger and Engle (1993) and Sibbertsen (2004b). On the other hand, it is also well known that long memory is easily confused with structural change, in the sense that the slow decay of empirical autocorrelations which is typical for a time series with long memory is also produced when a shortmemory time series exhibits structural breaks. Therefore it is of considerable theoretical and empirical interest to discriminate between these sources of

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES slowly decaying empirical autocorrelations.

127

Structural break is another type of nonstationarity arises when the population regression function changes over the sample period. This may occur because of changes in economic policy, changes in the structure of the economy or industry, events that change the dynamics of specic industries or rm related quantities such as inventories, sales, and production, etc. If such changes, called breaks, occur then regression models that neglect those changes lead to a misleading inference or forecasting. Breaks may result from a discrete change (or changes) in the population regression coefcients at distinct dates or from a gradual evolution of the coecients over a longer period of time. Discrete breaks may be a result of some major changes in economic policy or in the economy (oil shocks) while gradual breaks, population parameters evolve slowly over time, may be a result of slow evolution of economic policy. The former might be characterized by an indicator function and latter can be described by a smooth transition function. If a break occurs in the population parameters during the sample, then the OLS regression estimates over the full sample will estimate a relationship that holds on average. Now the question is how to test breaks.

9.2.2

Testing for Breaks (Instability)

Tests for breaks in the regression parameters depend on whether the break date is know or not. If the date of the hypothesized break in the coecients is known, then the null hypothesis of no break can be testing using a dummy or indicator variable. For example, consider the following model: yt = 0 + 1 yt1 + 1 xt + ut , if t , (0 + 0 ) + (1 + 1 )yt1 + (1 + 2 ) xt + ut , if t > ,

where denotes the hypothesized break date. Under the null hypothesis of no break, H0 : 0 = 1 = 2 = 0, and the hypothesis of a break can be tested using the F-statistic. This is called a Chow test proposed by Chow (1960) for a break at a known break date. Indeed, the above structural break model can be regarded as a special case of the following trending time series model yt = 0 (t) + 1 (t) yt1 + 1 (t) xt + ut .

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES

128

For more discussions, see Cai (2007). If there is a distinct break in the regression function, the date at which the largest Chow statistic occurs is an estimator of the break date. If there are more variables or more lags, this test can be extended by constructing binary variable interaction variables for all the dependent variables. This approach can be modied to check for a break in a subset of the coecients. The break date is unknown in most of the applications but you may suspect that a break occurred sometime between two dates, 0 and 1 . The Chow test can be modied to handle this by testing for break at all possible dates t in between 0 and 1 , then using the largest of the resulting F-statistics to test for a break at an unknown date. This modied test is often called Quandt likelihood ratio (QLR) statistic or the supWald or supF statistic: supF = max{F (0 ), F (0 + 1), , F (1 )}. Since the supF statistic is the largest of many F-statistics, its distribution is not the same as an individual F-statistic. The critical values for supF statistic must be obtained from a special distribution. This distribution depends on the number of restriction being tested, m, 0 , 1 , and the subsample over which the F-statistics are computed expressed as a fraction of the total sample size. Other types of F -tests are the average F and exponential F given by 1 aveF = 1 0 + 1 and expF = log 1 1 0 + 1
1 1

Fj
j=0

exp(Fj /2) .
j=0

For details on modied F -tests, see the papers by Hansen (1992) and Andrews (1993). For the large-sample approximation to the distribution of the supF statistic to be a good one, the subsample endpoints, 0 and 1 , can not be too close to the end of the sample. That is why the supF statistic is computed over a trimmed subset of the sample. A popular choice is to use 15% trimming, that is, to set for 0 = 0.15T and 1 = 0.85T . With 15% trimming, the F-statistic is computed for break dates in the central 70% of the sample. Table 9.1 presents the critical values for supF statistic computed with 15% trimming. This table is from Stock and Watson (2003) and you should check the book for a complete table. The

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES

129

Table 9.1: Critical Values of the QLR statistic with 15% Trimming Number of restrictions (m) 10% 1 7.12 2 5.00 3 4.09 4 3.59 5 3.26 6 3.02 7 2.84 8 2.69 9 2.58 10 2.48 5% 8.68 5.86 4.71 4.09 3.66 3.37 3.15 2.98 2.84 2.71 1% 12.16 7.78 6.02 5.12 4.53 4.12 3.82 3.57 3.38 3.23

supF test can detect a single break, multiple discrete breaks, and a slow evolution of the regression parameters. Three classes of structural change tests (or tests for parameter instability) which have been receiving much attention in both the statistics and econometrics communities but have been developed in rather loosely connected lines of research are unied by embedding them into the framework of generalized M-uctuation tests (Zeileis and Hornik (2003)). These classes are tests based on maximum likelihood scores (including the Nyblom-Hansen test), on F statistics (supF, aveF, expF tests) and on OLS residuals (OLS-based CUSUM and MOSUM tests; see Chu, Hornik and Kuan (1995), which is a special case of the so called empirical uctuation process, termed as efp in Zeileis, Leisch, Hornik, and Kleiber (2002) and Zeileis and Hornik (2003)). Zeileis (2005) showed that representatives from these classes are special cases of the generalized M-uctuation tests, based on the same functional central limit theorem, but employing dierent functionals for capturing excessive uctuations. After embedding these tests into the same framework and thus understanding the relationship between these procedures for testing in historical samples, it is shown how the tests can also be extended to a monitoring situation. This is achieved by establishing a general Muctuation monitoring procedure and then applying the dierent functionals corresponding to monitoring with ML scores, F statistics and OLS residuals. In particular, an extension of the supF test to a monitoring scenario is suggested and illustrated on a real world data set. In R, there are two packages strucchange developed by Zeileis, Leisch, Hornik, and

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES

130

Kleiber (2002) and segmented to provide several testing methods for testing breaks. They can be downloaded at [Link] and [Link] respectively. Example 9.2: We use the data build-in in R for the minimal water ow of the Nile River when planning the Ashwan Dam; see Hurst (1951), yearly data from 1871 to 1970 with 100 observations and the data name Nile. Also, we might use the data build-in the package longmemo in R, yearly data with 633 observations from 667 to 1284 and the data name NileMin. Here we use R to run the data set in Nile. As a result, the p-value for testing break
Nile 1000 1200 1400
1890 1910 Time 1930 1950

F statistics 40

600
1880

800

1900

1920 Time

1940

1960

OLSbased CUSUM test OLSbased CUSUM test with alternative boundaries

Empirical fluctuation process 1 0 1 2

1880

1900

1920 Time

1940

1960

Empirical fluctuation process 1 0 1 2

1880

1900

1920 Time

1940

1960

Figure 9.2: Break testing results for the Nile River data: (a) Plot of F -statistics. (b) The scatterplot with the breakpoint. (c) Plot of the empirical uctuation process with linear boundaries. (d) Plot of the empirical uctuation process with alternative boundaries. is very small (see the computer output) so that H0 is rejected. The details are summarized in Figures 9.2(a)-(b). It is clear from Figure 9.2(b) that there is one breakpoint for the Nile River data: the annual ows drop in 1898 because the rst Aswan dam was built. To test the null hypothesis that the annual ow remains constant over the years, we also compute OLS-based CUSUM process and plot with standard linear and alternative boundaries in Figures 9.2(c)-(d).

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES

131

Example 9.3: We build a time series model for real oil price (quarterly) listed in the tenth column in le [Link], ranged from the rst quarter of 1959 to the third quarter of 2002. Before we build such a time series model, we want to see if there is any structure change for oil price. As a result, the p-value for testing break is very small (see the computer output) so that H0 is rejected. The details are summarized in Figure 9.3. It is clear from Figure
2.0 F statistics 10 15 20 25 30 0
40 60 80 100 Time 120 140

0.5
0

op 1.0

1.5

100 Time

150

OLSbased CUSUM test OLSbased CUSUM test with alternative boundaries

Empirical fluctuation process 2 1 0 1 2

100 Time

150

Empirical fluctuation process 2 1 0 1 2

100 Time

150

Figure 9.3: Break testing results for the oil price data: (a) Plot of F -statistics. (b) Scatterplot with the breakpoint. (c) Plot of the empirical uctuation process with linear boundaries. (d) Plot of the empirical uctuation process with alternative boundaries. 9.3(b) that there is one breakpoint for the oil price. We also compute OLS-based CUSUM process and plot with standard linear and alternative boundaries in Figures 9.3(c)-(d). If we consider the quarterly price level (CPI) listed in the eighth column in le [Link] (1959.1 2002.3), we want to see if there is any structure change for quarterly consumer price index. The details are summarized in Figure 9.4. It is clear from Figure 9.4(b) that there would be one breakpoint for the consumer price index. We also compute OLS-based CUSUM process and plot with standard linear and alternative boundaries in Figures 9.4(c)(d). Please thank about the conclusion for CPI!!! Do you believe this result? If not, what happen?

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES

132

600

F statistics 400

200

80 100 Time

120

140

50
0

cpi 100

150

100 Time

150

OLSbased CUSUM test OLSbased CUSUM test with alternative boundaries

Empirical fluctuation process 4 2 0

100 Time

150

6
0

Empirical fluctuation process 4 2 0

100 Time

150

Figure 9.4: Break testing results for the consumer price index data: (a) Plot of F -statistics. (b) Scatterplot with the breakpoint. (c) Plot of the empirical uctuation process with linear boundaries. (d) Plot of the empirical uctuation process with alternative boundaries. Sometimes, you would suspect that a series may either have a unit root or be a trend stationary process that has a structural break at some unknown period of time and you would want to test the null hypothesis of unit root against the alternative of a trend stationary process with a structural break. This is exactly the hypothesis testing procedure proposed by Zivot and Andrews (1992). In this testing procedure, the null hypothesis is a unit root process without any structural breaks and the alternative hypothesis is a trend stationary process with possible structural change occurring at an unknown point in time. Zivot and Andrews (1992) suggested estimating the following regression: xt = + t + xt1 + k ci xti + et , i=1 [ + ] + [ t + (t T B )] + xt1 +
k i=1 ci

if t T , xti + et , if t > T ,

(9.7)

where = T B /T is the break fraction. Model (9.7) is estimated by OLS with the break points ranging over the sample and the t-statistic for testing = 1 is computed. The minimum t-statistic is reported. Critical values for 1%, 5% and 10% critical values are 5.34, 4.8 and 4.58, respectively. The appropriate number of lags in dierences is estimated for each

value of . Please read the papers by Zivot and Andrews (1992) and Sadorsky (1999) for

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES more details about this method and empirical applications.

133

9.2.3

Long Memory versus Trends

A more general problem than distinguishing long memory and structural breaks is in a way the question if general trends in the data can cause the Hurst eect. The paper by Bhattacharya et al (1983) was the rst to deal this problem. They found that adding a deterministic trend to a short memory process can cause spurious long memory. Now the problem becomes more complicated for modeling long memory time series. For this issue, we will follow Section 5 of Zeileis (2004a). Note that there are a lot of ongoing research (theoretical and empirical) works in this area.

9.3

Computer Codes

##################### # This is Example 9.1 ##################### z1<-matrix(scan("c:/res-teach/xiamen12-06/data/[Link]"), byrow=T,ncol=5) vw=abs(z1[,3]) n_vw=length(vw) ew=abs(z1[,4]) postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) par(mfrow=c(3,2),mex=0.4,bg="light green") acf(vw, ylab="",xlab="",ylim=c(-0.1,0.4),lag=400,main="") text(200,0.38,"ACF for value-weighted index") acf(ew, ylab="",xlab="",ylim=c(-0.1,0.4),lag=400,main="") text(200,0.38,"ACF for equal-weighted index") pacf(vw, ylab="",xlab="",ylim=c(-0.1,0.3),lag=400,main="") text(200,0.28,"PACF for value-weighted index") pacf(ew, ylab="",xlab="",ylim=c(-0.1,0.3),lag=400,main="") text(200,0.28,"PACF for equal-weighted index")

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES library(fracdiff) d1=fracdiff(vw,ar=0,ma=0) d2=fracdiff(ew,ar=0,ma=0) print(c(d1$d,d2$d)) m1=round(log(n_vw)/log(2)+0.5) pad1=1-n_vw/2^m1 vw_spec=[Link](vw,spans=c(5,7),demean=T,detrend=T,pad=pad1,plot=F) ew_spec=[Link](ew,spans=c(5,7),demean=T,detrend=T,pad=pad1,plot=F) vw_x=vw_spec$freq[1:1000] vw_y=vw_spec$spec[1:1000] ew_x=ew_spec$freq[1:1000] ew_y=ew_spec$spec[1:1000] [Link](vw_x,log(vw_y),span=1/15,ylab="",xlab="",col=6,cex=0.7, main="") text(0.03,-7,"Log Smoothed Spectral Density of VW") [Link](ew_x,log(ew_y),span=1/15,ylab="",xlab="",col=7,cex=0.7, main="") text(0.03,-7,"Log Smoothed Spectral Density of EW") [Link]() ########################## # This is for Example 9.2 ########################## [Link]() library(strucchange) library(longmemo) # not used

134

postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4,bg="light blue") #if(! "package:stats" %in% search()) library(ts) ## Nile data with one breakpoint: the annual flows drop in 1898

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES ## because the first Ashwan dam was built data(Nile) ## test whether the annual flow remains constant over the years [Link]=Fstats(Nile ~ 1) plot([Link]) print(sctest([Link])) plot(Nile) lines(breakpoints([Link])) ## test the null hypothesis that the annual flow remains constant ## over the years ## compute OLS-based CUSUM process and plot ## with standard and alternative boundaries [Link]=efp(Nile ~ 1, type = "OLS-CUSUM") plot([Link]) plot([Link], alpha = 0.01, [Link] = TRUE) ## calculate corresponding test statistic print(sctest([Link])) [Link]() ######################### # This is for Example 9.3 ######################### y=[Link]("c:/res-teach/xiamen12-06/data/[Link]",header=T,skip=1) op=y[,10] horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4,bg="light pink") op=ts(op) [Link]=Fstats(op ~ 1) #plot(op,type="l") # no lags and covariate # oil price postscript(file="c:/res-teach/xiamen12-06/figs/[Link]",

135

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES plot([Link]) print(sctest([Link])) ## visualize the breakpoint implied by the argmax of the F statistics plot(op,type="l") lines(breakpoints([Link])) [Link]=efp(op~ 1, type = "OLS-CUSUM") plot([Link]) plot([Link], alpha = 0.01, [Link] = TRUE) ## calculate corresponding test statistic print(sctest([Link])) [Link]() cpi=y[,8] cpi=ts(cpi) [Link]=Fstats(cpi~1) print(sctest([Link])) postscript(file="c:/res-teach/xiamen12-06/figs/[Link]", horizontal=F,width=6,height=6) #[Link]() par(mfrow=c(2,2),mex=0.4,bg="light yellow") #plot(cpi,type="l") plot([Link]) ## visualize the breakpoint implied by the argmax of the F statistics plot(cpi,type="l") lines(breakpoints([Link])) [Link]=efp(cpi~ 1, type = "OLS-CUSUM") plot([Link]) plot([Link], alpha = 0.01, [Link] = TRUE) ## calculate corresponding test statistic print(sctest([Link])) [Link]()

136

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES

137

9.4

References

Andrews, D.W.K. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica, 61, 821-856. Bhattacharya, R.N., V.K. Gupta and W. Waymire (1983). The Hurst eect under trends. Journal of Applied Probability, 20, 649-662. Beran, J. (1994). Statistics for Long-Memory Processes. Chapman and Hall, London. Brockwell, P.J. and Davis, R.A. (1991). Time Series Theory and Methods. New York: Springer. Chan, N.H. and C.Z. Wei (1988). Limiting distributions of least squares estimates of unstable autoregressive processes. Annals of Statistics, 16, 367-401. Chow, G.C. (1960). Tests of equality between sets of coecients in two linear regressions. Econometrica, 28, 591-605. Chu, C.S.J, K. Hornik and C.M. Kuan (1995). MOSUM tests for parameter constancy. Biometrika, 82, 603-617. Ding, Z., C.W.J. Granger and R.F. Engle (1993). A long memory property of stock returns and a new model. Journal of Empirical Finance, 1, 83-106. Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press. Hansen, B.E. (1992). Testing for parameter instability in linear models. Journal of Policy Modeling, 14, 517-533. Haslett, J. and A.E. Raftery (1989). Space-time modelling with long-memory dependence: Assessing Irelands wind power resource (with discussion). Applied Statistics, 38, 1-50. Hsu, C.-C., and C.-M. Kuan. (2000). Long memory or structural changes: Testing method and empirical examination. Working paper, Department of Economics, National Central University, Taiwan. Hurst, H.E. (1951). Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers, 116, 770-799. Granger, C.W.J. (1966). The typical spectral shape of an economic variable. Econometrica, 34, 150-161. Krmer, W., P. Sibbertsen and C. Kleiber (2002). Long memory versus structural change a in nancial tim series. Allgemeines Statistisches Archiv, 86, 83-96. Nyblom, J. (1989). Testing for the constancy of parameters over time. Journal of the American Statistical Association, 84, 223-230.

CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES

138

Sadorsky, P. (1999). Oil price shocks and stock market activity. Energy Economics, 21, 449-469. Sibbertsen, P. (2004a). Long memory versus structural breaks: An overview. Statistical Papers, 45, 465-515. Sibbertsen, P. (2004b). Long memory in volatilities of German stock returns. Empirical Economics, 29, 477-488. Stock, J.H. and M.W. Watson (2003). Introduction to Econometrics. Addison-Wesley. Tiao, G.C. and R.S. Tsay (1983). Consistency properties of least squares estimates of autoregressive parameters in ARMA models. Annals of Statistics, 11, 856-871. Tsay, R.S. (2005). Analysis of Financial Time Series, 2th Edition. John Wiley & Sons, New York. Zeileis, A. (2005). A unied approach to structural change tests based on ML scores, F statistics, and OLS residuals. Econometric Reviews, 24, 445-466. Zeileis A. and Hornik, K. (2003). Generalized M-uctuation tests for parameter instability. Report 80, SFB Adaptive Information Systems and Modelling in Economics and Management Science. URL [Link] Zeileis, A., F. Leisch, K. Hornik and C. Kleiber (2002). strucchange: An R package for testing for structural change in linear regression models. Journal of Statistical Software, 7, Issue 2, 1-38. Zivot, E. and D.W.K. Andrews (1992). Further evidence on the great crash, the oil price shock and the unit root hypothesis. Journal of Business and Economics Statistics, 10, 251-270.

Advanced Statistical Modelling Notes
No ratings yet
Advanced Statistical Modelling Notes
233 pages
Econometrics in R
No ratings yet
Econometrics in R
34 pages
Econometrics in R: Grant V. Farnsworth October 26, 2008
No ratings yet
Econometrics in R: Grant V. Farnsworth October 26, 2008
50 pages
Using Gretl For Principles of Econometrics, 3rd Edition
No ratings yet
Using Gretl For Principles of Econometrics, 3rd Edition
381 pages
Farnsworth, Econometrics in R
No ratings yet
Farnsworth, Econometrics in R
48 pages
Nonparametric Notes
No ratings yet
Nonparametric Notes
184 pages
CS 2008 3complete PDF
No ratings yet
CS 2008 3complete PDF
53 pages
Applied Statistics
No ratings yet
Applied Statistics
457 pages
Time Series Analysis R
100% (3)
Time Series Analysis R
340 pages
Applied Statistics with R Guide
No ratings yet
Applied Statistics with R Guide
417 pages
Applied Econometrics Module
100% (2)
Applied Econometrics Module
141 pages
Time Series Previewpdf
No ratings yet
Time Series Previewpdf
30 pages
Econometrics I: RStudio Guide
No ratings yet
Econometrics I: RStudio Guide
77 pages
Time Series A Data Analysis Approach Using R by Robert Shumway, David Stoffer
No ratings yet
Time Series A Data Analysis Approach Using R by Robert Shumway, David Stoffer
272 pages
Understanding Regression in R
No ratings yet
Understanding Regression in R
253 pages
Adkins (2011) - Using Gretl For Principles of Econometrics, 4th Edition PDF
No ratings yet
Adkins (2011) - Using Gretl For Principles of Econometrics, 4th Edition PDF
494 pages
R for Introductory Econometrics Guide
No ratings yet
R for Introductory Econometrics Guide
378 pages
Baum - An Introduction To Modern Econometrics Using Stata
100% (1)
Baum - An Introduction To Modern Econometrics Using Stata
376 pages
R Corregr
No ratings yet
R Corregr
147 pages
Advanced Statistics For The Behavioral Sciences A Computational Approach With R Ebook Full Text
100% (18)
Advanced Statistics For The Behavioral Sciences A Computational Approach With R Ebook Full Text
16 pages
Applied Statistics with R Overview
No ratings yet
Applied Statistics with R Overview
361 pages
Using Gretl For Principles of Econometrics, 3rd Edition: The Errata (Page 286) For Changes Since The Last Update
No ratings yet
Using Gretl For Principles of Econometrics, 3rd Edition: The Errata (Page 286) For Changes Since The Last Update
316 pages
Using Gretl For Principles of Econometrics
No ratings yet
Using Gretl For Principles of Econometrics
316 pages
RATS Programming Manual
No ratings yet
RATS Programming Manual
255 pages
Introduction To Statistical Modeling With SAS/STAT Software
No ratings yet
Introduction To Statistical Modeling With SAS/STAT Software
60 pages
Introduccion A R en Mexico
No ratings yet
Introduccion A R en Mexico
29 pages
(Edward Curry) An Introduction To Bioinformatics - A Practical Guide For Biologists
No ratings yet
(Edward Curry) An Introduction To Bioinformatics - A Practical Guide For Biologists
248 pages
(Ebook PDF) Time Series: A Data Analysis Approach Using R Install Download
No ratings yet
(Ebook PDF) Time Series: A Data Analysis Approach Using R Install Download
47 pages
Boulder Handout 2019
No ratings yet
Boulder Handout 2019
187 pages
Essential R
No ratings yet
Essential R
261 pages
Stata Basics for Econometricians
100% (1)
Stata Basics for Econometricians
58 pages
Stata Basics for Econometric Analysis
No ratings yet
Stata Basics for Econometric Analysis
58 pages
R for Applied Statistics
No ratings yet
R for Applied Statistics
457 pages
Econometrics
No ratings yet
Econometrics
28 pages
Advanced Statistical Methods Overview
No ratings yet
Advanced Statistical Methods Overview
194 pages
John Fox - Using The R Commander. A Point-And-Click Interface For R-CRC (2018)
No ratings yet
John Fox - Using The R Commander. A Point-And-Click Interface For R-CRC (2018)
223 pages
R for Biomedical Data Science
No ratings yet
R for Biomedical Data Science
17 pages
Modeling and Visulizing Data Using R: A Practical Introduction
No ratings yet
Modeling and Visulizing Data Using R: A Practical Introduction
106 pages
Statistical Methods For Data Science
100% (2)
Statistical Methods For Data Science
406 pages
Modern Applied Regressions
No ratings yet
Modern Applied Regressions
298 pages
Foundations and Applications of Statistics An Introduction Using R by Randall Pruim
100% (1)
Foundations and Applications of Statistics An Introduction Using R by Randall Pruim
842 pages
Statistical Methods II
No ratings yet
Statistical Methods II
284 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Statistics Consulting Overview
100% (1)
Statistics Consulting Overview
44 pages
Practical R Programming Guide
No ratings yet
Practical R Programming Guide
103 pages
Regression GL M
No ratings yet
Regression GL M
315 pages
Stata Guide - David Veenman v5.0 (Sept 2023)
No ratings yet
Stata Guide - David Veenman v5.0 (Sept 2023)
233 pages
Modern Analysis of Biological Data
No ratings yet
Modern Analysis of Biological Data
258 pages
Hansen (2006, Econometrics)
No ratings yet
Hansen (2006, Econometrics)
196 pages
Probability & Statistics Review
No ratings yet
Probability & Statistics Review
24 pages
Mathai CV
No ratings yet
Mathai CV
25 pages
Rrcov
No ratings yet
Rrcov
49 pages
Sta 809 A
No ratings yet
Sta 809 A
58 pages
Bayesian Conjugate Priors Explained
No ratings yet
Bayesian Conjugate Priors Explained
5 pages
2000 - Gareth - Principal Component Models For Sparse Functional Data
No ratings yet
2000 - Gareth - Principal Component Models For Sparse Functional Data
16 pages
Probability Theory Question Bank Blooms
No ratings yet
Probability Theory Question Bank Blooms
7 pages
High Dimensional Probability MA3K0 Notes 3
No ratings yet
High Dimensional Probability MA3K0 Notes 3
108 pages
Stochastic Differential Equations. Introduction To Stochastic Models For Pollutants Dispersion, Epidemic and Finance
100% (1)
Stochastic Differential Equations. Introduction To Stochastic Models For Pollutants Dispersion, Epidemic and Finance
156 pages
Introduction To Computational Finance and Financial Econometrics With R
No ratings yet
Introduction To Computational Finance and Financial Econometrics With R
527 pages
ExecCourse1 9-20pdf
No ratings yet
ExecCourse1 9-20pdf
44 pages
Two Sample Hotelling's T Square
No ratings yet
Two Sample Hotelling's T Square
30 pages
On The Theory of Elliptically Contoured Distributions - Cambanis, Huang, Simons (1981)
No ratings yet
On The Theory of Elliptically Contoured Distributions - Cambanis, Huang, Simons (1981)
18 pages
Online Multivariate Regression for Electricity Price Forecasting
No ratings yet
Online Multivariate Regression for Electricity Price Forecasting
35 pages
Understanding Multivariate Normal Distribution
No ratings yet
Understanding Multivariate Normal Distribution
9 pages
hw8 (5555)
No ratings yet
hw8 (5555)
3 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
18 pages
Linear Model Theory Univariate Multivariate and Mixed Models 1st Edition Keith E Muller PDF Download
No ratings yet
Linear Model Theory Univariate Multivariate and Mixed Models 1st Edition Keith E Muller PDF Download
88 pages
MATLAB Functions for Probability Distributions
No ratings yet
MATLAB Functions for Probability Distributions
6 pages
Multivariate Normal Distribution
50% (2)
Multivariate Normal Distribution
8 pages
MVN Exercises Solution Guide
No ratings yet
MVN Exercises Solution Guide
3 pages
Multivariate Normal Distribution - Wikipedia, The Free Encyclopedia
No ratings yet
Multivariate Normal Distribution - Wikipedia, The Free Encyclopedia
12 pages
Kuliah 3 Teori Keputusan Bayes Bag 2
No ratings yet
Kuliah 3 Teori Keputusan Bayes Bag 2
30 pages
Basics of Linear Models for Returns
No ratings yet
Basics of Linear Models for Returns
21 pages
Trans Emerging Tel Tech - 2008 - Telatar - Capacity of Multi Antenna Gaussian Channels
No ratings yet
Trans Emerging Tel Tech - 2008 - Telatar - Capacity of Multi Antenna Gaussian Channels
11 pages
Mathematical and Applied Statistics Module 2.
No ratings yet
Mathematical and Applied Statistics Module 2.
53 pages
SM 38
No ratings yet
SM 38
16 pages
Normal Distribution Properties and Transformations
No ratings yet
Normal Distribution Properties and Transformations
9 pages
Omega Ratio
No ratings yet
Omega Ratio
11 pages