0% found this document useful (0 votes)
26 views30 pages

Time Series Previewpdf

The document is a comprehensive textbook titled 'Time Series: A Data Analysis Approach Using R' authored by Robert H. Shumway and David S. Stoffer, published by CRC Press in 2019. It covers various aspects of time series analysis including models, regression, spectral analysis, and additional topics, providing a structured approach to data analysis using the R programming language. The book includes practical examples, problems, and supplementary materials for readers to enhance their understanding of time series methodologies.

Uploaded by

muuokelvin905
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views30 pages

Time Series Previewpdf

The document is a comprehensive textbook titled 'Time Series: A Data Analysis Approach Using R' authored by Robert H. Shumway and David S. Stoffer, published by CRC Press in 2019. It covers various aspects of time series analysis including models, regression, spectral analysis, and additional topics, providing a structured approach to data analysis using the R programming language. The book includes practical examples, problems, and supplementary materials for readers to enhance their understanding of time series methodologies.

Uploaded by

muuokelvin905
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Time Series: A Data

Analysis Approach
Using R
CHAPMAN & HALL/CRC

Texts in Statistical Science Series

Joseph K. Blitzstein, Harvard University, USA


Julian J. Faraway, University of Bath, UK
Martin Tanner, Northwestern University, USA
Jim Zidek, University of British Columbia, Canada

Recently Published Titles

Extending the Linear Model with R


Generalized Linear, Mixed Effects and Nonparametric Regression Models, Second Edition
J.J. Faraway

Modeling and Analysis of Stochastic Systems, Third Edition


V.G. Kulkarni

Pragmatics of Uncertainty
J.B. Kadane

Stochastic Processes
From Applications to Theory
P.D Moral and S. Penev

Modern Data Science with R


B.S. Baumer, D.T Kaplan, and N.J. Horton

Generalized Additive Models


An Introduction with R, Second Edition
S. Wood

Design of Experiments
An Introduction Based on Linear Models
Max Morris

Introduction to Statistical Methods for Financial Models


T. A. Severini

Statistical Regression and Classification


From Linear Models to Machine Learning
Norman Matloff

Introduction to Functional Data Analysis


Piotr Kokoszka and Matthew Reimherr

Stochastic Processes
An Introduction, Third Edition
P.W. Jones and P. Smith

Theory of Stochastic Objects


Probability, Stochastic Processes and Inference
Athanasios Christou Micheas
Linear Models and the Relevant Distributions and Matrix Algebra
David A. Harville

An Introduction to Generalized Linear Models, Fourth Edition


Annette J. Dobson and Adrian G. Barnett

Graphics for Statistics and Data Analysis with R


Kevin J. Keen

Statistics in Engineering, Second Edition


With Examples in MATLAB and R
Andrew Metcalfe, David A. Green, Tony Greenfield, Mahayaudin Mansor, Andrew Smith,
and Jonathan Tuke

Introduction to Probability, Second Edition


Joseph K. Blitzstein and Jessica Hwang

A Computational Approach to Statistical Learning


Taylor Arnold, Michael Kane, and Bryan W. Lewis

Theory of Spatial Statistics


A Concise Introduction
M.N.M van Lieshout

Bayesian Statistical Methods


Brian J. Reich, Sujit K. Ghosh

Time Series
A Data Analysis Approach Using R
Robert H. Shumway, David S. Stoffer

For more information about this series, please visit: https://www.crcpress.com/go/texts-


series
Time Series: A Data
Analysis Approach
Using R

Robert H. Shumway
David S. Stoffer
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2019 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper


Version Date: 20190416

International Standard Book Number-13: 978-0-367-22109-6 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reason-
able efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The
authors and publishers have attempted to trace the copyright holders of all material reproduced in
this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know
so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access
www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organiza-
tion that provides licenses and registration for a variety of users. For organizations that have been
granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Names: Shumway, Robert H., author. | Stoffer, David S., author.


Title: Time series : a data analysis approach using R / Robert Shumway, David
Stoffer.
Description: Boca Raton : CRC Press, Taylor & Francis Group, 2019. | Includes
bibliographical references and index.
Identifiers: LCCN 2019018441 | ISBN 9780367221096 (hardback : alk. paper)
Subjects: LCSH: Time-series analysis--Textbooks. | Time-series analysis--Data
processing. | R (Computer program language)
Classification: LCC QA280 .S5845 2019 | DDC 519.5/502855133--dc23
LC record available at https://lccn.loc.gov/2019018441

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents

Preface xi

1 Time Series Elements 1


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Time Series Data . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Time Series Models . . . . . . . . . . . . . . . . . . . . . . . . . 9
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Correlation and Stationary Time Series 17


2.1 Measuring Dependence . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Estimation of Correlation . . . . . . . . . . . . . . . . . . . . . . 27
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Time Series Regression and EDA 37


3.1 Ordinary Least Squares for Time Series . . . . . . . . . . . . . . 37
3.2 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . 47
3.3 Smoothing Time Series . . . . . . . . . . . . . . . . . . . . . . . 58
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4 ARMA Models 67
4.1 Autoregressive Moving Average Models . . . . . . . . . . . . . . 67
4.2 Correlation Functions . . . . . . . . . . . . . . . . . . . . . . . . 76
4.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5 ARIMA Models 99
5.1 Integrated Models . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2 Building ARIMA Models . . . . . . . . . . . . . . . . . . . . . 104
5.3 Seasonal ARIMA Models . . . . . . . . . . . . . . . . . . . . . . 111
5.4 Regression with Autocorrelated Errors * . . . . . . . . . . . . . 122
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

vii
viii CONTENTS
6 Spectral Analysis and Filtering 129
6.1 Periodicity and Cyclical Behavior . . . . . . . . . . . . . . . . . 129
6.2 The Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . 137
6.3 Linear Filters * . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7 Spectral Estimation 149


7.1 Periodogram and Discrete Fourier Transform . . . . . . . . . . . 149
7.2 Nonparametric Spectral Estimation . . . . . . . . . . . . . . . . . 153
7.3 Parametric Spectral Estimation . . . . . . . . . . . . . . . . . . . 165
7.4 Coherence and Cross-Spectra * . . . . . . . . . . . . . . . . . . . 168
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

8 Additional Topics * 175


8.1 GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.2 Unit Root Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 182
8.3 Long Memory and Fractional Differencing . . . . . . . . . . . . 185
8.4 State Space Models . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.5 Cross-Correlation Analysis and Prewhitening . . . . . . . . . . . 194
8.6 Bootstrapping Autoregressive Models . . . . . . . . . . . . . . . 196
8.7 Threshold Autoregressive Models . . . . . . . . . . . . . . . . . 201
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Appendix A R Supplement 209


A.1 Installing R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
A.2 Packages and ASTSA . . . . . . . . . . . . . . . . . . . . . . . . 209
A.3 Getting Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
A.4 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
A.5 Regression and Time Series Primer . . . . . . . . . . . . . . . . . 217
A.6 Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

Appendix B Probability and Statistics Primer 225


B.1 Distributions and Densities . . . . . . . . . . . . . . . . . . . . . 225
B.2 Expectation, Mean, and Variance . . . . . . . . . . . . . . . . . . 225
B.3 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . 227
B.4 Joint and Conditional Distributions . . . . . . . . . . . . . . . . . 227

Appendix C Complex Number Primer 229


C.1 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 229
C.2 Modulus and Argument . . . . . . . . . . . . . . . . . . . . . . . 231
C.3 The Complex Exponential Function . . . . . . . . . . . . . . . . 231
C.4 Other Useful Properties . . . . . . . . . . . . . . . . . . . . . . . 233
C.5 Some Trigonometric Identities . . . . . . . . . . . . . . . . . . . 234
CONTENTS ix
Appendix D Additional Time Domain Theory 235
D.1 MLE for an AR(1) . . . . . . . . . . . . . . . . . . . . . . . . . 235
D.2 Causality and Invertibility . . . . . . . . . . . . . . . . . . . . . 237
D.3 ARCH Model Theory . . . . . . . . . . . . . . . . . . . . . . . . 241

Hints for Selected Exercises 245

References 253

Index 257
Preface

The goals of this book are to develop an appreciation for the richness and versatility
of modern time series analysis as a tool for analyzing data. A useful feature of
the presentation is the inclusion of nontrivial data sets illustrating the richness of
potential applications in medicine and in the biological, physical, and social sciences.
We include data analysis in both the text examples and in the problem sets.
The text can be used for a one semester/quarter introductory time series course
where the prerequisites are an understanding of linear regression and basic calculus-
based probability skills (primarily expectation). We assume general math skills at
the high school level (trigonometry, complex numbers, polynomials, calculus, and so
on).
All of the numerical examples use the R statistical package (R Core Team, 2018).
We do not assume the reader has previously used R, so Appendix A has an extensive
presentation of everything that will be needed to get started. In addition, there are
several simple exercises in the appendix that may help first-time users get more
comfortable with the software. We typically require students to do the R exercises as
the first homework assignment and we found this requirement to be successful.
Various topics are explained using linear regression analogies, and some estima-
tion procedures require techniques used in nonlinear regression. Consequently, the
reader should have a solid knowledge of linear regression analysis, including multiple
regression and weighted least squares. Some of this material is reviewed in Chapter 3
and Chapter 4.
A calculus-based introductory course on probability is an essential prerequisite.
The basics are covered briefly in Appendix B. It is assumed that students are familiar
with most of the content of that appendix and that it can serve as a refresher.
For readers who are a bit rusty on high school math skills, there are a number of
free books that are available on the internet (search on Wikibooks K-12 Mathematics).
For the chapters on spectral analysis (Chapter 6 and 7), a minimal knowledge of
complex numbers is needed, and we provide this material in Appendix C.
There are a few starred (*) items throughout the text. These sections and examples
are starred because the material covered in the section or example is not needed to
move on to subsequent sections or examples. It does not necessarily mean that the
material is more difficult than others, it simply means that the section or example
may be covered at a later time or skipped entirely without disrupting the continuity.
Chapter 8 is starred because the sections of that chapter are independent special

xi
xii PREFACE
topics that may be covered (or skipped) in any order. In a one-semester course, we
can usually cover Chapter 1 – Chapter 7 and at least one topic from Chapter 8.
Some homework problems have “hints” in the back of the book. The hints vary
in detail: some are nearly complete solutions, while others are small pieces of advice
or code to help start a problem.
The text is informally separated into four parts. The first part, Chapter 1 –
Chapter 3, is a general introduction to the fundamentals, the language, and the
methods of time series analysis. The second part, Chapter 4 – Chapter 5, presents
ARIMA modeling. Some technical details have been moved to Appendix D because,
while the material is not essential, we like to explain the ideas to students who know
mathematical statistics. For example, MLE is covered in Appendix D, but in the main
part of the text, it is only mentioned in passing as being related to unconditional least
squares. The third part, Chapter 6 – Chapter 7, covers spectral analysis and filtering.
We usually spend a small amount of class time going over the material on complex
numbers in Appendix C before covering spectral analysis. In particular, we make sure
that students see Section C.1 – Section C.3. The fourth part of the text consists of the
special topics covered in Chapter 8. Most students want to learn GARCH models, so
if we can only cover one section of that chapter, we choose Section 8.1.
Finally, we mention the similarities and differences between this text and Shumway
and Stoffer (2017), which is a graduate-level text. There are obvious similarities
because the authors are the same and we use the same R package, astsa, and con-
sequently the data sets in that package. The package has been updated for this text
and contains new and updated data sets and some updated scripts. We assume astsa
version 1.8.6 or later has been installed; see Section A.2. The mathematics level of
this text is more suited to undergraduate students and non-majors. In this text, the
chapters are short and a topic may be advanced over multiple chapters. Relative to the
coverage, there are more data analysis examples in this text. Each numerical example
has output and complete R code included, even if the code is mundane like setting up
the margins of a graphic or defining colors with the appearance of transparency. We
will maintain a website for the text at www.stat.pitt.edu/stoffer/tsda. A solutions manual
is available for instructors who adopt the book at www.crcpress.com.

Davis, CA Robert H. Shumway


Pittsburgh, PA David S. Stoffer
Chapter 1

Time Series Elements

1.1 Introduction

The analysis of data observed at different time points leads to unique problems that
are not covered by classical statistics. The dependence introduced by the sampling
data over time restricts the applicability of many conventional statistical methods that
require random samples. The analysis of such data is commonly referred to as time
series analysis.
To provide a statistical setting for describing the elements of time series data,
the data are represented as a collection of random variables indexed according to
the order they are obtained in time. For example, if we collect data on daily high
temperatures in your city, we may consider the time series as a sequence of random
variables, x1 , x2 , x3 , . . . , where the random variable x1 denotes the high temperature
on day one, the variable x2 denotes the value for the second day, x3 denotes the
value for the third day, and so on. In general, a collection of random variables, { xt },
indexed by t is referred to as a stochastic process. In this text, t will typically be
discrete and vary over the integers t = 0, ±1, ±2, . . . or some subset of the integers,
or a similar index like months of a year.
Historically, time series methods were applied to problems in the physical and
environmental sciences. This fact accounts for the engineering nomenclature that
permeates the language of time series analysis. The first step in an investigation
of time series data involves careful scrutiny of the recorded data plotted over time.
Before looking more closely at the particular statistical methods, we mention that
two separate, but not mutually exclusive, approaches to time series analysis exist,
commonly identified as the time domain approach (Chapter 4 and 5) and the frequency
domain approach (Chapter 6 and 7).

1.2 Time Series Data

The following examples illustrate some of the common kinds of time series data as
well as some of the statistical questions that might be asked about such data.

1
2 1. TIME SERIES ELEMENTS
Johnson & Johnson Quarterly Earnings

1015
QEPS
5
0

1960 1965 1970 1975 1980


Time
2
log(QEPS)
0 1

1960 1965 1970 1975 1980


Time
Figure 1.1 Johnson & Johnson quarterly earnings per share, 1960-I to 1980-IV (top). The
same data logged (bottom).

Example 1.1. Johnson & Johnson Quarterly Earnings


Figure 1.1 shows quarterly earnings per share (QEPS) for the U.S. company Johnson
& Johnson and the data transformed by taking logs. There are 84 quarters (21 years)
measured from the first quarter of 1960 to the last quarter of 1980. Modeling such
series begins by observing the primary patterns in the time history. In this case, note
the increasing underlying trend and variability, and a somewhat regular oscillation
superimposed on the trend that seems to repeat over quarters. Methods for analyzing
data such as these are explored in Chapter 3 (see Problem 3.1) using regression
techniques.
If we consider the data as being generated as a small percentage change each year,
say rt (which can be negative), we might write xt = (1 + rt ) xt−4 , where xt is the
QEPS for quarter t. If we log the data, then log( xt ) = log(1 + rt ) + log( xt−4 ),
implying a linear growth rate; i.e., this quarter’s value is the same as last year plus a
small amount, log(1 + rt ). This attribute of the data is displayed by the bottom plot
of Figure 1.1.
The R code to plot the data for this example is,1
library(astsa) # we leave this line off subsequent examples
par(mfrow=2:1)
tsplot(jj, ylab="QEPS", type="o", col=4, main="Johnson & Johnson
Quarterly Earnings")
tsplot(log(jj), ylab="log(QEPS)", type="o", col=4)

1We assume astsa version 1.8.6 or later has been installed; see Section A.2.
1.2. TIME SERIES DATA 3
Global Warming

1.5
Land Surface
1.0 Sea Surface
Temperature Deviations
0.0 0.5
−0.5

1880 1900 1920 1940 1960 1980 2000 2020


Time

Figure 1.2 Yearly average global land surface and ocean surface temperature deviations
(1880–2017) in ◦ C.

Example 1.2. Global Warming and Climate Change


Two global temperature records are shown in Figure 1.2. The data are (1) annual
temperature anomalies averaged over the Earth’s land area, and (2) sea surface tem-
perature anomalies averaged over the part of the ocean that is free of ice at all times
(open ocean). The time period is 1880 to 2017 and the values are deviations (◦ C) from
the 1951–1980 average, updated from Hansen et al. (2006). The upward trend in both
series during the latter part of the twentieth century has been used as an argument
for the climate change hypothesis. Note that the trend is not linear, with periods of
leveling off and then sharp upward trends. It should be obvious that fitting a simple
linear regression of the either series (xt ) on time (t), say xt = α + βt + et , would
not yield an accurate description of the trend. Most climate scientists agree the main
cause of the current global warming trend is human expansion of the greenhouse
effect; see https://climate.nasa.gov/causes/. The R code for this example is:
culer = c(rgb(.85,.30,.12,.6), rgb(.12,.65,.85,.6))
tsplot(gtemp_land, col=culer[1], lwd=2, type="o", pch=20,
ylab="Temperature Deviations", main="Global Warming")
lines(gtemp_ocean, col=culer[2], lwd=2, type="o", pch=20)
legend("topleft", col=culer, lty=1, lwd=2, pch=20, legend=c("Land
Surface", "Sea Surface"), bg="white")

Example 1.3. Dow Jones Industrial Average
As an example of financial time series data, Figure 1.3 shows the trading day closings
and returns (or percent change) of the Dow Jones Industrial Average (DJIA) from
2006 to 2016. If xt is the value of the DJIA closing on day t, then the return is

rt = ( xt − xt−1 )/xt−1 .
4 1. TIME SERIES ELEMENTS

djia$Close 2006−04−20 / 2016−04−20


18000 18000

16000 16000

14000 14000

12000 12000

10000 10000

8000 8000

Apr 20 2006 Nov 01 2007 Jun 01 2009 Jan 03 2011 Jul 02 2012 Jan 02 2014 Jul 01 2015

djia_return 2006−04−21 / 2016−04−20


0.10 0.10

0.05 0.05

0.00 0.00

−0.05 −0.05

Apr 21 2006 Nov 01 2007 Jun 01 2009 Jan 03 2011 Jul 02 2012 Jan 02 2014 Jul 01 2015

Figure 1.3 Dow Jones Industrial Average (DJIA) trading days closings (top) and returns
(bottom) from April 20, 2006 to April 20, 2016.

This means that 1 + rt = xt /xt−1 and

log(1 + rt ) = log( xt /xt−1 ) = log( xt ) − log( xt−1 ) ,

just as in Example 1.1. Noting the expansion

r2 r3
log(1 + r ) = r − 2 + 3 −··· −1 < r ≤ 1,

we see that if r is very small, the higher-order terms will be negligible. Consequently,
because for financial data, xt /xt−1 ≈ 1, we have

log(1 + rt ) ≈ rt .

Note the financial crisis of 2008 in Figure 1.3. The data shown are typical of
return data. The mean of the series appears to be stable with an average return of
approximately zero, however, the volatility (or variability) of data exhibits clustering;
that is, highly volatile periods tend to be clustered together. A problem in the analysis
of these types of financial data is to forecast the volatility of future returns. Models
have been developed to handle these problems; see Chapter 8. The data set is an xts
data file, so it must be loaded.
1.2. TIME SERIES DATA 5

0.040.02
GDP Growth
0.00 −0.02

1950 1960 1970 1980 1990 2000 2010 2020


Time

Figure 1.4 US GDP growth rate calculated using logs (–◦–) and actual values (+).

library(xts)
djia_return = diff(log(djia$Close))[-1]
par(mfrow=2:1)
plot(djia$Close, col=4)
plot(djia_return, col=4)
You can see a comparison of rt and log(1 + rt ) in Figure 1.4, which shows the
seasonally adjusted quarterly growth rate, rt , of US GDP compared to the version
obtained by calculating the difference of the logged data.
tsplot(diff(log(gdp)), type="o", col=4, ylab="GDP Growth") # diff-log
points(diff(gdp)/lag(gdp,-1), pch=3, col=2) # actual return
It turns out that many time series behave like this, so that logging the data and
then taking successive differences is a standard data transformation in time series
analysis. ♦
Example 1.4. El Niño – Southern Oscillation (ENSO)
The Southern Oscillation Index (SOI) measures changes in air pressure related to sea
surface temperatures in the central Pacific Ocean. The central Pacific warms every
three to seven years due to the ENSO effect, which has been blamed for various global
extreme weather events. During El Niño, pressure over the eastern and western Pacific
reverses, causing the trade winds to diminish and leading to an eastward movement
of warm water along the equator. As a result, the surface waters of the central and
eastern Pacific warm with far-reaching consequences to weather patterns.
Figure 1.5 shows monthly values of the Southern Oscillation Index (SOI) and
associated Recruitment (an index of the number of new fish). Both series are for
a period of 453 months ranging over the years 1950–1987. They both exhibit an
obvious annual cycle (hot in the summer, cold in the winter), and, though difficult to
see, a slower frequency of three to seven years. The study of the kinds of cycles and
6 1. TIME SERIES ELEMENTS
Southern Oscillation Index

1.0
COOL
0.0

WARM
−1.0

1950 1960 1970 1980

Recruitment
100
60
0 20

1950 1960 1970 1980


Time

Figure 1.5 Monthly SOI and Recruitment (estimated new fish), 1950–1987.

their strengths is the subject of Chapter 6 and 7. The two series are also related; it is
easy to imagine that fish population size is dependent on the ocean temperature.
The following R code will reproduce Figure 1.5:
par(mfrow = c(2,1))
tsplot(soi, ylab="", xlab="", main="Southern Oscillation Index", col=4)
text(1970, .91, "COOL", col="cyan4")
text(1970,-.91, "WARM", col="darkmagenta")
tsplot(rec, ylab="", main="Recruitment", col=4)

Example 1.5. Predator–Prey Interactions
While it is clear that predators influence the numbers of their prey, prey affect the
number of predators because when prey become scarce, predators may die of star-
vation or fail to reproduce. Such relationships are often modeled by the Lotka–
Volterra equations, which are a pair of simple nonlinear differential equations (e.g.,
see Edelstein-Keshet, 2005, Ch. 6).
One of the classic studies of predator–prey interactions is the snowshoe hare and
lynx pelts purchased by the Hudson’s Bay Company of Canada. While this is an
indirect measure of predation, the assumption is that there is a direct relationship
between the number of pelts collected and the number of hare and lynx in the wild.
These predator–prey interactions often lead to cyclical patterns of predator and prey
abundance seen in Figure 1.6. Notice that the lynx and hare population sizes are
asymmetric in that they tend to increase slowly and decrease quickly (%↓).
The lynx prey varies from small rodents to deer, with the snowshoe hare being
1.2. TIME SERIES DATA 7

150
Hare
Lynx
( × 1000)
100
Number
50 0

1860 1880 1900 1920


Time

Figure 1.6 Time series of the predator–prey interactions between the snowshoe hare and lynx
pelts purchased by the Hudson’s Bay Company of Canada. It is assumed there is a direct
relationship between the number of pelts collected and the number of hare and lynx in the wild.

its overwhelmingly favored prey. In fact, lynx are so closely tied to the snowshoe
hare that its population rises and falls with that of the hare, even though other food
sources may be abundant. In this case, it seems reasonable to model the size of the
lynx population in terms of the snowshoe population. This idea is explored further in
Example 5.17.
Figure 1.6 may be reproduced as follows.
culer = c(rgb(.85,.30,.12,.6), rgb(.12,.67,.86,.6))
tsplot(Hare, col = culer[1], lwd=2, type="o", pch=0,
ylab=expression(Number~~~(""%*% 1000)))
lines(Lynx, col=culer[2], lwd=2, type="o", pch=2)
legend("topright", col=culer, lty=1, lwd=2, pch=c(0,2),
legend=c("Hare", "Lynx"), bty="n")

Example 1.6. fMRI Imaging
Often, time series are observed under varying experimental conditions or treatment
configurations. Such a set of series is shown in Figure 1.7, where data are collected
from various locations in the brain via functional magnetic resonance imaging (fMRI).
In fMRI, subjects are put into an MRI scanner and a stimulus is applied for a
period of time, and then stopped. This on-off application of a stimulus is repeated
and recorded by measuring the blood oxygenation-level dependent (bold) signal
intensity, which measures areas of activation in the brain. The bold contrast results
from changing regional blood concentrations of oxy- and deoxy- hemoglobin.
The data displayed in Figure 1.7 are from an experiment that used fMRI to
examine the effects of general anesthesia on pain perception by comparing results
from anesthetized volunteers while a supramaximal shock stimulus was applied. This
stimulus was used to simulate surgical incision without inflicting tissue damage. In
8 1. TIME SERIES ELEMENTS
Cortex

0.60.2
BOLD
−0.2−0.6

0 20 40 60 80 100 120

Thalamus
0.60.2
BOLD
−0.2−0.6

0 20 40 60 80 100 120

Cerebellum
0.60.2
BOLD
−0.2−0.6

0 20 40 60 80 100 120
Time (1 pt = 2 sec)
Figure 1.7 fMRI data from two locations in the cortex, the thalamus, and the cerebellum;
n = 128 points, one observation taken every 2 seconds. The boxed line represents the
presence or absence of the stimulus.

this example, the stimulus was applied for 32 seconds and then stopped for 32 seconds,
so that the signal period is 64 seconds. The sampling rate was one observation every
2 seconds for 256 seconds (n = 128).
Notice that the periodicities appear strongly in the motor cortex series but seem to
be missing in the thalamus and perhaps in the cerebellum. In this case, it is of interest
to statistically determine if the areas in the thalamus and cerebellum are actually
responding to the stimulus. Use the following R commands for the graphic:
par(mfrow=c(3,1))
culer = c(rgb(.12,.67,.85,.7), rgb(.67,.12,.85,.7))
u = rep(c(rep(.6,16), rep(-.6,16)), 4) # stimulus signal
tsplot(fmri1[,4], ylab="BOLD", xlab="", main="Cortex", col=culer[1],
ylim=c(-.6,.6), lwd=2)
lines(fmri1[,5], col=culer[2], lwd=2)
lines(u, type="s")
tsplot(fmri1[,6], ylab="BOLD", xlab="", main="Thalamus", col=culer[1],
ylim=c(-.6,.6), lwd=2)
lines(fmri1[,7], col=culer[2], lwd=2)
lines(u, type="s")
1.3. TIME SERIES MODELS 9
tsplot(fmri1[,8], ylab="BOLD", xlab="", main="Cerebellum",
col=culer[1], ylim=c(-.6,.6), lwd=2)
lines(fmri1[,9], col=culer[2], lwd=2)
lines(u, type="s")
mtext("Time (1 pt = 2 sec)", side=1, line=1.75)

1.3 Time Series Models

The primary objective of time series analysis is to develop mathematical models that
provide plausible descriptions for sample data, like that encountered in the previous
section.
The fundamental visual characteristic distinguishing the different series shown in
Example 1.1 – Example 1.6 is their differing degrees of smoothness. A parsimonious
explanation for this smoothness is that adjacent points in time are correlated, so
the value of the series at time t, say, xt , depends in some way on the past values
xt−1 , xt−2 , . . .. This idea expresses a fundamental way in which we might think
about generating realistic looking time series.
Example 1.7. White Noise
A simple kind of generated series might be a collection of uncorrelated random
variables, wt , with mean 0 and finite variance σw2 . The time series generated from
uncorrelated variables is used as a model for noise in engineering applications where it
is called white noise; we shall sometimes denote this process as wt ∼ wn(0, σw2 ). The
designation white originates from the analogy with white light (details in Chapter 6).
A special version of white noise that we use is when the variables are independent
and identically distributed normals, written wt ∼ iid N(0, σw2 ).
The upper panel of Figure 1.8 shows a collection of 500 independent standard
normal random variables (σw2 = 1), plotted in the order in which they were drawn. The
resulting series bears a resemblance to portions of the DJIA returns in Figure 1.3. ♦
If the stochastic behavior of all time series could be explained in terms of the
white noise model, classical statistical methods would suffice. Two ways of intro-
ducing serial correlation and more smoothness into time series models are given in
Example 1.8 and Example 1.9.
Example 1.8. Moving Averages, Smoothing and Filtering
We might replace the white noise series wt by a moving average that smoothes the
series. For example, consider replacing wt in Example 1.7 by an average of its current
value and its immediate two neighbors in the past. That is, let

1
w t −1 + w t + w t +1 , (1.1)

vt = 3

which leads to the series shown in the lower panel of Figure 1.8. This series is much
smoother than the white noise series and has a smaller variance due to averaging.
It should also be apparent that averaging removes some of the high frequency (fast
10 1. TIME SERIES ELEMENTS
white noise

3
1
w
−1
−3

0 100 200 300 400 500


Time
moving average
1 2 3
v
−1
−3

0 100 200 300 400 500


Time

Figure 1.8 Gaussian white noise series (top) and three-point moving average of the Gaussian
white noise series (bottom).

oscillations) behavior of the noise. We begin to notice a similarity to some of the


non-cyclic fMRI series in Figure 1.7.
A linear combination of values in a time series such as in (1.1) is referred to,
generically, as a filtered series; hence the command filter. To reproduce Figure 1.8:
par(mfrow=2:1)
w = rnorm(500) # 500 N(0,1) variates
v = filter(w, sides=2, filter=rep(1/3,3)) # moving average
tsplot(w, col=4, main="white noise")
tsplot(v, ylim=c(-3,3), col=4, main="moving average")

The SOI and Recruitment series in Figure 1.5, as well as some of the fMRI series
in Figure 1.7, differ from the moving average series because they are dominated
by an oscillatory behavior. A number of methods exist for generating series with
this quasi-periodic behavior; we illustrate a popular one based on the autoregressive
model considered in Chapter 4.
Example 1.9. Autoregressions
Suppose we consider the white noise series wt of Example 1.7 as input and calculate
the output using the second-order equation

xt = 1.5xt−1 − .75xt−2 + wt (1.2)

successively for t = 1, 2, . . . , 250. The resulting output series is shown in Figure 1.9.
Equation (1.2) represents a regression or prediction of the current value xt of a
1.3. TIME SERIES MODELS 11
autoregression

5
x
0 −5

0 50 100 150 200 250


Time

Figure 1.9 Autoregressive series generated from model (1.2).

time series as a function of the past two values of the series, and, hence, the term
autoregression is suggested for this model. A problem with startup values exists here
because (1.2) also depends on the initial conditions x0 and x−1 , but for now we set
them to zero. We can then generate data recursively by substituting into (1.2). That
is, given w1 , w2 , . . . , w250 , we could set x−1 = x0 = 0 and then start at t = 1:

x1 = 1.5x0 − .75x−1 + w1 = w1
x2 = 1.5x1 − .75x0 + w2 = 1.5w1 + w2
x3 = 1.5x2 − .75x1 + w3
x4 = 1.5x3 − .75x2 + w4

and so on. We note the approximate periodic behavior of the series, which is similar
to that displayed by the SOI and Recruitment in Figure 1.5 and some fMRI series
in Figure 1.7. This particular model is chosen so that the data have pseudo-cyclic
behavior of about 1 cycle every 12 points; thus 250 observations should contain
about 20 cycles. This autoregressive model and its generalizations can be used as an
underlying model for many observed series and will be studied in detail in Chapter 4.
One way to simulate and plot data from the model (1.2) in R is to use the following
commands. The initial conditions are set equal to zero so we let the filter run an extra
50 values to avoid startup problems.
set.seed(90210)
w = rnorm(250 + 50) # 50 extra to avoid startup problems
x = filter(w, filter=c(1.5,-.75), method="recursive")[-(1:50)]
tsplot(x, main="autoregression", col=4)

Example 1.10. Random Walk with Drift
A model for analyzing a trend such as seen in the global temperature data in Figure 1.2,
is the random walk with drift model given by

x t = δ + x t −1 + w t (1.3)
12 1. TIME SERIES ELEMENTS
random walk

80
60
40
20
0

0 50 100 150 200


Time

Figure 1.10 Random walk, σw = 1, with drift δ = .3 (upper jagged line), without drift, δ = 0
(lower jagged line), and dashed lines showing the drifts.

for t = 1, 2, . . ., with initial condition x0 = 0, and where wt is white noise. The


constant δ is called the drift, and when δ = 0, the model is called simply a random
walk because the value of the time series at time t is the value of the series at time
t − 1 plus a completely random movement determined by wt . Note that we may
rewrite (1.3) as a cumulative sum of white noise variates. That is,
t
xt = δ t + ∑ wj (1.4)
j =1

for t = 1, 2, . . .; either use induction, or plug (1.4) into (1.3) to verify this statement.
Figure 1.10 shows 200 observations generated from the model with δ = 0 and .3,
and with standard normal noise. For comparison, we also superimposed the straight
lines δt on the graph. To reproduce Figure 1.10 in R use the following code (notice
the use of multiple commands per line using a semicolon).
set.seed(314159265) # so you can reproduce the results
w = rnorm(200); x = cumsum(w) # random walk
wd = w +.3; xd = cumsum(wd) # random walk with drift
tsplot(xd, ylim=c(-2,80), main="random walk", ylab="", col=4)
abline(a=0, b=.3, lty=2, col=4) # plot drift
lines(x, col="darkred")
abline(h=0, col="darkred", lty=2)

Example 1.11. Signal Plus Noise
Many realistic models for generating time series assume an underlying signal with
some consistent periodic variation contaminated by noise. For example, it is easy to
detect the regular cycle fMRI series displayed on the top of Figure 1.7. Consider the
model
xt = 2 cos(2π t+5015 ) + wt (1.5)

for t = 1, 2, . . . , 500, where the first term is regarded as the signal, shown in the
1.3. TIME SERIES MODELS 13
2cos(2π(t + 15) 50)

2
1
cs
0 −1
−2

0 100 200 300 400 500


Time
2cos(2π(t + 15) 50 + N(0, 1))
4
−4 −2 0 2
cs + w

0 100 200 300 400 500


Time
2
2cos(2π(t + 15) 50) + N(0, 5 )
15
cs + 5 * w
−5 5
−15

0 100 200 300 400 500


Time

Figure 1.11 Cosine wave with period 50 points (top panel) compared with the cosine wave
contaminated with additive white Gaussian noise, σw = 1 (middle panel) and σw = 5 (bottom
panel); see (1.5).

upper panel of Figure 1.11. We note that a sinusoidal waveform can be written as

A cos(2πωt + φ), (1.6)

where A is the amplitude, ω is the frequency of oscillation, and φ is a phase shift. In


(1.5), A = 2, ω = 1/50 (one cycle every 50 time points), and φ = .6π.
An additive noise term was taken to be white noise with σw = 1 (middle panel)
and σw = 5 (bottom panel), drawn from a normal distribution. Adding the two
together obscures the signal, as shown in the lower panels of Figure 1.11. The degree
to which the signal is obscured depends on the amplitude of the signal relative to the
size of σw . The ratio of the amplitude of the signal to σw (or some function of the
ratio) is sometimes called the signal-to-noise ratio (SNR); the larger the SNR, the
easier it is to detect the signal. Note that the signal is easily discernible in the middle
panel, whereas the signal is obscured in the bottom panel. Typically, we will not
observe the signal but the signal obscured by noise.
To reproduce Figure 1.11 in R, use the following commands:
t = 1:500
cs = 2*cos(2*pi*(t+15)/50) # signal
w = rnorm(500) # noise
par(mfrow=c(3,1))
tsplot(cs, col=4, main=expression(2*cos(2*pi*(t+15)/50)))
tsplot(cs+w, col=4, main=expression(2*cos(2*pi*(t+15)/50+N(0,1))))
tsplot(cs+5*w,col=4, main=expression(2*cos(2*pi*(t+15)/50)+N(0,5^2)))

14 1. TIME SERIES ELEMENTS
Problems
1.1.
(a) Generate n = 100 observations from the autoregression

xt = −.9xt−2 + wt

with σw = 1, using the method described in Example 1.9. Next, apply the moving
average filter
vt = ( xt + xt−1 + xt−2 + xt−3 )/4
to xt , the data you generated. Now plot xt as a line and superimpose vt as a
dashed line.
(b) Repeat (a) but with
xt = 2 cos(2πt/4) + wt ,
where wt ∼ iid N(0, 1).
(c) Repeat (a) but where xt is the log of the Johnson & Johnson data discussed in
Example 1.1.
(d) What is seasonal adjustment (you can do an internet search)?
(e) State your conclusions (in other words, what did you learn from this exercise).
1.2. There are a number of seismic recordings from earthquakes and from mining
explosions in astsa. All of the data are in the dataframe eqexp, but two specific
recordings are in EQ5 and EXP6, the fifth earthquake and the sixth explosion, respec-
tively. The data represent two phases or arrivals along the surface, denoted by P
(t = 1, . . . , 1024) and S (t = 1025, . . . , 2048), at a seismic recording station. The
recording instruments are in Scandinavia and monitor a Russian nuclear testing site.
The general problem of interest is in distinguishing between these waveforms in order
to maintain a comprehensive nuclear test ban treaty.
To compare the earthquake and explosion signals,
(a) Plot the two series separately in a multifigure plot with two rows and one column.
(b) Plot the two series on the same graph using different colors or different line types.
(c) In what way are the earthquake and explosion series different?
1.3. In this problem, we explore the difference between random walk and moving
average models.
(a) Generate and (multifigure) plot nine series that are random walks (see Exam-
ple 1.10) of length n = 500 without drift (δ = 0) and σw = 1.
(b) Generate and (multifigure) plot nine series of length n = 500 that are moving
averages of the form (1.1) discussed in Example 1.8.
(c) Comment on the differences between the results of part (a) and part (b).
1.4. The data in gdp are the seasonally adjusted quarterly U.S. GDP from 1947-I to
2018-III. The growth rate is shown in Figure 1.4.
PROBLEMS 15
(a) Plot the data and compare it to one of the models discussed in Section 1.3.
(b) Reproduce Figure 1.4 using your colors and plot characters (pch) of your own
choice. Then, comment on the difference between the two methods of calculating
growth rate.
(c) Which of the models discussed in Section 1.3 best describe the behavior of the
growth in U.S. GDP?
References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans-
actions on Automatic Control, 19(6):716–723.
Blackman, R. and Tukey, J. (1959). The measurement of power spectra, from the
point of view of communications engineering. Dover, pages 185–282.
Bloomfield, P. (2004). Fourier Analysis of Time Series: An Introduction. John
Wiley & Sons.
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. J.
Econometrics, 31:307–327.
Bollerslev, T., Engle, R. F., and Nelson, D. B. (1994). Arch models. Handbook of
Econometrics, 4:2959–3038.
Box, G. and Jenkins, G. (1970). Time Series Analysis, Forecasting, and Control.
Holden–Day.
Brockwell, P. J. and Davis, R. A. (2013). Time Series: Theory and Methods.
Springer Science & Business Media.
Chan, N. H. (2002). Time Series Applications to Finance. John Wiley & Sons, Inc.
Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scat-
terplots. Journal of the American Statistical Association, 74(368):829–836.
Cochrane, D. and Orcutt, G. H. (1949). Application of least squares regression to
relationships containing auto-correlated error terms. Journal of the American
Statistical Association, 44(245):32–61.
Cooley, J. W. and Tukey, J. W. (1965). An algorithm for the machine calculation of
complex Fourier series. Mathematics of Computation, 19(90):297–301.
Durbin, J. (1960). The fitting of time-series models. Revue de l’Institut International
de Statistique, pages 233–244.
Edelstein-Keshet, L. (2005). Mathematical Models in Biology. Society for Industrial
and Applied Mathematics, Philadelphia.
Efron, B. and Tibshirani, R. J. (1994). An Introduction to the Bootstrap. CRC Press.
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of
the variance of United Kingdom inflation. Econometrica, 50:987–1007.
Granger, C. W. and Joyeux, R. (1980). An introduction to long-memory time series
models and fractional differencing. Journal of Time Series Analysis, 1(1):15–
29.
Grenander, U. and Rosenblatt, M. (2008). Statistical Analysis of Stationary Time
Series, volume 320. American Mathematical Soc.
Hansen, J. and Lebedeff, S. (1987). Global trends of measured surface air tempera-
ture. Journal of Geophysical Research: Atmospheres, 92(D11):13345–13372.
Hansen, J., Sato, M., Ruedy, R., Lo, K., Lea, D. W., and Medina-Elizade, M. (2006).
Global temperature change. Proceedings of the National Academy of Sciences,
103(39):14288–14293.
Hosking, J. R. (1981). Fractional differencing. Biometrika, 68(1):165–176.
Hurst, H. E. (1951). Long-term storage capacity of reservoirs. Trans. Amer. Soc.
Civil Eng., 116:770–799.
Hurvich, C. M. and Tsai, C.-L. (1989). Regression and time series model selection
in small samples. Biometrika, 76(2):297–307.
Johnson, R. A. and Wichern, D. W. (2002). Applied Multivariate Statistical Analysis.
Prentice Hall.
Kalman, R. E. (1960). A new approach to linear filtering and prediction problems.
Journal of Basic Engineering, 82(1):35–45.
Kalman, R. E. and Bucy, R. S. (1961). New results in linear filtering and prediction
theory. Journal of Basic Engineering, 83(1):95–108.
Kitchin, J. (1923). Cycles and trends in economic factors. The Review of Economic
Statistics, pages 10–16.
Levinson, N. (1947). A heuristic exposition of Wiener’s mathematical theory of
prediction and filtering. Journal of Mathematics and Physics, 26(1-4):110–119.
McLeod, A. I. and Hipel, K. W. (1978). Preservation of the rescaled adjusted
range: 1. A reassessment of the Hurst phenomenon. Water Resources Research,
14(3):491–508.
McQuarrie, A. D. and Tsai, C.-L. (1998). Regression and Time Series Model
Selection. World Scientific.
Parzen, E. (1983). Autoregressive Spectral Estimation. Handbook of Statistics,
3:221–247.
R Core Team (2018). R: A Language and Environment for Statistical Computing.
R Foundation for Statistical Computing, Vienna, Austria.
Schuster, A. (1898). On the investigation of hidden periodicities with application to a
supposed 26 day period of meteorological phenomena. Terrestrial Magnetism,
3(1):13–41.
Schuster, A. (1906). Ii. on the periodicities of sunspots. Phil. Trans. R. Soc. Lond.
A, 206(402-412):69–100.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics,
6(2):461–464.
Shephard, N. (1996). Statistical aspects of arch and stochastic volatility. Monographs
on Statistics and Applied Probability, 65:1–68.
Shewhart, W. A. (1931). Economic Control of Quality of Manufactured Product.
ASQ Quality Press.
Shumway, R., Azari, A., and Pawitan, Y. (1988). Modeling mortality fluctuations
in Los Angeles as functions of pollution and weather effects. Environmental
Research, 45(2):224–241.
Shumway, R. and Stoffer, D. (2017). Time Series Analysis and Its Applications:
With R Examples. Springer, New York, 4th edition.
Shumway, R. H. and Verosub, K. L. (1992). State space modeling of paleoclimatic
time series. In Proc. 5th Int. Meeting Stat. Climatol, pages 22–26.
Sugiura, N. (1978). Further analysts of the data by Akaike’s information criterion and
the finite corrections: Further analysts of the data by Akaike’s. Communications
in Statistics-Theory and Methods, 7(1):13–26.
Tong, H. (1983). Threshold Models in Non-linear Time Series Analysis. Springer-
Verlag, New York.
Tsay, R. S. (2005). Analysis of Financial Time Series, volume 543. John Wiley &
Sons.
Winters, P. R. (1960). Forecasting sales by exponentially weighted moving averages.
Management Science, 6(3):324–342.
Wold, H. (1954). Causality and econometrics. Econometrica: Journal of the
Econometric Society, pages 162–177.

You might also like