Lajos Horváth
Lajos Horváth
Advisors:
P. Bickel, P. Diggle, S. Fienberg, U. Gather,
I. Olkin, S. Zeger
ISSN 0172-7397
ISBN 978-1-4614-3654-6 ISBN 978-1-4614-3655-3 (eBook)
DOI 10.1007/978-1-4614-3655-3
Springer New York Heidelberg Dordrecht London
Over the last two decades, functional data analysis has established itself as an impor-
tant and dynamic area of statistics. It offers effective new tools and has stimulated
new methodological and theoretical developments. The field has become very broad,
with many specialized directions of research. This book focuses on inferential meth-
ods grounded in the Hilbert space formalism. It is primarily concerned with statisti-
cal hypothesis tests in various functional data analytic settings. Special attention is
devoted to methods based on the functional principal components and model speci-
fication tests, including change point tests. While most procedures presented in this
book are carefully justified by asymptotic arguments, the emphasis is on practically
applicable methodology, rather than theoretical insights into the structure of the rel-
evant statistical models. The methodology we present is motivated by questions and
data arising in several fields, most notably space physics and finance, but we solve
important general problems of inference for functional data, and so the methodol-
ogy explained in this book has a much broader applicability. Detailed derivations
are presented, so readers will be able to modify and extend the specific procedures
we describe to other inferential problems.
The book can be read at two levels. Readers interested primarily in methodology
will find detailed descriptions of the testing algorithms, together with the assess-
ment of their performance by means of simulations studies, followed by, typically,
two examples of application to real data. Researchers interested in mathematical
foundations of these procedures will find carefully developed asymptotic theory.
We provide both the introduction to the requisite Hilbert space theory, which many
graduate students or advanced undergraduate students may find useful, and some
novel asymptotic arguments which may be of interest to advanced researchers. A
more detailed description of the contents is given below.
As noted above, functional data analysis has become a broad research area, and
this book does not aim at giving a comprehensive account of all recent develop-
ments. It focuses on the construction of test statistics and the relevant asymptotic
vii
viii Preface
theory, with emphasis on models for dependent functional data. Many areas of func-
tional data analysis that have seen a rapid development over the last decade are not
covered. These include dynamical systems, sparse logitudinal data and nonpara-
metric methods. The collection of Ferraty and Romain (2011) covers many of the
topics which are the focus of this book, including functional regression models and
the functional principal components, but it also contains excellent review papers on
important topics not discussed in this book, including resampling, curve registration,
classification, analysis of sparse data, with special emphasis on nonparametric meth-
ods and mathematical theory. The books of Ferraty and Vieu (2006) and Ramsay et
al. (2009) are complementary to our work, as they cover, respectively, nonparamet-
ric methods and computational issues. The monograph of Ramsay and Silverman
(2005) offers an excellent and accessible introduction to many topics mentioned
above, while Bosq (2000) and Bosq and Blanke (2007) study mathematical founda-
tions. Our list of references is comprehensive, but it is no longer possible to refer
even to a majority of important and influential papers on functional data analysis.
We cite only the papers most closely related to the research presented in this book.
Chapters 1, 2 and 3 introduce the prerequisites, and should be read before any other
chapters. Readers not interested in the asymptotic theory, may merely go over Chap-
ter 2 to become familiar with the concepts and definitions central to the whole book.
The remaining chapters can essentially be read independently of each other. There
are some connections between them, but appropriate references can be followed
only if desired. Many chapters end with bibliographical notes that direct the reader
to related research papers. The book consists of three parts. Part I is concerned with
the independent and identically distributed functions, a functional random sample.
Part II studies the functional regression model. Part III focuses on functional data
structures that exhibit dependence, in time or in space.
Chapter 1 sets the stage for the remainder of the book by discussing several exam-
ples of functional data and their analyses. Some of the data sets introduced in Chap-
ter 1 are revisited in the following chapters. Section 1.5 provides a brief introduction
to software packages and the fundamental ideas used in the numerical implemen-
tation of the procedures discussed in the book. Part I begins with Chapter 2 which
introduces the central mathematical ideas of the book, the covariance operator and
its eigenfunctions. Chapter 3 follows with the definition of the functional principal
components, which are the most important ingredient of the methodology we study.
Chapters 4, 5 and 6 focus, respectively, on functional counterparts of the multi-
variate canonical correlation analysis, the two sample problem and the change point
problem. Part I concludes with Chapter 7 which discusses a test designed to verify if
a sample of functional data can be viewed as a collection of independent identically
distributed functions. Part II begins with Chapter 8 which offers an overview of the
various functional linear models and of the related inference. The remaining three
Preface ix
Acknowledgements
To a large extent, this book is an account of our research conducted with superb
collaborators: Alexander Aue, István Berkes, Siegfried Hörmann, Marie Hušková,
Jan Sojka, Josef Steinebach and Lie Zhu. Our graduate students: Devin Diderick-
sen, Stefan Fremdt, Robertas Gabrys, Sasha Gromenko, Inga Maslova, Ron Reeder,
Matt Reimherr, Greg Rice and Xi Zhang have been coauthors on almost all papers in
which the research reported in this book had been initially published. They prepared
all figures and tables, and performed the numerical calculations. We are grateful to
many researchers for stimulating comments and discussions, in particular, to John
Aston, Pedro Delicado, Diogo Dubart Norinho, Pedro Galeano, Peter Hall, Mevin
Hooten, Claudia Kirch, Ping Ma, Hans–Georg Müller, Debashis Paul, Dimitris Poli-
tis, Mohsen Pourahmadi, James Ramsay, Philip Reiss, Xiaofeng Shao, John Stevens,
James Stock, Jane–Ling Wang and Fang Yao. We acknowledge stimulating research
environments at the University of Utah and Utah State University, and the generous
funding from the National Science Foundation.
We thank Springer editors John Kimmel and Marc Strauss, and the editorial assis-
tants Hannah Bracken and Kathryn Schell the for the advice they offered at the vari-
ous stages of the preparation of this book. Our special thanks go to Rajiv Monsurate
of Springer who prepared the final LaTeX document. The following publishers
kindly granted licences to reproduce figures: John Wiley and Sons for Figure 1.4,
Oxford University Press for Figure 1.9, and Statistica Sinica for Figure 8.1.
Lajos Horváth
Piotr Kokoszka
Contents
xi
xii Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
Chapter 1
Functional data structures
The data that motivated the research presented in this book is of the form Xn .t/; t 2
Œa; b, where Œa; b is an interval on the line. Each observation is thus a curve. Such
curves can arise in many ways. Figure 1.1 shows a reading of a magnetometer over
a period of one week. A magnetometer is an instrument that measures the three
components of the magnetic field at a location where it is placed. There are over
100 magnetic observatories located on the surface of the Earth, and most of them
have digital magnetometers. These magnetometers record the strength and direction
of the field every five seconds, but the magnetic field exists at any moment of time,
so it is natural to think of a magnetogram as an approximation to a continuous
record. The raw magnetometer data are cleaned and reported as averages over one
minute intervals. Such averages were used to produce Figure 1.1. Thus 72460 D
10; 080 values (of one component of the field) were used to draw Figure 1.1. The
dotted vertical lines separate days in Universal Time (UT). It is natural to view
a curve defined over one UT day as a single observation because one of the main
sources influencing the shape of the record is the daily rotation of the Earth. When an
observatory faces the Sun, it records the magnetic field generated by wind currents
flowing in the ionosphere which are driven mostly by solar heating. Thus, Figure
1.1 shows seven consecutive functional observations.
Many important examples of data that can be naturally treated as functional come
from financial records. Figure 1.2 shows two consecutive weeks of Microsoft stock
prices in one minute resolution. In contrast to the magnetic field, the price of an
asset exists only when the asset is traded. A great deal of financial research has
been done using the closing daily price, i.e. the price in the last transaction of a
trading day. However many assets are traded so frequently that one can practically
Time in minutes
Fig. 1.1 The horizontal component of the magnetic field measured in one minute resolution at
Honolulu magnetic observatory from 1/1/2001 00:00 UT to 1/7/2001 24:00 UT.
1.1 Examples of functional data 3
think of a price curve that is defined at any moment of time. The Microsoft stock
is traded several hundred times per minute. The values used to draw the graph in
Figure 1.2 are the closing prices in one-minute intervals. It is natural to choose
one trading day as the underlying time interval. If we do so, Figure 1.2 shows 10
consecutive functional observations. From these functional observations, various
statistics can be computed. For example, the top panels P of Figure 1.3 show the mean
functions for the two weeks computed as .t/ O D 51 5iD1 Xi .t/, where Xi .t/ is
the price at time t on the i th day of the week. We see that the mean functions have
roughly the same shape (even though they have different ranges), and we may ask
if it is reasonable to assume that after adjusting for the ranges, the differences in
these curves can be explained by chance, or these curves are really different. This
is clearly a setting for a statistical hypothesis test which requires the usual steps
of model building and inference. Most chapters of this book focus on inferential
procedures in models for functional data. The bottom panels of Figure 1.3 show the
five curves Xi .t/ .t/O for each week. We will often work with functional data
centered in this way, and will exhibit the curves using the graphs as those in the
bottom panels of Figure 1.3.
Functional data arise not only from finely spaced measurements. For example,
when measurements on human subjects are made, it is often difficult to ensure that
they are made at the same time in the life of a subject, and there may be differ-
ent numbers of measurements for different subjects. A typical example are growth
curves, i.e. Xn .t/ is the height of subject n at time t after birth. Even though every
24.5
24.0
Closing price
23.5
23.0
M T W TH F M T W TH F
Fig. 1.2 Microsoft stock prices in one-minute resolution, May 1-5, 8-12, 2006
4 1 Functional data structures
(a) (b)
23.60
23.95
23.56
23.90
Mean closing price
23.52
23.80
23.48
10 11 12 13 14 15 16 10 11 12 13 14 15 16
(c) (d)
0.6
0.2
0.4
Centered closing price
0.0
0.0
−0.2
−0.2
−0.4
−0.6
10 11 12 13 14 15 16 10 11 12 13 14 15 16
Fig. 1.3 (a) Mean function of Microsoft stock prices, May 1-5, 2006; (b) Mean function of
Microsoft stock prices, May 8-12, 2006; (c) Centered prices of Microsoft stock, May 1-5, 2006;
(d) Centered prices of Microsoft stock, May 8-12, 2006.
individual has a height at any time t, it is measured only relatively rarely. Thus
it has been necessary to develop methods of estimating growth curves from such
sparse unequally spaced data, in which smoothing and regularization play a crucial
role. Examples and methodology of this type are discussed in the monographs of
Ramsay and Silverman (2002, 2005).
It is often useful to treat as functional data measurements that are neither sparse
nor dense. Figure 1.4, shows the concentration of nitrogen oxide pollutants, referred
to as NOx , measured at Barcelona’s neighborhood of Poblenou. The NOx concen-
tration is measured every hour, so we have only 24 measurements per day. It is nev-
ertheless informative to treat these data as a collection of daily curves because the
pattern of pollution becomes immediately apparent. The pollution peaks in morn-
ing hours, declines in the afternoon, and then increases again in the evening. This
1.1 Examples of functional data 5
NOx levels
400
300
mg/m³
200
100
0
0 5 10 15 20
Hours
Fig. 1.4 Hourly levels of NOx pollutants measured in Poblenou, Spain. Each curve represents one
day.
pattern is easy to explain because the monitoring station is in a city center, and road
traffic is a major source of NOx pollution. Broadly speaking, for functional data the
information contained in the shape of the curves matters a great deal. The above
data set was studied by Febrero et al. (2008), Jones and Rice (1992) study ozone
levels In Upland, California.
The usefulness of the functional approach has been recognized in many other
fields of science. Borggaard and Thodberg (1992) provide interesting applications of
the functional principal component analysis to chemistry. A spectrum is a sampling
6 1 Functional data structures
In this section, based on the work of Febrero et al. (2008), we show how the fun-
damental statistical concepts of the center of a distribution and of an outlier can be
defined in the functional context. A center of a sample of scalars can be defined by
the median, the mean, the trimmed mean, or other similar measures. The definition
of an outlier is less clear, but for relatively small samples even visual inspection
may reveal suspect observations. For a collection of curves, like those shown in
Figure 1.4, it is not clear how to define central curves or outlying curves. The value
of a function at every point t may not be an outlier, but the curve itself may be a
functional outlier. Generally speaking, once incorrectly recorded curves have been
removed, a curve is an outlier if it comes from a populations with a different dis-
tribution in a function space than the majority of the curves. An outlier may be far
away from the other curves, or may have a different shape. The concept of depth
of functional data offers a possible framework for identifying central and outlying
observations; those with maximal depth are central, and those with minimal depth
are potential outliers.
The depth of a scalar data point can be defined in many ways, see Zuo and Ser-
fling (2000). To illustrate, suppose X1 ; X2 ; : : : XN are scalar observations, and
1 X
N
FN .x/ D I fXn xg
N nD1
is their empirical distribution function. The halfspace depth of the observation Xi is
defined as
HSDN .Xi / D min fFN .Xi /; 1 FN .Xi /g :
1.2 Detection of abnormal NOx pollution levels 7
If Xi is the median, then FN .Xi / D 1=2, and so HSDN .Xi / D 1=2, the largest
possible depth. If Xi is the largest point, then FN .Xi / D 1, and so HSDN .Xi / D 0,
the least possible depth. Another way of measuring depth is to define
ˇ ˇ
ˇ1 ˇ
DN .Xi / D 1 ˇ FN .Xi /ˇˇ :
ˇ
2
1 X
N
FN;t .x/ D I fXn .t/ xg ;
N nD1
and we can define a functional depth by integrating one of the univariate depths. For
example, Fraiman and Muniz (2001) define the functional depth of the curve Xi as
Zb ˇ ˇ
ˇ1 ˇ
FDN .Xi / D ˇ
1 ˇ FN;t .Xi .t//ˇˇ dt:
2
a
There are also other approaches to defining functional depth, an interested reader is
referred to Febrero et al. (2008) and López-Pintado and Romo (2009).
Once a measure of a functional depth, denote it generically by FDN , has been
chosen, we can use the following algorithm to identify outliers:
The critical element of this procedure is determining the value of C which should
be so small that only a small fraction, say 1%, of the curves are classified as outliers,
if there are in fact no outliers. The value of C can then be computed from the sample
using some form of bootstrap, two approaches are described in Febrero et al. (2008).
Step 3 is introduced to avoid masking, which takes place when “large” outliers mask
the presence of other outliers.
Febrero et al. (2008) applied this procedure with three measures FDN to the data
shown in Figure 1.4, but split into working and non-working days. The two samples
containing 76 working and 39 nonworking days between February 23 and June 26,
2005 are shown in Figure 1.5, with outliers identified by black lines. For working
8 1 Functional data structures
Working days
03/18/06
300
NOx
04/29/06
100
0
0 5 10 15 20
Hours
04/30/06 03/19/06
100
0
0 5 10 15 20
Hours
Fig. 1.5 Outliers in NOx concentration curves in the samples of working and nonworking days.
days, these are Friday, March 18, and Friday, April 29. For non-working days, the
outliers are the following Saturdays, March 19 and April 30. These days are the
beginning of long weekend holidays in Spain. This validates the identification of
the NOx curves on these days as outliers, as the traffic pattern can be expected to be
different on holidays.
Febrero et al. (2008) did not attempt to develop an asymptotic justification for
the procedure described in this section. Its performance is assessed by application to
a real data set. Such an approach is common. In this book, however, we focus on sta-
tistical procedures whose asymptotic validity can be established. Resampling proce-
dures for functional data taking values in a general measurable space are reviewed
by McMurry and Politis (2010).
1.3 Prediction of the volume of credit card transactions 9
In this section, based on the work of Laukaitis and Račkauskas (2002), we describe
the prediction of the volume of credit card transactions using the functional autore-
gressive process, which will be studied in detail in Chapter 13.
The data available for this analysis consists of all transactions completed using
credit cards issued by Vilnius Bank, Lithuania. Details of every transaction are doc-
umented, but here we are interested only in predicting the daily pattern of the vol-
ume of transactions. For our exposition, we simplified the analysis of Laukaitis and
Račkauskas (2002), and denote by Dn .ti / the number of credit card transactions on
day n; n D 1; : : : ; 200; (03/11/2000 – 10/02/2001) between times ti 1 and ti , where
ti ti 1 D 8 min; i D 1; : : : ; 128: We thus have N D 200 daily curves, which
we view as individual observations. The grid of 8 minutes was chosen for ease of
exposition, Laukaitis and Račkauskas (2002) divide each day into 1024 intervals
of equal length. The transactions are normalized to have time stamps in the inter-
val Œ0; 1, which thus corresponds to one day. The left most panel of Figure 1.6
shows the Dn .ti / for two randomly chosen days. The center and right panels show
smoothed functional versions Dn .t/ obtained, respectively, with 40 and 80 Fourier
basis functions as follows. Each vector ŒDn .t1 /; Dn .t2 /; : : : ; Dn .t128 / is approxi-
mated using sine and cosine functions Bm .t/; t 2 Œ0; 1; whose frequencies increase
with m. We write this approximation as
X
M
Dn .ti / cnm Bm .ti /; n D 1; 2; : : : ; N:
mD1
The trigonometric functions are defined on the whole interval Œ0; 1, not just at the
points ti , so we can continue to work with truly functional data
X
M
Yn .t/ D cnm Bm .t/; n D 1; 2; : : : ; N:
mD1
In this step, we reduced the the number of scalars needed to represent each curve
from 128 to M (40 or 80). If the original data are reported on a scale finer than 8
minutes, the computational gain is even greater. The step of expanding the data with
respect to a fixed basis is however often only a preliminary step to further dimension
reduction. The number M is still too large for many matrix operations, and the
choice of the trigonometric basis is somewhat arbitrary, a spline or a wavelet basis
could be used as well. The next step will attempt to construct an “optimal” basis.
Before we move on to the next step, we remove the weekly periodicity by com-
puting the differences Xn .t/ D Yn .t/ Yn7 .t/; n D 8; 9; : : : ; 200: Figure 1.7
displays the first three weeks of these data. The most important steps of the anal-
ysis are performed on the curves Xn .t/; n D 8; 9; : : : ; 200: which we view as a
stationary functional time series. Thus, while each Xn is assumed to have the same
distribution in a function space, they are dependent. We assume that each Xn is an
10 1 Functional data structures
200
200
150
150
150
100
100
100
50
50
50
0
0
0
−50
0 20 40 60 80 100 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
120
100
80
100
80
80
60
60
60
40
40
40
20
20
20
0
0
0
−20
0 20 40 60 80 100 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Fig. 1.6 Two functional observations Xn derived from the credit card transactions (left–most
panel) together with smooths obtained by projection on 40 and 80 Fourier basis functions.
element of the space L2 D L2 .Œ0; 1/ of square integrable functions on Œ0; 1, and
that there is a function .t; s/; t 2 Œ0; 1; s 2 Œ0; 1; such that
Z 1
Xn .t/ D .t; s/Xn1 .s/ds C "n .t/;
0
where the errors "n are iid elements of L2 . The above equation extends to the func-
tional setting the most popular model of time series analysis, the AR(1) model, in
which the scalar observations Xi are assumed to satisfy Xi D Xi 1 C "i , see e.g.
1.3 Prediction of the volume of credit card transactions 11
400
300
200
100
0
−100
−200
Time
Fig. 1.7 Three weeks of centered time series of fXn .ti /g derived from credit card transaction
data. The vertical dotted lines separate days.
Chapter 3 of Box et al. (1994). To compute an estimate of the kernel .t; s/, the
curves Xn are approximated by an expansion of the form
X
p
Xn .t/ kn vk .t/;
kD1
Chapter 3. Here we note that p is generally much smaller than the number of the
points at which the curves are evaluated (128 in this example) or the number M
of basis functions (40 or 80 in this example). The vk are orthonormal, and form an
“optimal” system for expressing the observations. Laukaitis and Račkauskas (2002)
recommend using p D 4 FPC’s. Once an estimator O has been constructed, we can
R1
predict Xn via XO n D 0 O .t; s/Xn1 .s/ds and the transaction volume curves via
Z 1
O
YnC1 .t/ D Yn6 .t/ C O .t; s/ŒYn .s/ Yn7 .s/ds:
0
Figure 1.8 shows examples of two curves Yn (n D 150 and n D 190) and their
predictions YOn . In general, the predictions tend to underestimate the transaction vol-
ume. This is because even for the scalar AR(1) process, the series of prediction
O n1 has a smaller range than the observations Xn D Xn1 C "n . The
XO n D X
problem of prediction of functional time series is studied in detail in Chapter 13.
This section, based on the work of Leng and Müller (2006), introduces one of many
formulations of the functional linear model. We introduce such models in Chapter
8, and study them in Chapters 9, 11 and 10. Our presentation focuses only on the
central idea and omits many details, which can be found in Leng and Müller (2006)
and Müller and Stadtmüller (2005).
Figure 1.9 shows expression time courses of 90 genes. The expressions are mea-
sured at 18 time points ti with ti ti 1 D 7 minutes. The genes can be classified as
G1 phase and non–G1 phase. A classification performed using traditional methods
yielded 44 G1 and 46 non–G1 genes. Leng and Müller (2006) proposed a statistical
method of classifying genes based exclusively on their expression trajectories. Their
approach can be summarized as follows.
After rescaling time, each trajectory is viewed as a smooth curve Xn .t/; t 2
Œ0; 1; observed, with some error, at discrete time points ti . It is assumed that the
curves are independent and identically distributed with the mean function .t/ D
EXn .t/ and the FPC’s vk , so that they admit a representation
1
X
Xn .t/ D .t/ C kn vk ;
kD1
with Z 1
kn D .Xn .t/ .t//vk .t/dt:
0
The unknown curves Xn must be estimated, as outlined below, but the idea is that
the scalars Z 1
n D ˛ C ˇ.t/ .Xn .t/ .t// dt;
0
1.4 Classification of temporal gene expression data 13
100
40
80
30
60
20
40
10
20
0
0 200 400 600 800 1000 0 200 400 600 800 1000
Fig. 1.8 Two credit card transaction volume curves Yn (solid lines) and their predictions YOn (dot-
ted lines)
for some parameters ˛ and ˇ.t/; t 2 Œ0; 1; can be used to classify the genes as G1
or non–G1. Note that the parameter ˇ is a smooth curve. The idea of classification is
that we set a cut–off probability p1 , and classify a gene as G1 phase if h.n / > p1 ,
where
e
h./ D :
1 C e
The central issue is thus to approximate the linear predictors n , and this involves
the estimation of the curves Xn and the parameters ˛ and ˇ.
14 1 Functional data structures
Fig. 1.9 Temporal gene expression profiles of yeast cell cycle. Dashed lines: G1 phase; Gray solid
lines: non-G1 phases; Black solid line: overall mean curve.
For the curves shown in Figure 1.9, using p D 5 is appropriate. Estimation of the
FPC’s vk involves obtaining a smooth estimate of the covariance surface
i.e
X
p
.p/
n .˛; ˇ/ D ˛ C ˇk Okn ;
kD1
where Z 1
ˇk D ˇ.t/vO k .t/dt; k D 1; 2; : : : ; p:
0
The parameters ˛; ˇ1 ; : : : ; ˇp are estimated using the generalized linear model
!
Xp
Yn D h ˛ C O
ˇk kn C en ;
kD1
1.5 Statistical packages, bases, and functional objects 15
X
p
O n D ˛O C ˇOk Okn
kD1
O > p1 .
for any trajectory, and classify the gene as G1 phase if h./
Leng and Müller (2006) applied this method to the time courses of 6,178 genes in
the yeast cell cycle, and found that their method compares favorably with an earlier
method. In the training sample of the 90 trajectories, they found 5 genes which their
method classified as non–G1, but the traditional method as G1. They argued that the
traditional method may have classified some of these 5 genes incorrectly.
All procedures described in this book can be implemented in readily available sta-
tistical software without writing additional code in FORTRAN or C++. We have
implemented them using the R package fda. When applied to a single data sets,
these procedures are reasonably fast, and never take more then a few minutes on
a single processor laptop or desktop. Some simulations, which require running the
same procedure thousands of times can however take hours, or days if bootstrap is
involved.
Ramsay et al. (2009) provide a solid introduction to computational issues for
functional data, and numerous examples. Their book describes not only the R pack-
age fda, but contains many examples implemented in Matlab. Clarkson et al.
(2005) describes the implementation in S+.
Throughout this book we often refer the choice of a basis and the number of
basis functions. This is an important step in the analysis of functional data, which is
often not addressed in detail in the subsequent chapters, so we explain it here. This
is followed by brief comments on sparsely observed data.
We assume that the collected raw data are already cleaned and organized. Let t
be the one-dimensional argument. Functions of t are observed at discrete sampling
values tj , j D 1; : : : ; J , which may or may not be equally spaced. We work with
N functions with indexes i D 1; : : : ; N ; these are our functional data. These data
are converted to the functional form, i.e. a functional object is created. In order to
do this, we need to specify a basis. A basis is a system of basis functions, a linear
combination of which defines the functional objects. The elements of a basis may
or may not be orthogonal. We express a functional observation Xi as
X
K
Xi .t/ ci k k .t/;
kD1
16 1 Functional data structures
where the k ; k D 1; : : : ; K; are the basis functions. One of the advantages of this
approach is that instead of storing all the data points, one stores the coefficients of
the expansion, i.e. the ci k . As indicated in Section 1.3, this step thus involves an
initial dimension reduction and some smoothing. It is also critical for all subsequent
computations which are performed on the matrices built from the coefficients ci k .
The number K of the basis functions impacts the performance of some procedures,
but other are fairly insensitive to its choice. We discuss this issue in subsequent
chapters on a case by case basis. We generally choose K so that the plotted func-
tional objects resemble original data with some smoothing that eliminates the most
obvious noise. If the performance of a test depends on K, we indicate what values
of K give correct size. The choice of the basis is typically important. We work in
this book with two systems: the Fourier basis and the B–spline basis. The Fourier
basis is usually used for periodic, or nearly periodic, data. Fourier series are useful
for expanding functions with no strong local features and a roughly constant curva-
ture. They are inappropriate for data with discontinuities in the function itself or in
low order derivatives. The B-spline basis is typically used for non-periodic locally
smooth data. Spline coefficients are fast to compute and B–splines form a very flex-
ible system, so a good approximation can be achieved with a relatively small K.
In R, bases are created with calls like:
minutebasis<-create.fourier.basis(rangeval=c(0,1440),nbasis=49)
minutebasis<-create.bspline.basis(rangeval=c(0,1440),nbasis=49)
The parameter rangeval is a vector containing the initial and final values of
the argument t. The bases created above will be used for magnetometer data, which
consist of 1440 data points per day. These data are in one minute resolution, and
there are 1440 minutes in a day. The argument nbasis is the number of basis
functions.
Once a basis is created, the data are converted into functional objects. This is
needed to reduce the computational burden; only the coefficients ci k are used after
this conversion. In our example, the reduction is from 1440 to 49 numbers. In order
to convert raw data into a functional object the function data2fd is used. The code
below produces Figure 1.10. The data are the daily records of the magnetic intensity
stored in the matrix data.
minutetime<-seq(from = 1, to = 1440, by = 1)
minutebasis<-create.bspline.basis(rangeval=c(0,1440),nbasis=69)
data.fd<-data2fd(data, minutetime, basisobj=minutebasis)
plot.fd(data.fd, col="black")
title("Functional data, March -- April, 2001")
mean.function<-mean.fd(data.fd)
lines(mean, lw=7)
The fda package contains a variety of display functions and summary statis-
tics such as plot.fd, mean.fd, var.fd, sd.fd, center.fd, etc. All these
functions use functional objects as input.
The data we work with in this book are available at very densely spaced (typ-
ically equispaced) and numerous points tj (often over a thousand per curve). We
1.5 Statistical packages, bases, and functional objects 17
−200
−300
−400
1 6 12 18 24
Hours
Fig. 1.10 31 magnetic intensity functions with the mean function (thick line)
In this section we follow closely the exposition in Bosq (2000). Good references on
Hilbert spaces are Riesz and Sz.-Nagy (1990), Akhiezier and Glazman (1993) and
Debnath and Mikusinski (2005). An in–depth theory of operators in a Hilbert space
is developed in Gohberg et al. (1990), where the proofs of all results stated in this
section can be found.
We consider a separable Hilbert space H with inner product h ; i which gen-
erates the norm k k, and denote by L the space of bounded (continuous) linear
operators on H with the norm
The j may be assumed positive because one can replace fj by fj , if needed.
The existence of representation (2.1) is equivalent to the condition: maps every
bounded set into a compact set. Another equivalent condition is the following: the
convergence hy; xn i ! hy; xi for every y 2 H implies that k .xn / .x/k ! 0.
k kL k kS : (2.3)
and positive–definite if
h .x/; xi 0; x 2 H:
(An operator with the last property is sometimes called positive semidefinite, and
the term positive–definite is used when h .x/; xi > 0 for x ¤ 0.)
A symmetric positive–definite Hilbert–Schmidt operator admits the decompo-
sition
X1
˝ ˛
.x/ D j x; vj vj ; x 2 H; (2.4)
j D1
The space L2 D L2 .Œ0; 1/ is the set of measurable real–valued functions x defined
R1
on Œ0; 1 satisfying 0 x 2 .t/dt < 1. The space L2 is a separable Hilbert space with
the inner product Z
hx; yi D x.t/y.t/dt:
An integral sign without the limits of integration is meant to denote the integral
Rover the whole2 interval Œ0; 1. If x; y 2 L , the equality x D y always means
2
Œx.t/ y.t/ dt D 0.
2.3 Random elements in L2 and the covariance operator 23
with the real kernel .; /. Such operators are Hilbert–Schmidt if and only if
ZZ
2
.t; s/dtds < 1;
in which case ZZ
k k2S D 2
.t; s/dtds: (2.5)
RR
If .s; t/ D .t; s/ and .t; s/x.t/x.s/dt ds 0, the integral operator is
symmetric and positive–definite, and it follows from (2.4) that
1
X
.t; s/ D j vj .t/vj .s/ in L2 .Œ0; 1 Œ0; 1/: (2.6)
j D1
If is continuous, the above expansions holds for all s; t 2 Œ0; 1, and the series
converges uniformly. This result is known as Mercer’s theorem, see e.g. Riesz and
Sz.-Nagy (1990).
C.y/ D EŒhX; yi X ; y 2 L2 :
The eigenfunctions vj are orthogonal, and they can be normalized to have unit norm,
so that fvj g forms a basis in L2 . Consequently, by Parseval’s equality,
1
X 1
X h˝ ˛2 i
j D E X; vj D EkX k2 < 1: (2.7)
j D1 j D1
One can show that, in fact, C 2 L.L2 / is a covarianceP operator if and only if it is
symmetric positive–definite and its eigenvalues satisfy 1 j D1 j < 1.
To give a specific example of a bounded, symmetric, positive–definite operator
which is not a covariance operator, consider an arbitrary
P ˝ orthonormal
˛ basis fej ; j
1g, so that every x 2 L2 can be expanded as x D j x; ej ej . Define
X˝ ˛
.x/ D x; ej j 1 ej :
j
Since X˝ ˛2
h .x/; xi D x; ej j 1 0;
j
Assumption 2.1. The observations X1 ; X2 ; : : : XN are iid in L2 , and have the same
distribution as X , which is assumed to be square integrable.
X
N
O
.t/ D N 1 Xi .t/
i D1
X
N
O s/ D N 1
c.t; .Xi .t/ .t//
O .Xi .s/ .s//
O :
i D1
X
N
CO .x/ D N 1 hXi ;
O xi .Xi /;
O x 2 L2 :
i D1
Theorem 2.3. If Assumption 2.1 holds, then E O D and EkO k2 D O.N 1 /.
Proof. For every i , for almost all t 2 Œ0; 1, EXi .t/ D .t/, so it follows that
E O D in L2 . Observe that
X
N
˝ ˛
EkO k2 D N 2 EŒ .Xi /; .Xj /
i;j D1
X
N
2
DN EkXi k2 D N 1 EkX k2 : t
u
i D1
In the proof we used the following lemma which follows from conditioning on
X2 and the definition of expectation in a Hilbert space, see Section 2.3.
2.4 Estimation of mean and covariance functions 27
N
O s/ D
EŒc.t; c.t; s/ .in L2 .Œ0; 1 Œ0; 1//:
N 1
The bias of cO is asymptotically negligible, and is introduced by the estimation of
the mean function . Replacing by O in general has a negligible effect, and in
theoretical work, it is convenient to assume that is known and equal to zero. This
simplifies many formulas. When applying such results to real data, it is important to
remember to first subtract the sample mean function O from functional observations.
From now on, except when explicitly stated, we thus assume that the observations
have mean zero. We therefore have
X
N X
N
O s/ D N 1
c.t; Xi .t/Xi .s/I CO .x/ D N 1 hXn ; xi Xn
i D1 i D1
and Z
CO .x/.t/ D O s/x.s/ds;
c.t; x 2 L2 : (2.9)
We will see in Theorem 2.4 that EkX k4 < 1 implies EkCO k2S < 1, where k kS is
the Hilbert–Schmidt norm. By (2.5) and (2.9), RR this implies that with probability one
O / 2 L2 .Œ0; 1 Œ0; 1/ because then E
c.; cO2 .t; s/dt ds < 1. The assumption
EkX k4 < 1 is however only a sufficient condition. A direct verification shows that
if for each 1 n N , Xn ./ 2 L2 .Œ0; 1/ a.s., then c.;
O / 2 L2 .Œ0; 1 Œ0; 1/ a.s..
28 2 Hilbert space model for functional data
1
X 1
X
˝ ˛ ˝ ˛
k hX; i X k2S D k X; ej X k2 D kX k2 j X; ej j2 D kX k4 : t
u
j D1 j D1
kCO C k2S
1 ˇˇ
ˇˇ ˇˇ2
X ˇˇ 1 X ˝
N
˛ ˝ ˛ ˇˇˇˇ
D ˇˇ Xn ; ej Xn EŒ X; ej X ˇˇ
ˇˇ N ˇˇ
j D1 nD1
1
*
X 1 X ˚˝
N
˛ ˝ ˛
D Xn ; ej Xn EŒ Xn ; ej Xn ;
N nD1
j D1
+
1 X ˚˝
N
˛ ˝ ˛
Xm ; ej Xm EŒ Xm ; ej Xm
N mD1
1 N
1 X X X ˚˝
N
˛ ˝ ˛
D 2
h Xn ; ej Xn EŒ Xn ; ej Xn
N
j D1
nD1 mD1
˚˝ ˛ ˝ ˛
Xm ; ej Xm EŒ Xm ; ej Xm i:
Therefore,
EkCO C k2S
1 N
1 X X ˇˇˇˇ˝ ˛ ˝ ˛ ˇˇ2
D 2
E Xn ; ej Xn EŒ Xn ; ej Xn ˇˇ
N
j D1
nD1
1
X
1 ˇˇ˝ ˛ ˝ ˛ ˇˇ2
D E ˇˇ X; ej X EŒ X; ej X ˇˇ
N
j D1
1
1 X ˇˇˇˇ˝ ˛ ˇˇ2
E X; ej X ˇˇ
N
j D1
1
X ˇˇ˝ ˛ ˇˇ
D N 1 E ˇˇ X; ej X ˇˇ2
j D1
2 3
1
X ˝ ˛
D N 1 E 4kX k2 j X; ej j2 5
j D1
D N 1 EkX k4 :
Let k D kN
be a sequence of integers satisfying 1 kN N and
kN
lim D ; for some 0 1: (2.10)
N !1 N
30 2 Hilbert space model for functional data
Define 0 1
1 @ X X
cON .t; s/ D Xi .t/Xi .s/ C Yi .t/Yi .s/A
N
1i k k <i N
and
c .t; s/ D EŒX.t/X.s/ C .1 /EŒY.t/Y.s/:
EkCO N C k2S ! 0:
Hence
ˇˇ ˇˇ2
ˇˇ ˇˇ
ˇˇCO k EŒX.t/X.s/ C N k EŒY .t/Y .s/ ˇˇ
ˇˇ N N N ˇˇ
S
8 ˇˇ ˇˇ2
ˆ ˇˇ
< k 2 ˇˇ 1 X ˇˇ
ˇˇ
2 ˇˇ Xi .t/Xi .s/ EŒX.t/X.s/ˇˇˇˇ
ˇˇ
:̂ N ˇˇ k 1i k ˇˇ
S
ˇˇ ˇˇ2 9
ˇˇ
2 ˇˇ
ˇˇ >
N k X ˇˇ =
C ˇˇ 1 Yi .t/Yi .s/ EŒY .t/Y .s/ˇˇˇˇ :
ˇˇ N k
N ˇˇ k <i N ˇˇ >;
S
2.5 Estimation of the eigenvalues and the eigenfunctions 31
on account of (2.10). t
u
We often must estimate the eigenvalues and eigenfunctions of C , but the interpreta-
tion of these quantities as parameters, and their estimation, must be approached with
care. The eigenvalues must be identifiable, so we must assume that 1 > 2 > :
In practice, we can estimate only the p largest eigenvalues, and assume that 1 >
2 > > p > pC1 , which implies that the first p eigenvalues are nonzero. The
eigenfunctions vj are defined by C vj D j vj , so if vj is an eigenfunction, then so
is avj , for any nonzero scalar a (by definition, eigenfunctions are nonzero). The vj
are typically normalized, so that kvj k D 1, but this does not determine the sign of
vj . Thus if vO j is an estimate computed from the data, we can only hope that cOj vO j is
close to vj , where ˝ ˛
cOj D sign. vO j ; vj /:
Note that cOj cannot be computed form the data, so it must be ensured that the statis-
tics we want to work with do not depend on cOj .
With these preliminaries in mind, we define the estimated eigenelements by
Z
O s/vO j .s/ds D O j vO j .t/; j D 1; 2; : : : N:
c.t; (2.11)
We will often use the following result established in Dauxois et al. (1982) and
Bosq (2000). Its proof is presented in Section 2.7.
We also define ˝ ˛
cOj; D sign. vO j ; vj; /
and state the following theorem.
Theorem 2.8. Suppose Assumption 2.2 and condition (2.10) hold, and
1; > 2; > > p; > pC1; :
Then, for each 1 j p,
h i
E kcOj; vO j vj; k2 ! 0 and E jO j j; j2 ! 0:
Proof. The result follows from Theorem 2.6 and Lemmas 2.2 and 2.3 because the
kernel c .; / is symmetric. t
u
Theorem 2.9. If Assumption 2.1 holds with EX.t/ D 0 and EkX k4 < 1, then
ZN .t; s/ converges weakly in L2 .Œ0; 1 Œ0; 1/ to a Gaussian process .t; s/ with
E .t; s/ D 0 and
Proof. Writing
X
N
ZN .t; s/ D N 1=2 ŒXn .t/Xn .s/ c.t; s/;
nD1
because
ZZ Z Z
E .X.t/X.s//2 dt ds D E X 2 .t/dt X 2 .s/ds
( Z 2 Z 2 )1=2
2 2
2
E X .t/dt X .s/ds D EkX k4 < 1: t
u
We note that if the Xn are strongly mixing random functions, then the functions
Xn .t/Xn .s/ are also strongly mixing with the same rate. Hence, assuming some
moment conditions, for example those in Theorem 2.17 of Bosq (2000), the weak
convergence of the sequence ZN can also be established in the dependent case.
Since ZN .; / converges weakly in the space L2 .Œ0; 1 Œ0; 1/, the asymptotic
normality of O j j and cOj vOj vj is an immediate consequence of Theorem 2.10
which follows for the results of Hall and Hosseini-Nasab (2006), see also Hall and
Hosseini-Nasab (2007). First we state the required conditions.
Assumption 2.3. The random function X 2 L2 which has the same distribution as
the Xn satisfies the following conditions:
C1: For all > 0, sup0t 1 EjX.t/j < 1;
C2: there is > 0 such that for all > 0
Theorem 2.10. If Assumptions 2.1 and 2.3 and condition (2.12) hold, then for 1
j p, ZZ
N 1=2 .O j j / D ZN .t; s/vj .t/vj .s/dt ds C oP .1/
and
sup jN 1=2 .cOj vO j .t/ vj .t// TOj .t/j D oP .1/;
0t 1
34 2 Hilbert space model for functional data
where
X ZZ
TOj .t/ D .j k /1 vk .t/ Zn .t; s/vj .t/vj .s/dt ds:
k¤j
The proof of Theorem 2.7 is based on Lemmas 2.2 and 2.3. These lemmas have
wider applicability, and will be used in subsequent chapters. They state that if the
operators are close, then their eigenvalues and eigenfunctions (adjusted for the sign)
are also close.
Consider two compact operators C; K 2 L with singular value decompositions
1
X 1
X
˝ ˛ ˝ ˛
C.x/ D j x; vj fj ; K.x/ D j x; uj gj : (2.14)
j D1 j D1
The following Lemma is proven in Section VI.1 of Gohberg et al. (1990), see their
Corollary 1.6 on p. 99.
Lemma 2.2. Suppose C; K 2 L are two compact operators with singular value
decompositions (2.14). Then, for each j 1, j j j j kK C kL :
We now tighten the conditions on the operator C by assuming that it is symmetric
and C.vj / D j vj , i.e. fj D vj in (2.14). Notice that any covariance operator C
satisfies these conditions. We also define
˝ ˛
vj0 D cj vj ; cj D sign. uj ; vj /:
Lemma 2.3. Suppose C; K 2 L are two compact operators with singular value
decompositions (2.14). If C is symmetric, fj D vj in (2.14), and its eigenvalues
satisfy (2.12), then
p
0 2 2
kuj vj k kK C kL ; 1 j p;
˛j
where ˛1 D 1 2 and ˛j D min.j 1 j ; j j C1 /; 2 j p:
Proof. For a fixed 1 j p, introduce the following quantities
X˝ ˛2
Dj D kC.uj / j uj k; Sj D uj ; vk :
k¤j
and
Dj 2kK C kL : (2.17)
Verification of (2.15): By the Parseval identity
1
X ˝ ˛ ˝ ˛ ˝ ˛
kuj vj0 k2 D . uj ; vk cj vj ; vk /2 D . uj ; vj cj /2 C Sj :
kD1
If cj D 0, then (2.15) clearly holds, since in this case kuj vj0 k D kuj k D 1
˝ ˛
and Sj D kuj k uj ; vj D 1.
˝ ˛ ˝ ˛
If jcj j D 1, then . uj ; vj cj /2 D .1 j uj ; vj j/2 , and using the identity
P ˝ ˛2
k uj ; vk D 1, we obtain
1
X
˝ ˛ ˝ ˛2 ˝ ˛ ˝ ˛2
.1 j uj ; vj j/2 D uj ; vk 2j uj ; vj j C uj ; vj :
kD1
Thus, if jcj j D 1,
˝ ˛ ˝ ˛2 ˝ ˛
. uj ; vj cj /2 D Sj C 2. uj ; vj j uj ; vj j Sj :
In the R package fda, and in most other numerical implementations, the curves
O
Xi are smoothed before the estimates .t/ O s/ introduced in Section 2.4 are
and c.t;
36 2 Hilbert space model for functional data
This chapter introduces one of the most fundamental concepts of FDA, that of the
functional principal components (FPC’s). FPC’s allow us to reduce the dimension of
infinitely dimensional functional data to a small finite dimension in an optimal way.
In Sections 3.1 and 3.2, we introduce the FPC’s from two angles, as coordinates
maximizing variability, and as an optimal orthonormal basis. In Section 3.3, we
identify the FPC’s with the eigenfunctions of the covariance operator, and show
how its eigenvalues decompose the variance of the functional data. We conclude
with Section 3.4 which explains how to compute the FPC’s in the R package fda.
In this section we present some preliminary results which are fundamental for the
remainder of this chapter. To motivate, we begin with a vector case, and then move
on to the space L2 .
We first state the following well–known result, see e.g. Chapter 6 of Leon (2006).
UT U D I and Auj D j uj :
Moreover,
UT AU D D diagŒ1 ; 2 ; : : : ; p :
1 > 2 > > p . We want to find a unit length vector x such that xT Ax is
maximum. By the spectral decomposition, xT Ax D yT y, where y D UT x. Since
U is orthonormal, kyk D kxk, so it is enough to find a unit length P vector y such
that yyT is maximum, and then set x D Uy. Since yyT D pjD1 j yj2 , clearly
y D Œ1; 0; : : : ; 0T , and x D u1 with the maximum being 1 .
The above ideas can be easily extended to a separable Hilbert space, where they
become even more transparent. Suppose is a symmetric positive–definite Hilbert–
Schmidt operator in L2 . We have seen in Section 2.4 that the covariance operator C
and its sample counterpart CO are in this class, provided EkX k4 < 1. The operator
then admits the spectral decomposition (2.4), and the problem of maximizing
h .x/; xi subject to kxk D 1 becomes trivial because
*1 + 1
X ˝ ˛ X ˝ ˛2
h .x/; xi D j x; vj vj ; x D j x; vj :
j D1 j D1
P ˝ ˛2
By Parseval’s equality, we must maximize the above, subject to 1 j D1 x; vj D 1.
2
˝To ensure
˛ uniqueness, suppose 1 > 2 > , so we take hx; v1 i D 1 and
x; vj D 0 for j > 1. Thus, h .x/; xi is maximized at v1 (or -v1 ), and the max-
imum is 1 . Suppose now that we want to maximize h .x/; xi subject not only to
the condition kxk D 1, but also to hx; v1 i D 0. Thus we want to find another unit
norm function which is orthogonal to the function found in the first step. Such a
P ˝ ˛2 P ˝ ˛2
function, clearly satisfies h .x/; xi D 1 j D2 j x; vj and 1j D2 x; vj D 1,
so x D v2 , and the maximum now is 2 . Repeating this procedure, we arrive at the
following theorem.
Theorem 3.2. Suppose is a symmetric, positive definite Hilbert–Schmidt operator
with eigenfunctions vj and eigenvalues j satisfying (2.12). Then,
˚ ˝ ˛
sup h .x/; xi W kxk D 1; x; vj D 0; 1 j i 1; i < p D i ;
and the supremum is reached if x D vi . The maximizing function is unique up to a
sign.
The approach developed in the previous section can be applied to the following
important problem. Suppose we observe functions x1 ; x2 ; : : : ; xN . In this section it
is not necessary to view these functions as random, but we can think of them as the
observed realizations of random functions in L2 . Fix an integer p < N . We think
of p as being much smaller than N , typically a single digit number. We want to find
an orthonormal basis u1 ; u2 ; : : : ; up such that
N ˇˇ
ˇˇ ˇˇ2
X X
p ˇˇ
O 2 ˇˇ ˇˇ
S D ˇˇxi hxi ; uk i uk ˇˇ
ˇˇ ˇˇ
i D1 kD1
3.3 Functional principal components 39
X
N X
N
D kxi k2 hxi ; ui2 ;
i D1 i D1
PN D E
i.e. maximizes i D1 hxi ; ui 2
D O
C u; u : By Theorem 3.2, we conclude that u D
vO 1 .
The general case is treated analogously. Since
X
N X
N X
p
SO 2 D kxi k2 hxi ; uk i2 ;
i D1 i D1 kD1
we need to maximize
X
p X
N p D
X E
2
hxi ; uk i D CO .uk /; uk
kD1 i D1 kD1
X1 1 1
˝ ˛2 X ˝ ˛2 X ˝ ˛2
D O j u1 ; vO j C O j u2 ; vOj C C O j up ; vOj :
j D1 j D1 j D1
Pp
By Theorem 3.2, the sum cannot exceed kD1 O k , and this maximum is attained if
u1 D vO1 ; u2 D vO 2 ; : : : ; up D vO p .
1 X
N D E
hXi ; xi2 D CO .x/; x
N
i D1
can be viewed as the sample variance of the data “in the direction” of the function
x. If we are interested in finding the function x which is “most correlated” with the
variability of the data (away from the mean if the data are not centered), we must
thus find x which maximizes hCO .x/; xi. Clearly, we must impose a restriction on
the norm of x, so if we require that kxk D 1, we see from Theorem 3.2 that x D vO 1 ,
the first EFPC. Next, we want to find a second direction, orthogonal to vO1 , which is
“most correlated” with the variability of the data. By Theorem 3.2, this direction is
vO 2 . Observe that since the vO j ; i D 1; : : : ; N; form a basis in RN ,
1 X 1 XX˝ ˛2 X 1 X˝ ˛2 X
N N N N N N
kXi k2 D Xi ; vO j D Xi ; vOj D O j :
N N N
i D1 i D1 j D1 j D1 i D1 j D1
Thus, we may say that the variance in the direction vOj is O j , or that vO j explains
P
the fraction of the total sample variance equal to O j =. N O
kD1 k /. We also have the
corresponding population analysis of variance:
1
X 1
X 1
2
˝ ˛2 ˝ ˛ X
EkX k D EŒ X; vj D C vj ; vj D j :
j D1 j D1 j D1
We now present an example that describes how functional data with specified
FPC’s can be generated. Set
X
p
Xn .t/ D aj Zj n ej .t/; (3.1)
j D1
where aj are real numbers, for every n, the Zj n are iid mean zero random variables
with unit variance, and the ej are orthogonal functions with unit norm. To com-
pute the covariance operator, we do not have to specify the dependence between the
sequences fZj n ; j 1g. This is needed to claim the convergence of the EFPC’s to
3.3 Functional principal components 41
Therefore,
X
p Z
C.x/.t/ D aj2 ej .s/x.s/ds ej .t/:
j D1
It follows that the EPC’s of the Xn are the ej , and the eigenvalues are j D aj2 .
Methods of functional data analysis which use EFPC’s assume that the observa-
tions are well approximated by an expansion like (3.1) with a small p and relatively
smooth functions ej .
In most applications, it is important to determine
P ˝a value ˛of p such that the actual
data can be replaced by the approximation piD1 vO j ; Xn vOj . A popular method
is the scree plot. This is a graphical method proposed, in a different context, by
Cattell (1966). To apply it, one plots the successive eigenvalues O j against j (see
Figure 9.6). The method suggests to find j where the decrease of the eigenvalues
appears to level off. This point is used as the selected value of p. To the right of
it, one finds only the “factorial scree” (“scree” is a geological term referring to the
debris which collects on the lower part of a rocky slope). The method that works best
for the applications discussed in this book is the CPV method defined as follows. The
cumulative percentage of total variance (CPV) explained by the first p EFPC’s is
Pp
O k
CP V .p/ D PkD1 :
N O
kD1 k
We choose p for which CP V .p/ exceeds a desired level, 85% is the recommended
value. Other methods, known as pseudo–AIC and cross–validation have also been
proposed. All these methods are described and implemented in the MATLAB pack-
age PACE developed at the University of California at Davis.
This section has merely set out the fundamental definitions and properties. Inter-
pretation and estimation of the functional principal components has been a subject
of extensive research, in which concepts of smoothing and regularization play a
major role, see Chapters 8, 9, 10 of Ramsay and Silverman (2005).
42 3 Functional principal components
The R function pca.fd computes the EFPC’s vOj , the corresponding eigenvalues
˝ ˛
O j , and the scores Xi XN N ; vO j . Its argument must be a functional object, see
Section 1.5. A typical call is
par(mfrow=c(1,3))
plot(pca$scores[,1], pca$scores[,2], xlab="1st PC scores",
ylab="2nd PC scores")
plot(pca$scores[,1], pca$scores[,3], xlab="1st PC scores",
ylab="3rd PC scores")
plot(pca$scores[,2], pca$scores[,3], xlab="2nd PC scores",
ylab="3rd PC scores")
500
500
1000
500
0
2nd PC scores
3rd PC scores
3rd PC scores
0
−500
−500
−500
−1000
−1000
−1000
Fig. 3.1 Scatter plots of the scores of the magnetic intensity data.
3.5 Bibliographical notes 43
Canonical correlation analysis (CCA) is one of the most important tools of multi-
variate statistical analysis. Its extension to the functional context is not trivial, and
in many ways illustrates the differences between multivariate and functional data.
One of the most influential contributions has been made by Leurgans et al. (1993)
who showed that smoothing is necessary in order to define the functional canonical
correlations meaningfully.
This chapter is organized as follows. Section 4.1 reviews multivariate population
and sample canonical correlation analysis (CCA). In Section 4.2, we explain how
functional population CCA should be defined, but postpone the difficult question of
its existence to Section 4.6. First, in Section 4.3, we discuss two ways in which its
sample version has been defined, and then, in Section 4.4, we show the usefulness of
the functional sample CCA by applying it the analysis of space physics data. After
this numerical example, we return, in Section 4.6, to the theoretical question of the
existence of the population functional CCA. Section 4.6 uses some properties of the
square root of the covariance operator, so we first review the relevant concepts in
Section 4.5.
In this section we review the definition and some properties of the multivariate CCA.
Proofs of the results stated in this section are presented e.g. in Johnson and Wichern
(2002).
Suppose X and Y are two random vectors, respectively, in Rp and Rq . For deter-
ministic vectors a 2 Rp and b 2 Rq define the random variables
A D aT X; B D bT Y:
We want to find a and b which maximize
Cov.A; B/
Corr.A; B/ D p : (4.1)
VarŒAVarŒB
Clearly, if a and b maximize (4.1), then so do ca and d b for any c; d > 0. Therefore,
we impose a normalizing condition
If such a and b exist, we denote them a1 and b1 , and set A1 D aT1 X; B1 D bT1 Y.
We call .A1 ; B1 / the first pair of canonical variables and
˚
1 D Cov.A1 ; B1 / D max Cov.aT X; bT Y/ W VarŒaT X D VarŒbT Y D 1
(4.3)
the first canonical correlation.
Once a1 and b1 have been found, we want to find another pair .a; b/ which
maximizes (4.1) subject to (4.2), but also satisfies
If such a and b exist, we denote them a2 and b2 and call A2 D aT2 X; B2 D bT2 Y
the second pair of canonical variables and the resulting value 2 of (4.1) the second
canonical correlation. Notice that 2 1 because 2 is a maximum over a smaller
subspace (condition (4.4) is added).
We can continue in this way to find kth canonical components .k ; ak ; bk ;
Ak ; Bk / by requiring that the pair .Ak ; Bk / maximizes (4.1) subject to (4.2) and
Assume that C11 and C22 are nonsingular and introduce the correlation matrices
R D C1=2 1=2
11 C12 C22 ; RT D C1=2 1=2
22 C21 C11 :
Setting m D min.p; q/, it can be shown that the first m eigenvalues of the matrices
and
MY D RT R D C1=2 1 1=2
22 C21 C11 C12 C22
12 22 m
2
> 0:
4.1 Multivariate canonical components 47
MX ek D k2 ek ; MY fk D k2 fk ; k D 1; 2; : : : m:
Then
ak D C1=2
11 ek ; bk D C1=2
22 fk
are the weights of the kth pair of canonical variables, and k is the kth canonical
correlation. It is easy to check the the vectors ek and fk have unit norm and are
related via
ek D k1 Rfk ; fk D k1 RT ek :
For the development in the subsequent sections, it is convenient to summarize
the above using the inner product notation. Observe that
Cov.aT X; bT Y/ D E aT XbT Y
D E aT XYT b D aT E XYT b D ha; C12 bi
and
Var aT X D ha; C11 ai ; Var bT Y D hb; C22 bi :
Thus
k D hak ; C12 bk i
(4.6)
D max fha; C12 bi W a 2 Rp ; b 2 Rq ; ha; C11 ai D 1; hb; C22 bi D 1g
and
BO D ŒbO T y1 ; bO T y2 ; : : : ; bO T yN T
is maximum, provided the vectors A O and BO have unit sample variance. Once the
weight vectors aO and bO have been found, they are denoted aO 1 and bO 1 , and the corre-
sponding first pair of sample canonical variates by A O 1 and BO 1 . We then search for
another pair .Oa2 ; bO 2 / such that the sample correlation between analogously defined
AO 2 and B
O 2 is maximum subject to the conditions of unit sample variances and the
48 4 Canonical correlation analysis
lack of sample correlation with the AO 1 and BO 1 . Conditions for the existence of sam-
ple multivariate canonical components are fully analogous to those stated for the
population CCA. We define the matrices C O ij ; i; j D 1; 2; by analogy with the
definition of the matrices Cij . For example, the p q matrix C O 12 is defined as
2 32 3T
1 X
N
1 XN
1 XN
O 12 D
C 4xj xj 5 4yj yj 5 :
N 1 N N
j D1 j D1 j D1
O 11 and C
If the matrices C O 22 are nonsingular, then the sample multivariate canonical
O k; B
components .Ok ; aO k ; bO k ; A O k / exist for k min.p; q/, and are calculated by
replacing the matrices Cij by the C O ij .
We now define the functional canonical components (FCC) by analogy to the multi-
variate setting. Their existence will be investigated in Section 4.6. We work with two
L2 spaces H1 D L2 .T1 / and H2 D L2 .T2 /, where T1 and T2 are, possibly differ-
ent, subsets of a Euclidean space. We consider square integrable random functions
X 2 H1 ; Y 2 H2 and, to simplify the notation and some formulas, we continue to
assume that they have mean zero. The canonical components are determined solely
by the covariance structure and do not depend on the means. Thus, we define the
covariance functions
c11 .t; s/ D EŒX.t/X.s/;
c12 .t; s/ D EŒX.t/Y .s/;
c21 .t; s/ D EŒY .t/X.s/;
c22 .t; s/ D EŒY .t/Y .s/:
Next, we define the operators
C11 W H1 ! H1 ; C12 W H2 ! H1 ;
C21 W H1 ! H2 ; C22 W H2 ! H2
via
Z
C11 .x/.t/ D c11 .t; s/x.s/ds D EŒhX; xi X.t/;
T1
Z
C12 .y/.t/ D c12 .t; s/y.s/ds D EŒhY; yi X.t/;
T2
Z
C21 .x/.t/ D c21 .t; s/x.s/ds D EŒhX; xi Y .t/;
ZT 1
C22 .y/.t/ D c22 .t; s/y.s/ds D EŒhY; Y i Y .t/:
T2
4.3 Sample functional canonical components 49
The operators C11 and C22 are just the covariance operators introduced in Chapter
2, so they are symmetric, positive–definite and Hilbert–Schmidt. It is easy to extend
the definition of a Hilbert–Schmidt operator to the space L.H2 ; H1 / of bounded
operators from H2 to H1 , see Section 4.5. It is then seen that C12 is Hilbert–Schmidt
because by the Cauchy–Schwartz inequality
Z Z
2
c12 .t; s/dt ds EkX k2 EkY k2 :
T1 T2
and
BO D Œhb; y1 i ; hb; y2 i ; : : : ; hb; yN iT : (4.14)
100
50
0
0
values
values
−50
−100
−100
−200
−200
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
time time
to this storm. We explain toward the end of this section how the curves in Figure 4.2
were obtained.
Several ways of defining sample FCC’s have been put forward. All of them
involve some form of smoothing and/or dimension reduction. We describe here a
method recommended by He et al. (2004), which is closely related to the theory
introduced in Section 4.6. Then we turn to the method of Leurgans et al. (1993),
which emphasizes smoothing the weight functions a and b. It is implemented in the
R package fda.
Denote by O i and vOi the eigenvalues and the eigenfunctions of the sample covari-
ance operator of the functions xi , and define analogously Oj and uO j for the yj .
P P
Determine the numbers p and q such that i p O i and j q Oj explain the
required proportion of the variance, see Section 3.3. Methods of selecting p and
q which involve cross–validation are described in Section 2.5 of He et al. (2004).
Next, compute the scores
˝ ˛
Oi n D hvOi ; Xn i ; i D 1; 2; : : : ; p; Oj n D uO j ; Yn ; j D 1; 2; : : : ; q:
(a) (b)
0.5
0.0
HON weight function
−0.5
−1.0
−1.0
−1.5
−2.0
−2.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
(c) (d)
2
2
HON weight function
1
1
0
0
−1
−1
−2
−2
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Fig. 4.2 Weight functions for first canonical correlations of the curves displayed in Figure 4.1. Top
row with penalty, bottom row without penalty.
52 4 Canonical correlation analysis
X
p X
q
XOn D Oi n vOi ; YOn D Oj n uO j :
i D1 j D1
The curves XOn and YOn are smoothed versions of the original observations Xn and
Yn , while the vectors
˝ ˛
O n D ŒhXn ; vO1 i ; hXn ; vO2 i ; : : : Xn ; vOp T ;
˝ ˛
O n D ŒhYn ; uO 1 i ; hYn ; uO 2 i ; : : : Yn ; uO q T
allow us to reduce the the problem to the multivariate sample CCA described in
Section 4.1. The collection of pairs
now plays the role of the multivariate sample (4.8), and allows us to find the multi-
O k ; BO k /. In analogy to (4.25), we
variate sample canonical components .k ; aO k ; bO k ; A
define the functional canonical components as .k ; aO k ; bOk ; A O k ; BO k /, where
He et al. (2004) recommend an additional smoothing step. After the vO i and the
uO j have been computed, they can be smoothed in some way, for example using
polynomial smoothing. Denote the smoothed FPC’s by vQi and uQ j , and construct the
vectors
˝ ˛
Q n D ŒhXn ; vQ1 i ; hXn ; vQ2 i ; : : : Xn ; vQp T ;
˝ ˛
Q n D ŒhYn ; uQ 1 i ; hYn ; uQ 2 i ; : : : Yn ; uQ q T :
O A/
CN .A; O C ka00 k2 D 1; O B/
and CN .B; O C kb 00 k2 D 1:
The number > 0 is a smoothing parameter which penalizes for using functions
a; b which are highly irregular. It can be chosen by cross-validation, or subjectively,
4.3 Sample functional canonical components 53
Create a functional object that smooths the data using specified roughness
penalty:
par(mfrow=c(2,2))
plot.fd(cca.smoothed\$weight1[1], main="(a)",
ylab="HON weight function")
plot.fd(cca.smoothed\$weight2[1], main="(b)",
ylab="KAK weight function")
plot.fd(cca.unsmoothed\$weight1[1], main="(c)",
ylab="HON weight function",
xlab="Time (proportion of a day)")
plot.fd(cca.unsmoothed\$weight2[1], main="(d)",
ylab="KAK weight function",
xlab="Time (proportion of a day)")
Canonical correlations are often used to see which samples of functions are most
strongly associated, an application of this type is presented in Section 4.4. In such
applications, provided clear cut differences exists, any reasonable choice of , or
several values of , will lead to informative comparisons. The same is true for
choosing the orders p and q. In physical applications, these orders can be chosen to
restrict the analysis to meaningful principal components.
54 4 Canonical correlation analysis
This section is based on the work of Maslova et al. (2009) who proposed a new
method computing an index of magnetic storm activity. Magnetic storms belong to
the most important phenomena in near Earth space due to the energy involved and
their impact on the operation of satellite based telecommunication and navigation
systems.
The data are magnetometer observations, an example is shown in Figure 1.1.
When a magnetic storm occurs, the H-component drops for a period of 2-3 days at
observatories close to the magnetic equator, reflecting a strong magnetic field gen-
erated by a magnetospheric ring current that forms during storms. Figure 4.3 shows
a magnetometer record at Honolulu during a storm. Similar curves are observed at
other equatorial observatories, but each of them looks different, mostly because at
a given universal time, different observatories may be at local day- or nighttime, or
dawn or dusk. The position of an observatory relative to the sun has a noticeable
impact on the shape of the magnetogram. The change in the shape of the mag-
netometer records due to the daily rotation of the Earth is called the Sq (Solar
quiet) variation. An important direction of research in space physics, going back
to Sugiura (1964), has been concerned with developing an index curve that would
measure the strength of a magnetic storm globally, as different storm signatures
are observed at different observatories. Computing a global index involves averag-
ing over several equatorial observatories after removing the Sq variation from each
record. The technical details are quite complex, and there is no universal agree-
ment in the space physics community on the best way to construct a good global
index.
Maslova et al. (2009) proposed a new method of removing the Sq variation from
each record. Without discussing the technicalities, the idea is that the component
that is removed changes from day to day. For an older method, WISA, this com-
ponent was constant over the period of 2-4 weeks. The new method was proposed
in two variants, referred to below as 1) “with” and 2) “without centering”. There
are thus three methods to compare. To perform the comparison, the Sq variation is
removed by every method from the record at every observatory. What is left, should
reflect the effect of a global ring current, not the location of the station relative to
the Sun. Thus if the removal is successful, the remainders, called preindices, at the
pairs of stations should be highly correlated.
We present only a small part of the validation study reported by Maslova et al.
(2009). We consider four observatories, known as the “Dst Observatories”, which
are listed in Table 4.1. These four observatories yield six pairs listed in Table 4.2. For
each pair, we compute the sample FCC’s as described in Section 4.3. The smoothing
parameter is chosen so that all correlations fall into a relatively large subinterval
of [0,1], so that a visual comparison is facilitated. We are not interested so much in
the values of the sample FCC’s as in their order for the three methods. The results
are shown in Figure 4.4. High sample FCC’s indicate that the preindices obtained
by one of the methods are good because they measure the same field generated by
4.5 Square root of the covariance operator 55
100
0
−100
Intensity (nT)
−200
−300
−400
6 18 6 18 6 18 6 18 6 18 6 18
UT (hours)
Fig. 4.3 H-component of the magnetogram recorded at Honolulu Mar 29 – Apr 3 (thin line)
together with the global index developed by Maslova et al. (2009) (thick line). The dashed lines
separate UT days. The drop reflects a magnetic storm.
a global ring current. Figure 4.4 shows that the preindices constructed with the new
method always have higher sample FCC’s than those obtained with an older WISA
method. The new method with centering is generally better than the new method
without centering. A more detailed analysis confirms these assertions.
In this section we review certain properties of the covariance operator C which will
be used in Section 4.6
56 4 Canonical correlation analysis
Table 4.2 Pairs of four Dst stations (first set) used to compare methodologies.
Combination # Stations
1 HON & KAK
2 HON & SJG
3 HON & HER
4 KAK & SJG
5 KAK & HER
6 SJG & HER
1.0
0.9
0.9
Canonical correlation
Canonical correlation
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
1 2 3 4 5 6 1 2 3 4 5 6
Combination Combination
1.0
0.9
0.9
Canonical correlation
Canonical correlation
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
1 2 3 4 5 6 1 2 3 4 5 6
Combination Combination
Fig. 4.4 Canonical correlations for the new method (star), new method without centering (cross)
and WISA (circle), applied to all combinations of four Dst stations (see Table 4.2).
in which the j are nonnegative, the vj form a basis and satisfy C.vj / D j vj .
4.5 Square root of the covariance operator 57
and so
1 1
*1 +2
X ˝ ˛2 X X 1=2
1
j y; vj D 1
j i hx; vi i vi ; vj
j D1 j D1 i D1
1
X ˝ ˛2 1
X ˝ ˛2
D 1
j 1=2
j x; vj D x; vj D kxk2 < 1:
j D1 j D1
P ˝ ˛2 P 1=2 ˝ ˛
Conversely, if 1 1
j D1 j y; vj < 1, then x D 1j D1 j y; vj vj is a well–
defined element of L2 , and a direct verification shows that C 1=2 .x/ D y. The
inverse of C 1=2 is thus defined by
1
X 1=2 ˝ ˛
C 1=2 .y/ D j y; vj vj ; y 2 R.C 1=2 /: (4.18)
j D1
Notice that under assumption (4.16) each vk is in R.C 1=2 /, and since R.C 1=2 / is
a linear subspace, so are all finite linear combinations of the vk . However, in contrast
to the Euclidean space Rp , these finite linear combinations do not fill the whole of
58 4 Canonical correlation analysis
P1 1=2 P1
L2 . Define, for example, y D kD1 k vk . Since kD1 k < 1, y 2 L .
2
Hilbert–Schmidt. The sum does not depend on the choice of bases, and its square
root is the Hilbert–Schmidt norm. The space of Hilbert–Schmidt operators in
L.H1 ; H2 / is denoted S.H1 ; H2 /. If A1 2 S.H1 ; H2 / and A2 2 S.H2 ; H3 /, then
A2 A1 2 S.H1 ; H3 /, and kA2 A1 kS kA1 kS kA2 kS .
If A is an integral operator of the form
Z
A.x/.t/ D a.t; s/x.s/ds; x 2 H1 ;
T1
R R
then A is Hilbert–Schmidt if and only if T2 T1 a2 .t; s/dt ds < 1: In that case,
Z Z 1
X
2
a .t; s/dt ds D aj2i < 1:
T2 T1 i;j D1
Recall the notation introduced in Sections 4.1 and 4.2. The central message of this
section is that the whole spaces H1 and H2 are too large to define a functional
CCA. It it possible only on smaller subspaces. In practice, this is reflected by the
required smoothing in the sample FCCA, as discussed in Section 4.3. It is difficult
to capture this idea in a theoretical framework. We present the approach of He et al.
(2003) who propose to restrict the spaces H1 and H2 by imposing conditions on the
magnitude of the eigenvalues of C11 and C22 and their interactions, see Assumption
4.1. A more general framework for functional CCA was developed by Eubank and
Hsing (2008). Cupidon et al. (2007) is also relevant.
We would like to construct operator analogs of the matrices MX and MY defined
1=2 1=2
in Section 4.1. Define R1 D R.C11 / H1 and R2 D R.C22 / H2 . We need
the following condition
C12 .H2 / R1 ; C21 .H1 / R2 : (4.19)
4.6 Existence of the functional canonical components 59
and so we obtain
1
X 1
X
1 2
i hx; vi i kyk
2
rj2i j:
i D1 i;j D1
P
Thus, a sufficient condition for C12 .H2 / R1 is 1 rj2i j < 1 and, anal-
PD1
i;j
1
ogously, a sufficient condition for C21 .H1 / R2 is i;j D1 rj2i i < 1: Since
j 1 ; i 1 , both these conditions are implied by (4.22). t
u
60 4 Canonical correlation analysis
By (4.24),
1=2 1=2 1=2
X ˝ ˛
C12 C22 .uj / D j C12 .uj / D j EŒi k ui ; uj vk
i;k
1=2
X
D j EŒk j vk :
k
Consequently,
1=2
X 1=2
R.uj / D j EŒk j C11 .vk /
k
1=2
X X
D j EŒk j 1=2
k
vk D rjk vk :
k k
MY D R R W R2 ! R2 :
1
X 1
X
1 2
i rj i < 1 and 1 2
j rj i < 1:
i;j D1 i;j D1
1 1
* 1
+2
X X X ˝ ˛
1
i
2
hR.fk /; vi i D 1
i fk ; uj R.uj /; vi
i D1 i D1 j D1
0 12
1
X 1
X ˝ ˛
D 1
i
@ fk ; uj rj i A
i D1 j D1
1
X 1
X 1
˝ ˛2 X
1
i fk ; uj rj2i
i D1 j D1 j D1
1
X
kfk k2 1 2
i rj i :
i;j D1
˝ ˛ ˝ ˛
Thus, uj ; S.vi / D uj ; R .vi / , and so we conclude that S D R . Finally define
MX D RR W R1 ! R1
and observe that MX .k1 R.fk // D k2 .k1 R.fk //, and that kek k2 D 1. We sum-
marize these calculations in the following proposition:
62 4 Canonical correlation analysis
To lighten the notation, we verify part (ii) for k D 2. For functions a and b
1=2 1=2
satisfying the assumptions of part (ii) define x D C11 .a/; y D C22 .b/. Then
D E
1=2 1=2
ha; C12 .b/i D C11 .x/; C12 C22 .y/ D hx; R.y/i kxkkR.y/k:
Assumption 4.1 states that in order for the FCC’s to exist, the correlations of the
scores i and j must tend to zero very fast. This is trivially the case if rj i D 0 if i >
p or j > q, for some integers p and q. We thus conclude this section by considering
the case when the random functions X and Y admit the finite expansions
X
p X
q
XD i vi ; Y D j uj ;
i D1 j D1
D Œ1 ; : : : ; p T ; D Œ1 ; : : : ; q T :
R1 D spfv1 ; : : : ; vp g; R2 D spfu1 ; : : : ; uq g:
Consider the operators Cij ; i; j D 1; 2; defined in Section 4.2, but with their
P restricted to the appropriate subspaces Ri ; i D 1; 2. For example, if
domains
y D pjD1 yj uj ; yj D hy; ui i ; then
2* + 3
X
q X
q X
p
C12 .y/ D EŒhY; yi X D E 4 j uj ; yj 0 uj 0 i vi 5
j D1 j 0 D1 i D1
2 3
X
q X
p X
p
DE4 j yj i vi 5 D EŒi j yj vi :
j D1 i D1 i D1
Thus, the i th coefficient of C12 .y/ in the basis fv1 ; : : : ; vp g coincides with the i th
component of C12 y, where C12 D EŒ T and y D Œy1 ; : : : ; yq T . If the matrices
C11 D EŒ T and C22 D EŒ T are nonsingular, then the canonical components
.k ; ak ; bk ; Ak ; Bk / of the random vectors and are defined as in Section 4.1.
Direct verification then shows that .k ; ak ; bk ; Ak ; Bk / are the FCC’s of X and Y ,
where
Due to possibly different FPC’s structures, working with two functional samples
may be difficult. An important contribution has been made by Benko et al. (2009)
who developed bootstrap procedures for testing the equality of mean functions, the
FPC’s, and the eigenspaces spanned by them. In this chapter, we present asymp-
totic procedures for testing the equality of the means and the covariance operators
in two independent samples. Section 5.1 focuses on testing the equality of mean
functions. It shows that instead of statistics which have chi–square limits, those that
converge to weighted sums of squares of independent standard normals can also be
used. In other chapters we focus on statistics converging to chi–square distributions,
but analogous versions converging to weighted sums of normals can be readily con-
structed.
In Section 5.1 we present the procedures for testing the equality of the mean
functions, together with the theorems that justify them asymptotically; the proofs of
these theorems are presented in Section 5.3. Finite sample performance is examined
in Section 5.2. The theory presented in Section 5.4 is based on the work of Panaretos
et al. (2010), which contains some further extensions and numerical applications.
and
Xi .t/ D .t/ C "i .t/; 1 i M: (5.2)
H0 W D in L2
1 X
N
M
Ó N;M .t; s/ D .Xi .t/ XNN .t//.Xi .s/ XN N .s//
M CN N
i D1
1 X
M
N
C .Xi .t/ XN M
.t//.Xi .s/ XNM
.s//:
M CN M
i D1
P
The sum dkD1 Ok Nk2 provides an approximation to the limit in (5.7) if d is large
enough. The choice of d is discussed in Section 5.2.
The asymptotic consistency of Method I follows form thefollowing result.
P
then UN;M ! 1:
The corresponding eigenfunctions are '1 ; : : : ; 'd C1 : We want to project the obser-
vations onto the space spanned by '1 ; : : : ; 'd :. Since these functions are unknown,
we are using the corresponding eigenfunctions of ZO N;M , denoted by 'O1 ; : : : ; 'Od .
Now we project XN N XN M
into the linear space spanned by 'O1 ; : : : ; 'Od . Let
aO i D hXN N XN M
; 'Oi i; 1 i d;
Q.i; j / D .1 /EhX1 ; 'i ihX1 ; 'j i C EhX1 ; 'i ihX1 ; 'j i;
68 5 Two sample inference for the mean and covariance functions
NM X 2
d
.1/
TN;M D aO k =Ok
N CM
kD1
and
NM X 2
d
.2/
TN;M D aO k :
N CM
kD1
Theorem 5.3. If H0 and (5.3)–(5.5), (5.6) and (5.9) hold, then
.1/ d
TN;M ! 2 .d / (5.11)
and
.2/ d X
d
TN;M ! k Nk2 ; (5.12)
kD1
We present the results of a small simulation study aimed at comparing the test-
ing procedures introduced in Section 5.1. Since Method I essentially reduces to the
.2/ .1/ .2/
statistic TN;M of Method II, we compare the statistics TN;M and TN;M .
We consider sample sizes N D 50 and N D 100 and M D N as well as
M D 2N . To compare the sizes, we set .t/ D .t/ D 0. Under the alternative,
we set .t/ D 0 and .t/ D at.1 t/. The power is then a function of the parame-
ter a. We consider two setting for the errors: 1) Both the "i and the "i are Brownian
bridges; 2) The "i are Brownian bridges, and the "i are Brownian motions. To com-
pute the test statistics we converted the Gaussian processes simulated as increments
into functional objects using 49 Fourier basis function. We then computed the test
statistics with d D 5.
The tests have size very close to the nominal size, almost always within one per-
cent. We did not detect any systematic differences in size between the two tests.
The tests have remarkably good power. To illustrate, Figure 5.1 shows fifty trajecto-
ries of the Brownian bridge in the left panel and 50 independent trajectories of the
Brownian bridge plus .t/ D at.1 t/ with a D 0:8 in the right panel. Except
one function in the right panel which goes visibly above the other functions, both
sets look very similar, and it would be difficult to tell by eye that they have differ-
.1/
ent mean functions. Yet, based on one thousand replications, T50;50 rejects the null
.2/
hypothesis of equal means with probability 0.91 and T50;50 with probability 0.98,
at the nominal size ˛ D 5%. For such relatively small sizes, the tests suffer from
an elevated probability of type I error. For ˛ D 5%, the empirical size is 6:6% for
.1/ .2/
T50;50 and 7:3% for T50;50 . When both M and N exceed 100, the empirical sizes are
within 1% of the nominal sizes. Typical results are shown in Table 5.1. The power
is higher if the distributions of the errors are the same in both samples.
2
2
1
1
0
0
−1
−1
−2
−2
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Fig. 5.1 Fifty trajectories of the Brownian bridge (left) and fifty independent trajectories of the
Brownian bridge plus .t/ D 0:8t .1 t / (right). The tests can detect the different means with
probability higher than 90%.
70 5 Two sample inference for the mean and covariance functions
.1/ .2/
Table 5.1 Size (a D 0:0) and power (a > 0) of the tests based on T100;200 and T100;200 . The
sample with N D 100 has Brownian bridge errors, the one with M D 200 has Brownian motion
errors.
˛ D :01 ˛ D :05 ˛ D :10
.1/ .2/ .1/ .2/ .1/ .2/
a T100;200 T100;200 T100;200 T100;200 T100;200 T100;200
0:0 1:3 1:4 5:6 5:7 10:8 10:8
0:1 1:7 1:7 6:8 6:4 12:2 12:1
0:2 2:6 3:0 9:8 11:0 16:5 18:2
0:3 4:8 6:1 15:8 18:5 24:7 27:3
0:4 9:8 10:7 23:6 27:7 35:6 39:0
0:5 18:5 19:5 37:8 41:5 50:1 54:6
0:6 29:3 29:7 52:3 55:0 64:3 67:2
0:7 42:7 43:1 65:5 67:5 75:6 78:1
0:8 59:2 57:7 79:3 80:3 87:1 88:1
0:9 73:7 71:0 88:7 89:0 93:7 94:7
1:0 85:3 82:0 94:4 94:8 97:2 97:7
1:1 92:8 89:8 97:8 97:3 98:9 98:7
1:2 96:6 94:7 99:3 99:2 99:9 99:7
1:3 98:9 97:7 99:8 99:7 99:9 99:9
1:4 99:5 99:3 100:0 99:9 100:0 100:0
1:5 99:8 99:8 100:0 100:0 100:0 100:0
1:6 100:0 100:0 100:0 100:0 100:0 100:0
.1/ .2/
Table 5.2 P–values (in percent) of the tests based on statistics TN;M and TN;M applied to medfly
data.
d 1 2 3 4 5 6 7 8 9
.1/
T 1.0 2.2 3.0 5.7 10.3 15.3 3.2 2.7 5.0
T .2/ 1.0 1.0 1.0 1.1 1.1 1.0 1.0 1.1 1.1
40
30
eggs
20
10
0
0 5 10 15 20 25 30
day
Fig. 5.2 Estimated mean functions for the medfly data: short lived –solid line; long lived –dashed
line.
to use. Both tests reject the equality of the mean functions, even though the sample
means, shown in Figure 5.2, are not far apart. The P–values for the statistic T .2/ are
much more stable, equal to about 1%, no matter the value of d . The behavior of the
test based on T .1/ is more erratic. This indicates that while the test based on T .1/ is
easier to apply because it uses standard chi–square critical values, the test based on
T .2/ may be more reliable.
Proof of Theorem 5.1. By Theorem 2.1, there are two independent Gaussian
processesP 1 and 2 with zero means
P and covariances C and C such that
1=2 1=2
1i N .Xi /; M 1i M .Xi // converges weakly in L
2
.N
to . 1 ; 2 /: This proves (5.7) with D .1 / 1 C 2 : t
u
Proof of Theorem 5.2. It follows from the proof of Theorem 5.1 that
Z 1
NM
UN;M D ..t/ .t//2 dt C OP .1/;
N CM 0
Proof of Theorem 5.3. The central limit theorem for sums of independent and iden-
tically distributed random vectors in Rd yields
1=2 T
NM
hXN N XNM
; '1 i; : : : ; hXN N XN M
; 'd i (5.13)
N CM
d
! Nd .0; Q/;
where Nd .0; Q/ is a d -variate normal random vector with mean 0 and covariance
matrix Q. Since
Z 1 Z 1
.ZO N;M .t; s/ Z.t; s//2 dt ds D oP .1/;
0 0
and
max k'Oi cOi 'k D oP .1/; (5.15)
1i d
where cO1 ; : : : ; cOd are random signs. We showed in the proof of Theorem 5.1 that
Z 1
N .XN N .t/ .t//2 dt D OP .1/
0
and
Z 1
M .XNM .t/ .t//2 dt D OP .1/;
0
so by (5.15) we have
1=2
NM ˇ ˇ
max ˇhXN N XN ; 'Oi cOi 'iˇ D oP .1/:
M
1i d N CM
Now the results in Theorem 5.3 follow from (5.13), (5.14)and from the observation
.1/ .2/
that neither TN;M nor TN;M depend on the random signs cOi . t
u
Proof of Theorem 5.4. Following the proof of Theorem 5.3 one can easily verify
that
1=2 1=2
NM NM
aO i D h ; 'i i C OP .1/; 1 i d:
N CM N CM
Since (5.15) also holds, both parts of Theorem 5.4 are proven. t
u
5.4 Equality of covariance operators 73
H0 W C D C versus HA W C ¤ C :
In Theorem 5.5, we will assume that X and X are Gaussian elements of L2 . This
means that the equality of the covariances implies the equality in distribution. Thus,
under the additional assumption of normality, H0 states that the Xi have the same
distribution as the Xj .
Denote by CO and CO the empirical counterparts of C and C , and by R, O the
empirical covariance operator of the pooled data, i.e.
8 9
1 <X N XM
˝ ˛ =
O
R.x/ D hXi ; xi Xi C Xj ; x Xj
N CM : ;
i D1 j D1
D O CO .x/ C .1 O /C .x/;
O x 2 L2 ;
where
OD N
:
N CM
The operator RO has N C M eigenfunctions, which are denoted Ok . We also set
1 XD E2 1 X D O E2
N M
O k D Xn ; Ok ; O k D X ; k :
N nD1 M mD1 m
Note that the O k and the O k are not the eigenvalues of the operators CO and CO ,
but rather the sample variances of the coefficients of X and X with respect to the
orthonormal system fOk ; 1 k N C M g formed by the eigenfunctions of the
operator R.O
The test statistic is defined by
D E2
N C M X p .CO CO /O i ; Oj
TO D O .1 O / :
OO O O OO O O
i;j D1 . i C .1 /i /. j C .1 /j /
2
Theorem 5.5. Suppose X and X are Gaussian elements of L2 such that EkX k4 <
1 and EkX k4 < 1. Suppose also that O ! 2 .0; 1/, as N ! 1. Then
d
TO ! 2p.pC1/=2 ; N; M ! 1;
74 5 Two sample inference for the mean and covariance functions
where 2p.pC1/=2 denotes a chi-square random variable with p.p C 1/=2 degrees
of freedom.
Proof. Introduce the random operators
˝ ˛
Ci .x/ D hXi ; xi Xi ; Cj .x/ D Xj ; x Xj ; x 2 L2 :
The Ci form a sequence of iid elements the Hilbert space S of the Hilbert–Schmidt
operators acting on L2 , and the same is true for the Cj . Under H0 , the Ci and the
Cj have the same mean C . They also have the same covariance operator, which is
an operator acting on S given by
The assumption of Gaussianity and C D C imply that the Xi and the Xj have the
same distribution, so
h i h˝ ˛2 ˝ ˛ i
E hXi ; en i2 hXi ; .en /i Xi D E Xj ; en Xj ; .en / Xj :
We want to apply the CLT in the Hilbert space S to the operators Ci . By Theorem
2.1, we must verify that EkCi k2S < 1. This holds because, by Parseval’s equality,
1
" 1
#
X X
2 2 2
EkCi kS D E k hXi ; en i Xi k D E kXi k j hXi ; en i j D EkXi k4 :
2
nD1 nD1
We therefore obtain,
d d
N 1=2 .CO C / ! Z1 ; M 1=2 .CO C / ! Z2 ; (5.17)
so that Pp 2
i;j D1 WN;M .i; j /
TO D (5.18)
2. O O i C .1 O /O i /. O O j C .1 O /O j /
5.4 Equality of covariance operators 75
d p X1 Xq
ZD 2 i i i Vi i C i j ij .Vij C Vj i /; (5.20)
i D1 i <j
and so 8p
ˆ
<p2k kk if k D n;
hZ.vk /; vn i D k n kn if k < n; (5.21)
:̂p
k n nk if k > n:
Using (5.19) and (5.21), we see that
d X
p X 2 C 2 d Xp X
TO ! 2
kk C kn nk
D 2
kk C 2 d
kn D 2p.pC1/=2 : t
u
2
kD1 k<p kD1 k<p
76 5 Two sample inference for the mean and covariance functions
where the i are the eigenvalues of C and fni ; i 1g are independent sequences
of iid standard normal random variables.
Direct verification then shows that
1 q
X 1
X
Cn D i j ni nj Vij ; C D i Vi i :
i;j D1 i D1
The expected value EŒni nj n` nk is zero unless there are two pairs of equal
indices, or all indices are equal. We therefore have
X ˝ ˛ ˝ ˛
G. / D i j hVi i ; iS Vjj C Vij ; S Vij C Vij ; S Vj i
i ¤j
X X X
C3 2i hVi i ; iS Vi i 2i hVi i ; iS Vi i i j hVi i ; iS Vjj
i i i ¤j
X X ˝ ˛ ˝ ˛
D2 2i hVi i ; iS Vi i C i j Vij ; S Vij C Vij ; S Vj i :
i i ¤j
In Section 5.1 we assume that the observations in each sample are independent. If
the functions are obtained from a time record, for example daily or annual curves,
then the assumption of independence need not hold. Horváth et al. (2011) extend the
methodology and theory of Section 5.1 to dependent errors "i and "i . Instead of the
covariance kernel ´N;M .t; s/, a kernel corresponding to suitably defined long–run
covariances must be used. The dependence is quantified by the notion of Lp –m–
approximability introduced in Chapter 16. Gromenko and Kokoszka (2011) develop
5.5 Bibliographical notes 77
a test for the equality of the mean functions of the curves from two disjoint spatial
regions. They emphasize computational issues arising in small sample sizes of spa-
tially dependent curves.
The two sample problem for the covariance operators if the assumption of nor-
mality is violated is studied by studied by Fremdt et al. (2011) and Kraus and
Panaretos (2011). Boente et al. (2011) develop a bootstrap test to test the equal-
ity of covariance operators.
The two sample problem when the equality of the whole distributions is tested is
studied by Hall and Keilegom (2007) who emphasize the role of smoothing in two
sample problems for functional data.
Laukaitis and Račkauskas (2005) consider the model Xg;i .t/ D g .t/ C
"g;i .t/; g D 1; 2; : : : ; G; with innovations "g;i and group means g , and test
H0 W 1 .t/ D D G .t/: Other related contributions are Cuevas et al. (2004),
Delicado (2007) and Ferraty et al. (2007).
Chapter 6
Detection of changes in the mean function
In this chapter, we present a methodology for the detection of changes in the mean of
functional observations. At its core is a significance test for testing the null hypothe-
sis of a constant functional mean against the alternative of a changing mean. We also
show how to locate the change points if the null hypothesis is rejected. Our method-
ology is readily implemented using the R package fda. The null distribution of
the test statistic is asymptotically pivotal with a well-known asymptotic distribution
going back to the work of Kiefer (1959).
In Section 6.1, we provide some background and motivation. After formulating
the assumptions in Section 6.2, we describe the test procedure in Section 6.3. The
finite sample performance is investigated in Section 6.4, which also contains an
illustrative application to the detection of changes in mean patters of annual temper-
atures. The proofs of the theorems of Section 6.3 are presented in Section 6.5.
6.1 Introduction
Throughout the book we typically assume that the observations Xi have mean zero.
This is clearly not true in applications, so a suitable assumption is that Xi D C Yi ,
where EYi D 0. These equalities are in the space L2 , which, in particular, means
that EXi .t/ D .t/ for almost all t 2 Œ0; 1. The various procedures discussed in
this book refer then to the mean adjusted variables Xi which are estimated by
Xi XN . In particular, the FPC’s vk , are those of X , and we have the following
L2 expansion
X1
Xi .t/ D .t/ C ki vk .t/; 1 i N:
kD1
The FPC’s vk and their eigenvalues are then estimated using the sample covariance
operator
XN
˝ ˛
CO .x/ D N 1 Xi XN N ; x .Xi XN N /; x 2 L2 :
i D1
The above approach is however not valid if the observations Xi do not have the same
mean. If the data are collected sequentially, like annual temperature curves, intra-
day price curves, or daily magnetometer curves, then it is possible that the mean
function changes over time. The simplest type of change is that the mean func-
tion changes abruptly from one deterministic curve to another. Such an assump-
tion is clearly a convenient idealization. However, as has been shown for scalar
observations, procedures aimed at detecting such simple “jump changes” also
have power to detect more complex changes. The model for an abrupt change is
Xi D 1 C Yi ; 1 i k , Xi D 2 C Yi ; k < i N , where k is an unknown
change point. Assuming k =N ! , a simple verification shows that then CO is
close to CY C .1 / h; i , where D 1 2 . The eigenfunctions of CO
will then no longer estimate the eigenfunctions of CY , the covariance operator of
the Yi . In general, if the mean function changes, inference based on the FPC’s will
no longer be valid.
It is important to distinguish between a change point problem and the problem
of testing for the equality of means discussed in Chapter 5. In the latter setting,
it is known which population or group each observation belongs to. In the change
point setting, we do not have any partition of the data into several sets with possibly
different means. The change can occur at any point, and we want to test if it occurs
or not, and if it does, to estimate the point of change.
In this chapter, we assume that the observations are independent. This assumption
is often approximately satisfied, and allows to focus on the aspect of the methodol-
ogy directly related to change point detection. The case of dependent observations
is considered in Chapter 16. We note that even if the mean zero Yi are independent,
but the mean changes, a test of independence, like the one studied in Chapter 7,
will show that the Xi D i C Yi are dependent. This phenomenon is well–known
for scalar observations, and is referred to as spurious dependence.
Change point methodology is often applied to time series of average annual tem-
peratures at specific locations, or to series derived from such data, like the land
surface or marine global temperature series described and used in several examples
in Shumway and Stoffer (2006). Figure 6.1 shows the series of average annual tem-
peratures in Central England from 1780 to 2007. Longer records of temperature in
England, reaching into 1600’s, are available, but we focus on the more recent period
because starting from late 1700’s daily temperatures have been recorded, and the
annual curves can be viewed as functional observations. One such curve is shown
in Figure 6.2. Detecting a change point in mean in a series like the one shown
in Figure 6.1 should not be taken literary to mean that the mean actually abruptly
changes from one value to another in a specific year. It means that the assumption of
a constant mean value for the whole series is not acceptable. The estimated change
point then shows a rough break point after which the temperatures are higher on
average. Different model formulations are obviously possible. One can postulate a
straight line regression model for the annual mean and test for changes in slope.
In this chapter, we focus on the change in mean problem in the functional setting.
In this case, the mean is a function, and the change can be not only in the average
level of this function, but also in its shape. For the daily temperature data, a change in
6.1 Introduction 81
10.5
Degrees Celcius
9.5
8.5
7.5
10
5
Monthly averages
−5
Time
Fig. 6.2 Daily temperatures in 1916 with monthly averages and functional object obtained by
smoothing with B-splines.
82 6 Detection of changes in the mean function
shape may mean, for example, that while the overall annual average stays the same,
summers may become warmer and winters colder. In Section 6.4 we show that in
the functional setting more subtle changes can be detected than in the multivariate
setting which studies average monthly temperatures. The difference between the
multivariate and the functional data is illustrated in Figure 6.2.
Note that under H0 , we do not specify the value of the common mean.
The test we construct has a particularly good power against the alternative in
which the data can be divided into several consecutive segments, and the mean is
constant within each segment, but changes from segment to segment. The simplest
case of only two segments (one change point) is specified in Assumption 6.4.
Under the null hypothesis, we can represent each functional observation as
The following assumption specifies conditions on ./ and the errors Yi ./ needed
to establish the asymptotic distribution of the test statistic.
Assumption 6.1. The mean ./ is in L2 . The errors Yi ./ are iid mean zero random
elements of L2 which satisfy
Z
EkYi k2 D EYi2 .t/dt < 1: (6.2)
is square integrable, i.e. is in L2 .Œ0; 1 Œ0; 1/. Consequently, it implies the follow-
ing expansions: X
c.t; s/ D k vk .t/vk .s/ (6.4)
1k<1
and X
Yi .t/ D `;i v` .t/: (6.5)
1`<1
The vk are eigenfunctions of the covariance operator with kernel (6.3). The
sequences f`;i ; ` D 1; 2; : : :g are independent, and within each sequence the `;i
6.2 Notation and assumptions 83
are uncorrelated with mean zero and variance ` . The infinite sum in (6.5) converges
in L2 with probability one.
Recall that the estimated eigenelements are defined by
Z
O s/vO ` .s/ds D O ` vO ` .t/; ` D 1; 2; : : : ;
c.t; (6.6)
where
1 X
O s/ D
c.t; Xi .t/ XNN .t/ Xi .s/ XNN .s/
N
1i N
and
1 X
XN N .t/ D Xi .t/:
N
1i N
To control the distance between the estimated and the population eigenelements,
we need the following assumptions:
We establish the consistency of the test under the alternative of one change point
formalized in Assumption 6.4. A similar argument can be developed if there are
several change points, but the technical complications then obscure the main idea
explained in Sections 6.3 and 6.5 (in particular the functions (6.9) and (6.16) would
need to be modified). The more general case is studied empirically in Section 6.4.
in which the Yi satisfy Assumption 6.1, the mean functions 1 and 2 are in L2 .T /,
and
k D Œn for some 0 < < 1:
84 6 Detection of changes in the mean function
We will see in the proof of Theorem 6.2 that under Assumption 6.4 the sample
covariances of the functional observations converge to the function
This is a symmetric, square integrable function, and it is easy to see that for any
x; y 2 L2 ,
“
Q s/x.t/x.s/dt ds 0;
c.t;
1 X 1 X
O k .t/ D Xi .t/; e
k .t/ D Xi .t/:
k N k
1i k k<i N
If the mean is constant, the difference k .t/ D O k .t/ e k .t/ is small for all
1 k < N and all t 2 Œ0; 1. However, k .t/ can become large due to chance
variability if k is close to 1 or to N . It is therefore usual to work with the sequence
X k X k.N k/
Pk .t/ D Xi .t/ Xi .t/ D ŒO k .t/ e
k .t/
N N
1i k 1i N
in which the variability at the end points is attenuated by a parabolic weight function.
If the mean changes, the difference Pk .t/ is large for some values of k and of t.
Since the observations are in an infinite dimensional domain, we work with the
projections of the functions Pk ./ on the principal components of the data. These
projections can be expressed in terms of scores which can be easily computed using
the R package fda.
Consider thus the scores corresponding to the largest d eigenvalues:
Z
O`;i D ŒXi .t/ XN N .t/vO ` .t/dt; i D 1; 2; : : : ; N; ` D 1; 2; : : : ; d:
6.3 Detection procedure 85
Observe that the value of Pk .t/ does not change if the Xi .t/ are replaced by Xi .t/
XN N .t/. Consequently, setting k D ŒN x; x 2 .0; 1/, we obtain
8 9
Z < X =
ŒN x X X ŒN x X O
Xi .t/ Xi .t/ vO` .t/dt D O`;i `;i :
: N ; N
1i N x 1i N 1i N x 1i N
(6.11)
Identity (6.11) shows that scores can be used for testing the constancy of the mean
function.
The following theorem can be used to derive a number of test statistics. To state
it, introduce the statistic
0 12
1 X O 1 @ X O X
d
TN .x/ D ` `;i x O`;i A (6.12)
N
`D1 1i N x 1i N
was derived by Kiefer (1959). Denoting by cd .˛/ its .1 ˛/th quantile, the test
rejects H0 if SN;d > cd .˛/. The critical values cd .˛/ are presented in Table 6.1.
A multivariate analog of statistic (6.13) considered in Horváth et al. (1999) is
N
1 X k N k 2 O 1 T .k/;
MN;d D 2
.k/D d (6.15)
N N N
kD1
86 6 Detection of changes in the mean function
where .k/ is the difference of the mean vectors (of dimension d ) computed from
the first k and the last N k data vectors, and D O d is the d d matrix of estimated
O d is unstable. In statistic (6.13), this
residual vectors. If d is large, the inverse of D
inverse is “replaced” by inverses of the d largest eigenvalues O ` , and the whole
statistic is properly “diagonalized” so that only the most important variability of the
data is considered, while the high dimensional noise is ignored.
We now turn to the behavior of the test under the alternative. We will show that
P
it is consistent, i.e. SN;d ! 1. In fact, we can obtain the rate of divergence: under
HA , Sn;d grows linearly with N . We formulate these results under the assumption
of one change point.
Under Assumption 6.4, for 1 k d , introduce the functions
8 Z
ˆ
< x.1 / .1 .t/ 2 .t//wk .t/dt; 0 < x
gk .x/ D Z (6.16)
:̂ .1 x/ .1 .t/ 2 .t//wk .t/dt; < x < 1:
where
2 3
1= 1 0 0
6 0 1= 2 0 7
6 7
g.x/ D Œg1 .x/; : : : ; gd .x/T and ˙ D 6 : :: :: :: 7:
4 :: : : : 5
0 0 1= d
To estimate the change point, we plot the function TN .x/ in (6.12) against
0 x 1, and estimate by the value of x which maximizes TN .x/. The intu-
ition behind this estimator is clear from (6.12) and (6.11). To ensure uniqueness, we
formally define this estimator as
( )
ON D inf x W TN .x/ D sup TN .y/ : (6.18)
0y1
with
X Z 2
1
AD .t/w` .t/dt : (6.19)
`
1`d
Under the assumptions of Corollary 6.1, A > 0, and it is easy to verify that A.x/
has then a unique maximum at x D . t
u
In this section, we report the results of a simulation study that examines the finite
sample performance of the test. Recall that the test rejects if SN;d of (6.13) exceeds
the .1 ˛/th quantile of Kd of (6.14). For d 5, these quantiles were computed
by Kiefer (1959) using a series expansion of the CDF of Kd . Horváth et al. (1999)
used these expansions to find the critical values for d D 12 and noticed that the
critical values obtained by simulating Kd by discretizing the integral are slightly
different, but actually lead to more accurate tests. To cover a fuller range of the d
values, Table 6.1 gives simulated critical values for d D 1; : : : ; 30, computed by
discretizing the integral over 1; 000 points and running 100; 000 replications.
The simulation study consists of two parts. First we use standard Gaussian pro-
cesses as the errors Yi and a number of rather arbitrary mean functions . This part
assesses the test in some generic cases analogous to assuming a normal distribution
of scalar observations. In the second part, we use mean functions and errors derived
from monthly temperature data. No assumptions on the marginal distribution of the
Yi ’s or the shape of the ’s are made. This part assesses the test in a specific, prac-
tically relevant setting.
Gaussian processes. To investigate the empirical size, without loss of generality,
.t/ was chosen to be equal to zero and two different cases of Yi .t/ were considered,
namely the trajectories of the standard Brownian motion (BM), and the Brownian
bridge (BB). These processes were generated by transforming cumulative sums of
independent normal variables computed on a grid of 103 equispaced points in Œ0; 1.
Following Ramsay and Silverman (2005) (Chapter 3) discrete trajectories were
converted to functional observations (functional objects in R) using B-spline and
Fourier bases and various numbers of basis functions. No systematic dependence
either on the type of the basis or on the number of basis functions was found. The
results reported in this section were obtained using B-spline basis with 800 basis
functions. We used a wide spectrum of N and d , but to conserve space, we present
the results for N D 50; 150; 200; 300; 500 and d D 1; 2; 3; 4. All empirical
rejection rates are based on 1; 000 replications.
Table 6.2 shows the empirical sizes based on critical values reported in Table 6.1.
The empirical sizes are fairly stable. Except for a very few cases of small sample
sizes, all deviations from the nominal significanceplevels do not exceed two standard
errors computed using the normal approximation p.1 p/=R, where p is a nom-
inal level and R the number of repetitions. Table 6.2 shows that for these Gaussian
processes, the empirical size does not depend appreciably either on n or on d .
In the power study, several cases that violate the null were considered. We
report the power for k D ŒN=2. Several other values of k were also consid-
ered, and only a small loss of power was observed for N=4 < k 3N=4.
A few different mean p functions before and after change were used, namely
i .t/ D 0; t; t 2 ; t; e t ; sin.t/; cos.t/, i D 1; 2, for instance 1 .t/ D t and
2 .t/ D cos.t/, etc.
Table 6.2 Empirical size (in percent) of the test using the B-spline basis.
Process d =1 d =2 d =3 d =4
10% 5% 1% 10% 5% 1% 10% 5% 1% 10% 5% 1%
N D 50
BM 10.3 4.6 0.1 9.9 4.8 0.7 8.4 3.3 0.6 9.7 4.8 0.8
BB 11.2 5.5 0.8 10.6 4.9 1.1 8.4 4.0 0.9 8.5 4.3 1.2
N D 100
BM 12.2 5.6 1.3 9.8 5.6 0.9 9.3 4.6 0.9 9.0 5.4 0.9
BB 12.4 5.7 0.7 10.2 4.2 0.6 9.9 4.6 1.0 8.3 4.1 0.8
N D 150
BM 10.8 5.7 1.3 9.7 4.6 1.2 11.8 6.2 0.8 10.8 5.3 1.1
BB 10.5 5.0 1.2 9.8 4.4 1.1 10.4 6.2 0.7 10.5 5.1 1.2
N D 200
BM 9.7 5.4 0.8 9.2 4.3 0.7 9.3 5.8 1.3 10.8 5.5 0.9
BB 9.2 5.1 0.8 10.8 5.6 1.2 10.0 5.2 1.0 9.6 5.2 1.0
N D 300
BM 10.3 5.2 1.5 11.1 6.1 0.6 10.1 4.5 0.6 9.9 5.5 0.7
BB 10.4 5.6 1.1 9.4 4.8 0.9 9.9 4.1 0.8 10.5 5.3 1.3
N D 500
BM 11.6 6.3 1.3 10.6 6.9 1.5 10.9 5.7 1.4 9.0 4.4 0.6
BB 11.7 5.1 1.3 9.7 5.8 1.4 10.3 5.3 1.1 10.0 5.4 1.1
90 6 Detection of changes in the mean function
Table 6.3 Empirical power (in percent) of the test using B-spline basis. Change point at k D
Œn=2.
Process d =1 d =2 d =3
10% 5% 1% 10% 5% 1% 10% 5% 1%
N D 50
BM; BM + sin.t/ 81.5 70.8 43.7 72.6 60.0 33.2 67.7 54.9 27.3
BM; BM + t 88.4 78.0 54.1 84.7 74.0 45.4 77.5 64.3 36.0
BB; BB + sin.t/ 99.8 99.4 97.4 100 100 99.9 100 100 100
BB; BB + t 99.9 99.8 98.9 100 100 99.9 100 100 100
N D 100
BM; BM + sin.t/ 97.4 95.3 86.3 96.4 91.0 76.5 93.5 88.0 68.7
BM; BM + t 99.0 97.5 91.2 98.7 97.1 87.6 97.5 94.9 83.8
BB; BB + sin.t/ 100 100 100 100 100 100 100 100 100
BB; BB + t 100 100 100 100 100 100 100 100 100
N D 150
BM; BM + sin.t/ 99.9 99.5 96.6 99.6 98.6 95.1 98.9 97.4 90.3
BM; BM + t 100 99.8 98.7 99.8 99.7 98.8 99.9 99.7 97.8
BB; BB + sin.t/ 100 100 100 100 100 100 100 100 100
BB; BB + t 100 100 100 100 100 100 100 100 100
N D 200
BM; BM + sin.t/ 100 99.9 99.1 100 99.8 99.0 99.9 99.7 98.2
BM; BM + t 100 100 100 100 100 99.9 100 100 99.3
BB; BB + sin.t/ 100 100 100 100 100 100 100 100 100
BB; BB + t 100 100 100 100 100 100 100 100 100
Table 6.3 presents selected results of the power study. It shows that the test has
overall good power. For small samples, N 100, in cases where the BB was used
the power is slightly higher than for those with the BM. Nonetheless, for N 150
the power approaches 100% for both processes and all choices of other parameters.
The power decreases as the number of principal components d increases. This can
be explained as follows: the critical values of SN;d increase with d , but the change
point is mainly captured by a few initial leading principal components explaining
the major part of the variance.
Analysis of central England temperatures. The goal of this section is twofold:
to investigate the performance of the test in a real world setting, and to demonstrate
the advantages of the functional approach for high–dimensional data. The data con-
sists of 228 years (1780 to 2007) of average daily temperatures in central England.
They were published by the British Atmospheric Data Centre, and compiled by
Nick Humphreys at the University of Utah. The original data can thus be viewed
as 228 curves with 365 measurements on each curve. These data were converted to
functional objects in R using 12 B-spline basis functions. Multivariate observations
were obtained as in Horváth et al. (1999) by computing monthly averages resulting
in 228 vectors of dimension d D 12. (We could not even compute statistics (6.15)
for vectors of dimension 365 because R reported that D O was singular.) These two
procedures are illustrated in Figure 6.2. Even though we used 12 B-splines and 12
6.4 Finite sample performance and application to temperature data 91
averages, the resulting data look quite different, especially in spring and fall, when
the temperatures change most rapidly. Gregorian months form a somewhat arbitrary
fixed partition of the data, while the splines adapt to their shapes which differ from
year to year.
To compute statistic (6.13), we used d D 8 eigenfunctions which explain 84% of
variability. If the test indicates a change, we estimate it by the estimator ON (6.18).
This divides the data set into two subsets. The procedure is then repeated for each
subset until periods of constant mean functions are obtained. We proceed in exactly
the same manner using statistic (6.15). We refer to these procedures, respectively,
as FDA and MDA approaches. The resulting segmentations are shown in Tables 6.4
and 6.5.
The functional approach identified two more change point, 1850 and 1992, which
roughly correspond to the industrial revolution and the advent of rapid global warm-
ing. The multivariate approach “almost” identified these change points with the
P–values in iterations 4 and 5 being just above the significance level of 5%. This
may indicate that the functional method has better power, perhaps due to its greater
flexibility in capturing the shape of the data. This conjecture is investigated below.
Figure 6.3 shows average temperatures in the last four segments, and clearly illus-
trates the warming trend.
Table 6.4 Segmentation procedure of the data into periods with constant mean function.
Iteration Segment Decision SN;d P-value Estimated
MN;d change point
England temperatures (d D 8) (FDA approach)
1 1780 - 2007 Reject 8.020593 0.00000 1926
2 1780 - 1925 Reject 3.252796 0.00088 1808
3 1780 - 1807 Accept 0.888690 0.87404 -
4 1808 - 1925 Reject 2.351132 0.02322 1850
5 1808 - 1849 Accept 0.890845 0.87242 -
6 1850 - 1925 Accept 1.364934 0.41087 -
7 1926 - 2007 Reject 2.311151 0.02643 1993
8 1926 - 1992 Accept 0.927639 0.84289 -
9 1993 - 2007 Accept 1.626515 0.21655 -
England temperatures (d D 12) (MDA approach)
1 1780 - 2007 Reject 7.971031 0.00000 1926
2 1780 - 1925 Reject 3.576543 0.00764 1815
3 1780 - 1814 Accept 1.534223 0.81790 -
4 1815 - 1925 Accept 2.813596 0.07171 -
5 1926 - 2007 Accept 2.744801 0.08662 -
Table 6.5 Summary and comparison of segmentation. Beginning and end of data period in bold.
Approach Change points
FDA 1780 1808 1850 1926 1993 2007
MDA 1780 1815 1926 2007
92 6 Detection of changes in the mean function
20 1808 − 1849
1850 − 1925
1926 − 1991
15
1992 − 2007
Degrees Celsius
10
5
0
Time
The analysis presented above assumes a simple functional change point model
for the daily temperatures. Obviously, one cannot realistically believe that the mean
curves change abruptly in one year, this is merely a modeling assumption useful in
identifying patterns of change in mean temperature curves. Well-established alter-
native modeling approaches have been used to study the variability of temperatures.
For example, Hosking (1984) fitted a fractionally differenced ARMA(1,1) model
to the series of annual average temperatures in central England in 1659–1976. It is
generally very difficult to determine on purely statistical grounds if a change–point
or a long–range dependent model is more suitable for any particular finite length
record, see Berkes et al. (2006) and Jach and Kokoszka (2008) for recent methodol-
ogy, discussion and references. It is often more useful to choose a modeling method-
ology which depends on specific goals, and this is the approach we use. One way of
checking an approximate adequacy of our model is to check if the residuals obtained
after subtracting the mean in each segment are approximately independent and iden-
tically distributed. This can be done by applying the test developed by Gabrys and
Kokoszka (2007) which is a functional analog of the well–known test of Hosking
6.4 Finite sample performance and application to temperature data 93
(1980) and Li and McLeod (1981) (see also Hosking (1981, 1989). The P-value of
8% indicates the acceptance of the hypothesis that the residuals are iid.
Keeping these caveats in mind, we use the partitions obtained above to generate
realistic synthetic data with and without change–points. We use them to evaluate and
compare the size and power properties of the FDA and MDA tests, and to validate
our findings. We compute the residuals of every observation in a constant mean
segment by subtracting the average of the segment, i.e. Y O i s D Xi s O s ; where
s D 1; : : : ; S denotes the segment, and i D 1; : : : ; Is indexes observations in the
sth segment. The Y O i s are functional residuals, and their average in each segment is
clearly the zero function.
To assess the empirical size, we simulate “temperature-like” data by consider-
ing two cases. Case I: for every constant mean segment s, we produce synthetic
observations by adding to its mean function O s errors drawn from the empirical dis-
tribution of the residuals of that segment, i.e. synthetic (bootstrap) observations in
the sth segment are generated via Xi s D O s CnY O i s ; where i indicates that YO i s
o
O
is obtained by drawing with replacement from Yi s ; i D 1; : : : ; Is : Case II: We
compute residuals in each segment and pool them together. We use this larger set of
residuals to create new observations by adding to the average of a segment the errors
drawn with replacement from that pool of residuals. For each segment, we generate
1000 of these bootstrap sequences. Table 6.6 shows the the resulting empirical sizes.
As the sample size increases, the FDA rejection rates approach nominal sizes, while
the MDA test is much more conservative. For the 1993–2007 segment, the size is
not reported because the matrix D was (numerically) singular for most bootstrap
replications.
We next investigate the power. Three cases are considered. Case I: For each seg-
ment, we produce synthetic observations using the bootstrap procedure and sam-
pling residuals from a corresponding period. This means that the errors in each
segment come from possibly different distributions. Case II: We pool together two,
Table 6.6 Empirical size of the test for models derived from the temperature data.
Segment Number of 10% 5% 1% 10% 5% 1%
functions
Case I Case II
FDA approach (d D 8)
1780 - 1807 (1 ) 28 8.0 3.0 0.1 7.6 2.5 0.2
1808 - 1849 (2 ) 42 9.5 3.9 0.4 9.7 4.1 0.4
1850 - 1925 (3 ) 76 10.0 4.7 0.7 10.2 4.3 0.7
1926 - 1992 (4 ) 66 8.8 3.7 0.8 9.2 4.1 1.0
1993 - 2007 (5 ) 16 3.8 0.3 0.0 3.3 0.1 0.0
MDA approach (d D 12)
1780 - 1807 (1 ) 28 3.0 0.5 0.0 2.8 0.4 0.0
1808 - 1849 (2 ) 42 5.3 2.3 0.1 5.4 1.3 0.0
1850 - 1925 (3 ) 76 6.9 1.9 0.0 9.1 4.2 0.6
1926 - 1992 (4 ) 66 7.9 3.3 0.5 7.4 2.7 0.2
1993 - 2007 (5 ) 16 - - - 0.0 0.0 0.0
94 6 Detection of changes in the mean function
three, four, or five sets of residuals (depending on how many constant mean seg-
ments we consider) and sample from that pool to produce new observations. This
means that the errors in each segment come from the same distribution. Case III:
We slightly modify Case II by combining all residuals from all segments into one
population and use it to produce new observations. In both Case II and Case III,
the theoretical assumptions of Section 6.2 are satisfied, cf. Assumption 6.4, i.e. the
means change, but the errors come from the same population. Table 6.7 shows the
power of the test for FDA approach and Table 6.8 presents results of discrete MDA
method. As seen in Table 6.7, the differences between the three cases are of the
order of the chance error. Table 6.7 shows that the test has excellent power, even in
small samples, both for single and multiple change points. As for the Gaussian pro-
cesses, power is slightly higher if there is a change point around the middle of the
sample. Comparing Tables 6.7 and 6.8, it is seen that in FDA approach dominates
the MDA approach. There are a handful of cases, indicated with , when MDA per-
formed better, but their frequency and the difference size suggests that this may be
attributable to the chance error.
A key element of the proofs is bound (6.31), which follows from a functional cen-
tral limit theorem in a Hilbert space, see e.g. Kuelbs (1973). A result of this type is
needed because the observations Xi ./ are elements of a Hilbert space,
P and to detect
a change point, we must monitor the growth of the partial sums 1i N x Xi .t/
which are a function of 0 < x < 1. Lemma 6.2 is particularly noteworthy because
it shows that the eigenvalues and the eigenfunctions also converge under the alter-
native.
For the sake of completeness we provide a new and simple proof of the result
of Kuelbs (1973) in a form most suitable for the application to the proofs of The-
orems 6.1 and 6.2. We start with an L2 version of the Kolmogorov inequality. Let
fZi .t/; 0 t 1; 1 i N g be independent identically distributed random
functions with values in L2 Œ0; 1 satisfying
Z
EZ1 .t/ D 0 and EZ12 .t/dt < 1: (6.20)
8 !2 9 !2
< Z X
k = Z X
N
P max Zi .t/ dt dt E Zi .t/ dt: (6.21)
:1kN ;
i D1 i D1
Table 6.7 Empirical power of the test for change-point models derived from temperature data (FDA approach).
Segment Sample Change Nominal level
size point(s)
Case I Case II Case III
10% 5% 1% 10% 5% 1% 10% 5% 1%
England (d D 8) (FDA approach)
1 , 2 70 .41 85.6 76.8 49.7 86.4 76.9 46.3 87.0 75.7 45.3
1 , 3 104 .28 86.2 75.8 47.4 88.6 78.8 50.6 93.1 83.3 58.1
1 , 4 94 .31 100 100 98.7 100 100 99.3 99.8 99.7 96.3
1 , 5 44 .66 100 99.9 93.4 100 99.8 92.7 99.8 99.6 92.2
2 , 3 118 .36 87.9 78.5 52.8 88.0 78.9 52.1 88.6 79.6 54.0
2 , 4 108 .40 99.7 99.0 95.6 100 99.6 96.7 100 99.3 95.7
2 , 5 58 .74 99.2 97.8 86.3 99.4 98.6 85.8 99.6 98.7 86.6
3 , 4 142 .54 99.9 99.5 99.1 100 100 98.9 99.6 99.1 96.6
3 , 5 92 .84 99.1 96.7 82.9 99.4 97.4 84.4 98.9 95.4 79.6
4 , 5 82 .82 93.0 85.0 58.8 94.0 86.3 57.0 77.9 64.9 32.6
1 , 2 , 3 146 .20 .49 99.1 97.9 89.6 99.2 97.0 89.9 99.3 98.5 94.2
1 , 2 , 4 136 .21 .52 100 100 100 100 100 100 100 100 100
1 , 2 , 5 86 .34 .83 100 100 99.7 99.9 99.9 99.2 100 100 99.7
2 , 3 , 4 184 .23 .65 100 100 99.9 100 100 99.9 100 99.9 99.9
2 , 3 , 5 134 .32 .89 100 99.3 96.4 99.9 99.8 97.4 100 99.7 97.7
3 , 4 , 5 158 .49 .91 100 100 100 100 100 100 100 100 100
6.5 Theorem for functional observations and proofs of Theorems 6.1 and 6.2
1 , 2 , 3 , 4 212 .14 .33 .69 100 100 100 100 100 100 100 100 100
1 , 2 , 3 , 5 162 .18 .44 .91 100 100 99.9 100 100 99.9 100 100 100
2 , 3 , 4 , 5 200 .22 .60 .93 100 100 100 100 100 100 100 100 100
1 , 2 , 3 , 4 , 5 228 .13 .31 .64 .93 100 100 100 100 100 100 100 100 100
95
96
Table 6.8 Empirical power of the test for change-point models derived from temperature data (MDA approach).
Segment Sample Change Nominal level
size point(s)
Case I Case II Case III
10% 5% 1% 10% 5% 1% 10% 5% 1%
England (d D 8) (FDA approach)
1 , 2 70 .41 82.9 70.2 38.2 85.2 73.4 39.3 76.2 59.6 26.8
1 , 3 104 .28 79.7 63.9 32.6 79.4 64.8 30.5 81.1 67.4 35.1
1 , 4 94 .31 100 99.4 95.8 99.9 99.0 96.0 99.3 96.9 82.0
1 , 5 44 .66 98.4 93.8 54.5 99.0 93.0 55.8 98.5 91.8 49.0
2 , 3 118 .36 88.3 75.9 46.8 86.7 75.6 43.5 82.3 70.7 41.7
2 , 4 108 .40 97.3 93.3 77.5 97.8 95.6 78.1 98.3 95.7 80.7
2 , 5 58 .74 93.9 85.5 50.4 94.7 85.2 48.3 96.3 90.9 57.9
3 , 4 142 .54 100 100 98.5 100 99.8 99.0 99.5 98.9 94.6
3 , 5 92 .84 98.2 93.9 71.2 99.1 94.2 71.3 96.7 90.2 58.2
4 , 5 82 .82 78.4 63.1 28.0 79.4 63.4 26.4 60.9 44.1 15.7
1 , 2 , 3 146 .20 .49 97.5 93.2 76.9 97.7 93.1 77.9 97.4 94.9 80.2
1 , 2 , 4 136 .21 .52 100 100 100 100 100 99.9 100 100 99.9
1 , 2 , 5 86 .34 .83 100 99.8 96.2 99.9 99.7 95.7 100 99.8 97.4
2 , 3 , 4 184 .23 .65 100 100 99.1 100 99.9 98.7 100 100 99.5
2 , 3 , 5 134 .32 .89 99.8 99.4 93.7 99.6 99.3 93.8 99.7 98.6 92.1
3 , 4 , 5 158 .49 .91 100 100 100 100 100 100 100 100 100
1 , 2 , 3 , 4 212 .14 .33 .69 100 100 99.9 100 100 100 100 100 100
1 , 2 , 3 , 5 162 .18 .44 .91 100 100 99.1 100 99.9 99.1 100 100 98.9
2 , 3 , 4 , 5 200 .22 .60 .93 100 100 100 100 100 100 100 100 100
1 , 2 , 3 , 4 , 5 228 .13 .31 .64 .93 100 100 100 100 100 100 100 100 100
6 Detection of changes in the mean function
6.5 Theorem for functional observations and proofs of Theorems 6.1 and 6.2 97
R Pk 2
and therefore f i D1 Zi .t/ dt; Fk g is a non-negative submartingale. Hence by
Doob’s maximal inequality (cf. Hall and Heyde (1980), p. 14), we have that
8 !2 9
< Z X k =
P max Zi .t/ dt dt
:1kN ;
i D1
8 !2 2 39
<Z X N Z X 2 =
E Zi .t/ dtI 4 max Zi .t/ dt > 5
: 1kN ;
i D1 1i k
8 ! 9
<Z X N 2
=
E Zi .t/ dt ;
: ;
i D1
The following result was obtained by Kuelbs (1973), and now we present a proof
based on projections.
X
M
Yi:M .t/ D hYi ; v` i v` .t/
`D1
98 6 Detection of changes in the mean function
and
1
X
YNi:M .t/ D Yi .t/ Yi:M .t/ D hYi ; v` i v` .t/:
`DM C1
Z !2
X
k
P max N 1=2
YNi:M .t/ dt
1kN
i D1
Z !2
1 X k
E N 1=2
YNi:M .t/ dt
i D1
Z
1
D .E YN1:M .t//2 dt
1
1 X
D ` ;
`DM C1
proving (6.23).
It is easy to see that
!
X
k X
M X
k
1=2 1=2
N Yi;M .t/ D N hYi ; v` i v` .t/:
i D1 `D1 i D1
Using the continuity of the Wiener process we conclude that for all M 1
ˇ ˇ
max sup ˇW`;N .ŒN x=N / W`;N .x/ˇ D oP .1/: (6.25)
1`M 0x1
6.5 Theorem for functional observations and proofs of Theorems 6.1 and 6.2 99
D oP .1/:
(Note that the expected value above does not depend on N .) By the orthonormality
of the v` ’s we have that
0 12
Z 1
X 1
X
E sup @ 1=2 A dt D E sup 2
` W`;N .x/v` .t/ ` W`;N .x/
0x1 0x1
`DM C1 `DM C1
! 1
X
2
E sup W1;N .x/ `
0x1
`DM C1
! 0 .M ! 1/;
Since the Yi are iid functions with mean zero, the ˇ i are iid mean zero vectors in
Rd . A simple calculation using the orthonormality of the vk shows that each ˇ i has
a diagonal covariance matrix
2 3
1 0 0
6 0 2 0 7
6 7
˙d D 6 : :: :: :: 7
4 :: : : : 5
0 0 d
X d
N 1=2 ˇ i ! d .x/ .0 x 1/; (6.27)
1i N x
where the convergence is in the Skorokhod space D d Œ0; 1. The process
fd .x/; 0 x 1g takes values in Rd , has zero mean and covariance matrix ˙d .
Convergence (6.27) implies in turn that
2 3T 2 3
1 X X X X d X
4 ˇi x ˇ i 5 ˙d1 4 ˇi x ˇi 5 ! Bi2 .x/
N
1i N x 1i N 1i N x 1i N 1i d
(6.28)
in the Skorokhod space DŒ0; 1.
P
The matrix ˙d is estimated by ḃd . By (6.7) and Assumption 6.2, ḃd1 ! ˙d 1 ,
so (6.28) yields
2 3T 2 3
1 4 X X X X d X
ˇi x ˇ i 5 ḃd1 4 ˇi x ˇi 5 ! Bi2 .x/:
N
1i N x 1i N 1i N x 1i N 1i d
(6.29)
Note that
0 1
X X X X
ˇk;i x
ˇk;i D cOk @ ˇk;i x ˇk;i A :
1i N x 1i N 1i N x 1i N
2 3T 2 3
1 X X X X d X
4 ˇ i x ˇ i 5 ḃd1 4 ˇ i x ˇ i 5 ! Bi2 .x/:
N
1i N x 1i N 1i N x 1i N 1i d
(6.30)
6.5 Theorem for functional observations and proofs of Theorems 6.1 and 6.2 101
We now turn to the effect of replacing the ˇi;k by ˇQi;k . Observe that
ˇ ˇ
ˇ ˇ
ˇ 1=2 X X ˇ
sup ˇˇN
ˇi;k N 1=2 ˇQi;k ˇˇ
0<x<1 ˇ ˇ
1i N x 1i N x
ˇ 0 1 ˇ
ˇZ ˇ
ˇ X ˇ
D sup ˇˇ @N 1=2 Yi .t/A .cOk vk .t/ vO k .t// dt ˇˇ
0<x<1 ˇ ˇ
1i N x
2 0 12 3 1=2
Z Z 1=2
6 @ 1=2 X A 7 2
sup 4 N Yi .t/ dt 5 .cOk vk .t/ vOk .t// dt :
0<x<1
1i N x
Proof of Theorem 6.2. Theorem 6.2 follows from relation (6.36) and Lemma 6.3.
To establish them, we need the following Lemma.
1 X
N
cON .t; s/ D Yi .t/ YNN .t/ Yi .s/ YNN .s/
N
i D1
k k
C 1 .t/.s/ C rN .t; s/;
N N
where
X
k 1
rN .t; s/ D 1 Yi .t/ YNN .t/ .s/ C Yi .s/ YNN .s/ .t/
N N
1i k
k 1 X
C Yi .t/ YNN .t/ .s/ C Yi .s/ YNN .s/ .t/ :
N N
k <i N
RR P
Using the law of large numbers (Theorem 2.2), we obtain 2
rN .t; s/dt ds ! 0
and, by Theorem 2.6,
ZZ
P
ŒcON .t; s/ cQN .t; s/2 ! 0: (6.35)
Hence Lemmas 2.2 and 2.3 imply, respectively, (6.33) and (6.34). t
u
6.5 Theorem for functional observations and proofs of Theorems 6.1 and 6.2 103
P
ḃ1 ! ˙ : (6.36)
d
Proof. Denote
2 3
1 4 X O X
gO k .x/ D k;i x Ok;i 5 ; x 2 Œ0; 1;
N
1i N x 1i N
and
Z Z Z
Ok;i D Yi .t/vO k .t/dt C 2 .t/vO k .t/dt XN N .t/vO k .t/dt; if k < i N:
Most inferential tools of functional data analysis rely on the assumption of iid func-
tional observations. In designed experiments this assumption can be ensured, but
for observational data, especially derived from time series, it requires a verification.
In this chapter, based on the paper of Gabrys and Kokoszka (2007), we describe a
simple portmanteau test of independence for functional observations whose idea is
as follows. The functional observations Xn .t/; n D 1; 2; : : : ; N; are approximated
by the first p terms of the principal component expansion
X
p
Xn .t/ Xkn vk .t/; n D 1; 2; : : : ; N: (7.1)
kD1
where the Xkn are the scores. For the sake of an intuitive argument, assume
first that the populations FPC’s vk .t/ are known. Testing the iid assumption for
the curves Xn ./ reduces then to testing this assumption for the random vectors
ŒX1n ; : : : ; Xpn T . For this purpose, the method proposed by Chitturi (1976) can
be used: we find multivariate analogs of correlations and an analog of the “sum
of squares” which has a 2 asymptotic distribution. In reality, the vk .t/ must be
replaced by the EFPC’s. This transition is delicate in the problem of testing for
independence because the EFPC’s depend on all observations.
The test studied in this chapter is analogous to the Ljung–Box test which is exten-
sively used in time series analysis. It essentially tests if the curves are uncorrelated.
As is common in time series analysis, evidence against independence can be found
if the test is applied to some transformations of the functional observations, for
example to the curves Xn2 .t/.
We note that Székely et al. (2007) and Székely and Rizzo (2009) proposed a
test of independence of two variables X and Y , which can be of arbitrary dimen-
sion, and so can also be elements of a Hilbert space. Their test is based on a mea-
sure of dependence, known as the ”correlation of distances” which is derived from
differences of characteristic functions. To apply such a test, a sample of iid pairs
.Xi ; Yi /; i D 1; 2; : : : ; N; is required, and so a different inferential problem is solved
than that studied in this chapter.
L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, 105
Springer Series in Statistics 200, DOI 10.1007/978-1-4614-3655-3_7,
© Springer Science+Business Media New York 2012
106 7 Portmanteau test of independence
The chapter is organized as follows. In Section 7.1, we formulate the test proce-
dure together with mathematical assumptions and theorems establishing its asymp-
totic validity. The proofs of the theorems of Section 7.1 are presented in Section 7.4.
Sections 7.2 and 7.3 are devoted, correspondingly, to the study of the finite sample
performance of the test and its application to two types of functional data: credit
card sales and geomagnetic records. Section 7.5 contains some lemmas on Hilbert
space valued random elements which are used in Section 7.4, and may be useful in
other similar contexts. In Section 7.6, we develop the required theory for random
matrices.
We observe mean zero random functions fXn .t/; t 2 Œ0; 1; n D 1; 2; : : : N g and
want to test
versus
X
p
XOn .t/ D XOkn vO k .t/;
kD1
where Z Z
XOkn D Xn .t/vO k .t/dt D XO n .t/vO k .t/dt: (7.2)
In practice, the number p must be selected so that the first p EFPC’s explain a large
fraction of the sample variance, see Section 3.3.
To establish the null distribution of the test statistic, we require the following
assumption:
Assumption 7.1. The observations X1 ; X2 ; : : : XN are iid in L2 , have mean zero,
and satisfy
Z 2
4 2
EkXn k D E Xn .t/dt < 1: (7.3)
O n D ŒXO1n ; XO 2n ; : : : ; XO pn T
X (7.4)
7.1 Test procedure 107
where Z
Xkn D Xn .t/vk .t/dt: (7.6)
Under H0 , the Xn are iid mean zero random vectors in Rp for which we denote
N h
1 X
ch .k; l/ D Xkn Xl;nCh; 0 h < N:
N nD1
Notice that we do not use the “hat” O in the definition of the above sample covari-
ances because they cannot be computed from the data. When we work with vectors
(7.4), rather than (7.5), we use the “hat”.
Denote by rf;h .i; j / and rb;h .i; j / the .i; j / entries of C1 1
0 Ch and Ch C0 ,
respectively, and introduce the random variable
X
H X
p
QN D N rf;h .i; j /rb;h .i; j /: (7.7)
hD1 i;j D1
d
Theorem 7.1. If Assumption 7.1 holds, then QO N ! 2p2 H (Chi–square distribution
with p 2 H degrees of freedom).
X
H X
p
QO N D N cOh2 .i; j /O 1 O 1
i j : (7.8)
hD1 i;j D1
108 7 Portmanteau test of independence
Lemma 7.1 also shows that QO N does not depend on the sign of the EFPC’s vO k .
Supressing the “hat”, set vk0 D ck vk , where ck2 D 1. Using the “prime” to denote
quantities computed with vk0 rather than vk , observe that
N h
1 X
ch0 .k; l/ D hXn ; ck vk i hXnCh ; cl vl i D ck cl ch .k; l/:
N nD1
with iid mean zero innovations "n 2 L2 . We assume that fXn g is a stationary solu-
tion to equation (7.9), which exists under mild assumptions on , see Chapter 13.
Introduce the p p matrix with entries
lk D hvl ; .vk /i ; l; k D 1; 2; : : : p;
In this section we investigate the finite sample properties of the test using some
generic models and sample sizes typical of applications discussed in Section 7.3.
To investigate the empirical size, we generated independent trajectories of the
standard Brownian motion (BM) on Œ0; 1 and the standard Brownian bridge (BB).
This was done by transforming cumulative sums of independent normal variables
computed on a grid of m equispaced points in Œ0; 1. We used values of m ranging
from 10 to 1440, and found that the empirical size basically does not depend on m
(the tables of this section use m D 100).
7.2 Finite sample performance 109
In our study, the innovations "n in (7.10) are either standard BM’s or BB’s. We
used two kernel functions: the Gaussian kernel
t 2 C s2
.t; s/ D C exp ; .t; s/ 2 Œ0; 12 ;
2
and Wiener kernel
.t; s/ D C min.s; t/; .t; s/ 2 Œ0; 12 :
Table 7.1 Empirical size (in percent) of the test using Fourier basis. The simulated observations
are Brownian bridges.
Lag p=3 p=4 p=5
10% 5% 1% 10% 5% 1% 10% 5% 1%
N=50
1 7.7 2.5 0.6 7.4 2.8 0.3 7.9 3.5 0.4
3 6.8 2.5 0.3 6.7 3.3 0.6 4.9 2.0 0.3
5 4.9 2.0 0.0 3.6 1.4 0.2 4.0 1.7 0.2
N=100
1 9.0 5.1 0.4 8.9 3.9 0.6 10.0 3.9 0.8
3 8.1 3.5 0.6 8.3 4.0 0.9 7.5 3.2 0.4
5 8.8 3.6 0.6 6.6 2.7 0.3 6.7 2.4 0.3
N=300
1 9.8 4.6 1.2 9.4 4.0 0.9 10.1 4.7 0.6
3 9.3 4.8 1.0 9.1 4.7 0.9 10.0 5.4 0.8
5 7.2 3.7 1.0 8.2 3.8 0.7 10.6 5.5 1.2
110 7 Portmanteau test of independence
Table 7.2 Empirical power of the test using Fourier basis. The observations follow the AR(1)
model (7.10) with Gaussian kernel with k kS D 0:5 and iid standard Brownian motion innova-
tions.
Lag p=3 p=4 p=5
10% 5% 1% 10% 5% 1% 10% 5% 1%
N=50
1 44.7 33.8 17.7 41.9 29.4 12.6 38.5 26.1 9.2
3 35.2 27.0 13.3 34.0 24.7 10.8 33.2 21.6 8.7
5 26.7 20.0 11.0 24.4 15.8 8.1 21.5 14.3 6.0
N=100
1 71.2 64.2 51.4 74.4 66.5 48.1 77.7 68.0 46.1
3 67.9 61.0 44.9 67.5 58.6 42.8 68.4 56.9 38.1
5 62.3 54.6 38.6 59.0 49.9 32.3 55.1 45.5 27.9
N=300
1 98.7 98.2 96.7 99.2 98.9 97.2 99.8 99.5 98.5
3 97.6 97.1 95.5 99.0 98.4 96.8 99.2 98.3 96.6
5 96.8 95.9 92.8 98.1 97.0 93.8 98.4 97.3 94.4
The constants C were chosen so that k kS D 0:3; 0:5; 0:7. We used both Fourier
and B spline basis.
The power against this alternative is expected to increase rapidly with N , as the
test statistic is proportional to N . This is clearly seen in Table 7.2. The power also
increases with k kS ; for k kS D 0:7 and the Gaussian kernel, it is practically
100% for N D 100 and all choices of other parameters.
There are two less trivial observations: The power is highest for lag H D 1.
This is because for the AR(1) process the “correlation” between Xn and Xn1 is
largest at this lag. By increasing the maximum lag H , the value of QO N generally
increases, but the critical value increases too (degrees of freedom increase by p 2 for
a unit increase in H ). The power also depends on how the action of the operator
is “aligned” with the eigenvectors vk . If the inner products hvi ; vk i are large in
absolute value, the power is high. Thus, with all other parameters being the same,
the power in Table 7.3 is greater than in Table 7.2 because of the different covariance
structure of the Brownian bridge and the Brownian motion. In all cases, the power
for the Wiener kernel is slightly lower than for the Gaussian kernel.
In this section, we apply our test to two data sets which we have encountered
in earlier chapters. The first data set consists of the number of transactions with
credit cards issued by Vilnius Bank, Lithuania. The second, is a daily geomagnetic
variation.
7.3 Application to credit card transactions and diurnal geomagnetic variation 111
Table 7.3 Empirical power of the test using Fourier basis. The observations follow the AR(1)
model (7.10) with Gaussian kernel with k kS D 0:5 and iid standard Brownian bridge innova-
tions.
Lag p=3 p=4 p=5
10% 5% 1% 10% 5% 1% 10% 5% 1%
N=50
1 98.3 97.0 92.1 98.4 96.4 87.6 99.1 97.3 87.6
3 95.2 90.3 77.4 92.1 86.2 69.6 89.9 85.1 63.2
5 86.9 80.2 61.7 78.5 71.7 51.4 75.2 63.9 40.4
N=100
1 100 100 100 100 100 100 100 100 100
3 100 100 99.9 100 100 99.7 100 99.9 99.8
5 99.9 99.3 98.7 99.9 99.8 98.6 99.7 99.5 97.8
Table 7.4 P-values for the functional AR(1) residuals of the credit card data Xn .
Lag, H p=1 p=2 p=3 p=4 p=5 p=6 p=7
BF=40
1 69.54 22.03 13.60 46.29 80.35 96.70 99.20
2 35.57 38.28 7.75 47.16 64.92 95.00 99.04
3 54.44 53.63 25.28 52.61 71.33 86.84 94.93
BF=80
1 57.42 18.35 53.30 89.90 88.33 95.40 99.19
2 35.97 23.25 23.83 45.07 55.79 46.39 70.65
3 36.16 36.02 26.79 30.21 56.81 34.51 47.00
Time
100 200
0
−200
−400
Time
Fig. 7.1 Top: horizontal intensity (nT) measuret at Honolulu 30/3/2001 - 13/4/2001 with the
straight lines connecting first and last measurements in each day. Bottom: the same after sub-
stracting the lines.
7.4 Proofs of the results of Section 7.1 113
Table 7.5 P-values (in percent) for the magnetometer data split by season.
Lag Feb, Mar, Apr, May Jun, Jul, Aug, Sep
H p=4 p=5 p=4 p=5
1 13.44 6.51 1.03 1.23
3 3.37 2.99 31.72 42.59
a given day by a line, and subtracted this line from the data. After centering over
the period under study, we obtained the mean zero functional observations we work
with. The analysis was conducted using Fourier base functions.
Testing one year magnetometer data with lags H D 1; 2; 3 and different num-
bers of principal components p D 3; 4; 5; yields P–values very close to zero. This
indicates that while principal component analysis, advocated by Xu and Kamide
(2004), may be a useful exploratory tool to study daily variation over the whole
year, one must be careful when using any inferential tools based on it, as they typi-
cally require independent and identically distributed observations (a simple random
sample), see e.g. Section 5.2 of Seber (1984). We also applied the test to smaller
subsets of data roughly corresponding to boreal Spring and Summer. The P–values,
reported in Table 7.5, show that the transformed data can to a reasonable approxi-
mation be viewed as a functional simple random sample, at least with respect to the
second order properties. The discrepancy in the outcome of the test when applied to
the whole year and to a season is probably due to the annual change of the position
of the Honolulu observatory relative to the Sun whose energy drives the convective
currents mainly responsible for the daily variation.
The two examples discussed in this section show that our test can detect depar-
tures from the assumption of independence (credit card data) or from the assumption
of identical distribution (magnetometer data), and confirm both assumptions when
they are expected to hold. In our examples, the results of the test do not depend
much on the choice of the smoothing basis.
X
p n o
rOf;h .i; j /Orb;h .i; j / D tr COTC
O 1 O O 1
h 0 Ch C0 (7.12)
i;j D1
and that
X
p n o
cOh2 .i; j /O 1 O 1 O T O 1 O O 1 ;
i j D tr Ch Ch
i;j D1
O 0 D .
so it suffices to verify that C O
114 7 Portmanteau test of independence
AssumingRthat the sample mean function has been subtracted from the data, we
O 0 is
have XO i n D Xn .t/vO i .t/dt. Therefore the .i; j / entry of C
X
N
cO0 .i; j / D N 1 XOi n XO j n
nD1
N Z
X Z
D N 1 Xn .t/vO i .t/dt Xn .s/vO j .s/ds
nD1
Z N Z
!
X
1
D vOj .s/ N Xn .t/vO i .t/dtXn .s/ ds
nD1
Z
D vOj .s/CO .vO i /.s/ds
Z
D vOj .s/.O i vO i /ds D O i ıij : t
u
P
Proof of Theorem 7.1:. By Theorem 7.6, it is enough to show that QO N QN ! 0:
Recall from Section 7.1 that the value of QO N does not change if we replace vO k by
vkN D cOk vOk , where cOk is defined in Section 2.5. In the following, we replace vO k by
vkN .
P
By (7.7), relation QO N QN ! 0 will follow if we show that
P
O 0 C0 !
C 0 (7.13)
and
P
O h Ch / !
N 1=2 .C 0; h 1: (7.14)
Recall that
N h N h
1 X 1 X O O
ch .k; l/ D Xkn Xl;nCh I cOh .k; l/ D Xkn Xl;nCh:
N nD1 N nD1
P
We will first show that N 1=2 M1 ! 0. Observe that
X
N h
N 1=2 M1 D N 1=2 hXn ; vk vkN i hXnCh ; vl i
nD1
* h
+
X
N
1=2
D N hXnCh ; vl i Xn ; vk vkN
nD1
D hSN ; YN i ;
where
X
N h
SN WD N 1=2 hXnCh ; vl i Xn I YN D vk vkN :
nD1
P
To show that N 1=2 M1 ! 0, it thus remains to verify that EkSN k2 is bounded.
Notice that
X
N h
EkSN k2 D N 1 Ek hXnCh ; vl i Xn k2
nD1
X
N h
D N 1 E hXmCh ; vl i hXnCh ; vl i hXm ; Xn i
m;nD1
X
N h
D N 1 EŒhXnCh ; vl i2 EkXn k2
nD1
2
EkXn k2 :
P
To show that N 1=2 M2 ! 0, decompose M2 as M2 D M21 C M22 , where
N h
1 X
M21 D hXn ; vk i hXnCh ; vl vlN i I
N nD1
N h
1 X
M22 D hXn ; vkN vk i hXnCh ; vl vlN i :
N nD1
P P
By the argument developed for M1 , N 1=2 M21 ! 0, so we must show N 1=2 M22 !
0. This follows from Lemma 7.4. t
u
116 7 Portmanteau test of independence
P
c1 .k; l/ ! lk k (7.15)
and
P
cO1 .k; l/ c1 .k; l/ ! 0: (7.16)
P
O h Ch !
C 0; (7.17)
X
N 1
c1 .k; l/ D N 1 hvk ; Xn i hvl ; XnC1 i
nD1
X
N 1 X
N 1
D N 1 hvk ; Xn i hvl ; .Xn /i C N 1 hvk ; Xn i hvl ; "nC1 i
nD1 nD1
1
X
P ˝ ˛
! E Œhvk ; Xn i hvl ; .Xn /i D lj EŒhvk ; Xn i vj ; Xn
j D1
1
X 1
X
˝ ˛ ˝ ˛
D lj C.vk /; vj D lj k vk ; vj
j D1 j D1
D lk k :
To prove (7.17), we use the notation introduced in the proof of Theorem 7.1. We
P P
must show that M1 ! 0 and M2 ! 0. We will display the argument only for M1 .
Observe that
* h
+
X
N
M1 D N 1 hXnCh ; vl i Xn ; vk vkN :
nD1
P
By Theorem 13.2, kvk vkN k ! 0. Since,
X
N h
EkN 1 hXnCh ; vl i Xn k Ek hXnCh ; vl i Xn k EkXn k2 ;
nD1
P
it follows that M1 ! 0. t
u
7.5 Auxiliary lemmas for H -valued random elements 117
2 N h
D EkX0 k2 : t
u
N2
Lemma 7.3. Suppose fUN g and fVN g are random sequences in a Hilbert space
P
such that kUN k ! 0 and kVN k D OP .1/ i.e. limC !1 lim supN !1 P .kVN k >
C / D 0: Then
P
hUN ; VN i ! 0:
118 7 Portmanteau test of independence
Proof. The Lemma follows from the corresponding property of real random
sequences and the inequality j hUN ; VN i j kUN kkVN k: t
u
Then, for h 1,
X
N h
P
N 1=2 hXn ; YN i hXnCh ; ZN i ! 0:
nD1
X
N h D E
N 1=2 hXn ; YN i hXnCh ; ZN i D CN;h .YN /; N 1=2 ZN ;
nD1
with the operator CN;h defined in (7.18). Since P .N 1=2 kZN k > C /
C 2 NEkZN j2 , N 1=2 kZN k D OP .1/. Thus, by Lemma 7.3, it remains to verify
P
that CN;h .YN / ! 0. Since the Hilbert–Schmidt norm is not less than the uniform
operator norm k kL , see Section 2.1, we obtain from Lemma 7.2:
If v ¤ 0, then
X
N
d
N 1=2 Yt ! N.0; v/
t D1
and # "
1 X
N
v D lim N Var Yt :
N !1 N t D1
We first find the asymptotic distribution of C0 . We will show that N 1=2 .C0 V /
tends to a matrix Z0 whose entries Z0 .k; l/ are jointly Gaussian mean zero.
Observe that
Xp
1 X
N
akl Œc0 .k; l/ v.k; l/ D Yt ;
N t D1
k;lD1
where
X
p
Yt D akl .Xkt Xlt v.k; l//:
k;lD1
X
p
D akl aij E .Xkt Xlt v.k; l//.Xi t Xjt v.i; j //
k;l;i;j D1
X
p
D akl aij Œ.k; l; i; j / v.i; j /v.k; l/;
k;l;i;j D1
where
.k; l; i; j / D EŒXkt Xlt Xi t Xjt : (7.22)
120 7 Portmanteau test of independence
X
p
N 1=2 akl Œc0 .k; l/ v.k; l/
k;lD1
0 1 (7.23)
d X
p
! N @0; akl aij Œ.k; l; i; j / v.i; j /v.k; l/A :
k;l;i;j D1
X N h
1 X
p
akl ch .k; l/ D Yt ;
N t D1
k;lD1
where
X
p
Yt D akl Xkt Xl;t Ch:
k;lD1
The Yt are identically distributed with mean zero and are h–dependent. Observe that
X
p
EYt2 D akl aij v.k; i /v.l; j /
k;l;i;j D1
X
p
d X
p
N 1=2 akl ch .k; l/ ! N.0; akl aij v.k; i /v.l; j //
k;lD1 k;l;i;j D1
what is equivalent to
d
N 1=2 Ch ! Zh ; (7.26)
where Zh is a random matrix with jointly Gaussian mean zero entries
Zh .k; l/; k; l D 1; : : : ; p; with
Theorem 7.4. If the Xt are iid with finite fourth moment, then
d
N 1=2 ŒC0 V; C1 ; : : : ; CH ! ŒZ0 ; Z1 ; : : : ; ZH ;
X
p
D Œa0kl a0ij ..k; l; i; j / v.i; j /v.k; l// C a1kl a1ij v.k; i /v.l; j /:
k;l;i;j D1
(7.29)
We must thus show that the left–hand side of (7.28) converges to a normal random
variable with mean zeroP and variance (7.29). Observe that the left–hand side of
(7.28) is equal to N 1=2 Nt D1 Yt , where
X
p
Yt D Œa0kl .Xkt Xlt v.k; l// C a1kl Xkt Xl;t C1 :
k;lD1
The Yt are identically distributed with mean zero and are 1–dependent. Direct ver-
ification shows that the variance of Yt is equal to (7.29) and autocovariances of the
Yt at positive lags vanish. Convergence (7.28) follows therefore from Theorem 7.3.
t
u
We now want to find the asymptotic distribution of C1 0 . We first state a propo-
sition which is a matrix version of the delta method and essentially follows from
Proposition 6.4.3 in Brockwell and Davis (1991) by writing the matrices as column
vectors, e.g. we write a 2 2 matrix with entries aij as Œa11 ; a12 ; a21 ; a22 T .
Proposition 7.1. Suppose AN is a sequence of p q matrices such that for some
deterministic matrix of the same dimension
1 d
cN .AN / ! Z .cN ! 0/; (7.30)
122 7 Portmanteau test of independence
1 d
cN .g.AN / g.// ! rg./.Z/; (7.31)
X q
p X
@gij ./
Œrg./.Z/.i; j / D Z.k; l/: (7.32)
@´.k; l/
kD1 lD1
Consider the function g.A/ D A1 from Rp Rp into itself. The derivative of
this function at an invertible matrix V, rg.V/, is the linear operator
see e.g. Noble (1969), p. 24, Exercise 1.50. We want to identify the partial
derivatives @gij .V/=@´.k; l/ appearing in (7.32). Let u.k; l/ be the .k; l/–entry
1 1 1
Pp V . Direct verification shows that the .i; j /–entry of V ZV
of is
k;lD1 u.i; k/u.l; j /´.k; l/, so
@gij .V/
D u.i; k/u.l; j /: (7.34)
@´.k; l/
From (7.24) and Proposition 7.1, we thus obtain
d
N 1=2 .C1 1
0 V / ! Y0 ; (7.35)
X
p
Y0 .i; j / D u.i; k/u.l; j /Z0 .k; l/: (7.36)
k;lD1
d
.2p/ p matrix and apply Proposition 7.1. By Theorem 7.4, N 1=2 ŒC0 V; Ch T !
ŒZ0 ; Zh T : Consider the function g.A1 ; A2 / D A1
1 A2 : By Proposition 7.1, with
D ŒV; 0T ,
d
N 1=2 C1 T
0 Ch ! rg./.ŒZ0 ; Zh /:
We must find the explicit form of rg./. The map ŒA1 ; A2 T 7! A1 A2 has deriva-
tive
ŒH1 ; H2 T 7! H1 A2 C A1 H2 ;
see e.g. Noble (1969), p. 24, Exercise 1.50. Combining this with (7.33), we obtain
Using the same technique as in the proof of Theorem 7.4, relation (7.37) can be
extended to the following theorem:
Theorem 7.5. If the Xt are iid with finite fourth moment, then
d
N 1=2 C1 1
0 ŒC1 ; : : : ; CH ! V ŒZ1 ; : : : ; ZH ; (7.38)
X
H X
p
QN D N rf;h .i; j /rb;h .i; j /: (7.39)
hD1 i;j D1
d
Theorem 7.6. If the Xt are iid with finite fourth moment, then QN ! 2p2 H .
and that convergence (7.38) and (7.40) are joint. Since the matrices
ŒC1 1
0 Ch ; Ch C0 are asymptotically independent, it suffices to verify that
X
p
d
N rf;h .i; j /rb;h .i; j / ! 2p2 : (7.41)
i;j D1
To lighten the notation, in the remainder of the proof we suppress the index h
(the limit distributions do not depend on h).
Denote by f .i; j / and b .i; j /, respectively, the entries of matrices V1 Z and
ZV1 . By (7.38) and (7.40), it suffices to show that
X
p
d
f .i; j /b .i; j / D 2p2 : (7.42)
i;j D1
ZQ 0 .U ˝ U/ZQ D 2p2 :
d
(7.43)
124 7 Portmanteau test of independence
It remains to show that the LHS of (7.42) is equal to the LHS of (7.43). The entry
Z.i; k/ of the vector ZQ T multiplies the row u.i; /u.k; / of U ˝ U; the entry Z.j; l/
of ZQ multiplies the column u.; j /u.; l/. Consequently,
X
p
ZQ 0 .U ˝ U/ZQ D u.i; j /u.k; l/Z.i; k/Z.j; l/
i;j;k;lD1
X
p X
p X
p
D u.i; j /Z.j; l/ Z.i; k/u.k; l/
i;lD1 j D1 kD1
X
p
D f .i; l/b .i; l/;
i;lD1
In this chapter we review some important ideas related to the functional linear
model. Like its multivariate counterpart, this model has been developed in various
directions, and has been found to be extremely useful in a broad range of applica-
tions. The relevant research is very rich and multifaceted, and we do not aim at a full
review of the very extensive literature on this subject. Our objective in this chapter
is to explain briefly the general ideas and point to some recent advances. Some addi-
tional references are given in Section 8.7. Our choice of topics is partially motivated
by the the methodology presented in Chapters 9, 11 and 10. Practically all inferen-
tial tool for the functional linear model have been developed under the assumption
that the regressor/response pairs, .Xi ; Yi /, are independent. They must therefore be
applied with care to functional data obtained over time or space.
8.1 Introduction
The linear regression is perhaps the most useful and widely used statistical model.
The simplest linear model is the familiar straight line regression
Yi D ˇ0 C ˇ1 xi C "i ; i D 1; 2; : : : ; N;
in which all random variables are scalars, and the regressors xi are typically
assumed to be known scalars. In a functional linear model, some of these quanti-
ties are curves, and analogs of the coefficients ˇ0 and ˇ1 must be then appropriately
defined.
To provide a motivating example, we start with a problem studied in Chiou et al.
(2004) and based on an experiment reported in Carey et al. (2002) in which 1200
female medflies were fed one of 12 dietary doses ranging from full diet to 30% of
full diet. For each medfly, the count of eggs laid every day was recorded, and so the
egg-laying trajectories were obtained. Some of those are shown in Figure 8.1. As
expected, the total count of eggs increases with the dietary dose, but a biological
question of interest is whether this increase is due to a systematic increase at all
L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, 127
Springer Series in Statistics 200, DOI 10.1007/978-1-4614-3655-3_8,
© Springer Science+Business Media New York 2012
128 8 Functional linear models
Fig. 8.1 Smoothed egg-laying trajectories of twenty randomly selected med flies at dose levels
100%, 75% and 50%. Source: Chiou et al. (2004).
ages, or whether the different diet levels lead to different patterns of egg-laying. For
example, on a reacher diet, flies could start laying eggs earlier, and continue to lay
them well into a mature age, or produce a lot more eggs at the prime reproductive
age. To study this question, it is convenient to consider a linear model in which the
dose levels are scalar regressors and the egg–laying curves are functional responses.
We can distinguish three cases, in which either the responses or the regressors,
or both are curves. We assume for simplicity that the responses and the regressors
have mean zero. In all formulations, we assume that the errors "i are independent of
the explanatory variables (regressors) Xi .
The fully functional model:
Z
Yi .t/ D .t; s/Xi .s/ds C "i .t/: (8.1)
In this model, the responses Yi are curves, and so are the regressors Xi . It is
further studied in Section 8.3 and Chapters 9, 11, 10.
The scalar response model:
Z
Yi D .s/Xi .s/ds C "i ; (8.2)
in which the regressors are curves, but the responses are scalars. The properties
and extensions of this model are reviewed in Section 8.4.
The functional response model:
in which the responses are curves, but the regressors are known scalars. Exten-
sions of this model are described in Section 8.5.
Models (8.1), (8.2) and (8.3) are just prototypes intended to illustrate the general
idea. The main issue is that the functions are infinite dimensional objects which
8.2 Standard linear model and normal equations 129
The standard linear model, see e.g. Chapter 3 of Seber and Lee (2003), takes the
form
Y D Xˇ C ";
where
Y is the N 1 vector of responses;
X is the N p regression matrix, typically assumed to be of rank p;
ˇ is the p 1 parameter vector;
" is the N 1 vector of mean zero errors.
The least squares estimator of ˇ minimizes the Euclidean norm of the difference
Y Xˇ. Set D Xˇ, and denote by O the projection of Y on the subspace LX Rn
spanned by the columns of X. Thus O is the unique vector minimizing the length of
Y over 2 LX . The vector Y O is orthogonal to LX , so XT .Y O / D 0,
i.e. XT O D XT Y. If X is of rank p, there is a unique Ǒ such that O D X Ǒ , in which
case Ǒ satisfies the normal equations
XT Xˇ D XT Y:
and we think of
yi ; xi D Œxi1 ; xi 2 ; : : : ; xip T ; "i
130 8 Functional linear models
y; x D Œx1 ; x2 ; : : : ; xp T ; ":
y D xT ˇ C ": (8.5)
In Equation (8.1), the value .t; s/ reflects the effect of the explanatory function Xi
at time s on the response function Yi at time t. To develop an estimation procedure
analogous to the least squares estimation described in Section 8.2, and implemented
in the R package fda, equation (8.1) is rewritten in the following form:
Z
Y.t/ D X.s/ˇ.s; t/ds C ".t/; (8.6)
where
ˇ.s; t/ D .t; s/;
and where
Suppose fk ; k 1g and f ` ; ` 1g are some bases, for example Fourier and
spline, which need not be orthonormal. The functions k are suitable for expanding
the functions Xi and the i for expanding the Yi . The idea of the estimation of the
kernel ˇ.; / is to consider estimates of the form
X
K X
L
ˇ .s; t/ D bk` k .s/ ` .t/;
kD1 `D1
in which K and L are relatively small numbers which are used as smoothing param-
eters; the smaller K and L, the smoother the estimate of ˇ.; /. A least squares esti-
mator is then obtained by finding bk` which minimize the residual sum of squares:
X
N Z
kYi Xi .s/ˇ .s; /k2
i D1
Consistency properties of this approach are not fully understood, but it gives useful
estimates, which can be computed analogously to the standard vector case.
8.3 The fully functional model 131
B D Œbk;` ; 1 k K; 1 ` L:
To obtain an analog of the normal equations of Section 8.2, multiply by ZT and
ignore the error terms. This gives an approximate identity
Z
ZT Z BJ D ZT Y.t/ T .t/dt; (8.8)
where the vj are the FPC’s of X and the uj the FPC’s of Y , see Section 3.3, and
˝ ˛
i D hX; vi i ; j D Y; uj :
Lemma 8.1. Suppose X; Y; " 2 L2 are mean zero, " is independent of X , and the
following linear equation holds
Z
Y .t/ D .t; s/X.s/ds C ".t/; (8.10)
Then
1 X
X 1
EŒ` k
.t; s/ D uk .t/v` .s/;
EŒ`2
kD1 `D1
Proof. Since fvi g and fuj g are bases in L2 , the functions fvi .s/uj .t/; 0 s; t 1g
form a basis in L2 .Œ0; 1 Œ0; 1/, so .t; s/ admits a unique representation
1 X
X 1
.t; s/ D k` uk .t/v` .s/; (8.12)
kD1 `D1
Inserting (8.9) and (8.12) into (8.10), and using the orthonormality of the vi , we
obtain
X1 X1 X 1
j uj .t/ D ki i uk .t/ C ".t/:
j D1 kD1 i D1
It can be conversely assumed that (8.10) and (8.15) hold, and then the implication
that satisfies (8.11) will follow. This is the approach adopted by Yao et al. (2005b).
X
K X
L
O KL .t; s/ D O 1
` O `k u
O k .t/vO ` .s/;
kD1 `D1
1 X
N
O `k D hXi ; vO ` i hYi ; uO k i : (8.16)
N
i D1
It is clear that for this estimator, O KL .t; s/ does not depend on the signs of the vO`
and uO k .
If the curves Xn ; Yn ; n D 1; 2; : : : ; N are observed at sparse, irregular times,
and are subject to measurement error, Yao et al. (2005b) propose the following
procedure to calculate O `k . In the notation of Section 4.2, observe that
1 X
X 1
c21 .t; s/ D EŒX.s/Y .t/ D EŒi j vi .s/uj .t/;
i D1 j D1
134 8 Functional linear models
and so ZZ
EŒ` k D c21 .t; s/v` .s/uk .t/dsdt:
The surface c21 .; / is estimated by two–dimensional scatter plot smoothing, and
the resulting estimate is denoted by cO21 .; /. We then set
ZZ
O `k D cO21 .t; s/vO ` .s/uO k .t/dsdt: (8.17)
To establish the consistency of O KL .t; s/, we must assume that K and L are
functions of the sample size N . Then, under regularity conditions,
ZZ h i2 P
O KL .t; s/ .t; s/ dt ds ! 0; .K; L ! 1/;
1
X 1 X
X 1
EŒY jX D EŒ` jX u` D `i i u` :
`D1 `D1 i D1
where the functions f`i are assumed to be smooth. Assuming that the scores i are
independent, they develop a model fitting approach which is easy to implement.
g on Œ0; 1. An estimator that involves both restricting the number of basis functions
and a smoothness penalty is obtained by minimizing
XN Z 2 Z h i2
Yi g .s/Xi .s/ds C g .m/ .s/ ds:
i D1
PK
An estimate of .s/ is g .s/ D kD1 bk k .s/. Using the orthogonality of the
functions in the Fourier basis and their derivatives, we see that this reduces to finding
b1 ; b2 ; : : : ; bK which minimize
" #2 Z h
XN X
K XK i2
Yi hXi k i k
; b C b 2
k k.m/ .s/ ds:
i D1 kD1 kD1
An asymptotic setting for the estimation with large K and the roughness penalty
only, can be obtained informally by setting K D 1, and formally by assuming that
the potential estimates g are in a subspace of L2 of sufficiently smooth and periodic
functions. We therefore introduce the following definition:
m
Definition 8.1. The space W2;per L2 consists of m times differentiable functions,
such that g .m/ 2 L2 , and for 0 m 1, g . / is absolutely continuous, and
g . / .0/ D g . / .1/.
m
The space W2;per is an example of a Sobolev space, i.e. a space in which inte-
grability conditions are imposed not only on functions but also on their derivatives.
In order to develop a rigorous theory involving smoothing of functional data by
a roughness penalty, it is necessary to work with such spaces. In this setting, the
estimator O 1; is defined as the function g 2 W2;per
m
which minimizes
N
X Z 2 Z h i2
Yi g.s/Xi .s/ds C g.m/ .s/ ds:
i D1
provided the smoothing parameter tends to zero with N , but not too fast, see
Theorem 5 of Li and Hsing (2007) for the details.
For a general basis fk g, e.g. a nonorthogonal spline basis, an estimator of the
P
coefficient function .s/ of the form K kD1 bk k .s/ is obtained by minimizing
" #2 " K #2
XN XK X Z .m/
Yi hXi ; k i bk C bk k .s/ds ; (8.18)
i D1 kD1 kD1
136 8 Functional linear models
with m D 2 being the typical choice. In this approach, the emphasis is on choosing
an appropriate smoothing parameter , while the number K of basis is assumed to
be large.
An alternative approach regularizes the estimates of by projecting the regres-
sors onto the p leading EFPC’s (those
P corresponding to the largest eigenvalues), i.e.
by using the approximation Xn piD1 hXn ; vO i i vO i ; in which p is a small number.
P
The coefficient function .s/ is then estimated by piD1 O i vO i .s/, with the O i being
the values of the i which minimize
" #2
XN Xp
Yn hXn ; vOi i i : (8.19)
nD1 i D1
Reiss and Ogden (2007) proposed several hybrid methods which combine the
above two approaches, i.e. projecting onto the EFPC’s of the regressors and using
the roughness penalty. We describe only one of them, called FPCRR by the authors,
which appears to be most effective. The acronym FPCR stands for Functional Prin-
cipal Component Regression, the subscript R indicated that a roughness penalty
is applied to the regression rather than the components, the latter method being
denoted FPCRC . We focus on the general idea, the precise formulas and the compu-
tational details are given in Reiss and Ogden (2007).
As in (8.18), we seek coefficients bk such that we can obtain a good and infor-
mative approximation
X
K
Yi XQ i k bk ; where XQ i k D hXi ; k i :
kD1
This brings us to the framework of the standard linear model of Section 8.2. Denote
by
vQ j D ŒvQ j1 ; vQ j 2 ; : : : ; vQjK T ; 1 j K;
the multivariate principal components of the vectors
Q i D ŒXQi1 ; XQi 2 ; : : : ; XQ iN T ;
X 1 i N:
The vQ j are the normalized eigenvectors of the sample covariance matrix of the X Q i,
they coincide with the vectors uj of Theorem 3.1, see also Chapter 8 of Johnson and
Wichern (2002) for further details. The coefficient vector b D Œb1 ; b2 ; : : : ; bK T is
projected on the first p vQ j (those corresponding to the largest eigenvalues), what
yields
Xp
bk ˇj vQjk ; 1 k K:
i D1
The ˇj are estimated by minimizing
ˇ ˇ2 2 32
XN ˇ X p Xk ˇ Xp XK Z
ˇ ˇ
ˇY n ˇj XQ jk vQjk ˇˇ C 4 ˇj vQ jk k .s/ds 5 :
.m/
ˇ
nD1 ˇ j D1 kD1 ˇ j D1 kD1
8.4 The scalar response model 137
Selection of p and is discussed in Reiss and Ogden (2007) and Reiss and Ogden
(2009a). Denoting the resulting estimates by ˇOi , the estimate of .s/ is then
X
K X
p X
K
O .s/ D bk k .s/ D ˇOj vQ jk k .s/:
kD1 j D1 kD1
They are estimated by an iterative procedure. For a fixed , r./ is estimated by local
linear smoothing. For a fixed r./, is estimated by weighted least squares. These
steps are repeated one after another until the differences in the estimates become
negligible, i.e. until convergence is achieved. Local linear smoothing assumes that
if ´ is close to ´k , then r.´/ a0k C a1k .´ ´k /. Thus, in a neighborhood of ´k ,
(8.20) becomes
X
p
˝ ˛
Yi D fa0k C a1k .´i ´k /g ˛j ej ; Xi C ˇ´i C "i :
j D1
To estimate a0k and a1k , we fix (starting with a reasonable initial value) and
minimize
2 32
X
N X
p
˝ ˛
Rk D wi .k/ 4Yi fa0k C a1k .´i ´k /g ˛j ej ; Xi C ˇ´i 5 ; (8.21)
i D1 j D1
where the weights wi .k/ decrease as j´i ´k j increases. These weights are obtained
as
"N #1
X j´` ´k j j´i ´k j
wi .k/ D K K ;
h h
`D1
Model (8.3) is too simple for most applications. A useful extension is to consider
more than one parameter functions, what leads to the specification
X
L
Yi .t/ D xij j .t/ C "i .t/; i D 1; 2; : : : N;
j D1
which, analogously to (8.6), can be written as Y.t/ D X .t/ C ".t/. The parameter
.t/ D Œ 1 .t/; : : : ; L .t/T can be estimated by using a version of the procedure
described in Section 8.3. Chapter 13 of Ramsay and Silverman (2005) contains two
interesting applications of this model.
Chiou et al. (2004) propose a model in which the intercept function depends
on the regressor. Their formulation is suitable for experiments in which multiple
responses are available for every level of the explanatory variable, like the med fly
data described in Section 8.1, where there are almost 100 responses for every diet
level. To introduce that model set .t/ D EY .t/ and denote by .x/ the value of
which minimizes Z
fEŒY .t/jX D x .t/ g2 dt:
If we assume that Yi .t/ D .t/ .xi / C "i .t/; we obtain the multiplicative model
of Chiou et al. (2003) which can be easily estimated by using the sample analogs of
the expectations occurring above and some smoothing. To improve the predictions
of the functions Yi , Chiou et al. (2004) propose the model
X
K
Yi .t/ D .t/ .xi / C ˛k .xi / k .xi ; t/ C "i .t/; (8.22)
kD1
in which the k .x; / are the FPC’s of the functions R.x; t/ D Yi .t/ .t/ .x/.
To estimate this model, the k .xi ; t/ are estimated by the EFPC’s of the residuals
O
R.x; O O .x/, and the link functions ˛k by the general methods
t/ D Yi .t/ .t/
developed in Chiou and Müller (1998).
In this section we discuss several diagnostic methods for functional regression mod-
els. We first review the relevant ideas in the standard setting of Section 8.2. Our
objective is to verify if model (8.5) is appropriate.
140 8 Functional linear models
An elementary approach is to plot the responses yi against the regressors xij for
j D 1; 2; : : : ; p. If model (8.5) is correct, all these scatter plots should approxi-
mately follow a line with some spread around it, and have roughly the shape of an
ellipse.
There are many possible departures from the model (8.5), an in–depth study is
presented in Chapter 10 of Seber and Lee (2003). Here we focus only on two impor-
tant cases. By (8.5), the conditional expectation EŒyjx D xT ˇ is a linear function
of x. If, in fact EŒyjx D .x/ is not a linear function, then model (8.5) is not appro-
priate. Also if (8.5) holds, then VarŒyjx D VarŒ" is constant. If VarŒyjx D w.x/,
where w./ is not a constant function, then model (8.5) is not appropriate either.
Focusing first on the conditional expectation EŒyjx, suppose the data follow the
model
y D ˇ1 g.x1 / C ˇ2 x2 C : : : ˇp xp C ";
where g./ is a nonlinear function. This relation can be rewritten as
y D ˇ1 x1 C ˇ2 x2 C : : : ˇp xp C .ˇ1 .g.x1 / x1 / C "/ ;
i.e. as a linear regression in which the error terms have a mean which depends on
the value of x1 in a nonlinear manner. Estimating this regression by the least squares
method, we obtain the fit
yi D xi1 ˇO1 C xi 2 ˇO2 C : : : xip ˇOp C eOi :
For example, if the model is yi D ˇg.xi / C "i , but we think it is yi D ˇxi C "i ,
P P
the least squares estimate is ˇO D . xi2 /1 yi xi . The residual then is
O
eOi D yi ˇg.x O
i / D "i C .ˇ ˇ/g.xi /:
Thus, if g./ is nonlinear, the plot of the eOi versus the xi will reveal this nonlinearity.
It can be hoped that if y depends in a nonlinear manner on some coordinates
xj ; j D 1; 2; : : : ; p, then this nonlinearity will be revealed by one of the plots of
the eOi against the xij . If the "i in (8.4) do not have a constant variance, then the
residulas "Oi D yi .xi1 ˇO1 C xi 2 ˇO2 C : : : xip ˇOp / should exibit uneven spread with
respect to some variable. If VarŒ"i depends in some manner on the xi , then the plots
of the "Oi against the xij should reveal it.
If model (8.5) is correct, it is useful to check how well the data are described by
it. A commonly used measure is the coefficient of determination defined as
PN
.yOi yNN /2
RO 2 D PiND1 ;
i D1 .yi yNN /
2
P T Ǒ
where yNN D N 1 N i D1 yi and yOi D xi : It measures the proportion of the total
sample variance of the responses explained by the model. The population coefficient
of determination is defined as
VarŒEŒyjx
R2 D : (8.23)
VarŒy
We now discuss how these approaches can be adapted to functional linear
models.
8.6 Evaluating the goodness–of–fit 141
Scatter plot analysis. The informal graphical methods can be extended to the func-
tional setting as follows. Consider, for example, the fully functional model (8.1).
Then, by (8.14),
X p
` D `j j C ` .p/; (8.24)
j D1
where
1
X
` .p/ D `j j C hu` ; "i :
j DpC1
Equation (8.24) resembles (8.5) with the response ` and the regressors j , but the
errors ` .p/ areP no longer independent of the regressors. Nevertheless, in light of
(8.13), the sum 1 j DpC1 `j j can be expected to be small, so we may hope that
if model (8.1) is appropriate, then the scatter plots of the Oi ` against the Oij , i D
1; 2; : : : ; N; will approximatelly follow a line. Recall that
Z Z
Oi ` D Yi .t/uO ` .t/dt; Oij D Xi .s/vO j .s/ds
are, respectively, the scores of the Yi and the Xi in (8.1). When the dependence is
not linear, these plots exhibit different patterns. For example, if
Yi .t/ D H2 .Xi .t// C "i .t/;
where H2 .x/ D x 2 1, the scatterplot of the first FPC clearly shows a quadratic
trend, see Figure 9.4. In applications, we consider only the first few values of ` and
j , see Chiou and Müller (2007) for examples.
As in the standard regression model, one can also work with the residuals
Z
"Oi .t/ D Yi .t/ O .t; s/Xi .s/ds; i D 1; 2; : : : ; N;
Proof. By (8.9),
20 12 3
1
X
6 7
VarŒY .t/ D E 4@ j uj .t/A 5
j D1
1 X
X 1
D EŒj j 0 uj .t/uj 0 .t/
j D1 j 0 D1
1
X
2
D j uj .t/:
j D1
R
Since EŒY .t/jX D .t; s/X.s/ds, we obtain
"Z 2 #
VarŒEŒY .t/jX D E .t; s/X.s/ds
ZZ
DE .t; s/ .t; s 0 /X.s/X.s 0 /dsds 0 :
Observe that
Z Z
E v` .s/X.s/ds v`0 .s 0 /X.s 0 /ds 0 D E Œhv` ; X i hv`0 ; X i
Therefore
X 1 X
1 X 1
EŒ` k EŒ` k 0
VarŒEŒY .t/jX D ` uk .t/uk 0 .t/;
0
` `0
`D1 kD1 k D1
and so we obtain
1 X
X 1 X
1
EŒ` k EŒ` k 0 1
` uk .t/uk 0 .t/
VarŒEŒY .t/jX `D1 kD1 k 0 D1
D 1 :
VarŒY .t/ X
2
j uj .t/
j D1
The coefficient R2 .t/ (8.25) quantifies the degree to which the functional linear
model explains the variability of the response curves at a fixed point t. To define a
global measure of the degree of linear association, we can either integrate R2 .t/ or
integrate the numerator and the denominator separately, to obtain
Z
RQ 2 D R2 .t/dt
and
1 X
X 1
Z
.EŒm k /2 1
m
VarŒEŒY .t/jX dt
mD1 kD1
R2 D Z D 1 :
X
VarŒY .t/dt j
j D1
A closed form formula for RQ 2 is not available. Both RQ 2 and R2 are between 0 and
1. (If the function Y is defined on an interval Œa; b rather than Œ0; 1, then we define
Rb
RQ 2 D .b a/1 a R2 .t/dt.)
Sample analogs of R2 .t/; RQ 2 and R2 are defined by replacing the population
eigenfunctions and eigenvalues by their sample counterparts, and truncating the infi-
nite sums, for example,
X M X K
2 O 1
O mk m
RO 2 D
mD1 kD1
X
J
Oj
j D1
The functional linear model is introduced in its various forms in Chapters 12–17 of
Ramsay and Silverman (2005). Additional case studies are described in Chapters
8, 9 and 12 of Ramsay and Silverman (2002). Examples of R code are discussed
in Chapters 9 and 10 of Ramsay et al. (2009), which also give a quick application
oriented introduction.
An important application of the functional linear model is the prediction of the
response Y given a new observation of the explanatory variable X . This can be
done without postulating a linear relationship. Such nonparametric approaches are
discussed in Chapters 5, 6 and 7 of Ferraty and Vieu (2006). The general idea is to
nonparametrically estimate the a function r in a relation Yi r.Xi /. Such methods
have been developed for scalar responses. Model (8.20) can be viewed as a hybrid
nonparametric/linear model. The book of Shi and Choi (2011) studies Bayesian
methods for Gaussian functional regression.
Another important application of FDA is in classification problems; an example
is given in Section 1.4. In addition to a gene’s temporal expression profile, other
factors or covariates may be important. Motivated by such settings, Ma and Zhong
(2008) consider what can be called a functional nonparametric mixed effect model
of the form Yi .t/ D .Xi .t// C Zi .t/bi C "i .t/, where is a smooth function,
bi is a mean zero column random vector of dimension m with a covariance matrix
B, and Zi .t/ D ŒZi1 .t/; Zi 2 .t/; : : : ; Zi m .t/ is a design matrix. The estimation of
is formulated using the notion of the reproducing kernel Hilbert space (RKHS)
which is necessary to accommodate smoothness properties of the estimates, a point
not addressed in this book. To explain briefly, note that smoothness connects the
values of a function evaluated at neighboring points. In the space L2 , the value
x.t/ at any given t 2 Œ0; 1 is not relevant, and the functional L2 3 x 7! x.t/ is
not continuous. It can be defined as a continuous functional on a smaller space of
functions in L2 with square integrable second derivatives, similar to the Sobolev
space defined in Section 8.4. It is an example of a RKHS with a suitably defined
inner product h; iRKHS . On that space, by Riesz’ representation theorem, x.t/ D
hx; Rt iRKHS , for some element Rt of the RKHS. The value Rt .s/ of the function Rt
at point s 2 Œ0; 1 is typically denoted R.t; s/, and the function R.; / is called the
reproducing kernel. An interested reader is referred to Gu (2002).
A central issue for functional data is dimension reduction appropriate for a given
problem. Li and Hsing (2010) assume a general model Yi D f .hˇ1 ; Xi i ; : : : ;
hˇK ; Xi i ; "i /; in which the responses Yi are scalars, and the predictors Xi are func-
tions; f is an arbitrary function and ˇ1 ; : : : ; ˇK are linearly independent functions.
The functions f and ˇk are unknown, and K is also unknown. The problem of
interest is to test for and estimate the value of K, which is called the dimension of
the effective dimension reduction space.
We now list several other references related to the issues discussed in this chapter.
Cuevas et al. (2002) discuss the functional model in which the explanatory variables
are fixed rather than random functions; we focus in this book on the latter case. Mal-
fait and Ramsay (2003) emphasize that in many situations the general model (8.1)
8.7 Bibliographical notes 145
is inappropriate because the response Yi .t/ can depend only on the values of Xi .s/
for s t. McKeague and Sen (2010) study a scalar response impact point model
Y D ˛ C ˇX. / C " in which Y depends on the function X only through an
unknown point 2 .0; 1/. In an application, corresponds to a gene location that
impacts the response Y . Febrero-Bande et al. (2010) study the detection of influ-
ential data points in a functional model with scalar responses. Chiou et al. (2004)
discuss functional response models and give interesting data examples. Cardot et
al. (2003b) discuss estimation with splines, while Cardot et al. (2003c) present an
interesting application to predicting land use from remote sensing data. Cai and Hall
(2006) study theoretical foundations of prediction in the scalar linear model. Reiss
and Ogden (2010) introduce a linear model with images as explanatory variables.
Chapter 9
Test for lack of effect in the functional linear
model
In this chapter, we study the fully functional linear model (8.1) and test the nullity
of the operator , i.e.
H0 W D 0 versus HA W ¤ 0:
We thus test the null hypothesis that the curves Xn have no effect on the curves Yn .
This is analogous to testing H0 W ˇ1 D 0 in the straight line regression, yi D ˇ0 C
ˇ1 xi C "i . In the functional setting, the slope corresponds to a linear operator which
transforms functions into functions. Just as in the case of straight line regression, the
nullity of does not mean that there is no dependence between the curves Xn and
Yn , but that if there is a dependence, it cannot be described by a functional linear
model.
The usual t–test for the slope of the regression line is equivalent to an F –test. The
F –test is a standard tool for testing the significance of the coefficients in the scalar
linear model yi D ˇ0 C ˇ1 xi;1 C : : : ˇp1 xi;p1 C "i . The F –test, valid for normal
"i , is asymptotically equivalent to a 2 –test, see e.g. Chapter 4 of Seber and Lee
(2003). The test we propose is a 2 –test in which projections on the EFPC’s play
the role of the regressors. We impose only moment conditions on the distribution of
the regressor and error curves.
This chapter is organized as follows. In Section 9.1 we provide some background
and motivation for the test procedure described in Section 9.2. Its finite sample per-
formance is assessed in Section 9.3, followed by a detailed application to magne-
tometer data in Section 9.4. The asymptotic results and their extensions are stated
in Section 9.5, with the proofs presented in Section 9.6.
The test procedure described in this chapter was motivated by a question of space
physics. The most important magnetospheric phenomenon observed at high lati-
tudes, i.e. in the polar regions, are the substorms, which manifest themselves in
L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, 147
Springer Series in Statistics 200, DOI 10.1007/978-1-4614-3655-3_9,
© Springer Science+Business Media New York 2012
148 9 Test for lack of effect in the functional linear model
High−latitude High−latitude
200
50
0
30
CMO
CMO
−200
10
−400
−10
0 200 400 600 800 1000 1400 0 200 400 600 800 1000 1400
Mid−latitude Mid−latitude
20
20
BOU
BOU
5 10
15
−5 0
10
0 200 400 600 800 1000 1400 0 200 400 600 800 1000 1400
Low−latitude Low−latitude
34
30
20
30
HON
HON
10
26
0
−10
22
0 200 400 600 800 1000 1400 0 200 400 600 800 1000 1400
Fig. 9.1 Horizontal intensities of the magnetic field measured at a high-, mid- and low-latitude
stations (College, Alaska; Boulder, Colorado; Honolulu, Hawaii) during a substorm (left column)
and a quiet day (right column). The top left panel shows a typical signature of a substorm. Note
the different vertical scales for high-latitude records. Each graph is a record over one day, which
we view as a single functional observation.
a spectacular manner as the Northern Lights, the aurora borealis. There has been
some debate if the currents flowing in mid and low magnetic latitudes are “cor-
related” with the substorms. All magnetospheric currents are observed indirectly
through continuous records measured by terrestrial magnetometers. Examples of
such records, cut into one day pieces, are shown in Figure 9.1. The left top panel
shows a day with a substorm, the right top panel a day without a substorm. Compar-
ing the bottom left and right panels, little difference can be found. Some difference,
at least in the range, can be seen in the middle panels. There is thus a need for a
9.2 The test procedure 149
quantitative statistical tool for testing if the substorms have any (linear) effect on
lower latitude records. This problem is described in detail in Section 9.4.
Testing the null hypothesis of no effect exhibits new features in the functional
setting due to the fact that the data are infinitely dimensional, and every dimension
reduction technique restricts the domain of , and so leads to a loss of information
about . These issues are addressed in different contexts in Cuevas et al. (2002)
and Cardot et al. (2003). The testing procedure we propose is similar to that devel-
oped in Cardot et al. (2003) who consider scalar responses Yn . It turns out that the
more symmetric fully functional formulation actually leads to a somewhat simpler
test statistic which can be readily computed using the principal components decom-
positions of the the Yn and the Xn . Our test statistic has 2 limiting distribution
which is a good approximation for sample sizes around 50. The research presented
in this chapter is based on the papers Kokoszka et al. (2008), and Maslova et al.
(2010b).
We assume that the response variables Yn , the explanatory variables Xn and the
errors "n are zero mean random elements of the Hilbert space L2 . Denoting by X
(Y ) a random function with the same distribution as each Xn (Yn ), we introduce the
operators:
b; b; ,
and denote their empirical counterparts by C b e.g.
XN
b.x/ D 1
C hXn ; xi Xn :
N nD1
C.vk / D k vk ; .uj / D j uj :
1 X
N
b k/ D
.vk / .v hXn ; vk i Yn :
N nD1
where hXn ; vO k i is the kth score of the Xn , and hYn ; uO j i is j th score of the Yn .
These scores and the eigenvalues O k and Oj are output of functions available in the
R package fda.
4. If TON .p; q/ > 2pq .˛/, reject the null hypothesis of no linear effect. The crit-
ical value 2pq .˛/ is the (1 ˛)th quantile of the chi-squared distribution with pq
degrees of freedom.
In this section, we present the results of a small simulation study intended to evaluate
the empirical size and power of the test in standard Gaussian settings.
We used R D 1000 replications of samples of processes "n ; Xn and Yn , n D
1; 2; : : : ; N: In order to evaluate the empirical size, we generated samples of pairs
."n ; Yn / with independent components. To find the empirical power, we generated
samples of pairs ."n ; Xn / with independent components, and calculated Yn accord-
ing to (8.1). As "n ; Xn and Yn , we used Brownian bridge and motion processes in
various combinations. The computations were performed using the R package fda.
We used both Fourier and spline bases.
Since the Brownian bridge and motion have very regular Karhunen-Loève
decompositions, see e.g. Bosq (2000), p. 26, it is not surprising that the size and
power of the test do not depend appreciably on p and q. Figures 9.2 and 9.3 illus-
trate this point. The horizontal axes represent various combinations of p and q; 1
stands for p D 1 and q D 1; 2 for p D 1, q D 2; 3 for p D 1, q D 3, etc. All
combinations of p 4; q 4 were considered in the size study and p 6; q 6
in the power study. The results for Brownian bridges and motions and Fourier and
spline bases are practically the same. For this reason, we present the results only
in cases when all processes are Brownian bridges, and the analysis was performed
with the Fourier basis.
Naturally, the bigger the sample size the closer the empirical size of the test is to
the nominal size. Nevertheless, there is little or no improvement in the size of the
test starting from N D 40 – 80; these values can therefore be considered sufficient
to obtain reasonable size; with N D 40 the test being slightly conservative.
152 9 Test for lack of effect in the functional linear model
10
8
8
6
6
4
4
2
2
0
0
0 5 10 15 0 5 10 15
10
8
8
6
6
4
4
2
2
0
0 5 10 15 0 5 10 15
Fig. 9.2 Empirical size of the test for ˛ D 1%; 5%; 10% (indicated by dotted lines) for different
combinations of p and q. Here "n and Yn , n D 1; 2; : : : ; N are two independent Brownian
Bridges.
˚
.s; t/ D C exp .t 2 C s 2 /=2 ; t 2 Œ0; 1; s 2 Œ0; 1 (9.4)
with constants C such that k k < 1, i.e. jC j < 1 (the norm in this section is the
Hilbert–Schmidt norm). Panels (a) and (b) of Figure 9.3 present power when the
dependence between Xn and Yn is quite strong, k k D 0:75. For N D 80, the
power is practically 100% if k k D 0:75. The right column of Figure 9.3 shows the
power of the test when k k D 0:5. In this case power increases slower with N .
9.4 Application to magnetometer data 153
90
80
40
70
30
60
20
50
10
40
30
0
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
100.0
90
99.5
80
99.0
70
98.5
60
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
Fig. 9.3 Empirical power of the test for different combinations of principal components and dif-
ferent sample sizes N . Here Xn and "n are Brownian Bridges. In panels (a), (b) k k D 0:75; in
panels (c), (d) k k D 0:5.
plane and pointing toward the magnetic North. It most directly reflects the variation
of the M–I currents we wish to study. The M–I currents form a complex interac-
tive system which at present is only partially understood, see Kamide et al. (1998)
and Daglis et al. (2003). The magnetometer records contain intertwined signatures
of many currents, and an effort has been under way to deconvolute the signatures
of various currents. So far this has been done by preprocessing records from every
individual station, and then combining the filtered signals from stations at the same
magnetic latitude (e.g. equatorial stations, or auroral stations), see Jach et al. (2006)
for a recent example of such an approach. Better understanding of the M–I sys-
tem can however be obtained only by modeling interactions between the various
currents.
It is believed, see e.g. Rostoker (2000), that the auroral currents may have an
indirect impact on the equatorial and mid-latitude currents. The question of interest
is whether the auroral geomagnetic activity reflected in the high–latitude curves has
an effect on the processes in the equatorial belt reflected by the mid– and low–
latitude curves. This question is of particular interest for days during which a high–
latitude activity known as a substorm occurs. Its most spectacular manifestation are
the Norther Lights caused by high–energy electrojets flowing for a few hours in
the auroral belt. The top left panel of Figure 9.1 shows a signature of a substorm.
It is believed that there is energy transfer between the auroral electrojets and lower
latitude currents, but the direct physical mechanisms which might be responsible for
this interaction are a matter of debate. The question can be cast into the setting of
the functional linear model (8.1) in which the Xn are centered high–latitude records
and Yn are centered mid– or low–latitude records. This postulates an approximate
statistical model for the data and allows us to the the null hypothesis D 0. If the
null is true, we conclude that the high–latitude curves Xn have no linear effect on
the lower latitude curves. If the null is rejected, this indicates the existence of an
effect, which can be approximately linear (in the functional sense).
Detailed description of the data. We analyze one-minute averages of the horizon-
tal intensity of the magnetic field from four sets of stations given in Table 9.1. Only
one high–latitude station is used because substorms last for a few hours at night local
time, and we want to study their effect as the longitudal distance increases. The mid–
and low–latitude observatories are roughly aligned along the same longitude. The
functional data consists of daily curves in UT (Universal Time), with 1440 observa-
tions per curve. Figure 9.1 provides examples of such curves.
Several types of data sets are analyzed. The first set consists of all days with
substorms from January until August, 2001 (set A). Then, the same analysis is per-
formed on the so called medium strength substorms (defined by the dynamic range
of 400-700 nT) during the same period (set B). Substorms often occur during much
larger disturbances known as geomagnetic storms. In order to eliminate possible
confounding effects of storms, we removed all days n such that a storm was tak-
ing place on days n 1, n, or n C 1 (set A ). We also removed such days from
the list of medium strength substorms (set B ). To eliminate the possibility of con-
founding by next day’s storm, we also considered only isolated substorm days, i.e.
substorm days followed by at least two days without any substorms (set I). Finally,
to provide an additional validation of our findings, we select the substorms that took
place during three month periods: January – March (A1), March – May (A2), and
June – August (A3). The main reason of performing a separate analysis on medium
strength substorms is that very strong substorms can be viewed as outliers and may
distort the overall pattern of dependence. They are also typically generated by a dif-
ferent physical mechanism than medium strength substorms: the strong substorms
are connected to the instability associated with the release of energy stored in the
magnetosphere, and the medium substorms are associated with the direct pushing of
the enhanced solar wind. Comparing the substorms over three month periods guards
against the violation of the assumption of iid observations. Due to the annual rota-
tion of the Earth, the locations of the stations relative to the Sun change over time.
Hence, the substorms that happened long time apart might follow different statisti-
cal distributions. There were 101 substorm days from January until August during
2001, 81 substorm of which did not have any storms around; 41 substorms were
medium strength, 35 medium strength substorms after removing the ones close to
the storms; 43 isolated substorms occurred during 2001. We observed 40 substorm
days from January until March, 42 – from March until May, and 42 – from June
until August.
Details of test application and interpretation. In order to perform the test, the
minute-by-minute data were converted into functional objects in R using B–spline
basis with 149 basis functions. The number of basis functions is not crucial, the
only requirement being that the smoothed curves should look almost identical to the
original, while some noise can be eliminated.
In order to ensure that the test gives reliable results, the approximate validity of
the functional linear model must be checked. For this purpose, a technique intro-
duced by Chiou and Müller (2007), which relies on a visual examination of scatter
plots of scores, can be used. If the model is valid, score plots are roughly football–
shaped. When the dependence is not linear, these plots exhibit different patterns. The
number of plots is pq, where p and q are as in Section 9.2. They show the interac-
tion of the kth PC of the Xn (k D 1; : : : ; p) and j th PC of the Yn (j D 1; : : : ; q). To
illustrate this technique, consider a non-linear model: Yn .t/ D H2 .Xn .t// C "n .t/,
where H2 .x/ D x 2 1 is the Hermite polynomial of rank 2. For this model, the
156 9 Test for lack of effect in the functional linear model
2.0
2.0
1st observed FPC Score (Y)
1.5
1.0
1.0
0.5
0.5
0.0
0.0
−1.0
−1.0
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 −0.6 −0.4 −0.2 0.0 0.2
1st observed FPC Score (X) 2nd observed FPC Score (X)
0.4
0.4
2nd observed FPC Score (Y)
0.2
0.0
0.0
−0.2
−0.2
−0.4
−0.4
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 −0.6 −0.4 −0.2 0.0 0.2
1st observed FPC Score (X) 2nd observed FPC Score (X)
Fig. 9.4 Functional predictor-response plots of FPC scores of response functions versus FPC
scores of predictor functions for Yn .t/ D H2 .Xn .t// C "n .t/, where H2 .x/ D x 2 1,
n D 1; : : : ; 40.
plot in the top left corner of Figure 9.4 exhibits a quadratic trend. For the functional
linear model to be valid all these plots should be “pattern–free”. Figure 9.5 shows
examples of these plots for magnetometer data. We used CMO medium strength
substorm records as X , and THY with no lag – as Y . These scatter plots indicate
linear relationship with some outliers. Since we do not require Gaussianity, only
finite fourth moment, these outliers need not invalidate our conclusions. In case of
other pairs of functional data, the score plots look similar. We conclude that a linear
model is approximately appropriate for our application.
We now describe how to choose the most important FPC’s that will be used in
the test. One of the ways to pick them is to use the scree test, which is a graphi-
cal method first proposed by Cattell (1966). To apply the scree method one plots the
9.4 Application to magnetometer data 157
1000
1000
1st observed FPC Score (Y)
−1000 0
−3000
−3000
−5000
−5000
−10000 −5000 0 −6000 −2000 0 2000 6000
1st observed FPC Score (X) 2nd observed FPC Score (X)
1500
1500
2nd observed FPC Score (Y)
500
−500 0
−500 0
−1500
−1500
1st observed FPC Score (X) 2nd observed FPC Score (X)
Fig. 9.5 Functional predictor-response plots of FPC scores of response functions versus FPC of
explanatory functions for magnetometer data (CMO vs THY0)
successive eigenvalues, see Figure 9.6, and find the place where the smooth decrease
of eigenvalues appears to level off. To the right of this point one finds only “facto-
rial scree” (“scree” is a geological term referring to the debris which collects on the
lower part of a rocky slope). Table 9.2 provides the number of most important princi-
pal components and corresponding percentage of total variability explained by them
for all substorms that occurred from January until August, 2001. For other data sets
under consideration the general pattern is similar. One can also see from Figure 9.7
that each subsequent component picks up variation that declines in smoothness. For
example, the 10th principal components resemble random noise and explain a small
percentage of variability, that is why they are not included in the analysis.
When applying the test to magnetometer data, in most cases there is a clear rejec-
tion or acceptance for all combinations of the most important principal components.
In those cases, we can either reject “1” or fail to reject “0” the null hypothesis
158 9 Test for lack of effect in the functional linear model
CMO HON
2500000
8e+06
1500000
Eigenvalues
Eigenvalues
4e+06
500000
0e+00
0
2 4 6 8 10 12 14 2 4 6 8 10
PC PC
Fig. 9.6 Eigenvalues for different principal components of the substorm days that occurred from
March until May, 2001, from College(CMO), Honolulu (HON) stations. The black diamond
denotes the number of most important principal components selected by the scree test.
Table 9.2 Number of principal components retained by the scree test, and percentage of total
variability explained, during substorm days that occurred from January until August, 2001.
Stations PC % Stations PC % Stations PC % Stations PC %
CMO 10 82.52
BOU0 5 91.36 FRD0 4 90.83 THY0 5 92.17 MMB0 4 92.30
BOU1 4 86.40 FRD1 4 89.55 THY1 5 89.49 MMB1 4 91.01
BOU2 4 91.17 FRD2 4 92.32 THY2 4 91.57 MMB2 4 94.59
BOU3 4 91.74 FRD3 4 92.68 THY3 4 91.51 MMB3 4 95.60
HON0 4 96.56 SJG0 5 97.08 HER0 4 95.07 KAK0 4 94.33
HON1 3 94.91 SJG1 4 94.57 HER1 4 94.31 KAK1 4 93.80
HON2 4 97.44 SJG2 3 92.73 HER2 4 95.89 KAK2 4 96.39
HON3 4 97.79 SJG3 4 96.42 HER3 4 95.53 KAK3 3 94.66
with a reasonable confidence. We use the nominal 95% confidence level in this Sec-
tion. However, there are some cases when it is not clear what conclusion to draw.
We denote such cases “1?” – inclined toward rejecting the null hypothesis, “0?”–
inclined toward failing to reject the null, “1?0?”– inconclusive. Figure 9.8 gives
examples of such cases. We plot rejection regions up to the number of important
principal components. Grey areas mean that we reject H0 , white – fail to reject H0 .
The conclusion is clear when all, or almost all, rectangles are of the same color.
We can then conclude that X has an effect on Y (all grey) or there is no effect (all
white). Left panel of Figure 9.8 gives an example when it is not clear what to con-
clude. However, based on our previous experience we are most likely to reject the
9.4 Application to magnetometer data 159
0.030
0.04
0.00
0.015
0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400
0.06
0.05
−0.02 0.02
−0.05
0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400
0.04
0.02
−0.02
−0.06
0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400
0.06
0.00
0.00
−0.06
−0.08
0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400
Fig. 9.7 EFPC’s of the substorm days that occurred from January until August, 2001, from Col-
lege(CMO) and Honolulu (HON) stations.
1? 0? 1?0?
4
4
3
3
q
q
2
2
1
2 4 6 8 10 2 4 6 8 10 2 4 6 8 10
p p p
Fig. 9.8 Examples of rejection/acceptance plots at 5% level which are difficult to interpret. Grey
area – reject H0 , white – fail to reject H0 .
160 9 Test for lack of effect in the functional linear model
null hypothesis. In the case shown in the middle panel, the conclusion is also not
clear, but we lean toward accepting the independence of X and Y . Finally, the right
panel presents an example where it is rather unclear what to conclude.
Results and conclusions. We now discuss the results of the application of the test.
We consider high-latitude records from College, Alaska, (CMO) as X , and let Y
be the observations from all eight mid- and low-latitude stations during the same
UT time as the CMO data. We also analyze responses one, two and three days after
substorms were recorded at the CMO station. Such a setting should allow us to see
if there is a longitudinal effect of substorms; how long this effect, if any, lasts; and
what the global influence of a substorm is.
Column A in Table 9.3 presents the test results for all the substorms that occurred
from January to August. We see that the effect of substorms observed at CMO is
statistically significant at all mid- and low- latitude stations at the same UT (e.g.
BOU0, HON0). This is true for one-day lag data as well (e.g. BOU1, HON1), but
for the lag of two days the results are inconclusive. We conclude that the effect
of substorms observed at CMO persists for about 48 hours, at all longitudes and
latitudes. In the column labeled A* we provide the test results for the set of the
substorms where none of the events occurred close to storms. As one can see, the
results are similar to the ones in column A. This means that the observed effect is
not attributable to an impact of storms on high-latitude currents. We also analyzed
the effects of isolated substorms, i.e. there were at least 2 quiet days after such
substorms (see column I in Table 9.3). As one can see, there is significant linear
dependence between records observed at high latitude and mid-, low-latitude during
substorm days, as well as the next day. This means that the next day effect cannot
be attributed to the confounding effect of substorms on consecutive days. Next, we
analyze the effect of medium strength substorms. Table 9.3, column B, presents
the test results. We can see that the medium strength substorm effect is weaker
than in case of all substorms. The effect of medium strength substorms appears
significant on the same day, but on the following days is absent. It fades out faster for
further longitudes. We draw the same conclusion from column B* which includes
test results on the medium strength substorms that were not effected by the storm
activity. Table 9.4 gives the results for the three sets of substorms in three month
periods. In column A1 the results for the substorm days from January to March,
2001 are presented. The conclusions are similar to the ones we got for all substorms
from January until August (see Table 9.3, column A). The dependence seems to last
for two days. We come to the same conclusion dealing with the other two sets of the
substorms, the ones that occurred in Spring and Summer 2001 (see columns A1 and
A2 of Table 9.4), the second day dependence being weaker in summer. This agrees
with the earlier analysis, as there are fewer strong substorms in summer months.
We conclude that there is a pattern that suggests that there is a dependence
between high- and mid-, and high- and low-latitude records with no and one day
lag. There is no significant dependence for data with two- and three-day lags.
We conclude this section with a discussion of the physical meaning of our find-
ings. The ground magnetic effects of a localized auroral current system in the
Table 9.3 Results of the test for all substorm days (A), substorm days excluding days around the day with a storm (A*); medium strength substorms (B), medium
strength substorms excluding storm days (B*) that occurred from January to August, 2001; (I) isolated substorms that occurred from January to December, 2001.
Mid-latitude
A A* I B B* A A* I B B* A A* I B B* A A* I B B*
BOU0 BOU1 BOU2 BOU3
1 1 1 1? 1? 1 1 1 0 1? 1?0? 1?0? 0 0 0 0 1?0? 0 0? 1?0?
FRD0 FRD1 FRD2 FRD3
1 1 1 1? 1?0? 1 1 1 0 0? 0? 0? 0 0 0 0 0? 0 0? 1?0?
9.4 Application to magnetometer data
Low-latitude
A A* I B B* A A* I B B* A A* I B B* A A* I B B*
HON0 HON1 HON2 HON3
1 1 1 1? 1? 1 1 1 1?0? 1? 1?0? 0? 0 0 0 0? 0? 0 0? 1?0?
SJG0 SJG1 SJG2 SJG3
1 1 1 1? 1? 1 1 1 0 0? 0? 0 0 0 0 0 0 0 0? 1?0?
HER0 HER1 HER2 HER3
1 1 1 0? 1? 1 1 1 0? 0? 1?0? 1?0? 0 0 0? 0 0? 0 0? 1?0?
KAK0 KAK1 KAK2 KAK3
1 1 1 1? 1? 1 1 1 1? 1? 1? 1? 1?0? 0 0 0 0? 0 0? 1?0?
161
162 9 Test for lack of effect in the functional linear model
Table 9.4 Results of the test for substorm days that occurred in 2001 from January to March (A1),
March to May (A2), June to August (A3).
Mid-latitude
A1 A2 A3 A1 A2 A3 A1 A2 A3 A1 A2 A3
BOU0 BOU1 BOU2 BOU3
1 1 1 1 1 1?0? 0 0 0 0 0 0
FRD0 FRD1 FRD2 FRD3
1 1 1 1 1 1?0? 0 0 0 0 0 0
THY0 THY1 THY2 THY3
1 1 1 1 1 1 0 0 0 0 0 0
MMB0 MMB1 MMB2 MMB3
1 1 1 1 1 1?0? 1?0? 0 0? 0? 0 0
Low-latitude
A1 A2 A3 A1 A2 A3 A1 A2 A3 A1 A2 A3
HON0 HON1 HON2 HON3
1 1 1 1 1 1?0? 0? 0 0 0 0 0
SJG0 SJG1 SJG2 SJG3
1 1 1 1 1 1? 0 0 0 0 0 0
HER0 HER1 HER2 HER3
1 1 1 1 1 1 0 0 0 0 0 0
KAK0 KAK1 KAK2 KAK3
1 1 1 1 1 1 1?0? 0 0 0? 0 0
ionosphere normally become insignificant for a location at the Earth’s surface 400-
500 km away from the center of the current system. Therefore the substorm auroral
currents in the ionosphere would not be expected to have significant direct effects
on the H–component measurements at mid–latitudes and most certainly not at low
(equatorial) latitudes. The influence is likely not directly from the auroral electro-
jects, but the full current curcuit in the M-I system that drives the auroral electoro-
jects during substorms. Conceptually, this would not be entirely unexpected. How-
ever, what is unexpected is that on subsequent day, after a 24 hour lag, the mid– and
low–latitude field is still affected by prior day’s substorm activity defined by high–
latitude magnetic fields. The result is dependent on the strength of the substorms,
i.e. only the effect of strong substorms extends to low latitudes on the second day.
The interpretation of this result is not readily apparent. These statistical findings
may imply some physical connections between the substorm electrodynamics and
the physical processes in other regions of the M-I system that we are not aware of
at the present time.
Assumption 9.1. The triples .Yn ; Xn ; "n / form a sequence of independent identi-
cally distributed random elements such that "n is independent of Xn and
The next assumption extends condition (2.12) to both response and explanatory
variables.
Assumption 9.2. The eigenvalues of the operators C and satisfy, for some p > 0
and q > 0,
1 > 2 > : : : p > pC1 ; 1 > 2 > ::: q > qC1 : (9.7)
Under these assumptions, we can quantify the behavior of the test under H0 and
HA .
d
Theorem 9.1. Under H0 and Assumptions 9.1 and 9.2, TON .p; q/ ! 2pq ; as N !
1.
˝ ˛
Theorem 9.2. If Assumptions 9.1 and 9.2 hold, and .vk /; uj ¤ 0 for some k
P
p and j q, then TON .p; q/ ! 1; as N ! 1.
Jiofack and Nkiet (2010) showed that statistic (9.3) can be used outside the con-
text of the functional linear model to test the independence of two functional sam-
ples. The null hupothesis is then formulated in the following assumption.
Assumption 9.3. The pairs .Yn ; Xn / form a sequence of mean zero independent
identically distributed random elements such that Yn is independent of Xn and
EkXn k4 < 1; EkYn k4 < 1:
They proved the following result together with an analog of Theorem 9.2.
d
Theorem 9.3. If Assumptions 9.2 and 9.3 hold, then TON .p; q/ ! 2pq ; as N ! 1.
Jiofack and Nkiet (2010) also showed that statistic (9.3) can be used to test a
broader null hypothesis which corresponds to a lack of correlation rather than inde-
pendence. In the functional context, zero correlation can be defined by the condition
D 0, which means that for any x; y 2 L2 , EŒhX; xi hY; yi D 0. To formulate
the null hypothesis in this context, we introduce the following assumption.
Assumption 9.4. The pairs .Yn ; Xn / form a sequence of mean zero independent
identically distributed random elements such that EkXn k4 < 1; EkYn k4 < 1:
The operator is equal to zero.
164 9 Test for lack of effect in the functional linear model
If only D 0 is assumed, rather than the independence of the two samples, then
the statistic TON .p; q/ no longer converges to the chi–squared distribution. This is
essentially because without assuming independence, the fourth order moments
˝ ˛
ijk` D EŒhX; vi i hX; vk i Y; uj hy; u` i
where ˝ denotes the Kronecker product. For the properties of the Kronecker product
we refer to Chapter 4 of Horn and Johnson (1991). For example, if p D q D 2, then
2 3
1 1 0 0 0
6 0 1 2 0 0 7
HD6 4 0
7:
0 2 1 0 5
0 0 0 2 2
With this notation in place, we state the following result.
d
Theorem 9.4. If Assumptions 9.2 and 9.4 hold, then TON .p; q/ ! GT H1 G, where
G is a mean zero Gaussian vector in Rpq with covariance matrix K, and H is
defined by (9.9).
d
Note that if X is independent of Y , then K D H, and so GT H1 G D 2pq , see
e.g. Theorem 2.9 of Seber and Lee (2003). Jiofack and Nkiet (2010) explain how to
implement the test for a general matrix K.
The test statistics (9.3) does not change if we replace vOk by cOk vO k and uO j by cOj uO j ,
see Section 2.5. To lighten the notation in this section, we therefore write vO k in place
of cOk vOk and uO j in place of cOj uO j .
Proof of Theorem 9.1. Theorem 9.1 follows from Corollary 9.1, which is arrived
at through a series of simple lemmas. Lemma 9.1 shows that the 2 limit holds
for the population eigenelements. The remaining lemmas show that the differences
between the empirical and population eigenelements have asymptotically negligible
effect.
9.6 Proofs of Theorems 9.1 and 9.2 165
p nD E o
b k /; uj ; 1 j q; 1 k p
N .v
d
q (9.10)
! kj k j ; 1 j q; 1 k p ;
with kj N.0; 1/. Moreover, kj and k 0 j 0 are independent if .k; j / ¤ .k 0 ; j 0 /.
p D E P ˝ ˛
Proof. Under H0 , N .v b k /; uj D N 1=2 N nD1 hXn ; vk i "n ; uj : The sum-
mands have mean zero and variance k j , so (9.10) follows.
0 0
To verify that
p D
kj and Ek 0 j 0 are independent
p D Eif .k; j / ¤ .k ; j /, it suffices to
show that N .v b k /; uj and N .v b k 0 /; uj 0 are uncorrelated. Observe that
hp D E p D Ei
E b k /; uj ; N .v
N .v b k 0 /; uj 0
"N #
1 X ˝ ˛X
N
˝ ˛
D E hXn ; vk i "n ; uj hXn0 ; vk 0 i "n0 ; uj 0
N 0
nD1 n D1
1 X
N
˝ ˛˝ ˛
D E ŒhXn ; vk i hXn0 ; vk 0 i E "n ; uj "n0 ; uj 0
N 0 n;n D1
1 X
N
˝ ˛˝ ˛
D E ŒhXn ; vk i hXn ; vk 0 i E "n ; uj "n ; uj 0
N nD1
˝ ˛
D hC.vk /; vk 0 i uj ; uj 0 D k ıkk 0 j ıjj 0 : t
u
Recall that
Pthe Hilbert–Schmidt norm of a Hilbert–Schmidt operator S is defined
by kS k2S D 1 j D1 kS.ej /k , where fe1 ; e2 ; : : :g is any orthonormal basis, and that
2
b 2S D N 1 EkX k2 Ek"1 k2 :
Ekk
X
N
˝ ˛˝ ˛
b j /k2 D N 2
k.e Xn ; ej Xn0 ; ej hYn ; Yn0 i :
n;n0 D1
166 9 Test for lack of effect in the functional linear model
Therefore, under H0 ,
1 X
X N
˝ ˛˝ ˛
b 2S D N 2
Ekk E Xn ; ej Xn0 ; ej h"n ; "n0 i
j D1 n;n0 D1
1 X
X N
˝ ˛2
D N 2 E Xn ; ej Ek"n k2
j D1 nD1
1
X
1
˝ ˛2
DN Ek"1 k X; ej D N 1 Ek"1 k2 EkX k2 :
2
t
u
j D1
d
q (9.11)
! kj k j ; 1 j q; 1 k p
and
p D E P
N .b vO k vk /; uO j ! 0: (9.14)
p
To verify (9.13), note that by (2.13), N .uO j uj / D OP .1/, and by Lemma
b k k Ekk
9.2, Ekv b S D O.N 1=2 /: Thus (9.13) follows from Lemma 7.3.
To use the same argument for (9.14) (with (2.13)), we note that
p D E p ˝ ˛
N . b vO k vk /; uO j D N vO k vk ; .
Q uO j / ;
PN
Q
where .x/ D N 1 hYn ; xi Xn : Lemma 9.2 shows that under H0 ,
nD1
Q S D Ekk
Ekk b S: t
u
P P
By (2.13), O k ! k and Oj ! j, so we obtain
Corollary 9.1. Under the assumptions of Theorem 9.1,
p n 1=2 1=2 D E o
N O k Oj b vO k /; uO j ; 1 j q; 1 k p
.
(9.15)
d ˚
! kj ; 1 j q; 1 k p ;
with kj equal to those in Lemma 9.1.
9.6 Proofs of Theorems 9.1 and 9.2 167
X
p X
q D E2
SON .p; q/ D O 1
k O 1 b
j . O
v k /; O
u j :
kD1 j D1
P
By Lemma 9.6 and (2.13), SON .p; q/ ! S.p; q/ > 0. Hence TON .p; q/ D
P
N SON .p; q/ ! 1:
To establish Lemma 9.6, it is convenient to split the argument into two simple
lemmas: Lemma 9.4 and Lemma 9.5.
b EkY k2 :
Lemma 9.4. If Yn ; n 1; are identically distributed, then Ekk
X
N X
N
b N 1
kuk j hYn ; ui jkYn k N 1 kYn k2 :
nD1 nD1
Proof. The result follows from the Law of Large Numbers after noting that
D E 1 X
N
b
.v/; u D hXn ; vi hYn ; ui
N nD1
and
E ŒhXn ; vi hYn ; ui D E ŒhhXn ; vi Yn ; ui D h.v/; ui : t
u
D E P
Lemma 9.6. If Assumptions 9.1 and 9.2 hold, then . b vO k /; uO j !
˝ ˛
.vk /; uj ; j q; k p:
and D E P
b vO k / .v
. b k /; uO j ! 0:
These relations follow from Lemma 7.3, relations (2.13) and Lemma 9.4. t
u
Chapter 10
Two sample inference for regression kernels
In Chapter 5, we studied two sample procedures for the mean function and the
covariance operator. This chapter is devoted to testing the equality of the regres-
sion operators in two functional linear models. We are concerned with the following
problem: We observe two samples: sample 1: .Xi ; Yi /; 1 i N; and sample
2: .Xj ; Yj /; 1 j M . The explanatory variables Xi and Xj are functions,
whereas the responses Yi and Yj can be either functions or scalars (the Yi and Yj
are either both functions, or both scalars). We model the dependence of the Yi (Yj )
on the Xi (Xj ) by the functional regression models
where and are linear operators whose domain is a function space, and which
take values either in the same function space or in the real line. We wish to test if
the operators and are equal.
In Section 10.1, we provide motivation and background for the methodology
developed in this chapter. The testing procedures are derived in Sections 10.2 and
10.3, respectively, for scalar and functional responses. As with the usual two sample
tests for the equality of means, we make a distinction between the simpler case of
“equal variances” and the more complex case of “unequal variances”. We thus have
four testing procedures, which are summarized in Section 10.4. A reader interested
only the description of the test can start with Section 10.4, and refer to Sections 10.2
and 10.3 for further details, as needed. Section 10.5 presents the results of a small
simulation study. Applications to medfly and magnetometer data are presented in
Section 10.6. Asymptotic results and their proofs are collected in Section 10.7. This
chapter is based on the paper of Horváth et al. (2009).
We begin this section with a motivating example, which is continued in Section 10.6.
L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, 169
Springer Series in Statistics 200, DOI 10.1007/978-1-4614-3655-3_10,
© Springer Science+Business Media New York 2012
170 10 Two sample inference for regression kernels
25
20
20
15
10
10
5
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Fig. 10.1 Ten randomly selected smoothed egg–laying curves of short-lived medflies (left panel),
and ten such curves for long–lived medfies (right panel).
10.1 Motivation and introduction 171
Section 9.4, in which the responses are graphed against the scores of the initial func-
tional principal components. All graphs show nice elliptical shapes. In Section 10.6,
we apply the test derived in Section 10.2 to check if it is reasonable to assume that
D .
Like classical two sample procedures in various forms, the tests of this chapter
are likely to be applicable to a wide range of problems, where estimating two signif-
icantly different functional linear regressions on subsamples of a larger sample may
reveal additional features. In our setting, the role of regression parameter vectors (or
matrices) is played by integral operators acting on a function space. The complexity
of the test statistics increases as we move from scalar to functional responses and
relax assumptions on the covariance structure of the regressors. Even in the multi-
variate setting, except for comparing mean responses, the problem of comparing the
regression coefficients for two models based on two different samples is not trivial,
and we could not find a ready reference for it.
In the remainder of this chapter, we do not deal with the errors caused by replac-
ing the FPC’s by the EFPC’s: the test statistics do not depend on the signs of the
EFPC’s, and the OP .N 1=2 / distances can be handled by the application of Theo-
rem 2.7. The formulas appearing in this chapter are rather complex, and developing
arguments analogous to those in Section 5.4 and other chapters would take up too
much space and obscure the main ideas. To obtain computable test statistics, we
also neglect terms arising from EFPC’s with small eigenvalues. These terms are not
asymptotically negligible, but they are practically negligible, as established by the
simulations presented at the end of Section 10.3.
In the remainder of this chapter, we assume that the mean functions and the
means of the responses have been subtracted, and so we consider the scalar response
model
Z 1
Yi D .s/Xi .s/ds C "i (10.1)
0
against
HA W k k ¤ 0;
where the norm is in L2 .Œ0; 1/ for model (10.1) and in L2 .Œ0; 1 Œ0; 1/ for
model (10.2).
172 10 Two sample inference for regression kernels
In this Section, and Section 10.3, we refer to theorems stated and proven in Sec-
tion 10.7. To understand the procedures, it is however not necessary to study these
results, which provide asymptotic justification for the claims we make. To develop
a meaningful asymptotic theory, and to ensure that the tests perform well in finite
samples, we assume that the two samples sizes are of comparable size. Asymptoti-
cally, we postulate that there exists a constant 0 < < 1 such that
N=M ! ; N ! 1: (10.3)
Suppose vi ; i 1; form a basis in L2 .Œ0; 1/. Since we now deal with two sam-
ples, we may choose two different bases or one common basis. This choice will also
depend on what we assume about the variances of the regressors and the errors in
the two samples. To focus attention, it is initially convenient to think that the vi are
the FPC’s of the regressors Xi . P1
Since 2 L2 .Œ0; 1/, we can expand it as .s/ D i D1 i vi .s/; where
Pi 1 D h ; vi i. Consequently, the response variables can be expressed as Yi D
kD1 k hX i ; vk i C " i : We truncate the above expansion at 1 p < 1, and
combine the error made by the truncation with the "i . The response is thus given by
X
p 1
X
Yi D k hXi ; vk i C "0i ; "0i D "i C k hXi ; vk i : (10.4)
kD1 kDpC1
Y D X C "0 ; (10.5)
and
var."1 / D var."1 /: (10.8)
10.2 Derivation of the test for scalar responses 173
The common covariance operator of the Xi and the Xj is denoted by C and its
eigenelements by vi ; i , as in Section 2.5. Under these assumptions, we introduce
the random variable
p D N.1 C /1 . O /T ˙ 1
O p . O /;
O (10.9)
i ¤ j:
d
By Theorem 10.2, p ! 2 .p/; as N ! 1; i.e. p defined by (10.9), con-
verges to a chi-square random variable with p degrees of freedom. We therefore
propose the following test statistic when the covariances are equal
O p D N.1 C N=M /1 . O /T . Ȯ p /1 .
O O /;
O (10.12)
and where O is the residual standard deviation from the estimated regression model
defined analogously to (10.5), but with both samples pooled together. Thus, the
estimates O ; O 1 ; : : : ; O p are all computed using the pooled sample.
In many applications, the covariance kernels c.s; t/ and c .s; t/ are not necessar-
ily equal. Since the two kernels have different eigenfunctions, we now consider an
arbitrary basis fwi g of L2 .Œ0; 1/. Good choices for the wi are discussed in Section
10.4. The kernels and are expanded as
1
X 1
X
.s/ D i wi .s/; .s/ D j wj .s/; (10.13)
i D1 j D1
and so
1
X 1
X ˝ ˛
Yi D k hXi ; wk i C "i ; Yj D k Xj ; wk C "j :
kD1 kD1
174 10 Two sample inference for regression kernels
Y D X C "0 ; Y D X C "0 ;
with all terms analogously defined with respect to our new basis. While this appears
similar to our prior calculations, we are expanding with respect to an arbitrary basis
which means that X and "0 are now potentially correlated. The least squares estima-
tors take, however, the same form
O D .XT X/1 XT Y;
O D .XT X /1 XT Y :
Thus we can once again compare O to test the null hypothesis. To analyze
O and
the asymptotic behavior of these estimates we consider the relation
XT "0 D A C B C N m;
where
2 3
X
N
6 "i hXi ; w1 i 7
6 7
6 i D1 7
6 :: 7
6
AD6 7;
: 7
6 N 7
6X ˝ ˛7
4 "i Xi ; wp 5
i Dp
2 1
3
X
N X
6 k .hXi ; w1 i hXi ; wk i EŒhX1 ; w1 i hX1 ; wk i/ 7
6 7
6 i D1 kDpC1 7
6 7
6
BD6 :
:: 7;
7
6 N 7
6X X 1
˝ ˛ ˝ ˛ 7
4 X ; w hX ; w i EŒ X ; w hX ; w i 5
k i p i k 1 p 1 k
i D1 kDpC1
have mean zero and are uncorrelated since the error terms are independent of the
explanatory functions. The term m represents the bias introduced by using an arbi-
trary basis which is given by
2 X 1 3
6 k EŒhX 1 ; w1 i hX 1 ; wk i 7
6 kDpC1 7
6 7
6 :: 7
mD6 : 7:
6 1 7
6 X ˝ ˛ 7
4 k EŒ X1 ; wp hX1 ; wk i 5
kDpC1
10.2 Derivation of the test for scalar responses 175
Clearly A and B are sums of iid random vectors with means zero and finite covari-
ance matrices due to Assumptions 10.1 and 10.2. Thus by the multivariate central
limit theorem N 1=2 .A B/T is asymptotically normal. We have by the strong law
of large numbers that
X
N
˝ ˛ a:s: ˝ ˛
N 1 Xi ; wj hXi ; wk i ! E X1 ; wj hX1 ; wk i ;
i D1
X
N
˝ ˛ ˝ ˛
2
E "k hwi ; Xk i "k wj ; Xk D N E hwi ; X1 i wj ; X1 ;
kD1
and therefore
2
cov.A/ D N ˙ 1:
Turning to B, the .i; j / entry of its covariance matrix is given by
1
X 1
X ˚
N k r E .hX1 ; wi i hX1 ; wk i EŒhX1 ; wi i hX1 ; wk i/
kDpC1 rDpC1
˝ ˛ ˝ ˛
X1 ; wj hX1 ; wr i EŒ X1 ; wj hX1 ; wr i :
where C D 2 ˙ 1 1
1 C ˙ 1 ˙ 2˙ 1 .
1
with all terms analogously defined. Using (10.3), we therefore conclude that
d
N 1=2 . O .N.XT X/1 m M.XT X /1 m // ! N.0; C C C /:
O
176 10 Two sample inference for regression kernels
where O 2 and O 2 are residual standard deviations from the regression models for
the first and second sample respectively. The matrix ˙ 1 is now estimated with
Ȯ 1 D N 1 XT X;
with Ȯ 1 defined analogously.
The distributions of statistics (10.12) and (10.14) are approximated by the chi-
square distribution with p degrees of freedom. If p is large (in terms of the percent-
age of variance explained), then all neglected terms are close to 0.
Turning to model (10.2), we note that now it is also necessary to choose bases to
project the Yi and the Yj onto. We can then use the results developed in the scalar
case.
We first focus on the case of equal variances defined by assumptions (10.7) and,
in place of (10.8), by
X
p X
r
O .t; s/ D O k;l uO l .t/vO k .s/;
kD1 lD1
As in the scalar case, we combine any errors made by our approximations with the
error of the model, so we also introduce the matrix
1
X
˝ ˛
"0 .i; j / D "i ; uj C X.i; k/.k; j /; i D 1; : : : ; N; j D 1; : : : ; r:
kDpC1
which implies
Y D X C "0 : (10.16)
1
The corresponding least squares estimator O D .X X/ X Y consistently esti-
T T
mates the matrix . This follows immediately by applying Theorem 10.1 to each
column of . O Asymptotic normality follows from Theorem 10.4.
Since is now a matrix, the task of constructing a quadratic form leading to a
test statistic is somewhat painful notationally. We start by writing as a column
vector of length pr:
Tv D vec./T
D ..1; 1/; .2; 1/; : : : ; .p; 1/; .1; 2/; : : : ; .p 1; r/; .p; r//:
where Ȯ " is the pooled sample covariance matrix of the residuals and O D
diag.O 1 ; O 2 ; : : : ; O p /; with the O i being the eigenvalues of the empirical covariance
operator of the pooled Xi and Xj .
We finally turn to the most complex case of different covariances for the explana-
tory functions. We now expand both the explanatory and response functions with
respect to two arbitrary, potentially different, bases in L2 Œ0; 1, fui g and fwj g,
respectively:
X 1
1 X 1 X
X 1
.t; s/ D j i ui .t/wj .s/; .t; s/ D j i ui .t/wj .s/:
i D1 j D1 i D1 j D1
where the residual covariance matrices ˙ " and ˙ " are computed for each sample
1 1
separately. The estimate Ȯ 1 is given by N 1 XT X, and Ȯ 1 is defined analo-
gously.
The distribution of statistics (10.17) and (10.18) is approximated by the chi-
square distribution with pr degrees of freedom. Selection of p and r is discussed in
Section 10.4. If p and/or r are large, the normalized 2 distribution can be approx-
imated by a normal distribution, as in Cardot et al. (2003), who studied a single
scalar response model and tested D 0. In our case, due to the complexity of the
problem, the rigorous derivation of the normal convergence with p D pn depend-
ing on a sample size would be far more tedious, so it is not pursued. To perform a
test, a finite p (and r) must be chosen no matter what approximation is used, and
as illustrated in Section 10.6 large p (and r) do not necessarily lead to meaningful
results.
In order to apply the tests, we must first verify if a linear functional model approxi-
mates the dependence structure of the data reasonably well. This can be done using
10.4 Summary of the testing procedures 179
the techniques of Chiou and Müller (2007) described in Section 9.4. The assump-
tions of independence and identical distribution of the regressor curves can be ver-
ified using the test of Chapter 7. Checking the independence of the errors is more
complicated because they are not observable; it is studied in Chapter 11. Before
applying the tests, the regressors and the responses must be centered, so that their
sample means are zero.
Next, the values of p and r must be chosen. In applications in which the FPC’s
have a clear interpretation, these values can be chosen so that the action of the
operators on specific subspaces spanned by the FPC’s of interest is compared. In
the absence of such an interpretation, several data driven approaches are avail-
able. When the covariances are approximately equal, typically p is chosen so large
P
that pkD1 O k exceeds a required percentage of the variance of the Xi (defined as
P R 2 PM R 2
.N CM /1 . N i D1 Xi .t/dt C j D1 Xi .t/dt/ for the centered functions). We
choose r analogously for the response functions. When the covariances cannot be
assumed equal then we propose, as one possibility, a pooling technique to choose p
and r. Pooling the explanatory functions we have
0 1
XN X
M
.N C M /1 @ Xi .s/Xi .t/ C Xj .s/Xj .t/A
i D1 j D1
a:s:
! .1 C 1=/1 c.s; t/ C .1 C /c .s; t/:
Response Covariances O DF
Scalar Equal (10.12) p
Scalar Different (10.14) p
Functional Equal (10.17) pr
Functional Different (10.18) pr
The term “equal covariances” refers to assumptions (10.7), (10.8) in the scalar
case, and (10.7), (10.15) in the functional case.
180 10 Two sample inference for regression kernels
Before turning to data examples, we present the results of a small simulation study.
We evaluate the performance of the test based on the most general statistic (10.18).
The test performs even better in the equal variances case (provided the simulated
data have equal variances). We consider the fully functional linear model with inte-
gral kernels of the form
.s; t/ D c minfs; tg .s; t/ D c minfs; tg;
where c and c are constants. We set N D M D 100, and use 5 EFPC’s for the
regressors variables, and 3 EFPC’s for the responses. The results are based on 100
replications.
We use standard Brownian motions as error terms, and consider the regressors of
the following four types:
(A) Standard Brownian motions in both samples (Gaussian processes, equal
covariances).
(B) For the first sample the explanatory functions are standard Brownian motions
and for the second sample they are Brownian bridges (Gaussian processes, different
covariances).
(C) For both sets of explanatory functions we use
bnt
Xc Ti
X.t/ D n1=2 p ;
kD1
var.T i /
Table 10.1 Empirical rejection rates for the test based on the most general statistic (10.18). From
top to bottom, scenarios A, B, C, D described in the text.
˛=.c; c / (0,0) (1,1) (1,0) (1.5,0) (2,0)
0.10 0.14 0.08 0.50 0.90 0.98
(A)
0.05 0.09 0.03 0.40 0.81 0.98
0.01 0.03 0.00 0.18 0.63 0.92
where fTi g are iid t-distributed random variables with 6 degrees of freedom and
n D 200 (heavy–tailed distribution, equal covariances).
(D) The first set of explanatory functions are defined as in (C). For the second set
we consider
X .t/ D X.t/ŒX.1/ X.t/;
We now illustrate the application of the test on two examples. The first example
is motivated by the work presented in Carey et al. (2002), Chiou et al. (2004),
Müller and Stadtmüller (2005), Chiou and Müller (2007), among others, and studies
egg-laying curves of Mediterranean fruit flies (medflies). The second example is an
application to the measurements of the magnetic field generated by near Earth space
currents.
Egg-laying curves of Mediterranean fruit flies (continued). We applied the test
of Section 10.2 (without assuming equal variances) to the medfly data introduced in
Section 10.1. Table 10.2 shows the P-values for the five initial FPC’s (p 5). The
P-values for larger p do not exceed half a percent. We cannot reject H0 W D if
we use the test with p D 1, but if p > 1, we reject H0 . To understand this result, we
must turn to formula (10.13). The test compares estimates of i to those of i for
i p. Acceptance of H0 for p D 1 means that the curves 1 w1 .s/ and 1 w1 .s/ are
not significantly different. Their estimates are shown in the left panel of Figure 10.2.
The functions wi were computed by pooling all explanatory curves, as explained in
Section 10.4. The estimated coefficients are O 1 D 49:64; O 1 D 46:60. By contrast,
the estimates O 2 D 15:45; O 2 D 29:88 are very different, and consequently the
curves O 2 w2 .s/ and O 2 w2 .s/ shown in the right panel of Figure 10.2 look different.
Table 10.2 The values of statistic (10.14) and the corresponding P-values for several values of
p D p.
p pp P-Value
1 1.3323 0.2484
2 11.3411 0.0034
3 10.6097 0.0140
4 23.8950 0.0001
5 33.1144 0.0000
182 10 Two sample inference for regression kernels
20
40
10
30
0
20
-10
10
-20
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
The approximations to and which use p D 2 FPC’s are thus sufficient to detect
the difference. They are shown in Figure 10.3.
Comparing the estimates O 2 and O 2 or the curves in Figures 10.2 and 10.3 gives
a strong hint that the kernels and cannot be assumed equal. Our tests allow us
to attach statistical significance statements to these conclusions.
Data from terrestrial magnetic observatories. We now apply our methodology
to magnetometer data. A comprehensive case study is not our goal, we would rather
like to illustrate the steps outlined in Section 10.4 in a practically relevant setting.
Broader space physics issues related to this example are explained in Kamide et al.
10.6 Application to medfly and magnetometer data 183
Fig. 10.4 Observations for sample A: left panel CMO (X), right panel HON (Y ).
(1998), while Chapters 9, 10, 13 of Kivelson and Russell (1997) provide a detailed
background.
A sample of 40 functional regressors and corresponding responses is shown in
Figure 10.4. Each curve in Figure 10.4 shows one minute averages in a UT (Uni-
versal Time) day of the component of the magnetic field lying in the Earth’s tangent
plane pointing toward the magnetic North. We thus have 1440 data points per curve.
Splitting the magnetometer data into days and treating the daily curves as functional
observations is natural because of the daily rotation of the Earth. The curves Xi
reflect ionospheric magnetic activity in the polar region known as substorms, which
are spectacularly manifested as the northern lights (aurora borealis). The curves Yi
reflect magnetospheric activity in the magnetic equatorial region in the same UT
day. We consider three samples: A, B, C. Each of them consists of about 40 pairs of
curves. All measurements were recorded in 2001, the Xi at College (CMO), Alaska;
the Yi at Honolulu (HON), Hawaii. Sample A contains substorms which took place
in January through March, B in April–June, C in July–September. Using the graph-
ical goodness–of–fit test of Chiou and Müller (2007), see Section 8.6, and the test
of Chapter 7, Kokoszka et al. (2008) verified that the fully functional linear model
is a reasonable approximation and that the functional observations can be assumed
to be uncorrelated. Moreover, on physical grounds, the data can be assumed to be
approximately independent because the M–I system resets itself after each rotation
of the Earth, and the effect of larger disturbances of solar origin decay within about
two days.
Intuitively, we would expect rejections of the null for all three pairs: A–B, B–
C, and A–C, as the position of the axis of the Earth relative to the Sun shifts with
each season, and substorms are influenced by the solar wind. This is indeed the
184 10 Two sample inference for regression kernels
Table 10.3 P–values for testing the equality of regression operators in samples A and B.
p=r 1 2 3 4 5 6 7 8 9 10
1 0.344 0.608 0.231 0.280 0.349 0.380 0.372 0.391 0.351 0.257
2 0.147 0.259 0.274 0.416 0.565 0.422 0.373 0.345 0.339 0.310
3 0.204 0.378 0.399 0.621 0.762 0.592 0.582 0.621 0.654 0.478
4 0.120 0.305 0.299 0.567 0.716 0.619 0.654 0.307 0.315 0.158
5 0.440 0.668 0.555 0.741 0.861 0.730 0.792 0.515 0.453 0.223
6 0.582 0.891 0.798 0.793 0.883 0.554 0.567 0.605 0.218 0.106
7 0.689 0.962 0.950 0.911 0.954 0.749 0.792 0.783 0.566 0.427
8 0.965 0.968 0.972 0.952 0.958 0.815 0.755 0.582 0.432 0.257
9 0.981 0.804 0.962 0.980 0.972 0.821 0.837 0.753 0.722 0.456
10 0.727 0.585 0.903 0.973 0.986 0.972 0.973 0.941 0.935 0.626
11 0.911 0.880 0.991 0.999 0.999 0.998 0.998 0.994 0.995 0.990
12 0.856 0.860 0.989 0.997 0.959 0.962 0.940 0.930 0.845 0.889
13 0.667 0.856 0.982 0.988 0.939 0.950 0.889 0.845 0.784 0.844
14 0.395 0.457 0.798 0.418 0.314 0.445 0.240 0.240 0.201 0.282
15 0.398 0.481 0.847 0.414 0.321 0.456 0.276 0.255 0.170 0.113
case for tests in cases B–C and A–C, for which the P–values are very small: for
B–C the largest P-value is 0.034, and for A–C 0.007 (for p 15; r 10). The
results for testing samples A and B presented in Table 10.3 indicate the acceptance
of H0 . In retrospect, this conclusion is supported by the observation, well–known in
the space–physics community, that M–I disturbances tend to be weaker in summer
months. Our test thus shows that it is reasonable to assume that the effect of sub-
storms on low–latitude currents is approximately the same in first and second quarter
of 2001, but changes in the third quarter (possibly due to weaker substorms).
We now list he assumptions under which the tests presented in this chapter are valid
and present selected asymptotic results. They focus on the simplest case of scalar
responses and equal variances, only Theorem 10.4 pertains to functional responses,
and is stated for illustration. The asymptotic techniques used in the scalar equal vari-
ances case can be extended to the other cases, but the notation becomes more com-
plex, as explained in Section 10.3. The results presented here do not follow from
the existing multivariate theory because the regression errors are not independent
and include projections on the “left over” FPC’s vpC1 ; vpC2 ; : : : ; urC1 ; urC2 ; : : :,
etc. Theorems 10.2 and 10.4 are of particular interest, as they state the exact asymp-
totic distribution of the LSE’s in a multivariate regression obtained by projecting a
functional regression.
We state the assumptions on the sample .Xi ; Yi /; 1 i N . The assumptions
on .Xi ; Yi /; 1 i M are the same. The two samples are assumed independent.
10.7 Asymptotic theory 185
Assumption 10.1. The observations fXn g are iid mean zero random functions in
L2 .Œ0; 1/ satisfying
Z 2
EkXn k4 D E Xn2 .t/dt < 1:
For the linear model with scalar responses, we formulate the following assump-
tion.
Assumption 10.2. The scalar responses Yi satisfy
Z
Yi D .s/Xi .s/ds C "i ;
with iid mean zero errors "i satisfying E"4i < 1; and 2 L2 .Œ0; 1/. The errors "i
and the regressors Xi are independent.
and 2 L2 .Œ0; 1 Œ0; 1/. The errors "i and the regressors Xi are independent.
Since the following simple lemma is used repeatedly in the proofs, it is stated
first for ease of reference.
Lemma 10.1. Suppose X is a mean zero random element of L2 satisfying EkX k2 <
1. Then ˝ ˛
EŒhvi ; X i vj ; X D i ıij ;
where ıij is Kronecker’s delta.
Theorem 10.1. Suppose Assumptions 10.1, 10.2 and condition (2.12) hold. Then,
1 T a:s:
O D XT X
X Y ! ; as N ! 1;
a:s:
where T D .1 ; : : : ; p / and ! refers to almost sure convergence.
Proof. To analyze the behavior of ,
O let us start by considering
X
N
˝ ˛
.XT X/.i; j / D hvi ; Xk i vj ; Xk :
kD1
186 10 Two sample inference for regression kernels
X
N
˝ ˛ X
N
˝ ˛X
p
˝ ˛ XN
˝ ˛
XT Y.i / D vi ; Xj Yj D vi ; Xj k vk ; Xj C "0j vi ; Xj :
j D1 j D1 kD1 j D1
Applying again the strong law of large numbers and Lemma 10.1again, we obtain,
as N ! 1,
X
N
˝ ˛X
p
˝ ˛ a:s: X
p
1
N vi ; Xj k vk ; Xj ! E k hvi ; X1 i hvk ; X1 i D i i ıij :
j D1 kD1 kD1
PN ˝ ˛ a:s:
Lastly, we will show that, as N ! 1, N 1 0
j D1 "j vi ; Xj ! 0: Recalling the
definition of "0i , (10.4), we have
X
N X
N X
N 1
X
˝ ˛ ˝ ˛ ˝ ˛˝ ˛
N 1 "0j vi ; Xj D N 1 "j vi ; Xj CN 1 k vk ; Xj vi ; Xj :
j D1 j D1 j D1 kDpC1
Since f"i g and fXi g are independent, by the strong law of large numbers and
Assumption 10.2
X
N
˝ ˛ a:s:
N 1 "j vi ; Xj ! 0:
j D1
X
N 1
X 1
X
˝ ˛˝ ˛ a:s:
N 1 k vk ; Xj vi ; Xj ! E k hvk ; X1 i hvi ; X1 i D 0: t
u
j D1 kDpC1 kDpC1
Theorem 10.2. Suppose Assumptions 10.1, 10.2 and condition (2.12) hold. Then,
as N ! 1,
p d
N .
O / ! N.0; ˙ p /;
where N.0; ˙ p / is a multivariate normal random vector with mean 0 and covari-
ance matrix ˙ p defined by (10.10) and (10.11).
By Assumption 10.2 the above is a summation of iid random variables. Since each
coordinate of N 1=2 XT "0 is given by such a sum, Assumption 10.2 implies that
XT "0 can be expressed as a sum of iid random vectors. We can apply the multivariate
central limit theorem to obtain the claimed multivariate normal limiting distribution
if we can show that each entry of the covariance matrix is finite. Therefore we spend
the rest of the proof deriving the form for ˙ p and showing that its entries are finite.
Using the definition of "0i , we obtain
X
N
˝ ˛
N 1=2 vi ; Xj "0j
j D1
0 1
X X 1
N
˝ ˛ N
˝ ˛ X ˝ ˛
DN 1=2 @ vi ; Xj "j C vi ; Xj k vk ; Xj A :
j D1 j D1 kDpC1
Because the fXj g are independent, both sums (with respect to j ) are sums of inde-
pendent and identically distributed random variables. Furthermore, since f"j g are
independent of all other terms, we also have that the two sums above are uncorre-
lated. Therefore it follows that
0 0 11
XN X
N X1
˝ ˛ ˝ ˛ ˝ ˛
var @N 1=2 @ vi ; Xj "j C vi ; Xj k vk ; Xj AA
j D1 j D1 kDpC1
0 1
1
X
D var .hvi ; X1 i "1 / C var @hvi ; X1 i k hvk ; X1 iA : (10.20)
kDpC1
Considering the first term of (10.20), we have by the independence of X1 and "1
and Lemma 10.1 that
2
var.hvi ; X1 i "1 / D i < 1:
Turning to the second term of (10.20), we have by Lemma 10.1
2 3 2 32
1
X 1
X
var 4hvi ; X1 i k hvk ; X1 i5 D E 4hvi ; X1 i k hvk ; X1 i5 :
kDpC1 kDpC1
188 10 Two sample inference for regression kernels
Next we examine the joint behavior of the coordinates. Combining (10.21) with
the Cauchy-Schwarz inequality we have
Therefore to finish the proof we need only derive the form for the off diagonal terms
of ˙ p . Using (10.19), Assumption 10.2, and Lemma 10.1, it is easy to verify that
for i ¤ j
cov .XT "0 /.i /; .XT "0 /.j / D E..XT "0 /.i /.XT "0 /.j //
0 1
X 1 1
N
˝ ˛ X ˝ ˛XN
˝ ˛ X
D E@ vi ; Xq k vk ; Xq vj ; Xs k hvk ; Xs iA
qD1 kDpC1 sD1 kDpC1
0 0 12 1
X B˝
N 1
X
˛˝ ˛ ˝ ˛ C
D E @ vi ; Xq vj ; Xq @ k vk ; Xq A A
qD1 kDpC1
0 0 12 1
1
X
B ˝ ˛ C
D N E @hvi ; X1 i vj ; X1 @ k hvk ; X1 iA A :
kDpC1
Theorem 10.3. Suppose Assumptions 10.1, 10.2 and conditions (2.12), (10.3)
P
(10.7), (10.8) hold. Suppose further that p is so large that ¤ . Then p ! 1,
as N ! 1.
Proof. We start by expanding p as
p D N.1 C /1 . O /T ˙ 1
O p . O /
O
D N.1 C /1 . O C /T ˙ 1
O p . O C /
O
C N.1 C /1 . /T ˙ 1
p . /
C 2N.1 C /1 . /T ˙ 1
p . O C /:
O
190 10 Two sample inference for regression kernels
Therefore we need only consider each term above. From Theorem 10.2 it follows
that
N.1 C /1 . O C /T ˙ 1
O p . O C / D OP .1/:
O
and p
2N.1 C /1 . /T ˙ 1
p . O C / D OP . N /:
O
The last term we need to consider is
N.1 C /1 . /T ˙ 1
p . /:
Since ˙ 1
p is positive definite it follows that
. /T ˙ 1
p . / > 0;
and we have
N.1 C /1 . /T ˙ 1
p . / ! 1:
p
Furthermore when we divide the above by N we get
N 1=2 .1 C /1 . /T ˙ 1
p . / ! 1:
Theorem 10.4. Suppose that Assumptions 10.1, 10.3 and conditions (2.12), (10.7),
(10.3) and (10.15) hold. Then for each fixed p 1 and r 1, we have
d 1 1 1
N 1=2 .
O v v / ! N 0; ˙ " ˝ C E 1 ˝ . 2 /
and ˝ ˛
2 .i; q/ D hvi ; X1 i vq ; X1 : (10.23)
In this chapter, we consider two tests for error correlation in the fully functional
linear model, which we call Methods I and II They complement the tools described
in Section 8.6 and the graphical goodness of fit checks used in Chapter 9. To con-
struct the test statistics, finite dimensional residuals are computed in two different
ways, and then their autocorrelations are suitably defined. From these autocorre-
lation matrices, two quadratic forms are constructed whose limiting distribution
are chi–squared with known numbers of degrees of freedom (different for the two
forms). The test statistics can be relatively easily computed using the R package
fda.
The remainder of the chapter is organized as follows. Section 11.2 develops the
setting for the least squares estimation needed define the residuals used in Method I.
After these preliminaries, both tests are described in Section 11.3. Their finite sam-
ple performance is evaluated in Section 11.4 through a simulation study, and further
examined in Section 11.5 by applying both methods to magnetometer and finan-
cial data. The asymptotic justifications is presented in Section 11.6. This chapter is
based on the work of Gabrys et al. (2010).
For any statistical model, it is important to evaluate its suitability for particular data.
For the functional linear model, the methodology of Chiou and Müller (2007), which
we use in data examples in Chapters 9 and 10, is very useful. It is equally impor-
tant to verify model assumptions. An important assumption on the model errors
in all functional linear models of Chapter 8 is that these errors are independent
and identically distributed. In this chapter, we study two tests aimed at detecting
serial correlation in the error functions "n .t/ in the fully functional model (8.1).
The methodology of Chiou and Müller (2007) was not designed to detect error cor-
relation, and can leave it undetected. Figure 11.1 shows diagnostic plots of Chiou
L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, 191
Springer Series in Statistics 200, DOI 10.1007/978-1-4614-3655-3_11,
© Springer Science+Business Media New York 2012
192 11 Tests for error correlation in the functional linear model
0.2
0.0
0.0
−0.2
−0.2
−1.0 −0.5 0.0 0.5 1.0 −0.04 −0.02 0.00 0.02 0.04
2nd Residual FPC Score
0.0
0.0
−0.4
−0.4
−1.0 −0.5 0.0 0.5 1.0 −0.04 −0.02 0.00 0.02 0.04
1st Residual FPC Score
0.0
0.0
−1.0
−1.0
−1.0 −0.5 0.0 0.5 1.0 −0.04 −0.02 0.00 0.02 0.04
1st Fitted FPC Score 2nd Fitted FPC Score
Fig. 11.1 Diagnostic plots of Chiou and Müller (2007) for a synthetic data set simulated according
to model (8.1) in which the errors "n follow the functional autoregressive model of Chapter 13.
and Müller (2007) obtained for synthetic data that follow a functional linear model
with highly correlated errors. These plots exhibit almost ideal football shapes. It is
equally easy to construct examples in which our methodology fails to detect depar-
tures from model (8.1), but the graphs of Chiou and Müller (2007) immediately
show it. The simplest such example is given by Yn .t/ D Xn2 .t/ C "n .t/ with iid
"n , see Figure 9.4. Thus, the methods we study in this chapter are complimentary
tools designed to test the validity of specification (8.1) with iid errors against the
alternative of correlation in the errors.
As in the multivariate regression, error correlation affects various variance esti-
mates, and, consequently, confidence regions and distributions of test statistics.
In particular, prediction based on Least Squares estimation is no longer optimal.
To illustrate these issues, it is enough to consider the scalar regression model
yi D ˇ0 C ˇ1 xi C "i ; i D 1; 2; : : : N; with fixed values xi . We focus on inference
11.1 Motivation and background 193
X
N
.xi xN N /.yi yNN /
ˇO1 D
i D1
:
X
N
.xi xN N /2
i D1
By default, software packages estimate the standard error of ˇO1 by the square root
of the estimated variance
c ˇO1 / D O2
Var. ; (11.1)
X
N
.xi xN N /2
i D1
If the "i are uncorrelated, all off–diagonal terms in (11.3) can be neglected, and
we arrive at the estimator (11.1). However, if the "i are correlated, the off–diagonal
terms in (11.3) contribute to the variance of ˇO1 .
To show how large the bias in the estimation of VarŒˇO1 via (11.1) can be, we
consider the following setting:
yi D 2xi C "i ; xi D i=N; i D 1; 2; : : : ; N; N D 100;
where the errors "i follows an AR(1) process
"i D '"i 1 C wi ; wi i id N.0; 1/:
194 11 Tests for error correlation in the functional linear model
This section explains the three steps, discussed in Section 11.1, involved in the
construction of the residuals in the setting of model (8.1). The idea is that the
curves are represented by their coordinates with respect to the FPC’s of the Xn ,
e.g. Ynk D hYn ; vk i is the projection of the nth response onto the kth largest FPC.
A formal linear model for these coordinates is constructed and estimated by least
squares. This formal model does not however satisfy the usual assumptions due to
the effect of the projection of infinite dimensional curves on a finite dimensional
subspace.
Since the vk form a basis in L2 .Œ0; 1/, the products vi .t/vj .s/ form a basis in
L .Œ0; 1 Œ0; 1/. Thus, if .; / is a Hilbert–Schmidt kernel, then
2
1
X
.t; s/ D ij vi .t/vj .s/; (11.4)
i;j D1
RR
where ij D .t; s/vi .t/vj .s/dt ds. Therefore,
Z 1
X ˝ ˛
.t; s/Xn .s/ds D ij vi .t/ Xn ; vj :
i;j D1
where ˝ ˛
Ynk D hYn ; vk i ; nj D Xn ; vj ; enk D h"n ; vk i ;
and where
1
X ˝ ˛
nk D kj Xn ; vj :
j DpC1
we rewrite (11.5) as
Yn D Zn C ın ; n D 1; 2; : : : ; N;
196 11 Tests for error correlation in the functional linear model
In Section 11.7, we will use Proposition 11.1, which can be conveniently stated
here because we have just introduced the required notation. It holds under the
assumptions listed in Section 11.6, and the following additional assumption.
11.3 Description of the test procedures 197
P1
Assumption 11.1. The coefficients ij of the kernel .; / satisfy i;j D1 j ij j <
1:
^
Proposition 11.1. If Assumptions (A1)–(A5) and 11.1 hold, then e e D
1=2
OP .N /.
The proof of Proposition 11.1 is fairly technical and is developed in Aue et al.
(2010).
We consider two test statistics, (11.14) and (11.17) which arise from two different
ways of defining finite dimensional vectors of residuals. Method I builds on the
^
ideas presented in Section 11.2, the residuals are derived using the estimator e
obtained by projecting both the Yn and the Xn on the vO i , the functional principal
components of the regressors. Method II uses two projections; the Xn are projected
on the vO i , but the Yn are projected on the uO i . Motivated by Lemma 8.1, in Method
II, we approximate .; / by
X
q X
p X
N
˝ ˛
bpq .t; s/ D O 1
i O ij u
O j .t/vO i .s/ O ij D N 1 hXn ; vO i i Yn ; uO j :
j D1 i D1 nD1
(11.10)
Method I emphasizes the role of the regressors Xn , and is, in a very loose sense,
analogous to the plot of the residuals against the independent variable in a straight
line regression. Method II emphasizes the role of the responses, and is somewhat
analogous to the plot of the residuals against the fitted values. Both statistics have
P 1
the form H hD1 rO Th Ȯ rO h , where rO h are vectorized covariance matrices of appropri-
ately constructed residuals, and Ȯ is a suitably constructed matrix which approx-
imates the covariance matrix of the the rO h , which are asymptotically iid. As in all
procedures of this type, the P-values are computed for a range of values of H , typ-
ically H 5 or H 10. The main difficulty lies in deriving explicit formulas for
the rO h and Ȯ and showing that the test statistics converge to the 2 distribution.
We continue to use the notation
1
X
C.vk / D k vk ; Xn D ni vi ; ni D hvi ; Xn i I
i D1
1
X ˝ ˛
.uk / D k uk ; Yn D nj uj ; nj D uj ; Yn ;
j D1
and the residuals are Rn D Y On eY^n . For 0 h < N , define the sample autoco-
variance matrices of these residuals as
X
N h
Vh D N 1 Rn RTnCh : (11.12)
nD1
and
b DM
M b 0:
b0 ˝ M (11.13)
With this notation in place, we can define the test statistic
X
H
^
QN DN b 1 vec.Vh /:
Œvec.Vh /T M (11.14)
hD1
^
Properties of the Kronecker product, ˝, give simplified formulae for QN . Since
b 1
M DM b ˝M
1 b (see Horn and Johnson (1991) p. 244), Problem 25 on p. 252
1
0 0
of Horn and Johnson (1991), yields
X
H h i
^
QN DN tr Mb 1
0 V T b 1
M 0 V h :
h
hD1
X
H X
p
^
QN DN O f;h .i; j /m
m O b;h .i; j /:
hD1 i;j D1
^
The null hypothesis is rejected if QN exceeds an upper quantile of the chi–square
2
distribution with p H degrees of freedom, see Theorem 11.2.
11.3 Description of the test procedures 199
where is the Hilbert–Schmidt operator with kernel .; /. To define the residuals,
we replace the infinite sums in (11.15) by finite sums, the unobservable uj ; vi with
the uO j ; vO i , and with the estimator bpq with kernel (11.10). This leads to the
equation
X q X
p
Onj uO j D Oni
bpq .vO i / C Ó n ;
j D1 i D1
where, similarly as in Section 11.2, Ó n contains the "n , the effect of replacing the
infinite sums with finite ones, and the effect of the estimation of the eigenfunctions.
Method II is based on the residuals defined by
X
q X
p
Ó n D Ó n .p; q/ D Onj uO j Oni
bpq .vO i / (11.16)
j D1 i D1
Pq O 1 O j .t/;
bpq .vO i / D
Since j D1 i O ij u we see that
!
X
q X
p
Ó n D Onj Oni O 1
i O ij uO j .t/:
j D1 i D1
Next define
˝ ˛ X
p
O O
Znj WD uO j ; Ó n D nj Oni O 1
i O ij :
i D1
and denote by b
Ch the q q autocovariance matrix with entries
N h
1 X O
cOh .k; `/ D Znk O Z .k/ ZO nCh;` O Z .`/ ;
N nD1
P
where O Z .k/ D N 1 N O
nD1 Znk : Finally denote by rOf;h .i; j / and rOb;h .i; j / the
b
.i; j / entries, respectively, of C1 b b b1
0 Ch and Ch C0 .
The null hypothesis is rejected if the statistic
X
H X
q
QO N D N rOf;h .i; j /Orb;h .i; j / (11.17)
hD1 i;j D1
X
H h i
QO N D N tr bC1 bT b1 b
0 Ch C0 Ch
hD1
and
X
H
QO N D N Œvec.b
Ch /T Œb
C0 ˝ b
C0 1 Œvec.b
Ch /:
hD1
Both methods require the selection of p and q (Method I, only of p). We rec-
ommend the popular method based on the cumulative percentage of total variability
(CPV) calculated as
Pp O
k
CP V .p/ D PkD11 O
;
k kD1
with a corresponding formula for the q. The numbers of eigenfunctions, p and q, are
chosen as the smallest numbers, p and q, such that CP V .p/ 0:85 and CP V .q/
0:85. Other ways of selecting p (and q) are discussed in Section 3.3.
As p and q increase, the normalized statistics QN ^
and QO N converge to the stan-
dard normal distribution. The normal approximation works very well even for small
p or q (in the range 3-5 if N 100) because the number of the degrees of freedom
increases like p 2 or q 2 . For Method I, which turns out to be conservative in small
samples, the normal approximation brings the size closer to the nominal size. It also
improves the power of Method I by up to 10%
Finally, we note that the methods of this chapter are suitable for testing the cor-
relation of errors in model (8.1), but not in its special case known as the histori-
cal functional model of Malfait and Ramsay (2003). The latter is model (8.1) with
.t; s/ D ˇ.s; t/IH .s; t/, where ˇ.; / is an arbitrary Hilbert–Schmidt kernel and
IH .; / is the indicator function of the set H D f.s; t/ W 0 s t 1g. This
model requires that Yn .t/ depends only on the values of Yn .s/ for s t, i.e. it
postulates temporal causality within the pairs of curves. Our approach cannot be
readily extended to test for error correlation in the historical model because it uses
series expansions of a general kernel .t; s/, and the restriction that the kernel van-
ishes in the complement of H does not translate to any obvious restrictions on the
coefficients of these expansions.
In this section we report the results of a simulation study performed to asses the
empirical size and power of the proposed tests (Method I and Method II) for small
to moderate sample sizes. The sample size N ranges from 50 to 500. Both indepen-
dent and dependent regressor functions Xi are considered. The simulation runs have
1; 000 replications each. We used the R package fda.
11.4 A simulation study 201
To model the "i under H0 , independent trajectories of the Brownian bridge (BB)
and the Brownian motion (BM) are generated by transforming cumulative sums
of independent normal random variables computed on a grid of 1; 000 equispaced
points in Œ0; 1. In order to evaluate the effect of non Gaussian errors on the finite
sample performance, we also simulated t5 and uniform BB and BM (BBt5 , BBU ,
BMt5 and BMU ) by generating t5 and uniform, instead of normal increments. We
also generate errors as
X
5
"n .t/ D #nj j 1=2 sin.j t/;
j D1
with the iid #nj distributed according to the normal, t5 and uniform distributions.
We report simulation results obtained using by converting the curves into func-
tional objects using B-splines with 20 basis functions. We also performed the sim-
ulations using the Fourier basis, and found that the results are not significantly dif-
ferent. To determine the number of principal components (p for Xn and q for Yn ),
the cumulative percentage of total variability (CPV) is used as described in Section
11.3.
Three different ˚ kernel functions
in (8.1) are considered: the Gaussian kernel
.t; s/ D exp t 2 C s 2 =2 ; the Wiener kernel .t; s/ D min.t; s/; and the
Parabolic kernel .t; s/ D 4 .t C 1=2/2 C .s C 1=2/2 C2: The regressors Xi in
(8.1) are either iid BB or BM, or follow the functional autoregressive FAR(1) model
studied in detail in Chapter 13. To simulate the FAR(1) Xn we used the kernels of
the three types above, but multiplied by a constant K, so that their Hilbert–Schmidt
norm is 0.5. Thus, the dependent regressors follow the model
Z
Xn .t/ D K X .t; s/Xn1 .s/ds C ˛n .t/;
where the ˛n are iid BB, BM, BBt5 , BBU , BMt5 or BMU .
We present here only a small selection of the results of our numerical experi-
ments, and state general conclusions based on all simulations.
Starting with the empirical size, Tables 11.2 and 11.3 show that Method I is more
conservative and slightly underestimates the nominal levels while Method II tends to
overestimate them. The empirical sizes do not depend on whether the BB or the BM
is used, nor whether regressors are iid or dependent, nor on the shape of the kernel.
These sizes do not deteriorate if errors are not Gaussian either. The empirical size
of both methods is thus robust to the form of the kernel, to moderate dependence in
the regressors, and to departures from normality in the errors.
For the power simulations, we consider model (8.1) with the Gaussian kernel and
"n ARH.1/, i.e.
Z
"n .t/ D K " .t; s/"n1 .s/ds C un .t/;
where " .t; s/ is Gaussian, Wiener or Parabolic and K is chosen so that the Hilbert-
Schmidt norm of the above ARH(1) operator is 0:5 and the un .t/ are iid BB, BM,
BBt5 , BBU , BMt5 or BMU .
202 11 Tests for error correlation in the functional linear model
Table 11.2 Empirical size for independent predictors: X D BB , " D BB, DGaussian,
Wiener and Parabolic, p D 3.
Method I Method II
Sample Gaussian Wiener Parabolic Gaussian Wiener Parabolic
size 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5%
H D1
50 6.7 2.5 5.8 3.2 7.4 3.7 7.9 3.7 7.8 3.3 8.2 3.6
100 7.4 3.7 9.5 4.4 8.9 3.8 10.6 5.2 9.9 4.2 9.8 4.7
200 9.8 4.6 8.9 4.2 9.0 4.1 8.9 4.4 10.0 4.0 9.6 4.0
300 9.3 4.8 10.0 5.1 8.1 3.5 8.7 4.4 8.8 4.7 10.3 5.5
500 8.8 5.2 9.8 5.3 9.6 4.9 8.8 4.2 8.9 4.3 8.7 4.0
H D3
50 4.3 2.5 5.6 2.1 6.0 3.4 10.7 5.3 8.9 4.7 9.0 4.2
100 7.6 3.7 6.9 3.6 6.4 3.3 9.9 4.5 10.2 4.0 10.1 4.9
200 8.7 4.6 6.4 3.2 8.0 3.3 9.6 4.8 10.1 5.1 9.6 5.0
300 7.6 3.5 9.5 4.2 9.5 4.8 11.0 5.1 8.9 4.0 8.1 4.6
500 9.8 4.6 9.1 3.9 9.2 4.9 11.1 6.8 9.1 4.4 10.0 5.1
H D5
50 2.6 0.9 3.5 1.1 4.1 1.4 10.4 5.7 11.2 5.7 10.0 5.1
100 6.5 3.7 5.9 3.0 4.8 1.9 11.3 5.3 10.5 5.2 8.9 4.6
200 8.5 4.4 7.5 3.7 7.4 3.3 11.3 5.7 9.7 4.5 9.7 4.4
300 7.6 4.0 9.9 4.7 7.6 2.8 9.4 4.9 9.8 5.1 10.6 5.5
500 10.1 4.6 9.8 4.4 7.9 3.6 12.1 6.8 9.7 4.7 10.4 5.8
Typical power results are shown in Table 11.4. Just as for size, power is not
affected by the dependence of the regressors. As expected from the results for the
empirical size, power is uniformly higher for method II, but this difference is visible
only for N < 200 (in our numerical experiments). The power is highest for H D 1,
especially for smaller samples, because the errors follow the ARH(1) process.
We now illustrate the application of the tests on functional data sets arising in space
physics and finance.
Application to Magnetometer data. We continue the study of the association
between the auroral (high latitude) electrical currents and the currents flowing at
mid– and low latitudes. This problem was introduced in Section 9.4. Maslova et
al. (2010b) provide extensive references to the relevant space physics literature. The
problem was cast into the setting of the functional linear model (8.1) in which the Xn
are centered high-latitude records and Yn are centered mid- or low-latitude magne-
tometer records. We consider two settings 1) consecutive days, 2) non-consecutive
11.5 Application to space physics and high–frequency financial data 203
Table 11.3 Empirical size for dependent predictors: X ARH.1/ with the BB innovations and
X DGaussian, Wiener and Parabolic, DGaussian, " D BB, p D 3.
Method I Method II
Sample Gaussian Wiener Parabolic Gaussian Wiener Parabolic
size 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5%
H D1
50 8.4 3.9 5.9 2.1 7.3 2.9 9.2 4.6 7.2 2.7 8.6 3.8
100 8.9 4.4 8.8 3.7 8.4 3.7 10.4 4.6 10.2 4.9 9.9 4.8
200 10.2 4.7 9.7 4.6 10.1 4.7 9.5 4.8 8.9 4.0 9.8 5.2
300 9.2 4.9 8.9 4.4 8.6 4.6 10.1 4.1 8.5 3.4 12.0 5.3
500 10.5 5.2 9.3 4.5 9.0 4.7 9.0 4.2 9.5 4.8 11.5 5.6
H D3
50 4.4 2.2 5.3 2.9 5.5 2.8 8.1 4.1 10.7 4.5 10.1 4.0
100 6.6 3.1 6.0 2.7 7.0 2.9 10.7 5.4 9.1 4.9 9.9 4.5
200 7.8 3.1 8.5 4.1 8.9 3.9 11.9 6.2 8.5 4.0 7.7 2.9
300 8.2 4.8 8.6 3.9 9.4 4.8 11.9 5.2 8.8 4.4 9.3 5.2
500 11.4 5.3 10.3 5.7 9.1 4.3 10.6 5.4 9.9 5.1 9.9 4.9
H D5
50 4.2 1.8 3.2 1.5 4.0 1.9 9.9 5.2 11.1 6.6 11.9 6.7
100 7.2 3.2 4.9 2.4 5.2 2.1 10.5 5.5 10.2 5.5 11.2 6.0
200 7.6 2.8 8.1 3.7 8.8 4.4 11.4 4.6 10.3 4.6 11.6 7.3
300 8.3 4.2 8.3 3.4 7.3 3.9 10.7 5.5 9.3 5.2 9.7 4.7
500 10.7 5.8 10.4 4.9 7.9 4.2 9.0 4.1 9.2 4.0 10.4 5.3
Table 11.4 Method I: Empirical power for dependent predictor functions: X ARH.1/ and
" ARH.1/ with the BB innovations, " D X D Gaussian, Winer and Parabolic,
DGaussian, p D 3.
Sample Gaussian Wiener Parabolic
size 10% 5% 10% 5% 10% 5%
H D1
50 79.2 68.6 68.5 54.0 62.3 47.3
100 99.9 99.6 98.6 96.7 97.7 96.0
200 100 100 100 100 100 100
300 100 100 100 100 100 100
500 100 100 100 100 100 100
H D3
50 53.8 40.7 45.4 32.8 40.0 29.0
100 98.0 95.7 93.6 89.5 87.5 81.3
200 100 100 100 99.9 100 99.8
300 100 100 100 100 100 100
500 100 100 100 100 100 100
H D5
50 41.2 27.9 31.7 20.8 25.4 15.6
100 95.1 90.3 84.4 74.9 78.2 68.1
200 100 100 100 99.8 99.9 99.3
300 100 100 100 100 100 100
500 100 100 100 100 100 100
effect (the structure of the M-I system in the northern hemisphere changes with sea-
son). We conclude that it is not appropriate to use model (8.1) with iid errors to
study the interaction of high– and low latitude currents when the data are derived
from consecutive days.
Setting 2 (substorm days): We now focus on two samples studied in Maslova
et al. (2010b). They are derived from 37 days on which isolated substorms were
recorded at College, Alaska (CMO). A substorm is classified as an isolated sub-
storm, if it is followed by 2 quiet days. There were only 37 isolated substorms in
2001, data for 10 such days are shown in Figure 11.3. The first sample consists of
37 pairs .Xn ; Yn /, where Xn is the curve of the nth isolated storm recorded at CMO,
and Yn is the curve recorded on the same UT day at Honolulu, Hawaii, (HON). The
second sample is constructed in the same way, except that Yn is the curve recorded at
Boulder, Colorado (BOU). The Boulder observatory is located in geomagnetic mid-
latitude, i.e. roughly half way between the magnetic north pole and the magnetic
equator. Honolulu is located very close to the magnetic equator.
The p-values for both methods and the two samples are listed in Table 11.5.
For Honolulu, both tests indicate the suitability of model (8.1) with iid errors. For
Boulder, the picture is less clear. The acceptance by Method I may be due to the
small sample size (N D 37). The simulations in Section 11.4 show that for N D
50 this method has the power of about 50% at the nominal level of 5%. On the
11.5 Application to space physics and high–frequency financial data 205
500
0
nT (CMO)
−500
−1000
min
40
20
0
nT (HON)
−20
−40
−60
−80
min
Fig. 11.2 Magnetometer data on 10 consecutive days (separated by vertical dashed lines) recorded
at College, Alaska (CMO) and Honolulu, Hawaii, (HON).
other hand, Method II has the tendency to overreject. The sample with the Boulder
records as responses confirms the general behavior of the two methods observed
in Section 11.4, and emphasizes that it is useful to apply both of them to obtain
more reliable conclusions. From the space physics perspective, midlatitude records
are very difficult to interpret because they combine features of high latitude events
(exceptionally strong auroras have been seen as far south as Virginia) and those of
low latitude and field aligned currents.
We also applied the tests to samples in which the regressors are curves on days
on which different types of substorms (according to a space physics classification)
206 11 Tests for error correlation in the functional linear model
200
nT (CMO)
−200
−600
1 1441 2881 4321 5761 7201 8641 10081 11521 12961 14401
20 40 60
nT (HON)
−20 0
1 1441 2881 4321 5761 7201 8641 10081 11521 12961 14401
80
nT (BOU)
40
0
−40
1 1441 2881 4321 5761 7201 8641 10081 11521 12961 14401
Fig. 11.3 Magnetometer data on 10 chronologically arranged isolated substorm days recorded at
College, Alaska (CMO), Honolulu, Hawaii, (HON) and Boulder, Colorado (BOU).
occurred. The broad conclusion remains that for substorm days, the errors in model
(8.1) can be assumed iid if the period under consideration is not longer than a few
months. For longer periods, seasonal trends apparently cause differences in distri-
bution (possibly also of the Xn ).
Application to intraday returns. Perhaps the best known application of linear
regression to financial data is the celebrated Capital Asset Pricing Model (CAMP),
see e.g. Chapter 5 of Campbell et al. (1997). In its simplest form, it is defined by
rn D ˛ C ˇrn.I / C "n ;
where
Pn Pn1
rn D 100.ln Pn ln Pn1 / 100
Pn1
11.5 Application to space physics and high–frequency financial data 207
is the return, in percent, over a unit of time on a specific asset, e.g. a stock of a
corporation, and rn.I / is the analogously defined return on a relevant market index.
The unit of time can be can be day, month or year.
In this section we work with intra–daily price data, which are known to have
properties quite different than those of daily or monthly closing prices, see e.g.
Chapter 5 of Tsay (2005); Guillaume et al. (1997) and Andersen and Bollerslev
(1997a, 1997b) also offer interesting perspectives. For these data, Pn .tj / is the price
on day n at tick tj (time of trade); we do not discuss issues related to the bid–ask
spread, which are not relevant to what follows. For such data, it is not appropriate to
define returns by looking at price movements between the ticks because that would
lead to very noisy trajectories for which the methods based on the FPC’s are not
appropriate (Johnstone and Lu (2009) explain why principal components cannot be
meaningfully estimated for noisy data). Instead, we adopt the following definition.
Figure 11.4 shows intra-day cumulative returns on 10 consecutive days for the
Standard & Poor’s 100 index and the Exxon Mobil corporation. These returns have
an appearance amenable to smoothing via FPC’s.
We propose an extension of the CAPM to such return by postulating that
Z
rn .t/ D ˛.t/ C ˇ.t; s/rn.I / .s/ds C "n .t/; t 2 Œ0; 1; (11.18)
where the interval Œ0; 1 is the rescaled trading period (in our examples, 9:30 to 16:00
EST). We refer to model (11.18) as the functional CAPM (FCAPM). As far as we
know, this model has not been considered in the financial literature, but just as for
the classical CAPM, it is designed to evaluate the extent to which intraday market
returns determine the intraday returns on a specific asset. It is not our goal in this
example to systematically estimate the parameters in (11.18) and compare them for
various assets and markets, we merely want to use the methods developed in this
paper to see if this model can be assumed to hold for some well–known assets. With
this goal in mind, we considered FCAPM for S&P 100 and its major component, the
Exxon Mobil Corporation (currently it contributes 6:78% to this index). The price
processes over the period of about 8 years are shown in Figure 11.5. The functional
observations are however not these processes, but the cumulative intra–daily returns,
examples of which are shown in Figure 11.4.
After some initial data cleaning and preprocessing steps, we could compute the
p-values for any period within the time stretch shown in Figure 11.5. The p-values
for calendar years, the sample size N is equal to about 250, are reported in Table
11.6. In this example, both methods lead to the same conclusions, which match the
well–known macroeconomic background. The tests do not indicate departures from
208 11 Tests for error correlation in the functional linear model
1.5
1.0
0.5
SP returns
0.0
−1.5 −1.0 −0.5
min
2.0
1.5
XOM returns
1.0
0.5
−0.5 0.0
min
Fig. 11.4 Intra-day cumulative returns on 10 consecutive days for the Standard & Poor’s 100 index
(SP) and the Exxon–Mobil corporation (XOM).
the FCAMP model, except in 2002, the year between September 11 attacks and the
invasion of Iraq, and in 2006 and 2007, the years preceding the collapse of 2008
in which oil prices were growing at a much faster rate than then the rest of the
economy.
In the above examples we tested the correlation of errors in model (8.1), but not
in the historical functional linear model defined at the end of Section 11.3. This
is justified because the magnetometer data are obtained at locations with different
11.5 Application to space physics and high–frequency financial data 209
800
SP prices
600
400
Fig. 11.5 Share prices of the Standard & Poor’s 100 index (SP) and the Exxon–Mobil corporation
(XOM). Dashed lines separate years.
Table 11.6 P–values, in percent, for the FCAPM (11.18) in which the regressors are the intra–
daily cumulative returns on the Standard & Poor’s 100 index, and the responses are such returns
on the Exxon–Mobil stock.
Year Method I Method II
2000 46.30 55.65
2001 43.23 56.25
2002 0.72 0.59
2003 22.99 27.19
2004 83.05 68.52
2005 21.45 23.67
2006 2.91 3.04
2007 0.78 0.72
local times, and for space physics applications the dependence between the shapes
of the daily curves is of importance. Temporal causality for financial data is often
not assumed, as asset values reflect both historical returns and expectations of future
market conditions.
210 11 Tests for error correlation in the functional linear model
The exact asymptotic 2 distributions are obtained only under Assumption 11.2
which, in particular, requires that the Xn be iid. Under Assumption (A1)–(A5), these
2 distributions provide only approximations to the true limit distributions. The
approximations are however very good, as the simulations in Section 11.4 show;
size and power for dependent Xn are the same as for iid Xn , within the standard
error. Thus, to understand the asymptotic properties of the tests, we first consider
their behavior under Assumption 11.2. We begin the presentation of the asymptotic
theory by stating the required assumptions.
Assumption 11.2. The errors "n are independent identically distributed mean zero
elements of L2 satisfying Ek"n k4 < 1. The regressors Xn are independent identi-
cally distributed mean zero elements of L2 satisfying EkXn k4 < 1. The sequences
fXn g and f"n g are independent.
For data collected sequentially over time, the regressors Xn need not be indepen-
dent. We formalize the notion of dependence in functional observations using the
notion of L4 –m–approximability studied in detail in Chapter 16. For ease of ref-
erence, we repeat some conditions contained in Assumption 11.2; the weak depen-
dence of the fXn g is quantified in Conditions (A2) and (A5).
(A1) The "n are independent, identically distributed with E"n D 0 and Ek"n k4 <
1.
(A2) Each Xn admits the representation
where
.k/ .k/
Xn.k/ D g.˛n ; ˛n1 ; : : : ; ˛nkC1 ; ˛nk ; ˛nk1 ; : : :/;
where
.k/ .k/
".k/
n D h.un ; un1 ; : : : ; unkC1 ; unk ; unk1 ; : : :/;
for all i; j . The two methods introduced in Section 11.3 detect the alternatives with
ei D vi (Method I) and ei D ui (Method II). These methods test for correlation up
to lag H , and use the FPC vi ; i p; and ui ; i q.
With this background, we can state the null and alternative hypotheses as follows.
H0 : Model (8.1) holds together with Assumptions (A1)–(A5).
The key assumption is (A1), i.e. the independence of the "n .
HA;I : Model (8.1)
˝ holds
˛ together with Assumptions, (A2), (A4), (A5), (B1)–(B4),
and EŒh"0 ; vi i "h ; vj ¤ 0 for some 1 h H and 1 i; j p.
HA;II : Model ˝(8.1) holds
˛ together with Assumptions, (A2), (A4), (A5), (B1)–(B4),
and EŒh"0 ; ui i "h ; uj ¤ 0 for some 1 h H and 1 i; j q.
Note that the ui are well defined under the alternative, because (A2), (A4), (A5)
and (B1)–(B4) imply that the Yn form a stationary sequence.
For ease of reference, we state the following Theorem, which follows immedi-
ately from Theorem 16.2.
Theorem 11.1. If assumptions (A2), (A4), (A5) and (2.12) hold, then relations
(2.13) hold.
212 11 Tests for error correlation in the functional linear model
Theorem 11.2. Suppose Assumptions 11.2 and 11.1 and condition (2.12) hold. Then
^
the statistics QN converges to the 2 –distribution with p 2 H degrees of freedom.
Theorem 11.3. Suppose Assumption 11.2 and condition (2.12) hold. Then statis-
tic (11.17) converges in distribution to a chi–squared random variable with q 2 H
degrees of freedom.
where
eh .s; t/ D EŒX0 .s/Xh .t/:
where the p 2 –dimensional vectors Zh are iid normal, and coincide with the limits
of N 1=2 vec.Vh /, if the Xn are independent.
For any r > 0, the terms RN;p .h/ satisfy,
˚ˇˇ ˇˇ
lim lim sup P ˇˇRN;p .h/ˇˇ > r D 0: (11.19)
p!1 N !1
they all involve coefficients jk with at least one index greater than p multiplied by
factors of order OP .N 1=2 /. In (11.19), the limit of p increasing to infinity should
not be interpreted literally, but again merely indicates that p is so large that the first
p FPC’s vk explain a large percentage of variance of the Xn .
Our last theorem states conditions under which the test is consistent. The inter-
pretation of the limit as p ! 1 is the same as above. Theorem 11.5 states that for
such p and sufficiently large N the test will reject with large probability if "n and
"nCh are correlated in the subspace spanned by fvi ; 1 i pg.
Theorem 11.5. Suppose Assumptions (B1)–(B4), (A2), (A4), (A5), Assumption 11.1
and condition (2.12) hold. Then, for all R > 0,
˚ ^
lim lim inf P QN > R D 1;
p!1 N !1
˝ ˛
provided EŒh"0 ; vi i "h ; vj ¤ 0; for some 1 h H and 1 i; j p.
To illustrate the arguments, we present in Section 11.7 the proof of Theorem 11.2.
The proof of Theorem 11.3 follows the general outline of the proof of Theorem 7.1.
The proof of Theorem 11.4 is very long, but the general idea is like that used in the
proof of Theorem 11.2. Similarly, the proof of Theorem 11.5 is a modification and
extension of the proof of Theorem 11.2.
Yn D p Xn C ı n ; (11.20)
where 2 3
11 12 1p
6 21 7
6 22 2p 7
p D 6 : :: :: :: 7 :
4 :: : : : 5
p1 p2 pp
The vectors Yn ; Xn ; ı n are defined in Section 11.2 as the projections on the FPC’s
v1 ; v2 ; : : : vp . Proposition 11.2 establishes an analog of (11.20) if these FPC’s are
replaced by the EFPC’s vO 1 ; vO 2 ; : : : vOp . These replacement introduces additional
terms generically denoted with the letter . First we prove Lemma 11.1 which leads
to a decomposition analogous to (11.5).
Lemma 11.1. If relation (16.39) holds with a Hilbert–Schmidt kernel .; /, then
0 1
Z X
p
Yn .t/ D @ cOi ij cOj vOi .t/vO j .s/A Xn .s/ds C n .t/;
i;j D1
214 11 Tests for error correlation in the functional linear model
where
n .t/ D "n .t/ C n .t/ C n .t/:
The terms n .t/ and n .t/ are defined as follows:
n .t/ D n1 .t/ C n2 .t/I
0 1
Z 1
X 1
X
n1 .t/ D @ ij vi .t/vj .s/A Xn .s/ds;
i DpC1 j D1
0 1
Z X
p 1
X
n2 .t/ D @ ij vi .t/vj .s/A Xn .s/ds:
i D1 j DpC1
where n .t/ D n1 .t/ C n2 .t/: Thus model (16.39) can be written as
0 1
Z Xp
Yn .t/ D @ ij vi .t/vj .s/A Xn .s/ds C n .t/ C "n .t/
i;j D1
To take into account the effect of the estimation of the vk , we will use the decom-
position
ij vi .t/vj .s/ D cOi ij cOj .cOi vi .t//.cOj vj .s//
D cOi ij cOj vOi .t/vO j .s/
C cOi ij cOj ŒcOi vi .t/ vOi .t/cOj vj .s/
C cOi ij cOj v
O i .t/ŒcOj vj .s/ vOj .s/;
which allows us to rewrite (16.39) as
0 1
Z Xp
Yn .t/ D @ cOi ij cOj vOi .t/vO j .s/A Xn .s/ds C n .t/;
i;j D1
where n .t/ D "n .t/ C n .t/ C n .t/ and n .t/ D n1 .t/ C n2 .t/: t
u
11.7 Proof of Theorem 11.2 215
can be replaced by the “errors” O n . The essential element of the proof is the relation
ep
e^ p D OP .N
1=2
/ stated in Proposition 11.1.
Lemma 11.2. Suppose Assumptions 11.2 and 11.1 and condition (2.12) hold. Then,
for any fixed h > 0,
ˇˇ ˇˇ
ˇˇ X
N h ˇˇ
ˇˇ T ˇˇ
ˇˇVh N 1
O n O nCh ˇˇ D OP .N 1 /:
ˇˇ ˇˇ
nD1
X
N h
^ ^
Vh D N 1 ep
Œ. e p /X
O n C O n Œ.
ep
e p /X
O nCh C O nCh T :
nD1
PN h
O h D N 1
Denoting, C O nX
X O T ; we thus obtain
nD1 nCh
X
N h
^ ^ ^
ep
V h D . e /C
p
O h .
ep
e /T C .
p
ep
e /N 1
p
O n O TnCh
X
nD1
X
N h X
N h
C N 1 O n X
O T . e^
ep p/ C N
1 T
O n O nCh :
nCh
nD1 nD1
216 11 Tests for error correlation in the functional linear model
To deal with the remaining three terms, we use the decomposition of Lemma
11.1. It is enough to bound the coordinates of each of the resulting terms. Since
n D "n C n1 C n2 C n1 C n2 , we need to establish bounds for 2 5 D 10
terms, but these bounds fall only to a few categories, so we will only deal with some
typical cases.
Starting with the decomposition of X O n O TnCh , observe that
h ZZ h
!
X
N
˝ ˛ X
N
1 1=2
N 2 hXn ; vO i i "nCh ; vO j D N Xn .t/"nCh .s/ vO i .t/vO j .s/dtds:
nD1 nD1
The terms Xn .t/"nCh .s/ are iid elements of the Hilbert space L2 .Œ0; 1 Œ0; 1/, so
by the CLT in a Hilbert space, see Chapter 2,
ZZ h
!2
X
N
1=2
N Xn .t/"nCh .s/dt ds D OP .1/:
nD1
RR
Since the vO j have unit norm, .vO i .t/vO j .s//2 dt ds D 1: It therefore follows from
the Cauchy–Schwarz inequality that
X
N h
˝ ˛
hXn ; vOi i "nCh ; vO j D OP .N 1=2 /:
nD1
PN h
ep
Thus, the "n contribute to . e^ 1 O OT
p /N nD1 Xn nCh a term of the order
OP .N 1=2 N 1 N 1=2 / D OP .N 1 /, as required.
We now turn to the contribution of the n;1 . As above, we have
X
N h
˝ ˛
N 1=2 hXn ; vO i i nCh;1 ; vO j
nD1
ZZ h
!
X
N
1=2
D N Xn .t/nCh;1 .s/ vOi .t/vO j .s/dt ds
nD1
0 0 1 1
ZZ X
N h Z 1
X 1
X
D @N 1=2 Xn .t/ @ k` vk .s/v` .u/
A XnCh .u/duA
nD1 kDpC1 `D1
vO i .t/vO j .s/dt ds
Z Z Z
D Nh .t; u/Rp .t; u/dt du vk .s/vO j .s/ds;
11.7 Proof of Theorem 11.2 217
where
X
N h
Nh .t; u/ D N 1=2 Xn .t/XnCh .u/
nD1
and
1
X 1
X
Rp .t; u/ D O i .t/:
k` v` .u/v
`D1 kDpC1
By the CLT for m–dependent elements in a Hilbert space, (follows e.g. from Theo-
rem 2.17 of Bosq (2000)), Nh .; / is OP .1/ in L2 .Œ0; 1 Œ0; 1/, so
ZZ
Nh2 .t; u/dt du D OP .1/:
X
N h
˝ ˛
hXn ; vO i i nCh;1 ; vOj D OP .N 1=2 /;
nD1
and this again implies that the n1 make a contribution of the same order as the "n .
The same argument applies to the n2 .
We now turn to the contribution of the n1 , the same argument applies to the n2 .
Observe that, similarly as for the n1 ,
X
N h
˝ ˛
N 1=2 hXn ; vO i i Oj
nCh;1 ; v
nD1
ZZ h
!
X
N
1=2
D N Xn .t/ nCh;1 .s/ vO i .t/vO j .s/dt ds
nD1
2 3
Z ZZ X
p
D 4 Nh .t; u/ cOk O i .t/dt
k` v` .u/v du5 ŒcOk vk .s/ vO k .s/vO j .s/ds
k;`D1
(11.22)
Clearly,
0 12
ZZ X
p
@ cOk O i .t/
k` v` .u/v
A dt du D OP .1/;
k;`D1
By Theorem 2.7,
Z 1=2
ŒcOk vk .s/ vOk .s/2 ds D OP .N 1=2 /: (11.23)
218 11 Tests for error correlation in the functional linear model
We thus obtain
X
N h
˝ ˛
hXn ; vOi i Oj
nCh;1 ; v D OP .1/; (11.24)
nD1
X
N h
^
ep
. e p /N 1 O n O TnCh D OP .N 1 /:
X
nD1
PN h
The term N 1 O n X
O T . e^
ep p / can be dealt with in a fully analogous
nD1 nCh
way. t
u
O n D "O n C O n C O n ;
with the coordinates obtained by projecting the functions "n ; n ; n onto the EFPC’s
vO j . For example,
˝ ˛
O n D Œhn ; vO 1 i ; hn ; vO 2 i ; : : : ; n ; vOp T :
Lemma 11.3 shows that the vectors O n do not contribute to the asymptotic dis-
tribution of the Vh . This is essentially due to the fact that by Theorem 2.7, the
difference between vO j and cOj vj is of the order OP .N 1=2 /. For the same reason, in
the definition of ˇOn and O n , the vOj can be replaced by the cOj vj , as stated in Lemma
11.4. Lemma 11.4 can be proven in a similar way as Lemma 11.3, so we present
only the proof of Lemma 11.3.
Lemma 11.3. Suppose Assumptions 11.2 and 11.1 and condition (2.12) hold. Then,
for any fixed h > 0,
ˇˇ ˇˇ
ˇˇ X
N h ˇˇ
ˇˇ 1 T ˇˇ
ˇˇVh N Œ"O n C O n Œ"O nCh C O nCh ˇˇ D OP .N 1 /:
ˇˇ ˇˇ
nD1
Lemma 11.4. Suppose Assumptions 11.2 and 11.1 and condition (2.12) hold. Then,
for any fixed h > 0,
ˇˇ ˇˇ
ˇˇ X
N h ˇˇ
ˇˇ 1 T ˇˇ
ˇˇVh N Œ"Q n C Q n Œ"Q nCh C Q nCh ˇˇ D OP .N 1 /;
ˇˇ ˇˇ
nD1
11.7 Proof of Theorem 11.2 219
where
˝ ˛
"Q n D ŒcO1 h"n ; v1 i ; cO2 h"n ; v2 i ; : : : ; cOp "n ; vp T
and
˝ ˛
Q n D ŒcO1 hn ; v1 i ; cO2 hn ; v2 i ; : : : ; cOp n ; vp T :
Proof of Lemma 11.3. In light of Lemma 11.2, we must show that the norm of
difference between
X
N h
N 1 Œ"O n C O n Œ"O n C O n T
nD1
and
X
N h
N 1 Œ"O n C O n C O n Œ"O n C O n C O n T
nD1
is OP .N 1 /.
Writing O n D O n1 C O n2 and O n D O n1 C O n2 , we see that this difference consists
of 20 terms which involve multiplication by O n1 or O n2 . For example, analogously
to (11.22), the term involving "n and and nCh;1 has coordinates
X
N h
˝ ˛
N 1 h"n ; vOi i Oj
nCh;1 ; v
nD1
2 3
Z ZZ X
p
D N 1=2 4 N";h .t; u/ cOk O i .t/dt
k` v` .u/v du5
k;`D1
where
X
N h
N";h .t; u/ D N 1=2 "n .t/XnCh .u/:
nD1
X
N h
˝ ˛
N 1 h"n ; vO i i Oj
nCh;1 ; v D OP .N 1 /:
nD1
The other terms can be bounded using similar arguments. The key point is that
by (11.23), all these terms are N 1=2 times smaller than the other terms appearing in
P T
the decomposition of N 1 N h O n O n . nD1 t
u
220 11 Tests for error correlation in the functional linear model
X
N
.1/
RN;h D N 1=2 "n .t/"nCh .s/;
nD1
X
N
.2/
RN;h D N 1=2 "n .t/XnCh .s/;
nD1
X
N
.3/
RN;h D N 1=2 "nCh .t/Xn .s/;
nD1
X
N
.4/
RN;h D N 1=2 Xn .t/XnCh .s/:
nD1
Lemma 11.5, which follows directly for the CLT in the space L2 .Œ0; 1 Œ0; 1/
and the calculation of the covariances, summarizes the asymptotic behavior of the
.i /
processes RN;h .
Lemma 11.5. Suppose Assumptions 11.2 and 11.1 and condition (2.12) hold. Then
n o d n o
.i / .i /
RN;h ; 1 i 4; 1 h H ! h
; 1 i 4; 1 h H ;
.i /
where the are L2 .Œ0; 1 Œ0; 1/–valued jointly Gaussian process such that the
nh o
.i /
processes h ; 1 i 4 are independent and identically distributed.
then
d
N 1=2 fVh ; 1 h H g ! fTh ; 1 h H g ;
where the Th ; 1 h H; are independent identically distributed normal random
matrices. Their covariances can be computed using Lemma 11.1. After lengthy but
straightforward calculations, the following lemma is established
Lemma 11.6. Suppose Assumptions 11.2 and 11.1 and condition (2.12) hold. If
(11.25) holds, then for any fixed h > 0,
where
a.k; `I k 0 ; `0 /
D r2 .k; k 0 /r2 .`; `0 / C r2 .k; k 0 /r1 .`; `0 / C r2 .`; `0 /r1 .k; k 0 / C r1 .k; k 0 /r1 .`; `0 /;
11.7 Proof of Theorem 11.2 221
with
1
X
r1 .`; `0 / D j `j `0 j
j DpC1
and ZZ
0
r2 .k; k / D EŒ"1 .t/"1 .s/vk .t/vk 0 .s/dt ds:
X
p
˝ ˛
"^ Q ^ Xn ; vO j
nk D hYn ; v
Ok i kj
j D1
X
p
˝ ˛
cOk hYn ; vk i cOk kj cOj cOj Xn ; vj
j D1
0 1
X
p
˝ ˛
D cOk @hYn ; vk i kj Xn ; vj A
j D1
0 1
1
X ˝ ˛
D cOk @h"n ; vk i C kj Xn ; vj A :
j DpC1
D r1 .k; k 0 / C r2 .k; k 0 /:
Therefore, defining,
! !
1 X ^ ^ 1 X ^ ^
N N
O k 0 ; `; `0 / D
a.k; " " 0 " " 0 ;
N nD1 nk nk N nD1 n` n`
we see that
O k 0 ; `; `0 / cOk cOk 0 cO` cO`0 a.k; k 0 ; `; `0 /:
a.k; (11.26)
222 11 Tests for error correlation in the functional linear model
M D Œ A.i; j /; 1 i; j p ;
where
A.i; j / D Œ a.`; i; k; j /; 1 `; k p :
By (11.26), an estimator of M is
h i
b D M.i;
M b j /; 1 i; j p ;
where
b j / D Œ a.`;
M.i; O i; k; j /; 1 `; k p :
b can be written in the form (11.13), which is conve-
Direct verification shows that M
nient for coding.
As seen from (11.26), it cannot be guaranteed that the matrix M b will be close
to the matrix M because of the unknown signs cOi . However, as will be seen in the
proof of Theorem 11.2, statistic (11.14) does not depend on these signs.
nD1
h
!
X
N
D Œb
C˝b
C vec N 1
Œ"n C n Œ"nCh C nCh T
C oP .1/;
nD1
and where
˝ ˛
"n D Œh"n ; v1 i ; h"n ; v2 i ; : : : ; "n ; vp T I
˝ ˛
n D Œhn ; v1 i ; hn ; v2 i ; : : : ; n ; vp T :
11.8 Bibliographical notes 223
^
In particular, we see that the asymptotic distribution of QN does not depend on the
^
signs cO1 ; cO2 ; : : : ; cOp (the same argument shows that QN itself does not depend on
these signs), so we may assume that they are all equal to 1. The claim then follows
form Lemmas 11.5 and 11.6. t
u
There are relatively few papers dealing with goodness-of fit testing in the functional
linear model, see Section 8.6. We have often used the the methodology of Chiou and
Müller (2007) who emphasize the role of the functional residuals "Oi .t/ D YOi .t/
Yi .t/, where the Yi .t/ are the response curves, and the YOi .t/ are the fitted curves,
and propose a number of graphical tools, akin to the usual residual plots. They also
propose a test statistic based on Cook’s distance, Cook (1977) or Cook and Weisberg
(1982), whose null distribution can be computed by randomizing a binning scheme.
In the context of scalar data, Cochrane and Orcutt (1949) drew attention to the
presence of serial correlation in the errors of models for economic time series, and
investigated the effect of this correlation by means of simulations. Their paper is
one of the first contributions advocating the use of simulation to study the behavior
of statistical procedures. In the absence of a computer, they used tables of uniformly
distributed random integers from 1 to 99 to construct a large number of tables similar
to Table 11.1, but for more complex regression and dependence settings.
Tests for serial correlation in the standard scalar linear regression were developed
by Durbin and Watson (1950, 1951, 1971), see also Chatfield (1998) and Section
10.4.4 of Seber and Lee (2003). Their statistics are functions of sample autocorre-
lations of the residuals, but their asymptotic distributions depend on the distribu-
tion of the regressors, and so various additional steps and rough approximations are
required, see Thiel and Nagar (1961) and Thiel (1965), among others. To overcome
these difficulties, Schmoyer (1994) proposed permutation tests based on quadratic
forms of the residuals.
Textbook treatments addressing correlation in regression errors are available in
Chapters 9 and 10 of Seber and Lee (2003), a good summary is given in Section
224 11 Tests for error correlation in the functional linear model
5.5 of Shumway and Stoffer (2006). The general idea is that when dependence in
errors is detected, it must be modeled, and inference must be suitably adjusted. The
relevant research is very extensive, so we mention only the influential papers of
Sacks and Ylvisaker (1966) and Rao and Griliches (1969). Opsomer et al. (2001)
and Xiao et al. (2003) consider a nonparametric regression Yt D m.Xt / C "t .
Several other variants of the Functional CAPM and their predictive power are
examined in Kokoszka and Zhang (2011).
As briefly discussed in Chapter 8, there are many possible departures from the
specification of a scalar regression model. In addition to error autocorrelation, one
may test the specification of the error distribution function or the parametric form
of the regression function. Koul (2002) provides an exhaustive theoretical treatment
of such issues.
Chapter 12
A test of significance in functional quadratic
regression
where Xnc .t/ D Xn .t/ E .Xn .t// is the centered predictor process. If h.s; t/ D 0,
then D E.Yn / and (12.1) reduces to the functional linear model
Z
Yn D C k.t/Xnc .t/ dt C "n : (12.2)
To test the significance of the quadratic term in (12.1), we test the null hypothesis,
H0 W h.s; t/ D 0; (12.3)
L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, 225
Springer Series in Statistics 200, DOI 10.1007/978-1-4614-3655-3_12,
© Springer Science+Business Media New York 2012
226 12 A test of significance in functional quadratic regression
and
1
X
k.t/ D bi vi .t/: (12.7)
i D1
We estimate the mean, X .t/, of the predictor process and the associated covari-
ance function, C.t; s/, with the corresponding empiricals
1 X
N
XN .t/ D Xn .t/
N nD1
and
1 X
N
CO .t; s/ D Xn .t/ XN .t/ Xn .s/ XN .s/ :
N nD1
where
1
X 1 X
X 1
"
n D "n C bi hXnc ; vi i C .2 1fi D j g/ai;j hXnc ; vi ihXnc ; vj i
i DpC1 i DpC1 j Di
X
p 1
X X
p
C 2ai;j hXnc ; vi ihXnc ; vj i C bi hXnc ; vi cOi vO i i
i D1 j DpC1 i D1
X
p X
p X
p
C bi hXN X ; cOi vO i i .2 1fi D j g/ai;j
i D1 i D1 j Di
hXn XN ; cOi vOi ihXn XN ; cOj vOj i hXnc ; vi ihXnc ; vj i :
12.1 Testing procedure 227
Then 2 3
A Q
Y D ZO 4 BQ 5 C " ; (12.9)
where
T
Y D Y1 ; Y2 ; : : : ; YN ;
Q D vech fcOi cOj ai;j .2 1fi D j g/ ; 1 i j pgT ;
A
BQ D cO1 b1 ; cO2 b2 ; : : : ; cOp bp
T
;
T
" D "
1 ; "2 ; : : : ; "N ;
and 2OT 3
D1 FO T1 1
6D
OT FO T2 17
6 2 7
ZO D 6 : :: :: 7
4 :: : :5
OT
D OFT 1
N N
with
O n D vech fhvO i ; Xn Xih
D N vO j ; Xn XN i; 1 i j pgT ;
Q B,
We estimate A, Q and using the least squares estimator:
2 3
A O 1
O
4 BO 5 D ZO T Z ZO T Y: (12.10)
O
O and B,
To represent elements of A O we will use the notation that A O D vech.faO i;j .2
h iT
1fi D j g/; 1 i j pgT / and BO D bO1 ; bO2 ; : : : ; bOp .
O will be close to zero since A
We expect, under H0 , that A Q is zero. If H0 is not
O
correct, we expect the magnitude of A to be relatively large. Let
XN
O D 1
G O nD
D O Tn ;
N nD1
XN
O D 1
M O n;
D
N nD1
and
1 X 2
N
O 2 D "O ;
N nD1 n
228 12 A test of significance in functional quadratic regression
where
X
p
"On D Yn O bOi hXn X;
N vOi i
i D1
X
p X
p
.2 1fi D j g/aO i;j hXn XN ; vO i ihXn XN ; vOj i
i D1 j Di
Assumption 12.2.
Z 4
E Xn2 .t/ dt < 1:
and
Assumption 12.4. the sequences f"n g and fXn .t/g are independent.
The last condition is standard in functional data analysis. It implies that the eigen-
functions v1 ; v2 ; : : : ; vp are unique up to a sign.
Assumption 12.5.
1 > 2 > > pC1 :
where '` are orthonormal functions. Assumption 12.1 can be replaced with the
3
requirement that n;1 , n;2 , : : :, n;p are independent with En;` D 0 and En;` D 0
for all 1 ` p.
Our last result provides a simple condition for the consistency of the test based
on UN . Let A D vech.fai;j .2 1fi D j g/; 1 i j pgT /, i.e. the first
r D p.p C 1/=2 coefficients in the expansion of h in (12.6).
Theorem 12.2. If (12.4), (12.5), Assumptions 12.1–12.5 are satisfied and A ¤ 0;
then we have that
P
UN ! 1:
The condition A ¤ 0 means that h is not the 0 function in the space spanned by
the functions vi .t/vj .s/; 1 i; j p.
In this section we apply our test to the Tecator data set available at http:
//lib.stat.cmu.edu/datasets/tecator. This data set is studied in Fer-
raty and Vieu (2006) Tecator Infratec food used 240 samples of finely chopped pure
meat with different fat contents. For each sample of meat, a 100 channel spectrum
of absorbances was recorded. These absorbances can be thought of as a discrete
approximation to the continuous record, Xn .t/. Also, for each sample of meat, the
fat content, Yn was measured by analytic chemistry.
The absorbance curve measured from the nth meat sample is given by Xn .t/ D
log10 .I0 =I /, where t is the wavelength of the light, I0 is the intensity of the light
before passing through the meat sample, and I is the intensity of the light after it
passes through the meat sample. The Tecator Infratec food and feed analyzer mea-
sured absorbance at 100 different wavelengths between 850 and 1050 nanometers.
This gives the values of Xn .t/ on a discrete grid from which we can use cubic
splines to interpolate the values anywhere within the interval.
Yao and Müller (2010) proposed using a functional quadratic model to predict
the fat content, Yn , of a meat sample based on its absorbance spectrum, Xn .t/.
We are interested in determining whether the quadratic term in (12.1) is needed
by testing its significance
data set. From the data, we calculate U240 . The
for this
p-value is then P 2 .r/ > U240 . The test statistic and hence the p-value are influ-
enced by the number of principal components that we choose to keep. If we select p
according to the advice of Ramsay and Silverman (2005), we will keep only p D 1
principal component because this explains more than 85% of the variation between
absorbance curves in the sample. Table 12.1 gives p-values obtained using p D 1, 2,
and 3 principal components, which strongly supports that the quadratic regression
provides a better model for the Tecator data.
Since Theorem 12.1 assumes that the Xn ’s are Gaussian, we now check the plau-
sibility of this assumption for the absorbance spectra. If the Xn ’s are Gaussian
processes, then the projections hXn ; vi i would be normally distributed. Using the
230 12 A test of significance in functional quadratic regression
Table 12.1 p-values (in %) obtained by applying our testing procedure to the Tecator data set with
p D 1, 2, and 3 principal components.
p 1 2 3
p-value 1:25 13:15 0:00
Shapiro-Wilks test for normality, we conclude that the first projection, hXn ; vO 1 i, is
not normally distributed (p-valueD 3:15 107 ). The Box-Cox family of transfor-
mations is commonly employed to transform data to be more like the realizations of
a normal random variable:
.X.t/ C !2 /!1 1
f .X.t// D : (12.11)
!1
We apply the Box-Cox transformation with !1 D :0204 and !2 D 1:6539.
We can now verify the plausibility of the Gaussian assumption for the transformed
spectra by testing the first projection (p-valueD 0:38). If we now apply our test of
the significance of the quadratic term for the transformed data, we get a p-value
which is essentially zero using p D 1, 2, or 3 principal components. This strongly
supports that the quadratic regression provides a better model for the transformed
Tecator data.
1 X
N
1
D p "n G MMT On M :
D
N nD1
1 X
N
1 D 1
p "n G MMT O n M !
D N 0; 2 G MMT ;
N nD1
where
0
1
X
D Var @"n C
2
bi hXnc ; vi i
i DpC1
1 X
X 1
C .2 1fi D j g/ai;j hXnc ; vi ihXnc ; vj i
i DpC1 j Di
1
X
p 1
X
C 2ai;j hXnc ; vi ihXnc ; vj iA :
i D1 j DpC1
and
OM
M O T MMT D oP .1/;
This chapter studies the functional autoregressive (FAR) process which has found
many applications. The theory of autoregressive and more general linear processes
in Hilbert and Banach spaces is developed in the monograph of Bosq (2000), on
which Sections 13.1 and 13.2 are based. We present only a few selected results
which provide an introduction to the central ideas, and are needed in the sequel.
Section 13.3 is devoted to prediction by means of the FAR process; some theoretical
background is given in Section 13.5.
We say that a sequence fXn ; 1 < n < 1g of mean zero elements of L2
follows a functional AR(1) model if
where 2 L and f"n ; 1 < n < 1g is a sequence of iid mean zero errors in L2
satisfying Ek"n k2 < 1.
The above definition defines a somewhat narrower class of processes than that
considered by Bosq (2000) who does not assume that the "n are iid, but rather that
they are uncorrelated in an appropriate Hilbert space sense, see his Definitions 3.1
and 3.2. The theory of estimation for the process (13.1) is however developed only
under the assumption that the errors are iid.
To lighten the notation, we set in this chapter, k kL D k k.
13.1 Existence
A scalar AR(1) process fXn ; 1 < n < 1g is said to be causal if it admits the
expansion
1
X
Xn D cj "nj :
j D0
The Xn depends then only on the present and past errors, but not on the future
ones. If j j < 1, scalar AR(1) equations have a unique solution of this form, in
L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, 235
Springer Series in Statistics 200, DOI 10.1007/978-1-4614-3655-3_13,
© Springer Science+Business Media New York 2012
236 13 Functional autoregressive model
Lemma 13.1. For any 2 L, the following two conditions are equivalent:
C0: There exists an integer j0 such that k j0 k < 1.
C1: There exist a > 0 and 0 < b < 1 such that for every j 0, k j k ab j .
Proof. Since C1 clearly implies C0, we must only show that C0 implies C1.
Write j D j0 q C r for some q 0 and 0 r < j0 . Therefore,
k j k D k j0 q r k k j0 kq k r k:
If k j0 k D 0, then C1 holds for any a > 0 and 0 < b < 1, so we assume in the
following that k j0 k > 0: Since q > j=j0 1 and k j0 k < 1, we obtain
j
k j k k j0 kj=j0 1 k r k k j0 k1=j0 k j0 k1 max k r k;
0r<j0
Note that condition C0 is weaker than the condition k k < 1; in the scalar case
these two conditions are clearly equivalent. Nevertheless,
P C1 is a sufficiently strong
condition to ensure the convergence of the series j j ."nj /, and the existence of
a stationary causal solution to functional AR(1) equations, as stated in the following
theorem.
Proof. To establish the existence of the limit of the infinite series, we work with
the space of square integrable random functions in L2 D L2 .Œ0; 1/, see Sec-
tion 2.3. If the random functions are defined on a probability space ˝, then
we work with L2 .˝; L2 .Œ0; 1//, which is a Hilbert space with the inner prod-
.m/
PmE hX;
uct Y i ; X; Y 2 L2 .Œ0; 1/. Thus, to show that the sequence Xn D
j 2 2
j D0 ." nj / has a limit in L .˝; L .Œ0; 1//, it suffices to check that it is a
Cauchy sequence in m for every fixed n.
13.1 Existence 237
Note that E j ."nj / D 0 because the expectation commutes with bounded opera-
tors. Therefore, by Lemma 13.1,
ˇˇ ˇˇ2 0 1
ˇˇ m0 ˇˇ m0 m0
ˇˇ X j ˇˇ X X
ˇˇ
E ˇˇ ˇˇ
."nj /ˇˇ @ j 2A 2 2 2
k k Ek"0 k Ek"0 k a b 2j :
ˇˇj Dm ˇˇ j Dm j Dm
.m/
Thus Xn converges in L2 .˝; L2 .Œ0; 1//.
To show the a.s. convergence, it is enough to verify that
1
X
k j ."nj /k < 1 a:s:
j D0
P
and so 1 j D0 k kk"nj k < 1 a.s.
j
The series (13.2) is clearly strictly stationary, and it satisfies equation (13.1).
Suppose fXn0 g is another strictly stationary causal sequence satisfying (13.1). Then,
iterating (13.1), we obtain, for any m 1,
X
m
Xn0 D 0
j ."nj / C mC1 .XnmC1 /:
j D1
Therefore,
which satisfies ZZ
2
.t; s/dt ds < 1: (13.4)
Recall from section 2.2 that the left–hand side of (13.4) is equal to k k2S . Since
k k k kS , we see that (13.4) implies condition C0 of Lemma 13.1 with j0 D 1.
13.2 Estimation
This section is devoted to the estimation of the autoregressive operator , but first
we state Theorem 13.2 on the convergence of the EFPC’s and the corresponding
eigenvalues, which states that the expected distances between the population and
the sample eigenelements are O.N 1=2 /, just as for independent functional obser-
vations. Theorem 13.2 follows from Example 16.1, Theorem 16.2 and Lemma 13.1.
and denote with superscript T the adjoint operator. Then, C1T D C T because, by
a direct verification, C1T D E ŒhXn ; xi Xn1 , i.e.
C1 D C: (13.5)
13.2 Estimation 239
The above identity is analogous to the scalar case, so we would like to obtain an
estimate of by using a finite sample version of the relation D C1 C 1 . The
operator C does not however have a bounded inverse on the whole of H . To see
it, recall that C admits representation (2.4), which implies that C 1 .C.x// D x,
where
X1
˝ ˛
C 1 .y/ D 1
j y; vj vj :
j D1
1
The operator C is defined if all j are positive.
˚ (If 1 2 p >
pC1 D 0, then fXn g is in the space
P spanned by v 1 ; : : : ; vp . On this subspace, we
can define C 1 by C 1 .y/ D pjD1 1 j hy; vi i vi .) Since kC 1 .vn /k D 1 n !
1, as n ! 1, it is unbounded. This makes it difficult to estimate the bounded
operator using the relation D C1 C 1 . A practical solution is to use only the
first p most important EFPC’s vOj , and to define
X
p
˝ ˛
c p .x/ D
IC O 1
j x; vOj vOj :
j D1
X
N 1 X
p
1 ˝ ˛˝ ˛
D O 1
j x; vOj Xk ; vOj XkC1 :
N 1
kD1 j D1
All quantities at the right–hand side of (13.7) are available as output of the R
function pca.fd, so this estimator is very easy to compute. Kokoszka and Zhang
(2010) conducted a number of numerical experiments to determine how close the
estimated surface O p .t; s/ is to the surface .t; s/ used to simulate an FAR(1) pro-
cess. Broadly speaking, for N 100, the discrepancies are very large, both in
magnitude and in shape. This˚is illustrated in Figure 13.1, which shows the Gaus-
sian kernel .t; s/ D ˛ exp .t 2 C s 2 /=2 , with ˛ chosen so that the Hilbert–
Schmidt norm of is 1=2, and three estimates which use p D 2; 3; 4. The inno-
vations "n were generated as Brownian bridges. Such discrepancies are observed
for other kernels and other innovation processes as well. Moreover, by any reason-
able measure of a distance between two surfaces, the distance between and O p
increases as p increases. This is counterintuitive because by using more EFPC’ vO j ,
we would expect the approximation (13.7) to improve. For the FAR(1) used to pro-
P
duce Figure 13.1, the sums pjD1 O j explain, respectively, 74, 83 and 87 percent of
the variance for p D 2; 3 and 4, but (for the series length N D 100), the absolute
deviation distances between and O p are 0:40; 0:44 and 0:55. The same pattern is
observed for the RMSE distance k O kS and the relative absolute distance. As N
increases, these distances decrease, but their tendency to increase with p remains.
This problem is partially due to the fact that for many FAR(1) models, the estimated
eigenvalues O j are very small, except O 1 and O 2 , and so a small error in their estima-
tion translates to a large error in the reciprocals O 1j appearing in (13.7). Kokoszka
and Zhang (2010) show that this problem can be alleviated to some extent by adding
a positive baseline to the O j . However, as we will see in Section 13.3, precise esti-
mation of the kernel is not necessary to obtain satisfactory predictions.
13.3 Prediction
In this section, we discuss finite sample properties of forecasts with the FAR(1)
model. Besse et al. (2000) compare several prediction methods for functional time
13.3 Prediction 241
p=2
s
t t
p=3 p=4
s
s
t t
Fig. 13.1 The kernel surface .t; s/ (top left) and its estimates O p .t; s/ for p D 2; 3; 4.
series by application to real geophysical data. Their conclusion is that the method
which we call below Estimated Kernel performs better than the “non–functional”
methods rooted in classical time series analysis. A different approach to predic-
tion of functional data was proposed by Antoniadis et al. (2006). In this section,
we mostly report the findings of Didericksen et al. (2011), whose simulation study
includes a new method proposed by Kargin and Onatski (2008), which we call below
Predictive Factors, and which seeks to replace the FPC’s by directions which are
most relevant for predictions.
We begin by describing the prediction methods we compare. This is followed by
the discussion of their finite sample properties.
Estimated Kernel (EK). This method uses estimator (13.7). The predictions are
calculated as
Z !
X
p X
p
XOnC1 .t/ D O p .t; s/Xn .s/ds D O k` hXn ; vO` i vO k .t/; (13.8)
kD1 `D1
242 13 Functional autoregressive model
where
X
N 1
O j i D O 1
i .N 1/
1
hXn ; vO i ihXnC1 ; vO j i: (13.9)
nD1
There are several variants of this method which depend on where and what kind
of smoothing is applied. In our implementation, all curves are converted to func-
tional objects in R using 99 Fourier basis functions. The same minimal smoothing
is used for the Predictive Factors method.
Predictive Factors (PF). Estimator (13.7) is not directly justified by the problem
of prediction, it is based on FPC’s, which may focus on the features of the data
that are not relevant to prediction. An approach known as predictive factors may
(potentially) be better suited for forecasting. It finds directions most relevant to pre-
diction, rather than explaining the variability, as the FPC’s do. Roughly speaking, it
focuses on the optimal expansion of .Xn /, which is, theoretically, the best predic-
tor of XnC1 , rather than the optimal expansion of Xn . Since is unknown, Kargin
and Onatski (2008) developed a way of approximating such an expansion in finite
samples. We describe this approach in Section 13.4. It’s practical implementation
depends on choosing an integer k and a positive number ˛. We used k D p (the
same as the number of the EFPC’s), and ˛ D 0:75, as recommended by Kargin and
Onatski (2008).
We selected five prediction methods for comparison, two of which do not use
the autoregressive structure. To obtain further insights, we also included the errors
obtained by assuming perfect knowledge of the operator . For ease of reference,
we now describe these methods, and introduce some convenient notation.
MP (Mean Prediction) We set XOnC1 .t/ D 0. Since the simulated curves have mean
zero at every t, this corresponds to using the mean function as a predictor. This
predictor is optimal if the data are uncorrelated.
NP (Naive Prediction) We set XOnC1 D Xn . This method does not attempt to
model temporal dependence. It is included to see how much can be gained
by utilizing the autoregressive structure of the data.
EX (Exact) We set XO nC1 D .Xn /. This is not really a prediction method because
the autoregressive operator is unknown. It is included to see if poor predic-
tions might be due to poor estimation of (cf. Section 13.2).
EK (Estimated Kernel) This method is described above.
EKI (Estimated Kernel Improved) This is method EK, but the O i in (13.9) are
replaced by O i C b,
O as described in Section 13.2.
PF (Predictive Factors) This method is introduced above and described in detail
in Section 13.4.
Didericksen et al. (2011) studied the errors En and Rn , N 50 < n < N , defined
by
s
Z 1 2 Z 1 ˇ ˇ
ˇ ˇ
En D Xn .t/ XOn .t/ dt and Rn D ˇXn .t/ XOn .t/ˇ dt:
0 0
13.3 Prediction 243
for N D 50; 100; 200, and k kS D 0:5; 0:8. They considered several kernels and
innovation processes, including smooth errors obtained as sum of two trigonometric
function, irregular errors generated as Brownian bridges, an intermediate errors.
Examples of boxplots are shown in Figures 13.2 and 13.3. In addition to boxplots,
Didericksen et al. (2011) reported the averages of the En and Rn , N 50 < n < N ,
and the standard errors of these averages, which allow to assess if the differences in
the performance of the predictors are statistically significant. Their conclusions can
be summarized as follows:
1. Taking the autoregressive structure into account reduces prediction errors, but, in
some settings, this reduction is not be statistically significant relative to method
MP, especially if k k D 0:5. Generally if k k D 0:8, using the autoregressive
structure significantly and visibly improves the predictions.
2. None of the Methods EX, EK, EKI uniformly dominates the other. In most cases
method EK is the best, or at least as good at the others.
3. In some cases, method PF performs visibly worse than the other methods, but
always better than NP.
4. Using the improved estimation described in Section 13.2 does not generally
reduce prediction errors.
0.8
1.0
0.7
0.8
0.6
0.5
0.6
0.4
0.4
0.3
0.2
0.2
0.1
MP NP EX EK EKI PF MP NP EX EK EKI PF
Fig. 13.2 Boxplots of the prediction errors En (left) and Rn (right); Brownian bridge innovations,
.t; s/ D C t , N D 100, p D 3, k k D 0:5.
244 13 Functional autoregressive model
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
MP NP EX EK EKI PF MP NP EX EK EKI PF
Fig. 13.3 Boxplots of the prediction errors En (left) and Rn (right); Brownian bridge innovations,
.t; s/ D C t , N D 100, p D 3, k k D 0:8.
Didericksen et al. (2011) also applied all prediction methods to mean corrected
precipitation data studied in Besse et al. (2000). For this data set, the averages of
the En and the Rn are not significantly different between the first five methods,
method PF performs significantly worse than the others. We should point out that
method PF depends on the choice of the parameters ˛ and k. It is possible that
its performance can be improved by better tuning these parameters. On the other
hand, our simulations show that method EK essentially reaches the limit of what
is possible, it is comparable to the theoretically perfect method EX. While taking
into account the autoregressive structure of the observations does reduce prediction
errors, many prediction errors are comparable to those of the trivial MP method.
To analyze this observation further, we present in Figure 13.4 six consecutive tra-
jectories of a FAR(1) process with k k D 0:5, and Brownian bridge innovations,
together with EK predictions. Predictions obtained with other nontrivial methods
look similar. We see that the predictions look much smoother than the observa-
tions, and their range is much smaller. If the innovations "n are smooth, the obser-
vations are also smooth, but the predicted curves have a visibly smaller range than
the observations. This is also true for smooth real data, as shown in Figure 13.5.
13.3 Prediction 245
Fig. 13.4 Six consecutive trajectories of the FAR(1) process with Gaussian kernel, k k D 0:5,
and Brownian bridge innovations. Dashed lines show EK predictions with p D 3.
The smoothness of the predicted curves follows from representation (13.8), which
shows that each predictor is a linear combination of a few EFPC’s, which are smooth
curves themselves. The smaller range of the the predictors is not peculiar to func-
tional data, but is enhanced in the functional setting. For a mean zero scalar AR(1)
process Xn D Xn1 C "n , we have Var.Xn / D 2 Var.Xn1 / C Var."n /, so the
variance of the predictor O Xn1 is about 2 times smaller than R the variance of Xn .
In the functional setting, the variance of XOn .t/ is close to VarŒ .t; s/Xn .s/ds. If
the kernel admits the decomposition .t; s/ D 1 .t/ 2 .s/, as all the kernels we
use do, then
h i Z 1
Var XOn .t/ 12 .t/Var 2 .s/X n1 .s/ds :
0
If the function 1 is small for some values of t 2 Œ0; 1, it will automatically drive
down the predictions. If 2 is small for some s 2 Œ0; 1, it will reduce the integral
R1
0 2 .s/Xn1 .s/ds. The estimated kernels do not admit a factorization of this type,
but are always weighted sums of products of orthonormal functions (the EFPC’s
vO k ). A conclusion of this discussion is that the predicted curves will in general look
246 13 Functional autoregressive model
Fig. 13.5 Six consecutive trajectories (1989–1994) of centered pacific precipitation curves (solid)
with their EK predictions (dashed).
smoother and “smaller” than the data. This somewhat disappointing performance
is however not due to poor prediction methods, but to a natural limit of predictive
power of the FAR(1) model; the curves .Xn / share the general properties of the
curves O .Xn /, no matter how is estimated.
To solve this problem, Kargin and Onatski (2008) introduce the polar decomposition
of C 1=2 , see Section 13.5,
To lighten the notation, suppose that k is smaller than the rank of ˚, and denote by
1 > > k the largest k eigenvalues of ˚, and by x1 ; : : : ; xk the corresponding
2 2
eigenfunctions.
where k is defined by
X
k D E
1
k .y/ D i y; T C 1=2 .xi / U.xi /: (13.10)
i D1
with the trace tr defined in (13.19). Equality (13.11) follows from (2.7), which states
that EkY k2 is equal to the sum of the eigenvalues of CY . This sum is clearly equal
to tr.CY / (take the eigenfunction of CY as the en in (13.19)). Setting Y D .
A/.Xn /, it is easy to see that CY D . A/C. A/T , and so by (13.11),
The problem of finding Rk can be solved using the results of Section 13.5. To apply
them, we notice that C 1=2 2 S. This is because the operator C 1=2 is Hilbert–
Schmidt as C is trace class, so C 1=2 2 S, see Section 13.5. By (13.24), Rk is
given by
X
k
Rk .y/ D i hy; xi i U.xi /; y 2 L2 ; (13.14)
i D1
X
k D E
1
k C 1=2 .x/ D i C 1=2 .x/; T C 1=2 .xi / U.xi /
i D1
X
k D E
1
D i x; C 1=2 T C 1=2 .xi / U.xi /
i D1
X
k
1
D i hx; ˚.xi /i U.xi /
i D1
X
k
˝ ˛
1 2
D i x; i xi U.xi / D Rk .x/: t
u
i D1
where the 2
i are the eigenvalues of ˚ D C 1=2 T C 1=2 in decreasing order.
X
k
k .y/ D hy; bi i C1 .bi /; bi D C 1=2 .xi /; (13.15)
i D1
Proof. We first verify that xi is in the range of C 1=2 , so that C 1=2 .xi / is well–
defined, see Section 4.5. Since the xi are the eigenfunctions of ˚, we have
2
i xi D ˚.xi / D C 1=2 . T C 1=2 .xi //;
X
k
k .y/ D i hy; bi i U.xi /; (13.17)
i D1
Consequently,
1 1=2 1
U.xi / D i C1 C .xi / D i C1 .bi /:
The random sequences fhXn ; bi i ; 1 < n < 1g are called the predic-
tive factors. The functions C1 .bi / are called the predictive loadings. The loadings
C1 .bi /; i D 1; 2; : : : k; are the “directions” in L2 most relevant for prediction. Since
the xi are defined only up to a sign, the same is true for the predictive factors and
loadings. However the operator k (13.15) is uniquely defined.
To implement the prediction strategy suggested by Theorem 13.3 and Lemma
13.2, we need to estimate the eigenfunctions xi and the eigenvalues i2 of ˚ D
C 1=2 T C 1=2 D C 1=2 C1T C1 C 1=2 ; cf. (13.5), and then approximate the bi D
C 1=2 .xi /. Similarly as in the problem of the estimation of , the difficulty arises
from the fact C 1=2 is not a bounded estimator. This introduces an instability to
the estimation of the eigenfunctions and eignevalues of CO 1=2 CO 1T CO 1 CO 1=2 , and it
cannot be ensured that these estimates converge to their population counterparts.
To deal with these difficulties, Kargin and Onatski (2008) propose the following
approach. To facilitate the inversion, introduce
where ˛ is a small positive number and I is the identity operator. Denote by O ˛;1 2
O
O ˛;k the largest k eigenvalues of ˚˛ , and by xO ˛;1 ; : : : ; xO ˛;k the corresponding
2
Finding a bound on the prediction error requires a long technical argument. Kar-
gin and Onatski (2008) established the following result.
Theorem 13.4. Suppose Assumptions of Theorem 13.2 hold, and ˛ and k are func-
tions of the sample size N such that
N 1=6 ˛ ! A > 0 and N N 1=4 k K > 0;
for some constants A and K. Then
EkXnC1 O ˛;k .Xn /k2 D O N 1=6 log2 .N / :
Proof. The claim follows from Theorem 4 of Kargin and Onatski (2008). t
u
13.5 The trace class and the polar and singular decompositions
where we used the fact that A1=2 is symmetric, see Section 4.5.
The trace has the following properties:
Properties (13.20) and (13.21) are trivial, (13.22) follows from a simple verification
which uses the fact that the trace can be computed using fen g or fU.en /g.
A D U .AT A/1=2
S D U ˚ 1=2 ; ˚ D S T S:
Set r D rank.˚/, which is the dimension of the range of ˚, and may be infinity.
Denote by i2 the eigenvalues of ˚, and by xi the corresponding eigenfunctions,
i D 1; 2; : : : ; r. One can show that rank.˚ 1=2 / D rank.˚/ and that the xi are are
the eigenvectors of ˚ 1=2 with eignevalues i .
The singular value decomposition of S is
X
r
S.y/ D i hy; xi i U.xi /; y 2 L2 :
i D1
X
k^r
Sk .y/ D i hy; xi i U.xi /; y 2 L2 ; (13.24)
i D1
and satisfies
1
X
kS Sk k2S D 2
i : (13.25)
i DkC1
Chapter 14
Change point detection in the functional
autoregressive process
In this chapter, we develop a change point test for the FAR(1) model introduced
in Chapter 13. The importance of change point testing was discussed in Chapter 6.
Failure to take change points into account leads to spurious inference. This chapter
is based on the work of Horváth et al. (2010). Zhang et al. (2011) proposed a self–
normalized statistic to solve the problem discussed in this chapter. Self–normalized
statistics are discussed in Section 16.6.
The remainder of this chapter is organized as follows. The testing problem and
the assumptions are stated in Section 14.1. The testing procedure is described and
heuristically justified in Section 14.2. Its application and finite sample performance
are examined in Section 14.3. Asymptotic justification is presented in Section 14.4,
with the proofs developed in Sections 14.5 and 14.6.
14.1 Introduction
The problem can be stated as follows. We observe the random functions fXn .t/; t 2
Œ0; 1; n D 1; 2; : : : N g and assume that they follow the model
HA W there is 1 k < N W 1 D D k ¤ k C1 D D N :
L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, 253
Springer Series in Statistics 200, DOI 10.1007/978-1-4614-3655-3_14,
© Springer Science+Business Media New York 2012
254 14 Change point detection in the functional autoregressive process
theorem for ergodic sequences. But in the functional setting, this replacement intro-
duces asymptotically nonnegligible terms, see Section 14.6, which cancel because
of the special form of the test statistic. To show that the remaining terms due to the
estimation of the FPC’s are asymptotically negligible, we develop a new technique
which involves the truncation at lag O.log N / of the moving average representa-
tion of the FAR(1) process (Lemma 14.3), a blocking technique that utilizes this
truncation (Lemma 14.4) and Mensov’s inequality (Lemma 14.8).
The following assumption formalizes the structure of the observations under the
null hypothesis.
Assumption 14.1. The functional observations Xn 2 L2 satisfy
Assumption 14.1 ensures that (14.5) has a unique strictly stationary solution
fXn .t/; t 2 Œ0; 1g with finite fourth moment in L2 such that "nC1 is independent
of Xn ; Xn1 ; : : :, see Chapter 13.
If Assumption 14.1 holds, we can define the covariance operator
C.x/ D EŒhXn ; xi Xn ; x 2 L2 ;
and its eigenfunctions vj and eigenvalues j . Since the Xn are assumed to have
mean zero, it is convenient to work with the sample covariance operator defined by
1 X
N
CO .x/ D hXn ; xi Xn ; x 2 L2 :
N nD1
In this section, we describe the idea of the test and explain its practical application.
The requisite asymptotic theory is presented in Section 14.4.
14.2 Testing procedure 255
The idea is to check if the action of on the span of the p most important prin-
cipal components of the observations X1 ; X2 ; : : : ; XN changes at some unknown
time point k. If there is no change in the autoregressivePoperator ˝ ,˛ the functions
vj ; j D 1; 2; : : : ; p; remain constant. Since ˝vj D ˛ ` vj ; v` v` , this is the
case, to a good approximation, if the coefficients
˝ ˛ vj ; v` ˝; ` p ˛remain constant.
Direct verification shows that under H0 , vj ; v` D 1 j Rvj ; v` where
Rx D EŒhXn ; xi XnC1 ; x 2 L2 ;
where
Z Z
˝ ˛ ˝ ˛
Xij D Xi ; vj D Xi .t/vj .t/dt; XO ij D Xi ; vO j D Xi .t/vO j .t/dt:
1 X 1 X
Rk D Xi 1 XTi ; RN k D Xi 1 XTi I
k N k
2i k k<i N
X X
Ok D 1 O i 1 X
O Ti ; O 1 O i 1 X
O Ti :
R X R N k D X
k N k
2i k k<i N
1 X ˝ ˛ a:s: ˝ ˛ ˝ ˛
Rk .j; `/ D Xi 1 ; vj hXi ; v` i ! EŒ Xn1 ; vj hXn ; v` i D Rvj ; v` :
k
2i k
256 14 Change point detection in the functional autoregressive process
˝ ˛
Thus the matrices Rk and RN k approximate the matrix Œ Rvj ; v` ; j; ` D
1; 2; : : : ; p based, correspondingly, on the observations before and after time k,
and so it is appealing to base the test on their difference. The matrices Rk and
RN k cannot however be computed from the data because the population princi-
pal components vj are unknown. Thus, we must replace them by their empirical
counterparts R O . Relation (2.13) means that cOj vOj is close to vj . Conse-
O k and R
N k
quently, the .j; `/ entry of R O k must be multiplied by cOj cO` in order to approximate
the .j; `/ entry of Rk . The random signs cOj and cO` are unknown, so a test statistic
must be constructed in such a way that they do not appear in it. This is not a mere
technical point; changing just a few observations can flip the curves vOj , sometimes
the sign changes in another estimation run, even if the data do not change. Another
important point is that using the EFPC’s vOj introduces a bias. Roughly speaking,
details are presented in Section 14.6,
Yi D ŒYi .1; 1/; : : : ; Yi .1; p/; Yi .2; 1/; : : : ; Yi .2; p/; : : : ; Yi .p; 1/; : : : ; Yi .p; p/T I
O i D ŒYOi .1; 1/; : : : ; YOi .1; p/; YOi .2; 1/; : : : ; YOi .2; p/; : : : ; YOi .p; 1/; : : : ; YOi .p; p/T :
Y
Define further
X X
Zk D Yi ; ZN k D Yi I
2i k k<i N
X X
ZO k D O i;
Y ZO N k D O i:
Y
2i k k<i N
14.2 Testing procedure 257
Since the Xi follow a functional AR(1) model, the vectors Yi form a weakly depen-
dent stationary sequence, and so, as k ! 1,
2 3
p 1 X d
k4 Yi E Yk 5 ! N.0; D/; (14.7)
k
2i k
Relation (14.7), and the corresponding relation for the sum over k < i N , can be
rewritten as
Zk kEYN N.0; kD/; ZN k .N k/EYN N.0; .N k/D/:
Denoting by fWD .t/; t 0g a p 2 –dimensional Brownian motion with covariance
matrix D, we have, in fact,
Zk kEYN WD .k/; ZN k .N k/EYN WD .N / WD .k/: (14.9)
By (14.9), under H0 we have,
1 1 1 1
Zk Z WD .k/ .WD .N / WD .k//
k N k N k k N k
1
D ŒN WD .k/ kWD .k/
k.N k/
kWD .N / C kWD .k/
N k
D WD .k/ WD .N / :
k.N k/ N
Denote
k.N k/ 1 1
UN .k/ D Zk Z : (14.10)
N k N k N k
The above calculation shows that
k
UN .k/ WD .k/ WD .N /:
N
Comparing covariances, we see that
T
1 k k
WD .k/ WD .N / D 1 WD .k/ WD .N / ; 1 k N;
N N N
has the same distribution as
X
2
Bm .k=N /; 1 k N; (14.11)
1mp 2
258 14 Change point detection in the functional autoregressive process
where the Bm ./ are independent Brownian bridges on Œ0; 1. Consequently, any
functional of
1
GN .k/ D UN .k/T D 1 UN .k/; 1 k N; (14.12)
N
can be approximated by the corresponding functional P of (14.11).
Asymptotic theory for functionals of the process 1md Bm 2
.u/; u 2 Œ0; 1,
including weighted sums and maximally selected statistics, is well–known, see e.g.
Csörgő and Horváth (1993, 1997), and goes back to Kiefer (1959). A Cramér–von–
R1P
Mises type functional Kd WD 0 1md Bm 2
.u/du leads to tests with good finite
sample properties, and so we focus on it in the following, but clearly other function-
als can be used as well, see e.g. Horváth et al. (1999) for more examples.
To implement the test, we need to estimate the matrix D (14.8). The estima-
tion of the long run covariance matrix is one of the most extensively studied topics
in time series analysis and econometrics, see e.g. Andrews (1991), Andrews and
Monahan (1992) and Robinson (1998) for recent approaches and references. Any
reasonable method can be used, but for concreteness, we focus on the popular and
simple Bartlett estimator, and explain how to adapt it to the change point problem.
Denote by
0 10 1T
1 X @O 1 X O A@O 1 X O A
bh .k/ D Yi Yi Yi Ch Yi
k k k
1i kh 1i k 1i k
and
0 1
1 X 1 X
bh .N k/ D Oi
@Y O iA
Y
N k N k
k<i N h k<i N h
0 1T
X
O i Ch 1
@Y O iA
Y
N k
k<i N h
X
and
b N k D h
D 1 bh .N k/: (14.14)
q C1
jhjq
where
b k.N k/ 1O 1 O
UN .k/ D Zk ZN k : (14.16)
N k N k
Using the weighted sum of the estimators D b
b k and D
N k in (14.15) has been shown
in different settings to lead to better power than using just Db N , see Antoch et al.
(1997) and Hušková et al. (2007).
Defining the critical value c.˛; d / by P .Kd > c.˛; d // D ˛, and
1 X O
N
ION D GN .k/; (14.17)
N
kD1
the test rejects if ION > c.˛; p 2 / The critical values c.˛; d / can be computed using
an analytic formula derived by Kiefer (1959), but the simulated critical values in
Table 6.1 give better results in finite samples.
It is possible to develop a rigorous theory for the behavior of the test under the
alternative, but the analysis becomes even more technical and would take up space.
We therefore outline only the essential arguments which explain why and when the
test is consistent.
First we introduce the following notation: Let k D Œn ; 0RR< < 1; be the time
of change. The kernel changes from to which satisfies . .s; t//2 ds dt <
1:
Following the proof of Theorem 2.6, one can show that as N ! 1;
ZZ
P
.CO N .x; y/ CN .x; y//2 dx dy ! 0;
where
1 X
CO N .x; y/ D Xi .x/Xi .y/
N
1i n
and
CN .x; y/ D EŒX0 .x/X0 .y/ C .1 / lim EŒXN .x/XN .y/:
N !1
where R .t; s/ D limN !1 EXN .t/XN C1 .s/. This means that we have consistency
if for at least one .j; `/
ZZ ZZ
R.t; s/vN j .t/vN ` .s/dt ds ¤ R .t; s/vN j .t/vN ` .s/dt ds;
i.e. if R and R are different on the space spanned by fvN j .t/vN ` .s/; 1 j; ` pg:
We conclude this section with a summary of the practical implementation of the
test procedure:
P P
1) Find p so large that pjD1 O j = N O
j D1 j > 0:8, but not greater than 5.
2) Compute ION (14.17).
3) Choose a significance level ˛ and find the critical value c.˛; d / with d D p 2
from Table 6.1.
4) Reject H0 if ION > c.˛; p 2 /.
In step 1), p cannot be too large because it is then difficult toestimate D. In step
2) good results are also obtained if in (14.15) Nk Dbk C 1 k D b
N N k is replaced
b N , the computations are then much faster.
by D
In this section we report the results of a small simulation study that examined
the finite sample performance of our test. Calculations were performed using the
R package fda. We used the functional time series Xn of differenced counts of
credit card transactions described in Section 7.3. The first three weeks (21 func-
tional observations) are shown in Figure 1.7. The whole data set contains N D 200
curves. Applied to these data, our test does not reject the null hypothesis, indicating
that a functional AR(1) model is appropriate for all 200 Xn . This is in agreement
with the conclusions of Laukaitis and Račkauskas (2002) and of Section 7.3. The
long run covariance matrix was estimated using the code Hansen (1995) (with some
modifications).
In the following, we use the curves Xn to generate functional AR(1) processes
which will allow us to assess the finite sample performance of our test in a realis-
tic setting. To do it, we estimate the kernel .; / using the function linmod, see
Malfait and Ramsay (2003) (we omit the details of regularization). Then, residual
14.3 Application to credit card transactions and Eurodollar futures 261
functions are computed as "On .t/ D XnC1 .t/ O XnC1 .t/; n D 1; : : : ; 193. Draw-
ing these residuals with replacement, we can simulate functional AR(1) series of
any length via
Z
Zm .t/ D O .t; s/Zm1 .s/ds C "m .t/; m D 1; 2; : : : ; N;
where the "m ./ are the bootstrap draws of the "On ./. If we change the kernel .; /
at some point, we can assess the power of the test. To remove the initialization
effect, the first “burn-in” 100 simulated functional observations were removed. The
empirical rejection rates reported below are based on one thousand replications.
Table 14.1 shows empirical sizes for several values of p and N . The test becomes
conservative as p increases. This is because the critical values increase in proportion
to p 2 , but only the first few principal components explain most of the variance.
The same phenomenon was observed in Chapter 7. To save space, we report the
empirical power only for p D 2 and p D 3; for p D 4 the power is about 30% lower
than for p D 3. We introduced a change at half length by multiplying O .; / by
c D 0:1; 0:3; 0:6, sample realizations for N D 200 are shown in Figure 14.1. The
change is not readily seen by eye, especially for c D 0:6. For c D 0:1, the second
half of the series looks more like white noise, and the power is correspondingly very
close to 100%, and so is not reported. Table 14.2 shows that the power increases with
the sample size N , and is satisfactory for N D 200, supporting the claim the the
functional AR(1) model is suitable for the whole credit card transactions record.
We now turn to the application of the change point test to the data set consist-
ing of Eurodollar futures contract prices studied by Kargin and Onatski (2008). The
seller of a Eurodollar futures contract takes on an obligation to deliver a 3 month
deposit of one million US dollars to a bank account outside the United States i
months from today. The price the buyer is willing to pay for this contract depends
on the prevailing interest rate. These contracts are traded at the Chicago Mercan-
tile Exchange, and provide a way to lock in an interest rate. They are liquid assets
responsive to Federal Reserve policy, inflation, and economic indicators.
The data we study consist of 114 points per day; point i corresponds to the price
of a contract with closing date i months from today. We work with centered data,
i.e. the mean function has been subtracted from all observations. Examples of these
c=0.1
20
15
10
5
0
−5
−15
c=0.3
15
10
5
0
−5
−15
c=0.6
15
10
5
0
−5
−15
centered functions are shown in Figure 14.2, the middle panel reflects a change
in expectations of future interest rates following the September 11, 2001 terrorist
attacks.
The test rejects the null hypothesis of a constant operator for some periods and
accepts for others. Figure 14.3 shows a period of 50 days for which the null hypothe-
sis is accepted. Even though the prices of the contract fluctuate, these fluctuation can
be modeled by assuming a single FAR(1) model. By contrast, the curves shown in
Figure 14.4 cannot be assumed to follow an FAR(1) model, according to the change
point test. No single change point is apparent, but the range of the data increases in
14.3 Application to credit card transactions and Eurodollar futures 263
Table 14.2 Empirical power (in percent) for a change occurring at k D N=2, and O changing
to c O for c D 0:3 (in parentheses c D 0:6).
p=2 p=3
10% 5% 1% 10% 5% 1%
N=50
46.1 (30.9) 28.3 (16.5) 6.3 (1.7) 23.1 (15.9) 10.4 (5.6) 0.3 (0.1)
N=100
82.5 (58.1) 67.7 (44.3) 33.5 (16.7) 64.4 (46.9) 46.9 (28.8) 18.2 (7.8)
N=200
98.7 (91.6) 95.8 (81.6) 82.3 (52.8) 96.3 (82.8) 90.4 (67.4) 65.6 (34.9)
4
2
0
4
2
0
4
2
0
0 114 228 342 456 570 684 798 912 1026 1140
Fig. 14.2 Eurodollar futures curves over three disjoint 10 day long periods.
0
−1
In order to develop an asymptotic theory, we must verify that the test statistic does
not change if the principal components vOj are replaced by cOj vO j , as only the latter
converge to the population principal components vj . For this purpose, it is conve-
nient to introduce a p p diagonal matrix C p and a p 2 p 2 diagonal matrix M
defined by
2 3
cO1
6 cO2 7
6 7
Cp D 6 :: 7; M D Cp ˝ Cp ;
4 : 5
cOp
14.4 Asymptotic results 265
4
3
2
1
where ˝ denotes the Kronecker product, see e.g. Graham (1981). For example, if
pD2 2 3
cO1 cO1
6 cO1 cO2 7
MD6 4
7:
5
cO2 cO1
cO2 cO2
0.6
0.5
0.6
0.4
0.4
0.3
0.2
0.2
0.1
0.0
0.0
0 10 20 30 40 50 5 10 15 20
0.08
0.07
0.3
0.06
0.05
0.2
0.04
0.1
0.03
0.02
0.0
2 4 6 8 10 12 1 2 3 4 5 6 7 8
Fig. 14.5 P-values for consecutive segments. The continuous line indicates the five percent thresh-
old.
Recall also the definition of the Bartlett estimators (14.13) and (14.14), and intro-
duce the following assumption of the rate of growth of the bandwidth q D q.N /:
Assumption 14.2. Suppose q.N / is nondecreasing and satisfies
q.2kC1 /
sup k
<1 (14.18)
k0 q.2 /
and
q.N / ! 1 and q.N /.log N /4 D O.N /: (14.19)
14.4 Asymptotic results 267
The following theorem shows that the test procedure described in Section 14.2
has asymptotically correct size.
Theorem 14.1. Suppose Assumptions 14.1, 14.2 and condition (2.12) hold. Then
X
QO N .u/ ! Bm2
.u/ in D.Œ0; 1/;
1mp 2
Propositions 14.1 and 14.2 are proven, respectively, in Sections 14.5 and 14.6.
Using them, it is easy to prove Theorem 14.1.
Relation (14.21) is stated as Proposition 14.2. To prove (14.22), we use Theorem A.1
and Remark A.1 of Berkes et al. (2006) which imply that under Assumption 14.2,
b k and D
D b N k converge almost surely to D. Recall that if a sequence n converges
P
to zero a.s., then max1nN jn j ! 0, as N ! 1. Therefore, sup1<u<1 kuD b ŒN u
P P
b N ŒN u .1 u/Dk ! 0, and so
uDk ! 0 and sup1<u<1 k.1 u/D
b
b ŒN u C .1 u/D
sup kuD
P
N ŒN u Dk ! 0:
1<u<1
QN .u/ ! ŒWD .u/ uWD .1/T D 1 ŒWD .u/ uWD .1/ in D.Œ0; 1/
To reduce the notational burden, we focus on just one component, i.e. we want to
show that X d
N 1=2 ŒYi .j; `/ EYi .j; `/ ! WD.i;j / .u/: (14.24)
2i ŒN u
of Billingsley (1999); we must verify that the sequence fYi .j; `/g is stationary and
ergodic and that
X1
jCov.Y0 .j; `/; Yi .j; `//j < 1: (14.25)
i D1
Relation (14.25) is established in Lemma 14.1. Ergodicity follows from the repre-
sentation
˝ ˛
Yi .j; `/ D Xi 1 ; vj ŒhXi 1 ; v` i C h"i ; v` i
˝ ˛˝ ˛ ˝ ˛˝ ˛
D Xi 1 ; vj Xi 1 ; T v` C Xi 1 ; vj "i ; T v`
and Theorem 13.1 (moving average representation of Xk ) and Theorem 36.4
of Billingsley (1995) (a function of shifts of an iid sequence forms an ergodic
sequence). t
u
Now we establish (14.25).
Lemma 14.1. Under Assumption 14.1, the Yi .j; `/ defined by (14.6) satisfy,
X
jCov.Y1 .j; `/; Yi .m; n//j < 1:
1i <1
Proof. Since
˝ ˛˝ ˛ ˝ ˛˝ ˛
Yi .j; `/ D Xi 1 ; vj Xi 1 ; T v` C Xi 1 ; vj "i ; T v` ;
Cov.Y1 .j; `/; Yi .m; n// D C1 .i / C C2 .i / C C3 .i / C C4 .i /;
where
˝ ˛˝ ˛ ˝ ˛
C1 .i / D Cov X0 ; vj X0 ; T v` ; hXi 1 ; vm i Xi 1 ; T vn I
˝ ˛˝ ˛ ˝ ˛
C2 .i / D Cov X0 ; vj X0 ; T v` ; hXi 1 ; vm i "i ; T vn I
˝ ˛˝ ˛ ˝ ˛
C3 .i / D Cov X0 ; vj "1 ; T v` ; hXi 1 ; vm i Xi 1 ; T vn I
˝ ˛˝ ˛ ˝ ˛
C4 .i / D Cov X0 ; vj "1 ; T v` ; hXi 1 ; vm i "i ; T vn :
It is easy to see that C2 .i / D C4 .i / D 0, for i > 1, so it remains to find an
absolutely convergent bounds on C1 .i / and C3 .i /. We focus on the term C1 .i /, the
argument for C3 .i / being similar. Consider arbitrary x; y; u; v 2 L2 .Œ0; 1/: Since
P
Xk D k X0 C k1 j
j D0 "kj ;
Therefore
n 2
o
jC1 .i /j k k2.i 1/ EkX0 k4 C EkX0 k2 kvj k kv` k k T vm k k T vn k
2k k2i EkX0 k4 : t
u
where
k.N k/ 1 h i
cOj cO` ZO k .j; `/ Zk .j; `/ k R.j;
O `/
N k
1 h i
cOj cO` ZO N
k .j; `/ Z
N k .j; `/ .N k/ O `/ :
R.j;
N k
and
h i P
N 1=2 max cOj cO` ZO N
O
k .j; `/ ZN k .j; `/ .N k/R.j; `/ ! 0: (14.28)
2kN
Since the above two relations are verified in the same way, we will show only the
verification of (14.27).
Observe that
ZZ X
Zk .j; `/ D Xi 1 .t/Xi .s/vj .t/v` .s/dt ds
2i k
and ZZ X
cOj cO` ZO k .j; `/ D Xi 1 .t/Xi .s/cOj vO j .t/cO` vO ` .s/dt ds:
2i k
14.6 Proof of Proposition 14.2 271
Therefore
Since
ˇ ˇ
ZZ ˇ X ˇ
ˇ ˇ
ˇ ŒXi 1 .t/Xi .s/ r.t; s/ˇˇ ju.t;
O s/j dt ds
ˇ
ˇ1i k ˇ
0 ˇ ˇ2 11=2
ZZ ˇ X ˇ Z Z 1=2
B ˇ ˇ C
@ ˇ ŒX .t/X .s/ r.t; s/ˇ dt ds A j O
u.t; s/j 2
dt ds ;
ˇ i 1 i ˇ
ˇ1i k ˇ
Proof. Since
ˇ ˇ
ˇvj .t/v` .s/ cOj vO j .t/cO` vO ` .s/ˇ2 2v 2 .t/Œv` .s/ cOj vO ` .s/2
j
where the series converges in the L2 norm and almost surely. For c > 0 to be
determined later, introduce the truncated series
X
c log N
Xk;N D j "kj : (14.31)
j D0
where
rN .t; s/ D EŒXi 1;N .t/Xi;N .s/: (14.32)
Introduce also the functions
and
Ui;N .t; s/ D Xi 1;N .t/Xi;N .s/: (14.34)
To prove the lemma, it suffices to show that
ˇˇ ˇˇ
ˇˇ X ˇˇ
ˇˇ ˇˇ
1
N E max ˇˇ ˇˇ Vi;N ˇˇˇˇ ! 0; (14.35)
2kN ˇˇ ˇˇ
2i k
ˇˇ ˇˇ
ˇˇ ˇˇ
ˇˇ X ˇˇ
1
N E max ˇˇˇˇ ŒUi;N rN ˇˇˇˇ ! 0 (14.36)
2kN ˇˇ ˇˇ
2i k
and
krN rk ! 0: (14.37)
In (14.35), (14.36), (14.37), the norm is taken in the space L2 .Œ0; 12 /.
14.6 Proof of Proposition 14.2 273
By Lemma 14.4,
ˇˇ ˇˇ
ˇˇ ˇˇ
ˇˇ X ˇˇ
ˇ
E max ˇˇˇ Vi;N ˇˇˇˇ KN 2 ;
2kN ˇˇ ˇˇ
2i k
for some K and any > 0, provided c is sufficiently large, so (14.35) follows.
Relations (14.36) and (14.37) follow, respectively, from Lemmas 14.5 and 14.6. u
t
Lemma 14.4. For c > 0 define Xk;N D Xk;N;c by (14.31). Consider the function
Vi;N .t; s/ defined by (14.33). Then for any > 0, there is c so large that
ZZ 1=2
EkVi;N k D E 2
Vi;N .t; s/dt ds KN
Lemma 14.5. The functions Ui;N 2 L2 .Œ0; 12 / defined by (14.34) satisfy
ˇˇ ˇˇ
ˇˇ X ˇˇ
ˇˇ ˇˇ
E max ˇˇˇˇ ŒUi;N EUi;N ˇˇˇˇ KN 1=2 .log N /3=2 ;
2kN ˇˇ ˇˇ
2i k
Proof. Set
Ui;N .t; s/ D Ui;N .t; s/ EUi;N .t; s/:
274 14 Change point detection in the functional autoregressive process
Let m D c log N and assume without loss of generality that m is an integer. We will
work with the decomposition
X
Ui;N D S1 .k/ C S2 .k/ C : : : C Sm .k/:
1i k
The idea is that S1 .k/ is the sum of (available) U1;N ; U1Cm;N ; : : :, S2 .k/ of
U2;N ; U2Cm;N ; : : :, etc. Formally, for 1 k N and 1 j m, define
8
ˆ
ˆ X
Œk=m
ˆ
ˆ
C UmŒk=mCj;N
ˆ
< U.`1/mCj;N ; if k=m is not an integer
Sj .k/ D `D1
(14.38)
ˆk=m
ˆX
ˆ
ˆ
U.`1/mCj;N ; if k=m is an integer:
:̂
`D1
By (14.31) and (14.34), for any fixed j , Sj .k/ is a sum ofˇˇ independent ˇiden-
ˇ
ˇˇP ˇˇ
tically distributed random functions in L2 .Œ0; 12 /. Since ˇˇ 1i k Ui;N ˇˇ
Pm ˇˇ ˇˇ
ˇˇ ˇˇ
j D1 Sj .k/ ;
ˇˇ ˇˇ2
ˇˇ ˇˇ
ˇˇ X ˇˇ X m
ˇˇ ˇˇ
ˇˇ U ˇˇ m ˇˇSj .k/ˇˇ2 : (14.39)
ˇˇ i;N ˇˇ
ˇˇ1i k ˇˇ j D1
ˇˇP ˇˇ2
ˇˇ ˇˇ
By (14.39) and Lemma 14.7, we obtain E ˇˇ 1i k Ui;N ˇˇ C mk; where C is
a constant which does not depend on N . Since Ui;N is a stationary sequence, this
bound implies that for all K < L,
ˇˇ ˇˇ2
ˇˇ ˇˇ
ˇˇ X ˇˇ
ˇˇ
E ˇˇ ˇˇ
Ui;N ˇˇ C m.L K/: (14.40)
ˇˇKi L ˇˇ
Relation (14.40) together with the Mensov inequality (Lemma 14.8) imply that
ˇˇ ˇˇ2
ˇˇ ˇˇ
ˇˇ X ˇˇ
ˇ
E max ˇˇˇ Ui;N ˇˇˇˇ C m.log N /2 N: (14.41)
1kN ˇˇ ˇˇ
1i k
Lemma 14.6. Recall the functions r.t; s/ D EŒXi 1 .t/Xi .s/ and rN .t; s/ (14.32).
Then ZZ
kr rN k2 D jr.t; s/ rN .t; s/j2 dt ds D O.N 2rc /;
X
m
D E j "i 1j .t/ j C1 "i 1j .t/ :
j D0
The following two lemmas are used in the proof of Lemma 14.5.
Lemma 14.7. The functions Sj .k/ 2 L2 .Œ0; 12 / defined by (14.38) satisfy
X
m X
m X
m X
m
D k kj1 k kj2 k kj3 k kj4
j1 D0 j2 D0 j3 D0 j4 D0
Proof. The proof is practically the same as for real–valued random variables i , see
Móricz (1976), and so is omitted. t
u
Chapter 15
Determining the order of the functional
autoregressive model
This chapter is concerned with determining the order p in the FAR(p) model
X
p
Zi D ˚j .Zi j / C "i : (15.1)
j D1
L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, 277
Springer Series in Statistics 200, DOI 10.1007/978-1-4614-3655-3_15,
© Springer Science+Business Media New York 2012
278 15 Determining the order of the functional autoregressive model
.x ˝ y/.t; s/ D x.s/y.t/; x; y 2 L2 ;
which are elements of the space L2 .Œ0; 1 Œ0; 1/. The inner product in the latter
space will also be denoted by h; i, as it will always be clear from the context what
space the product is in.
We observe a sample of curves Z1 .t/; Z2 .t/; : : : ; ZN .t/; t˚2 Œ0;
1: We assume
that these curves are a part–realization of an infinite sequence Zj which satisfies
(15.1) and the following assumptions.
The operator
2 3
˚1 ˚2: : : ˚p1 ˚p
6 I 0 ::: 0 0 7
6 7
6 0 7
˚0 D 6 0 I ::: 0 7 (15.3)
6 :: :: :: :: :: 7
4 : : : : : 5
0 0 ::: I 0
15.1 Representation of an FAR(p) process as a functional linear model 279
k˚ 0 kL < 1; (15.4)
Assumption 15.2. The "i 2 L2 in (15.1) are independent and identically dis-
tributed.
Condition (15.4) and Assumption 15.2 imply that the Zi form a stationary and
ergodic sequence in L2 such that "i is independent of Zi 1 ; Zi 2 ; : : :, see Section
5.1 of Bosq (2000). For ease of reference, we state the following definition.
X
p
p
Qp .´/ D ´ I ´pj ˚j
j D1
Denoting by Ij the indicator function of the interval ..j 1/=p; j=p, we obtain
X
p Z 1 X
p
Œ˚j .Zi j /.t/ D Ij .x/j .t; xp .j 1//Zi j .xp .j 1//p dx:
j D1 0 j D1
Next we define
X
p
Xi .s/ D Zi j .sp .j 1//Ij .s/ (15.5)
j D1
280 15 Determining the order of the functional autoregressive model
and
X
p
.t; s/ D p j .t; sp .j 1//Ij .s/: (15.6)
j D1
Setting Yi D Zi , we have
Yi D .Xi / C "i ; (15.7)
where is an integral Hilbert–Schmidt operator with the kernel , i.e.
Z
Yi .t/ D .t; s/Xi .s/ds C "i .t/: (15.8)
Thus, if we can estimate , then we can estimate each of the ˚j . The FAR(p 1)
model will be rejected in favor of FAR(p) if the resulting estimate of ˚O p is large in
a sense established in Section 15.2. We now turn to the estimation of the operator .
Let fvO k ; 1 k N g be an orthonormal basis of L2 (for each N ), constructed
from the eigenfunctions of the covariance operator
1 X
N
b
CX .t; s/ D .Xi .t/ XN N .t//.Xi .s/ XNN .s//;
N
i D1
For ease of reference, we list the dimensions of the matrices introduced above
Y .N qy /; X .N qx /; .qx qy /:
Using these matrices, we now reduce model (15.7) to a finite dimensional linear
model. The precision of this finite dimensional approximation will be reflected in
the structure of its random errors. Observe that
X
qx 1
X
D X.i; k/ .k; j / C h"i ; uO j i C hXi ; vOk ih ; vO k ˝ uO j i:
kD1 kDqx C1
YDX C "0 ;
The N qy matrix "0 has absorbed the error we made in projecting onto a finite
dimensional space, and is given by
X
"0 .i; j / D h"i ; uO j i C hXi ; vO l ih ; vO l ˝ uO j i:
l>qx
is close to zero. The key element is the range of the argument x of vO k , which reflects
the part of whose nullity we want to test. Based on the above representation, we
want to find linear combinations of the O .k; j / which make the sum (15.12) small.
Clearly, we do not want to test if all O .k; j / are small because that would mean that
the whole kernel and so all of the j ; 1 j p; vanish. For further discussion,
it is convenient to set
sCp1
vO k;p .s/ D vO k ; 0 s 1;
p
15.2 Order determination procedure 283
so that
1 X
O p .t; s/ D O .k; j /vO k;p .s/uO j .t/; 0 s; t 1:
p
kqx ; j qy
The idea behind the construction of the test statistic is to replace the vO k;p by a
smaller set of functions that optimally describe the space spanned by them, and so,
in a sense, by the vO k .x/; x .p 1/=p. In other words, we test the nullity of
p only in the most significant orthogonal directions of the vO k;p . We orthogonalize
them as
X
qx
wO k;p .s/ D ˛O i;k vO i;p .s/
i D1
O ˛O k D O k ˛O k ;
V 1 k qx ; (15.14)
where
O 1 O 2 O qx :
A direct verification shows that
˝ ˛
wO k;p ; wO k 0 ;p D O k ık;k 0 ;
What happens for both simulated and real data is that a few wO k;p have norms close to
p, and the remaining norms are significantly smaller. An approximate upper bound
of p, holds because
X
qx
kwO k;p k j˛O i;k jkvO i;p k pk˛k k1 ;
i D1
284 15 Determining the order of the functional autoregressive model
R1
Since 0 vOk2 .x/dx D 1, kvO k;p k will generally not be very close to p, unless most of
the mass of vO k is concentrated on the interval Œ.p 1/=p; 1.
We thus want to determine if the coefficients
hOp ; wO k;p ˝ uO j i; k D 1; : : : ; q? ; j D 1; : : : ; qy
The above calculation shows that the coefficients hOp ; wO k;p ˝ uO j i are small if the
matrices O k ˛Tk O have small entries. As explained above, O k D kwO k;p k2 0:9p, so
we reject Hp if the entries of the matrices ˛Tk O are collectively large. To derive a
test statistic, consider the following matrices (with their dimensions in parentheses)
O ? D Œ˛O 1 ; : : : ; ˛O q?
A .qx q? /; O T? O
A .q? qy /: (15.15)
O T? O are
We want to construct a quadratic form which is large when some entries of A
large, and which has an approximately parameter free distribution. We will exploit
d
the approximation ZT .VarZ/1 Z ! 2dim.Z/ , which holds for an asymptotically
normal vector Z. To this end, we form the column vector vec.A O T O / by stack-
?
O ? O , a process known as vectorization. Using the property,
ing the columns of A T
15.2 Order determination procedure 285
where I.qy / is the qy qy identity matrix. We use the above identity to determine
the approximate covariance matrix of vec.A O T O /. Applying the formula Var.QZ/ D
?
T O
QVar.Z/Q , and treating the matrix A? as deterministic, we obtain
h i
Var vec.A O T? Var.vec. O // .I.qy / ˝ A
O T? O / .I.qy / ˝ A O T? ;
where we used the property .A˝B/T D AT ˝B T . One can show that, see Kokoszka
and Reimherr (2011),
N Var.vec. O // b O
C" ˝ ;
where
O D diagfO 1 ; : : : ; O qx g;
b
C" D N 1 .Y X O /T .Y X O /:
Combining these results, we arrive at the test statistic
h iT h i1
O p WD N vec A O? O .I.qy / ˝ A O ? /.b O
C" ˝ /.I.q O
y / ˝ A? /
O ? O /:
vec.A
(15.16)
The statistic O p has an approximately chi–square distribution with qy q? degrees of
freedom. In Section 15.3 we evaluate the quality of this approximation. We conclude
this section with an algorithmic description of the test procedure.
Test algorithm (Hp1 against Hp ).
1. Subtract the sample mean from the functional observations. Continue to work
with the centered data.
2. Construct the regressors Xi according to (15.5), and set Yi D Zi .
bY
3. Determine qy such that the first qy eigenfunctions of the covariance operator C
explain between 80 and 90 percent of the variance.
a. Set qy D qx p or
b. take qx analogous to qy .
4. Construct the matrices Y and X according to (15.9).
5. Calculate the qx qy matrix O according to (15.10).
6. Calculate the qx qx matrix V O according to (15.13), and its eigenvectors ˛O k and
eigenfunctions O k defined in (15.14).
7. Determine q? such that the first q? eigenvalues O k are greater than 0.9p. (The
procedure is not sensitive to the cut–off value of 0.9, taking 0.5 produced the
same conclusions in data examples and simulations.)
8. Construct the matrices A O ? and AO T O defined in (15.15) and compute the test
?
statistic O p defined in (15.16)
9. Compute the P–value using the chi–square density with qy q? degrees of free-
dom.
286 15 Determining the order of the functional autoregressive model
We first evaluate the performance of the test using simulated data, then we turn to
the application to two financial data sets.
Simulated data. The data are generated according to an FAR model, the choice of
the autoregressive operators specifies the order. We consider two models
Zi D c1 Zi 1 C c2 Zi 2 C "i ; (15.17)
Table 15.3 P–values for the test applied to credit card data transformed by differencing.
Null Hyp pD0 p1
Alt Hyp p1 p2
P–Value 0.000 0.427
Table 15.4 P–values for the test applied to credit card data transformed by centering.
Null Hyp pD0 p1 p2
Alt Hyp p1 p2 p3
P–Value 0.000 0.00 0.161
We now apply our multistage test procedure to two financial data sets we have
already introduced in previous chapters: the daily credit card transactions and the
curves of Eurodollar futures prices.
Credit Card Transactions. This data set is introduced in Section 1.3. Recall
that we denote by Dn .ti / the number of credit card transactions in day n; n D
1; : : : ; 200; between times ti 1 and ti , where ti ti 1 D 8 min; i D 1; : : : ; 128:
We thus have N D 200 daily curves. The transactions are normalized to have time
stamps in the interval Œ0; 1, which thus corresponds to one day. Some smoothing is
applied to construct the functional objects, as explained in Section 1.3.
The curves thus obtained have non–zero mean and exhibit strong weekly period-
icity. By computing the differences Zn .t/ D Yn .t/ Yn7 .t/; n D 8; 9; : : : ; 200;
we can remove both. We refer to this method of obtaining the Zi for further analysis
as differencing. Another way to remove the weekly periodicity and the mean is to
center the observations according to their day of the week. We refer to this method
as centering.
The P–values are displayed in Tables 15.3 and 15.4. The stationary process
obtained by differencing can be modeled as FAR(1). This agrees with the con-
clusions we reached in Chapters 7 and 14, where we tested the suitability of the
FAR(1) model using significance tests against error correlations and change points.
288 15 Determining the order of the functional autoregressive model
Centering by week days leads to a more complex structure, which can be captured
by the FAR(2) model.
Eurodollar Futures. We now turn to the application of our procedure to the data
set consisting of Eurodollar futures contract prices studied in Section 14.3. Recall
that each daily curve consists of 114 points per day; point i corresponds to the price
of a contract with closing date i months from today. We work with centered data,
i.e. the sample mean function has been subtracted from all observations.
The P–values displayed in Table 15.5 indicate that the FAR(1) model is not suit-
able for modelling the whole data set, but the FAR(2) model is acceptable. This
conclusion agrees with the analysis presented in Section 14.3 where a change point
test was applied to these data. We saw that the FAR(1) model is not suitable for the
whole data set, merely for shorter subintervals. The present analysis shows that a
slightly more complex FAR(2) model captures the stochastic structure of the whole
data set.
Chapter 16
Functional time series
L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, 289
Springer Series in Statistics 200, DOI 10.1007/978-1-4614-3655-3_16,
© Springer Science+Business Media New York 2012
290 16 Functional time series
400
200
0
nT (CMO)
−200
−400
−600
−800
min
Fig. 16.1 Ten consecutive functional observations of a component of the magnetic field recorded
at College, Alaska. The vertical lines separate days. Long negative spikes lasting a few hours
correspond to the aurora borealis.
drawn from them, they must fit into a general, one might say nonparametric, depen-
dence scheme. An example of space physics data is shown in Figure 16.1. Temporal
dependence from day to day can be discerned, but has not been modeled yet.
The Chapter is organized as follows. In Section 16.1, we introduce the depen-
dence condition and illustrate it with several examples. In particular, we show that
the linear functional processes fall into this framework, and present some nonlinear
models that also do. In Section 16.2 we show how the consistency of the estimators
16.1 Approximable functional time series 291
for the eigenvalues and eigenfunctions of the covariance operator extends to depen-
dent functional data. Next, in Sections 16.3 and 16.4, we turn to the estimation
of an appropriately defined long run variance matrix for functional data. For most
time series procedures, the long run variance plays a role analogous to the variance–
covariance matrix for independent observations. Its estimation is therefore of funda-
mental importance, and has been a subject of research for many decades, Andrews
(1991), Anderson (1994) and Hamilton (1994) provide the background and numer-
ous references. In Sections 16.5 and 16.7, we illustrate the application of the results
of Sections 16.2 and 16.3 on two problems: change point detection for functional
mean, and the estimation of kernel in the functional linear model. We show that the
detection procedure introduced in Chapter 6 must be modified if the data exhibit
dependence, but the kernel estimation procedure is robust to mild dependence. Sec-
tion 16.5 also contains a small simulation study and a data example. The proofs are
collected in the remaining sections. This chapter is partially based on the paper of
Hörmann and Kokoszka (2010).
The notion of weak dependence has over the past decades been formalized in many
ways. Perhaps the most popular are various mixing conditions, see Doukhan (1994),
Bradley (2007), but in recent years several other approaches have also been intro-
duced, see Doukhan and Louhichi (1999) and Wu (2005, 2007), among others. In
time series analysis, moment based measures of dependence, most notably autocor-
relations and cumulants, have gained broad acceptance. The measure we consider
below is a moment type quantity, but it is also related to the mixing conditions as it
considers –algebras m time units apart, with m tending to infinity.
A most direct relaxation of independence is m–dependence. Suppose fXn g is a
sequence of random elements taking values in a measurable space S . Denote by
Fk D f: : : Xk2 ; Xk1 ; Xk g and FkC D fXk ; XkC1 ; XkC2 ; : : :g, the –algebras
generated by the observations up to time k and after time k, respectively. Then the
C
sequence fXn g is said to be m-dependent if for any k, the –algebras Fk and FkCm
are independent.
Most time series models are not m–dependent. Rather, various measures of
dependence decay sufficiently fast, as the distance m between the –algebras Fk
C
and FkCm increases. However, m–dependence can be used as a tool to study prop-
erties of many nonlinear sequences, see e.g. Berkes and Horváth (2001), Berkes,
Horváth and Kokoszka (2003, 2005), Berkes and Horváth (2003a, 2003b), Hörmann
(2008), Berkes, Hörmann and Schauer (2008, 2009). The general idea is to approx-
.m/
imate fXn ; n 2 Zg by m–dependent processes fXn ; n 2 Zg, m 1. The goal is
to establish that for every n the sequence fXn.m/ ; m 1g converges in some sense
to Xn , if we let m ! 1. If the convergence is fast enough, then one can obtain
the limiting behavior of the original process from corresponding results for m–
dependent sequences. Definition 16.1 formalizes this idea and sets up the necessary
292 16 Functional time series
In this chapter we use H to denote the function space L2 D L2 .Œ0; 1/ to avoid
confusion with the space Lp of scalar random variables.
where the "i are iid elements taking values in a measurable space S , and f is a mea-
surable function f W S 1 ! H . Moreover we assume that if f"0i g is an independent
copy of f"i g defined on the same probability space, then letting
we have
1
X
p Xn Xn.m/ < 1: (16.4)
mD1
.n/
Xn.m/ D f ."n ; "n1 ; : : : ; "nmC1 ; ".n/
nm ; "nm1 ; : : :/: (16.5)
We call this method the coupling construction. Since this modification lets condition
(16.4) unchanged, we will assume from now on that the Xn.m/ are defined by (16.5).
16.1 Approximable functional time series 293
Then, for each m 1, the sequences fXn.m/ ; n 2 Zg are strictly stationary and
m–dependent, and each Xn.m/ is equal in distribution to Xn .
One can also define Xn.m/ by
.m/
Xn.m/ D f ."n ; "n1 ; : : : ; "nmC1 ; ".m/
n;nm ; "n;nm1 ; : : :/; (16.6)
n o
where ".m/
n;`
; m 1; 1 < n; ` < 1 are iid copies of "0 . We require (16.4), but
now Xn.m/ defined by (16.6) is used. To establish (16.4) with Xn.m/ defined by (16.5)
or (16.6) the same arguments are used.
Lp –m–approximability is related to Lp –approximability studied by Pötscher
and Prucha (1997) for scalar– and vector–valued processes. Since our definition
applies with an obvious modification to sequences with values in any normed vec-
tor spaces H (in particular, R or Rn ), it can been seen as a generalization of
Lp –approximability. There are, however, important differences. By definition, Lp –
approximability only allows for approximations that are, like the truncation con-
struction, measurable with respect to a finite selection of basis vectors "n ; : : : ; "nm ,
whereas the coupling construction does not impose this condition. On the other
hand, Lp –approximability is not based on independence of the innovation process.
Instead independence is relaxed to certain mixing conditions.
Finally, we point out that only a straightforward modification is necessary
in order to generalize the theory of this paper to non-causal processes Xn D
f .: : : ; "nC1 ; "n ; "n1 ; : : :/: At the expense of additional technical assumptions, our
framework can also be extended to non-stationary sequences, e.g. those of the form
(16.2) where f"k g is a sequence of independent, but not necessarily identically dis-
tributed, random variables.
We now illustrate the applicability of Definition 16.1 with several examples. Let
L D L.H; H / be the set of bounded linear operators from H to H . Recall that for
A 2 L the operator norm is kAkL D supkxk1 kAxk.
expansion. The proof of Proposition 16.2 is more involved than those presented in
Examples 16.1 and 16.2, and is not presented. The curves yk .t/ appearing in Exam-
ple 16.3 correspond to appropriately defined intradaily returns.
Example 16.3 (Functional ARCH). Let ı 2 H be a positive function and let f"k g
an i.i.d. sequence in L4H . Further, let ˇ.s; t/ be a non-negative kernel function in
L2 .Œ0; 12 ; BŒ0;1
2
; 2 /. Then we call the process
where Z 1
2 2
k .t/ D ı.t/ C ˇ.t; s/yk1 .s/ds; (16.9)
0
Then equations (16.8) and (16.9) have a unique strictly stationary and causal solu-
tion and the sequence fyk g is Lp –m–approximable.
Example 16.3 and Proposition 16.2 are taken from Hörmann et al. (2010), where
further properties of the functional ARCH sequence are discussed.
We conclude this section we a simple but useful Lemma which shows that Lp –
m–approximability is unaffected by linear transformations, whereas independence
assumptions are needed for product type operations.
Lemma 16.1. Let fXn g and fYn g be two Lp –m–approximability sequences in LpH .
Define
Zn.1/ D A.Xn /, where A 2 L;
Zn.2/ D Xn C Yn ;
Zn.3/ D Xn ı Yn .Xn ı Yn .t/ D Xn .t/Yn .t//;
Zn.4/ D hXn ; Yn i;
Zn.5/ D Xn ˝ Yn (Xn ˝ Yn .t; s/ D Xn .s/Yn .t/).
.1/ .2/ p
Then fZn g and fZn g are Lp –m–approximable sequences in LH . If Xn and Yn
.4/ .5/
are independent then fZn g and fZn g are Lp –m–approximable sequences in the
.3/
respective spaces. If E supt 2Œ0;1 jXn .t/jp CE supt 2Œ0;1 jYn .t/jp < 1, then fZn g
p
is Lp –m–approximable in LH .
Proof. The first two relations are immediate. We exemplify the proofs of the remain-
ing claims by focusing on Zn D Zn.5/ . For this we set Zm.m/
D Xm .m/
˝ Ym.m/ and
296 16 Functional time series
.m/
note that Zm and Zm are (random) kernel operators, and thus Hilbert-Schmidt
operators. Since
.m/ .m/
kZm Zm kL kZm Zm kS
ZZ 1=2
.m/ .m/
2
Xm .s/Ym .t/ Xm .s/Ym .t/ dt ds
p
2 kXm kkYm Ym.m/ k C kYm.m/ kkXm Xm.m/
k ;
In this section we extend the results of Section 2.5 to weakly dependent functional
time series and establish a central limit theorem for functional time series.
Let fXn g 2 L2H be a stationary sequence with covariance operator C . We assume
that C is an integral operator with kernel c.t; s/ D Cov.X1 .t/; X1 .s// whose esti-
mator is
1 X
N
O s/ D
c.t; .Xn .t/ XNN .t//.Xn .s/ XN N .s//: (16.10)
N nD1
The proof of Theorem 16.1 is given in Section 16.8. Let us note that by Lemma
2.2 and Theorem 16.1,
h i
NE jj O j j2 NEkCO C k2L NEkCO C k2S UX :
16.3 The long–run variance 297
Assuming
1 > 2 > > d > d C1 ; (16.12)
˛˝
Lemma 2.3 and Theorem 16.1 yield for j d (cOj D sign. vO j ; vj /),
p !2
2 2 8 8UX
2
NE kcOj vOj vj k NEkCO C k2L 2 NEkCO C k2S 2 ;
˛j ˛j ˛j
The concept of the long run variance, while fundamental in time series analysis,
has not been studied for functional data, and not even for scalar approximable
sequences. It is therefore necessary to start with some preliminaries, which lead
to the main results and illustrate the role of the Lp –m–approximability.
Let fXnP
g be a scalar (weakly) stationary sequence. Its long run variance is defined
as 2 D j 2Z j ; where j D Cov.X0 ; Xj /, provided this series is absolutely
convergent.Our first lemma shows that this is the case for L2 –m–approximable
sequences.
298 16 Functional time series
Since
the random variables X0 and Xj.j / are independent, so Cov.X0 ; Xj.j / / D 0, and
X N jj j
1 X
O2 D !q .j / Oj ; Oj D .Xi XN N /.Xi Cjj j XNN /:
N
jj jq i D1
Various weights !q .j / have been proposed and their optimality properties studied,
see Andrews (1991) and Anderson (1994), among others. In theoretical work, it is
typically assumed that the bandwidth q is a deterministic function of the sample size
such that q D q.N / ! 1 and q D o.N r /, for some 0 < r 1. We will use the
following assumption:
Assumption 16.1. The bandwidth q D q.N / satisfies q ! 1; q 2 =N ! 0 and the
weights satisfy !q .j / D !q .j / and
j!q .j /j b (16.15)
.h; r; s/ D Cov ..X0 /.Xh /; .Xr /.Xs // r hs s hr :
16.3 The long–run variance 299
Recently, Giraitis et al. (2003) showed that condition (16.17) can be replaced by a
weaker condition
X1 1
X
sup j .h; r; s/j < 1: (16.18)
h rD1 sD1
To explain the
P intuition behind conditions (16.19) and (16.20), consider the linear
process Xk D 1 j D0 cj Xkj . For k 0,
1
X 1
X
.k/ .k/
Xk Xk D cj "kj cj "kj :
j Dk j Dk
cannot be expressed only in terms of the errors (16.21), but the errors "k ; : : : ; "1
should approximately cancel, so that the difference Xk Xk.k/ is small, and very
weakly correlated with Xr.r/ XrC`
.rC`/
.
With this background, we now formulate the following result.
300 16 Functional time series
Theorem 16.4 is proven in Section 16.8. The general plan of the proof is the same
as that of the proof of Theorem 3.1 of Giraitis et al. (2003), but the verification of
the crucial relation (16.49) uses a new approach based on L4 –m–approximability.
The arguments preceding (16.49) show that replacing XNN by D EX0 does not
change the limit. We note that the condition q 2 =N ! 0 we assume is stronger than
the condition q=N ! 0 assumed by Giraitis et al. (2003). This difference is of little
practical consequence, as the optimal bandwidths for the kernels used in practice
are typically of the order O.N 1=5 /. Finally, we notice that by further strengthening
conditions on the behavior of the bandwidth function q D q.N /, the convergence
in probability in Theorem 16.4 could be replaced by the almost sure convergence,
but we do not pursue this research here. The corresponding result under condition
(16.18) was established by Berkes et al. (2005), it is also stated without proof as
part of Theorem A.1 of Berkes et al. (2006).
We now turn to the vector case in which the data are of the form
Xn D ŒX1n ; X2n ; : : : ; Xd n T ; n D 1; 2; : : : ; N:
Just as in the scalar case, the estimation of the mean by the sample mean does
not effect the limit of the kernel long–run variance estimators, so we assume that
EXi n D 0 and define the autocovariances as
8 r
ˆ
ˆ X
N
ˆ
ˆN 1
Xn XTnCr if r 0;
<
Or D nD1
ˆ jrj
NX
ˆ
ˆN 1
:̂ XnCjrj XTn if r < 0:
nD1
X
N X jrj
N 2 EŒXi m Xj n D N 1 1 r .i; j /;
N
m;nD1 jrj<N
We are now able to turn to functional data. Suppose fXn g 2 L2H is a zero
mean sequence
R and v1 ; v2 ; : : : ; vd is any set of orthonormal functions in H . Define
Xi n D Xn .t/vi .t/dt, Xn D ŒX1n ; X2n ; : : : ; Xd n T and r D Cov.X0 ; Xr /. A
direct verification shows that if fXn g is Lp –m–approximable, then so is the vector
sequence fXn g. We thus obtain the following corollary.
The results of Section 16.4 show that the conclusions of parts b/ of Theorem
16.5 and Corollary 16.1 holds under L2 –m–approximability and mild additional
assumptions; L4 –m–approximability and Condition (16.23) are not required.
In Corollary 16.1, the functions v1 ; v2 ; : : : ; vd form an arbitrary orthonormal
deterministic basis. In many applications, a random basis consisting of the estimated
principal components vO1 ; vO 2 ; : : : ; vOd is used. The scores with respect to this basis are
defined by Z
O `i D .Xi .t/ XNN .t//vO ` .t/dt; 1 ` d: (16.24)
We then have the following proposition which will be useful in most statistical pro-
cedures for functional time series which, an application to change point detection is
developed in Section 16.5.
O D diag.cO1 ; : : : ; cOd /, with cOi D sign.hvi ; vO i i/. Suppose
Proposition 16.3. Let C
fXn g 2 LH is L –m–approximable and that (16.12) holds. Assume further that
4 4
1 X
q
WD sup wq .j / < 1 (16.26)
q1 q
j Dq
and q 4 =N ! 0. Then
j Ȯ .ˇ/ Ȯ .C
O Ǒ /j D oP .1/ and j Ȯ ./
O Ȯ . Ǒ /j D oP .1/: (16.27)
The result of this section, Theorem 16.6, provides an elegant alternative to Theo-
rem 16.5. It is a general consistency result for the kernel estimators of the long–run
covariance matrix, which can be used in many problems of inference for vector–
valued time series. Since its relevance goes beyond functional data, we restate
some assumptions and definitions, to make this section as self–contained as pos-
sible. In this Section, X` D ŒX1` ; : : : ; Xd ` T , is a sequence of zero mean L2 –m–
approximable random vectors. For ease of reference, recall that this means that the
following assumptions hold:
Assumption 16.3.
EX` D 0 and EkX` k2 < 1:
16.4 Estimation of the long–run covariance matrix under weak assumptions 303
Assumption 16.4.
1
X 1=2
max E.Xj ` Xj.m/
`
/2 < 1;
1j d
mD1
where f".m/
`;n ; m 1; 1 < n; ` < 1g are iid copies of "0 .
Recall that the long–run variance matrix ˙ introduced in Section 16.3 is defined
by
1
X 1
X
˙ D EX0 XT0 C EX0 XT` C EX` XT0 :
lD1 lD1
Assumptions 16.3 and 16.4 yield that ˙ is well–defined, and the infinite sums in the
definition are (coordinate-wise) absolutely convergent. We consider the estimation
of ˙ . The sample autocovariance matrices defined in Section 16.3 can be written as
X k/
min.N;N
Ok D 1 X` XT`Ck
N
`Dmax.1;1k/
X
N 1
Ȯ N D K.k=BN / O k : (16.28)
kD.N 1/
the most commonly used kernels, like the Bartlett (cf. Example 16.4) and Parzen
(cf. Example 16.5). Assumption 16.5 has been used in other contexts, for example,
Liu and Wu (2010) established consistency results for the estimation of spectral
densities under Assumption 16.6. It does not specify the rate at which BN tends to
infinity. We formulate it as a separate assumption, namely,
Assumption 16.6.
BN ! 1 and BN =N ! 0:
We can now state the following theorem, which is proven in Section 16.9
If Theorem 16.6 is used in the context of functional data, the vectors X` are often
projections onto the EFPC’s vO1 ; : : : ; vO d . In this case, Ȯ N is close to CO ˙ CO , with
the matrix CO as in Proposition 16.3. For more applications of Theorem 16.6, see
Horváth and Reeder (2011).
The main advantage of Theorem 16.6 over Theorem 16.4 is that the latter requires
L4 –m–approximability, whereas only L2 –m–approximability is assumed in Theo-
rem 16.6. This is of practical relevance as some data, most notably those arising
in financial applications, may not have fourth moments. Moreover, Theorem 16.6
does not use the cumulant–like condition (16.23), which may be difficult to verify
for some model classes. Finally, Theorem 16.6 uses a weaker and more standard
assumption BN D o.N /, rather than BN D o.N 1=2 / needed in Theorem 16.4. This
is achieved at the expense of imposing smoothness condition on the kernel K and it
its Fourier transform KO (Assumption 16.5(iii) can be replaced with the requirement
that K.t/ decays fast enough as jtj ! 1). For all kernels and bandwidths used
in practice, both the conditions on K and the rate BN D o.N 1=2 / hold, so these
differences in assumptions are less important.
We conclude this section with some example illustrating Assumption 16.5.
This kernel clearly satisfies parts (i)–(iii) of Assumption 16.5. Its Fourier transform
is
O 1 u 2
K.u/ D sin :
u 2
Thus, to verify part (iv), we must check that the function
2
sin.t/
F .t/ D
t
16.4 Estimation of the long–run covariance matrix under weak assumptions 305
O
Following the arguments used in Example 16.4, one can verify that K.u/ is also
integrable and Lipschitz.
In Examples 16.4 and 16.5, the kernel is a scaled version of the convolution of
uniform densities on Œ1; 1. The Bartlett kernel is the convolution of two, while the
Parzen kernel is the convolution of four (this follows immediately from the form of
O
K.u/, cf. Example 16.6). Higher order convolutions can be used as well.
The rectangular kernel is not used in practice due to its poor performance in
finite samples, which can be theoretically explained by the slowly decaying Fourier
transform. To some extend, this is also true of the Bartlett kernel, but it is more
often used due to its simplicity. Optimal kernels are generally smoother in the time
domain and “more compactly” supported in the frequency domain. In software
306 16 Functional time series
implementations, these kernels are typically not defined directly through a func-
tion K, but through the weights !q .j / considered in Section 16.3. For example the
modified Daniell kernel is obtained by repeated discrete convolutions of the weights
!.1/ D 1=3; !.0/ D 1=3; !.1/ D 1=3, see e.g. Chapter 4 of Shumway and
Stoffer (2006).
Functional time series are obtained from data collected sequentially over time, and it
is natural to expect that conditions under which observations are made may change.
If this is the case, procedures developed for stationary series will produce spurious
results. In this section, we develop a procedure for the detection of a change in the
mean function of a functional time series. In addition to its practical relevance, the
requisite theory illustrates the application of the results developed in Sections 16.2
and 16.3. The main results of this Section, Theorems 16.7 and 16.8, are proven in
Section 16.10. This Section is an extension of Chapter 6 to dependent curves. We
thus consider testing the null hypothesis
(Note that under H0 , we do not specify the value of the common mean.) The test
we construct, has a particularly good power against the alternative in which the data
can be divided into several consecutive segments, and the mean is constant within
each segment, but changes from segment to segment. The simplest case of only two
segments (one change point) is specified in Assumption 16.8. First we note that
under the null hypothesis, we can represent each functional observation as
The following assumption specifies conditions on ./ and the errors Yi ./ needed
to establish the convergence of the test statistic under H0 .
Assumption 16.7. The mean in (16.29) is in H . The error functions Yi 2 L4H are
L4 –m–approximable mean zero random elements such that the eigenvalues of their
covariance operator satisfy (16.12).
Recall that the L4 –m–approximability implies that the Yi are identically dis-
tributed with 4 .Yi / < 1. In particular, their covariance function
in which the Yi satisfy Assumption 16.7, the mean functions 1 and 2 are in L2
and
k D ŒN for some 0 < < 1:
The general idea of testing is similar to that developed in Chapter 6 for inde-
pendent observations, the central difficulty is in accommodating the dependence. To
define the test statistic, recall that bold symbols denote d –dimensional vectors, e.g.
O i D ŒO 1i ; O 2i ; : : : ; O d i T . Define the partial sums process
bN
X xc
SN .x; / D n ; x 2 Œ0; 1;
nD1
Theorem 16.8. Suppose Assumption 16.8 and condition (16.33) hold. If the vectors
P
1 and 2 are not equal, then TN .d / ! 1:
The behavior under the alternative of change point tests for dependent functional
data is studied by Aston and Kirch (2011a). Their work addresses in detail the
orthogonality conditions required for a test to have nontrivial power, and includes
epidemic changes in which the mean is 2 at k1 ; k1 C 1; : : : ; k2 with k1 > 1 and
k2 < n, and 1 elsewhere.
We conclude this section with two numerical examples which illustrate the effect
of dependence on our change point detection procedure. Example 16.7 uses syn-
thetic data, while Example 16.8 focuses on particulate pollution data. Both show
that using statistic (16.31) with Ȯ ./
O being the estimate for just the covariance, not
the long–run covariance matrix, leads to spurious rejections of H0 , a nonexistent
change point can be detected with a large probability. An interesting example is
presented in Aston and Kirch (2011b) who develop methodology for determining
distributions of change points for 3D functional data from multiple subjects. They
apply it to a large study on resting state functional magnetic resonance imaging.
We first let q D 0, which corresponds to using just the sample covariance of fO n g
in the normalization for the test statistic (16.31) (dependence is ignored). We use
1000 replications and the 5% confidence level. The rejection rate is 23:9%, much
higher than the nominal level of 5%. In contrast, using an appropriate estimate for
the long–run variance, the reliability of the test improves dramatically. Choosing an
optimal bandwidth q is a separate problem, which we do not pursue here. Here we
4 2
adapt the formula q 1:1447 .aN /1=3 ; a D .1C /4
valid for a a scalar AR(1)
16.5 Change point detection 309
process with the autoregressive coefficient , Andrews (1991). Using this formula
with D k kS D 0:6 results in q D 4. This choice gives the empirical rejection
rate of 3:7%, much closer to the nominal rate of 5%.
Example 16.8. This example, which uses pm10 (particulate matter with diameter
< 10m, measured in g=m3 ) data, illustrates a similar phenomenon as Exam-
ple 16.7. For the analysis we use pm10 concentration data measured in the Austrian
city of Graz during the winter of 2008/2009 (N =151). The data are given in 30
minutes resolution, yielding an intraday frequency of 48 observations. As in Stadt-
lober et al. (2008) we use a square root transformation to reduce heavy tails. Next
we remove possible weekly periodicity by subtracting the corresponding mean vec-
tors obtained from the different weekdays. A time series plot of this new sequence
is given in Figure 16.2. The data look relatively stable, although a shift appears to
be possible in the center of the time series. It should be emphasized however, that
pm10 data, like many geophysical time series, exhibit a strong, persistent positive
autocorrelation structure. These series are stationary over long periods of time, with
an appearance of local trends or shifts at various time scales (random self–similar
or fractal structure).
The daily measurement vectors are transformed into smooth functional data
using 15 B-splines functions of order 4. The functional principal component analysis
yields that the first three principal components explain 84% of the total variabil-
ity, so we use statistic (16.31) with d D 3. A look at the acf and pacf of the
first empirical PC scores (Figure 16.3) suggests an AR(1), maybe AR(3) behavior.
The second and third empirical PC scores show no significant autocorrelation struc-
ture. We use the formula given in Example 16.7 with D 0:70 (acf at lag 1) and
N D 151 and obtain q 4. This gives T151 .3/ D 0:94, which is close to the critical
value 1:00 when testing at a 95% confidence level, but does not support rejection
seasonally detrended sqrt(pm10)
−5
0 50 100 150
day
p
Fig. 16.2 Seasonally detrended pm10, Nov 1, 2008 – Mar 31, 2009.
310 16 Functional time series
0.6
0.6
0.4
PACF
ACF
0.2
0.2
0.0
−0.2
0 5 10 15 20 5 10 15 20
Lag Lag
Fig. 16.3 Left panel: Sample autocorrelation function of the first empirical PC scores. Right panel:
Sample partial autocorrelation function of the first empirical PC scores.
of the no-change hypothesis. In contrast, using only the sample covariance matrix
in (16.32) gives T151 .3/ D 1:89, and thus a clear and possibly spurious rejection of
the null hypothesis.
We have seen in the previous sections of this chapter that in many inferential prob-
lems related to time series the long run variance plays a fundamental role. In par-
ticular, in Section 16.5 we used it to normalize the test statistic of Chapter 6 to
obtain a limiting null distribution that is parameter free, see Theorem 16.7. It has
been known in econometric research that using the long run variance in this way
can lead to the so–called non–monotonic power. This phenomenon is illustrated in
Figure 16.4 which shows the power of three tests of the null hypothesis of Sec-
tion 16.5 under the alternative quantified in Assumption 16.8. The mean zero curves
Yi are Gaussian FAR(1) processes; k D N=2, 1 D 0, and 2 D ıf .t/. Of cen-
tral importance is the parameter ı which quantifies the magnitude of the change.
The test called BGHK is the the test of Chapter 6 (it assumes independent Yi ), HK
refers to the test of Section 16.5, and SN to the test based on a self–normalized
statistic, which will be introduced later in this section. Focusing on the HK test , we
see that if the change becomes very large, this test looses power; its power is non–
monotonic. A heuristic explanation is that the “denominator” in statistic (16.31), an
estimate of run variance matrix based on a data driven procedure discussed later
in this section, becomes very large when ı increases. This is because the scores
O `i , given by (16.24), are computed without adjusting for a possible change point.
If the change point is very large, the sample autocorrelations of the scores decay
16.6 Self–normalized statistics 311
BM; f(t)=t
100
80
Rejection percentage
60
BGHK
SN
40
HK
20
0
0 1 2 3 4 5 6
δ
Fig. 16.4 Size-adjusted power for detecting the change in the mean function; ı measures the mag-
nitude of change; sample size N D 50.
very slowly and it causes the data driven procedure to select large bandwidths and
the estimator of the long run variance to behave like for a very strongly dependent
sequence. This inflates its value so much that the test looses power. (We note that
if the bandwidth is deterministic, the power is monotonic, but there is no universal
formula for the kernel bandwidth that gives correct size.) A remedy is to adjust the
definition of the scores O `i to allow for a possible change point. Combined with
the idea of self–normalization, this leads to a test that has monotonic power. Before
proceeding further, we note that size of change corresponding to, say, ı D 2 is very
large relative to the Yi , and a change point of this magnitude can be detected by eye.
The tests based on self–normalized statistics correct however not only the problem
of non–monotonic power, but perhaps more importantly eliminate the need to select
the bandwidth parameter in the kernel estimators of the long run variance.
The remainder of this section is devoted to the discussion of these issues. It
is based on the work of Shao (2010), Shao and Zhang (2010) and Zhang et al.
(2011). These papers contain references to earlier work and to other applications
of self–normalization. A different approach to change point detection which does
312 16 Functional time series
2
in the Skorokhod space. The parameter is the long–run variance:
X
2
D lim N Var XNN D .h/:
N !1
h
Set 8 92
N <X
X n =
DN D N 2 .Xj XNN / :
: ;
nD1 j D1
X n n
n
N 1=2 .Xj XN N / D SN SN .1/:
N N
j D1
Consequently
N n Z
X n n o2 d 1
DN D N 1 SN SN .1/ ! 2
fW .r/ rW .1/g2 dr:
nD1
N N 0
(16.37)
The convergences in (16.36) and (16.37) are joint, so (16.35) follows.
16.6 Self–normalized statistics 313
The key point is the cancelation of 2 when (16.36) is divided by (16.37). Rela-
tion (16.37) shows that DN is an inconsistent estimator of 2 . The distribution of
the right–hand side of (16.35) can however be simulated, and the critical values can
be obtained with arbitrary precision. Relation (16.35) can be used to construct a
confidence interval for without estimating the long run variance. Such a construc-
tion does not require the selection of a bandwidth parameter in the kernel estimates
of 2 .
The normalization analogous to DN is however not suitable for the change point
problem. Simulations reported in Shao and Zhang (2010) show that with such a
normalization the power of change point tests tends to zero as ı increases. These
authors propose a self–normalization that takes into account the behavior under the
alternative. Their approach was extended to functional data by Zhang et al. (2011).
To explain it, we extend and lighten the notation introduced in Section 16.5. Set
X n2
n2
n1 1
UN .n1 ; n2 / D O j D SN ; O SN ; O ;
N N
j Dn1
with the scores defined by (16.24). Note the the sums UN .n1 ; n2 / depend also on
the number d of the EFPC’s to be used. Next, for each 1 k N , introduce the
d d matrices
h n ih n iT
D N .n; k/ D UN .1; n/ UN .1; k/ UN .1; n/ UN .1; k/ ; n k:
k k
and
N nC1
D N .n; k/ D UN .n; N / UN .k C 1; N /
N k
T
N nC1
UN .n; N / UN .k C 1; N / ; n > k:
N k
Using these matrices, we can define the normalizing matrices as
8 9
1 <X X =
k N
VN .k/ D D N .n; k/ C D N .n; k/ :
N :nD1 ;
nDkC1
( )( p )1
X
p
4 O `4 O`2 X O `4
O
˛.1/ D : (16.38)
.1 O` /6 .1 C O` /2 .1 O` /4
`D1 `D1
16.6 Self–normalized statistics 315
Table 16.2 Empirical size (upper panel) and size–adjusted power (lower panel) in percent for the
SN test (i) and the BGHK test (ii) for independent functional data generated as BM or BB. The
size-adjusted power is computed under the alternative with 2 .t/ D t or 2 .t/ D sin.t/, and
k D N=2.
d D1 d D2 d D3
10% 5% 1% 10% 5% 1% 10% 5% 1%
N D 50
BM (i) 10.7 5.7 0.7 9.6 3.7 0.7 10.8 5.2 1.4
(ii) 10.0 5.3 1.2 10.3 5.0 0.8 10.9 5.5 1.0
BB (i) 7.5 3.8 0.8 8.2 4.6 1.1 10.7 6.0 1.3
(ii) 10.6 5.4 0.8 10.9 5.1 1.1 10.5 5.2 1.2
N D 100
BM (i) 9.9 5.1 1.1 9.2 4.3 0.5 9.1 4.6 0.7
(ii) 10.4 5.4 0.5 10.3 4.5 0.6 9.5 3.8 0.6
BB (i) 10.0 5.1 1.3 8.4 3.5 0.7 9.9 4.7 0.7
(ii) 9.6 5.2 0.9 9.3 4.9 0.6 9.1 4.1 0.9
N D 50
BM, t (i) 77.6 64.5 44.9 71.7 58.4 39.4 67.4 51.7 23.8
(ii) 89.5 79.8 48.9 83.6 73.7 48.9 77.8 65.4 38.8
BB, t (i) 99.8 99.4 95.6 100 100 99.6 100 100 99.9
(ii) 100 100 99.7 100 100 100 100 100 100
BM, si n.t / (i) 70.0 57.7 38.9 62.1 48.3 29.1 56.0 41.4 17.0
(ii) 82.1 71.9 39.4 74.4 61.4 36.4 66.9 52.4 28.7
BB, si n.t / (i) 99.3 98.1 89.7 100 99.6 96.9 100 99.9 99.4
(ii) 99.9 99.7 97.6 100 100 100 100 100 100
N D 100
BM, t (i) 96.9 89.9 70.8 92.9 87.4 73.0 90.9 84.0 66.7
(ii) 99.3 98.4 95.5 99.1 97.9 94.0 98.5 96.8 91.2
BB, t (i) 100 99.9 99.6 100 100 100 100 100 100
(ii) 100 100 100 100 100 100 100 100 100
BM, si n.t / (i) 92.7 84.2 62.7 87.1 78.7 59.2 83.9 73.8 52.1
(ii) 98.4 95.8 89.6 96.3 93.5 86.6 95.2 90.9 78.0
BB, si n.t / (i) 99.9 99.7 98.8 100 100 100 100 100 100
(ii) 100 100 100 100 100 100 100 100 100
Here O` is the autoregressive coefficient estimate in the model O n;` D ` O n1;` C
"n;` , and O `2 is the estimate of the innovation variance. Table 16.3 reports the empir-
ical sizes. We see that the size distortion of the BGHK test is very large compared
to the other two tests. This is due to the fact that it is designed only for independent
functional data and is invalid in the temporally–dependent case. For the HK test,
the size distortion is less severe but is sensitive to the choice of d . It tends to be
oversized for small d but undersized for large d . For the SN test, size distortion is
apparent for N D 50, but improves for N D 100. The size for the SN test is fairly
robust to the choice of d . Based on the results reported in Zhang et al. (2011), the
following comments can be made about the size–adjusted power. First, the BGHK
test delivers the highest power among the three tests, which is largely due to its
316 16 Functional time series
Table 16.3 Empirical size in percent of the SN (i), the BGHK (ii) and the HK (iii) test for data
following an FAR(1) process.
d D1 d D2 d D3
10% 5% 1% 10% 5% 1% 10% 5% 1%
N D 50
Gaussian
BM (i) 15.2 10.3 3.9 15.2 8.4 2.4 14.5 8.0 2.2
(ii) 44.1 32.2 16.2 37.5 25.0 12.4 32.7 23.0 11.4
(iii) 17.7 9.2 0.6 11.1 3.2 0.3 4.9 1.1 0.0
BB (i) 17.3 10.6 3.1 14.0 7.1 2.5 14.5 8.2 2.3
(ii) 42.7 32.3 13.8 36.1 25.5 10.0 34.9 23.2 9.6
(iii) 19.8 8.8 0.2 11.1 2.6 0.0 6.3 1.5 0.0
Wiener
BM (i) 16.0 10.4 4.0 16.1 9.2 3.0 16.0 9.8 2.9
(ii) 46.4 33.6 16.6 40.2 26.9 12.6 36.6 25.0 10.2
(iii) 17.5 8.4 0.5 10.9 2.8 0.1 6.1 0.7 0.0
BB (i) 17.0 10.4 2.9 13.3 7.3 2.2 15.4 9.7 2.2
(ii) 42.8 31.0 14.3 37.9 26.7 11.2 36.4 23.9 10.0
(iii) 19.0 8.9 0.2 11.2 3.3 0.0 6.5 1.8 0.0
N D 100
Gaussian
BM (i) 13.3 7.8 2.0 11.7 5.7 1.2 11.7 6.1 1.2
(ii) 51.2 35.9 16.4 39.7 27.9 11.6 34.9 24.1 9.7
(iii) 15.2 7.4 0.4 11.6 3.9 0.2 7.2 2.1 0.0
BB (i) 11.6 6.7 1.6 10.9 4.9 1.1 11.5 7.1 1.2
(ii) 46.7 33.0 13.9 35.9 25.1 10.2 36.4 25.8 11.4
(iii) 16.1 8.0 1.4 12.4 5.3 0.3 10.0 3.5 0.1
Wiener
BM (i) 13.7 7.8 2.1 11.7 5.8 1.3 12.9 7.1 1.3
(ii) 52.2 37.2 17.5 43.8 29.7 12.8 38.3 26.1 11.7
(iii) 15.3 7.4 0.5 11.6 4.1 0.2 7.5 2.6 0.0
BB (i) 11.9 6.4 1.9 10.4 5.6 1.2 12.0 7.8 1.3
(ii) 45.1 32.0 13.5 38.5 27.5 12.8 37.9 27.3 11.9
(iii) 16.4 7.6 1.5 13.3 5.9 0.5 10.8 3.7 0.2
severe upward size distortion. Second, the power of the SN test is comparable to
that of the HK test for N D 50 and BM innovations, but the SN test tends to have
moderate power loss when sample size increases to 100. In the case of the BB inno-
vations, the SN test is superior to the HK test in power. Overall, the severe size
distortion of the BGHK test under weak dependence suggests its inability to accom-
modate dependence and thus it is not recommended for testing for a change point
for dependent functional data. The HK test is able to account for dependence but it is
sensitive to the choice of bandwidth BN and of d . As shown in Figure 16.4, the data
driven bandwidth used in the HK test leads to non–monotonic power. Compared to
the other two tests, the SN test tends to have more accurate size at the expense of
some power loss.
16.7 Functional linear model with dependent regressors 317
in which both the regressors and the responses are functions. The results of this
section can be easily specialized to the case of scalar responses.
In the existing theory, the Xn in (16.39) are assumed to be independent and iden-
tically distributed. For functional time series the assumption of the independence of
the Xn is often questionable, so it is important to investigate if procedures devel-
oped and theoretically justified for independent regressors can still be used if the
regressors are dependent.
We focus here on the estimation of the kernel .t; s/. Our result is motivated by
the work of Yao et al. (2005b) who considered functional regressors and responses
obtained from sparse independent data measured with error. The data that moti-
vates our work are measurements of physical quantities obtained with negligible
errors or financial transaction data obtained without error. In both cases the data
are available at fine time grids, and the main concern is the presence of temporal
dependence between the curves Xn . We therefore merely assume that the sequence
fXn g 2 L4H is L4 –m–approximable, which, as can be easily seen, implies the L4 –
m–approximability of fYn g. To formulate additional technical assumptions, we need
to introduce some notation.
We assume that the errors "n are iid and independent of the Xn , and denote by X
and Y random functions with the same distribution as Xn and Yn , respectively. We
work with their expansions
1
X 1
X
X.s/ D i vi .s/; Y .t/ D j uj .t/;
i D1 j D1
where
˝ ˛the vj are the FPC’s of X and the uj the FPC’s of Y , and i D hX; vi i ; j D
Y; uj : Indicating with the “hat” the corresponding empirical quantities, an estima-
tor of .t; s/ proposed by Yao et al. (2005b) is
X
K X
L
O KL .t; s/ D O 1
` O `k u
O k .t/vO ` .s/;
kD1 `D1
1 X
N
O `k D hXi ; vO` i hYi ; uO k i ; (16.40)
N
i D1
318 16 Functional time series
but any estimator for which Lemma 16.6 of Section 16.11 holds can be used without
affecting the rates.
Let j and j be the eigenvalues corresponding to vj and uj . Define ˛j as in
Lemma 2.3 and define ˛j0 accordingly with j instead of j . Set
Assumption 16.9. (i) We have 1 > 2 > and 1 > 2 > .
(ii) We have K D K.N /, L D L.N / ! 1 and
KL
0 D o N 1=2 :
L minfhK ; hL g
Remark 16.1. Horváth and Reeder (2011) showed, under more general conditions,
that for fixed K and L
ZZ h i2
O KL .t; s/ KL .t; s/ dt ds D OP .N 1 /;
where
X
K X
L
KL .t; s/ D 1
` EŒ` k uk .t/v` .s/:
kD1 `D1
and sparsely observed, whereas Theorem 16.9 admits dependence, but does not deal
with curves measured with error at irregular points.
In this, and the following sections of this chapter, we use the following conventions:
A generic X , which is assumed to be equal in distribution to X1 , will be used at
some places. Any constants occurring will be denoted by 1 ; 2 ; : : : The i may
change their values from proof to proof.
This can be shown by using that X0 and Xk.k/ are independent. Furthermore it can
be readily verified that
X
E hB0 ; Bk iS D E hX0 ; Xk i2 2j ; k 1: (16.44)
j 1
˝ ˛2 ˝ ˛˝ ˛
D hX0 ; Xk i2 X0 ; Xk0 2 X0 ; Xk Xk0 X0 ; Xk0 :
Thus
˝ ˛2 ˝ ˛2 ˝ ˛˝ ˛
hX0 ; Xk i2 X0 ; Xk0 D X0 ; Xk Xk0 C 2 X0 ; Xk Xk0 X0 ; Xk0 :
320 16 Functional time series
(16.45)
Combining (16.43), (16.44), (16.45) and using the Definition of L4 –m–
approximability yields the proof of our theorem, with UX equal to the sum over
k 1 of the right hand side of (16.45). t
u
Proof of Theorem 16.3. The proof is split into two steps. First we show that
P PN .m/
N 1=2 N i D1 Xi .t/ is close to N
1=2
i D1 Xi .t/, if m is sufficiently large. Then
we establish the claim for m-dependent functions, for any m 1.
As the first step, we show that
Z " X
N
#2
1=2 .m/
lim sup lim sup E N Xi .t/ Xi .t/ dt D 0; (16.46)
m!1 N !1 i D1
In the proof, we will repeatedly use independence relations which follow from rep-
resentation (16.6). First observe that if j > i , then .Xi ; Xi.m/ / is independent of
Xj.j i / because
X
E Xi .t/ Xi.m/ .t/ Xj .t/
1i <j N
X
D E Xi .t/ Xi.m/ .t/ Xj .t/ Xj.j i / .t/ :
1i <j N
16.8 Proofs of the results of Sections 16.2 and 16.3 321
X Z 2 12 2 12
.m/ .j i /
E Xi .t/ Xi .t/ E Xj .t/ Xj .t/ dt
1i <j N
X Z 2 12 Z 2 12
E Xi .t/ Xi.m/ .t/ dt E Xj .t/ Xj.j i / .t/ dt
1i <j N
X Z 2 12 Z 2 12
.m/ .j i /
D E X0 .t/ X0 .t/ dt E X0 .t/ X0 .t/ dt
1i <j N
Z 2 12 X Z 2 12
.m/ .k/
N E X0 .t/ X0 .t/ dt X0 .t/ X0 .t/ dt
k1
X
D N 2 X0 X0.m/ 2 X0 X0.k/ :
k1
Hence, by (16.4),
ˇ ˇ
ˇZ X h ˇ i
1 ˇ ˇ
lim sup lim sup ˇ E Xi .t/ Xi .t/ Xj .t/ dt ˇˇ D 0:
.m/
m!1 N !1 N
ˇ
ˇ 1i <j N ˇ
Next we define X
Xi.K/ .t/ D hXi ; v` i v` .t/:
1`K
Utilizing X
lim E hX0 ; v` i2 D 0
K!1
`K
.K/
The sum of the Xi ’s can be written as
1 X .K/
X 1 X
Xi .t/ D v` .t/ hXi ; v` i :
N 1=2 1i N
N 1=2 1i N
1`K
Next, we use the central limit theorem for stationary m-dependent sequences of
random vectors (see Lehmann (1999) and the Cramér-Wold theorems in DasGupta
(2008), pages 9 and 120)) and get that
8 9T
< 1 X = d
hX i ; v` i ; 1 ` K ! NK .0; K /;
: N 1=2 ;
1i N
16.8 Proofs of the results of Sections 16.2 and 16.3 323
where NK .0; K / is a K-dimensional normal random variable with zero mean and
covariance matrix K D diag.1 ; : : : ; K /: Thus we proved that for all K > 1
X d X
N 1=2 Xi.K/ .t/ ! 1=2
` N` v` .t/ in L2 ;
1i N 1`K
X̀
Sk;` D .Xi /:
i Dk
Observe that
jj j 1
Oj Qj D 1 .XNN /2 C .XN N /.S1;N jj j C Sjj jC1;N / DW ıj :
N N
We therefore have the decomposition
X X
O2 D !q .j / Qj C !q .j /ıj DW O 12 C O 22 :
jj jq jj jq
and
P
O 22 ! 0: (16.48)
We begin with the verification of the easier relation (16.48). By (16.15),
X X
Ej O 22 j b Ejıj j b E.XN N /2
jj jq jj jq
b X
E.XN N /2
1=2 1=2
C E.S1;N jj j C Sjj jC1;N /2 :
N
jj jq
324 16 Functional time series
By Lemma 16.2,
1 X jj j
E.XNN /2 D 1 j D O.N 1 /:
N N
jj jN
X 1
X
N jj j
E O 12 D !q .j / j ! j:
N
jj jq j D1
To lighten the notation, without any loss of generality, we assume from now on that
D 0, so that
0 1
jkj
NX j`j
NX
1
Cov. Qk ; Q` / D 2 Cov @ Xi Xi Cjkj ; Xj Xj Cj`j A :
N
i D1 j D1
Therefore, by stationarity,
1 X ˇˇ
N
ˇ
jCov. Qk ; Q` /j 2 Cov Xi Xi Cjkj ; Xj Xj Cj`j ˇ
N
i;j D1
1 X jrj ˇˇ ˇ
D 1 Cov X0 Xjkj ; Xr XrCj`j ˇ :
N N
jrj<N
The last sum can be split into three terms corresponding to r D 0, r < 0 and r > 0.
The contribution to the left–hand side of (16.49) of the term corresponding to
r D 0 is
X ˇ ˇ
N 1 ˇCov X0 Xjkj ; X0 Xj`j ˇ D O.q 2 =N /:
jkj;j`jq
16.8 Proofs of the results of Sections 16.2 and 16.3 325
The terms corresponding to r < 0 and r > 0 are handled in the same way, so we
focus on the contribution of the summands with r > 0 which is
r ˇˇ
X X
N 1
ˇ
N 1 1 Cov X0 Xjkj ; Xr XrCj`j ˇ :
N
jkj;j`jq rD1
and
.rCj`j/
Cov X0 Xjkj ; Xr.r/ XrCj`j
.jkj/ .rCj`j/ .jkj/ .rCj`j/
D Cov X0 Xjkj ; Xr.r/ XrCj`j C Cov X0 .Xjkj Xjkj /; Xr.r/ XrCj`j :
h i h i
.jkj/ .r/ .rCj`j/ .rCj`j/
E X0 Xjkj Xr XrCj`j E X0 Xjkj E Xr.r/ XrCj`j
h i h ih i
.jkj/ .rCj`j/ .jkj/ .rCj`j/
D EŒX0 E Xjkj Xr.r/ XrCj`j EŒX0 E Xjkj Xr.r/ XrCj`j D 0:
We thus obtain
.jkj/ .rCj`j/
Cov X0 Xjkj ; Xr XrCj`j D Cov X0 .Xjkj Xjkj /; Xr.r/ XrCj`j
.rCj`j/
C Cov X0 Xjkj ; Xr XrCj`j Xr.r/ XrCj`j :
X X
N 1 ˇ ˇ
ˇ .rCj`j/ ˇ
N 1 ˇCov X0 Xjkj ; Xr XrCj`j Xr.r/ XrCj`j ˇ ! 0:
jkj;j`jq rD1
This is done using the technique introduced in the proof of Theorem 16.1. By the
Cauchy-Schwarz inequality, the problem reduces to showing that
X X
N 1 n o1=2 2 1=2
1 .rCj`j/
N EŒX02 Xjkj
2
E Xr XrCj`j Xr.r/ XrCj`j ! 0:
jkj;j`jq rD1
326 16 Functional time series
1
X X h i4 1=4
N 1 E Xr Xr.r/ ;
jkj;j`jq rD1
Proof of Proposition 16.3. We only prove the left relation in (16.27). The element
O Ǒ / is given by
in the k-th row and `-th column of Ȯ .ˇ/ Ȯ .C
X
q
wq .h/ X
ˇkn ˇ`;nCh cOk ˇOkn cO` ˇO`;nCh
N
hD0 1nN h
(16.50)
X
q
wq .h/ X
C ˇk;nCh ˇ`;n cOk ˇOk;nCh cO` ˇO`;n :
N
hD1 1nN h
For reasons of symmetry it suffices to study (16.50), which can be decomposed into
X
q
wq .h/ X
ˇkn ˇ`;nCh cO` ˇO`;nCh
N
hD0 1nN h
(16.51)
X
q
wq .h/ X
C cO` ˇO`;nCh ˇkn cOk ˇOkn :
N
hD0 1nN h
As both summands above can be treated similarly, we will only treat (16.51). For
any " > 0 we have
0ˇ ˇ 1
ˇX ˇˇ
ˇ q wq .h/ X
P @ˇˇ ˇkn ˇ`;nCh cO` ˇO`;nCh ˇˇ > " A
ˇhD0 N 1nN h ˇ
0ˇ ˇ 1
ˇX ˇˇ " X
ˇ q wq .h/ X q
P @ˇˇ ˇkn ˇ`;nCh cO` ˇO`;nCh ˇˇ > wq .h/A
ˇhD0 N 1nN h ˇ q hD0
0 ˇ ˇ 1
Xq ˇ X ˇ
1 ˇ ˇ "
P @ ˇˇ ˇkn ˇ`;nCh cO` ˇO`;nCh ˇˇ > A : (16.52)
N ˇ ˇ q
hD0 1nN h
have
0ˇ ˇ 1
ˇ X ˇˇ " N
ˇ
P @ˇˇ ˇkn ˇ`;nCh cO` ˇO`;nCh ˇˇ > A
ˇ1nN h ˇ q
!
X
N X
N 2 "2 N 2
P 2
ˇkn ˇ`n cO` ˇO`n >
nD1 nD1
q2
! !
1 X 2
N
1 X
N 2 "2
P ˇ > q˛N CP ˇ`n cO` ˇO`n > 3
N nD1 kn N nD1 q ˛N
!
1 X
2 N
Eˇk1 "2
CP kYn k2 kv` cO` vO ` k2 > 3
q˛N N nD1 q ˛N
!
1 X
N
EkY1 k2
CP kYn k2 > 2EkY1 k2
q˛N N nD1
"2
C P kv` cO` vO ` k2 >
2EkY1 k2 q 3 ˛N
P
N
EkY1 k 2 Var N
1
nD1 kY n k2
2C0 EkY1 k2 q 3 ˛N
C C :
q˛N E kY1 k
2 2 N "2
It can be easily shown that for U , V in L4H
2 kU k2 kV k2 42 .U V / C 2 f4 .U / C 4 .V /g 4 .U V /:
An immediate consequence is that L4 –m–approximability of fYn g implies L2 –
m–approximability of the scalar sequence fkYn k2 g. A basic result for stationary
sequences gives
!
1 X 1 X ˇˇ
N
ˇ
Var 2
kYn k Cov kY0 k2 ; kYh k2 ˇ ;
N nD1 N
h2Z
where the by Lemma 16.2 the autocovariances are absolutely summable. Hence the
summands in (16.52) are bounded by
1 1 q 3 ˛N
C1 C C ;
q˛N N N "2
where the constant C1 depends only on the law of fYn g. The proof of the proposition
follows immediately from our assumptions on q and ˛N . t
u
The proof of this result needs some preliminary lemmas, which we establish first.
We can assume without loss of generality that K.u/ D 0 if juj > 1. Let m be a
328 16 Functional time series
X
N 1
.m/
K.k=BN / O k ;
.m/
˙N D
kD.N 1/
where
X k/
min.N;N
O .m/ D 1 X.m/ .X.m/ /T
k ` `Ck
N
`Dmax.1;1k/
are the sample covariances of lag k. Since K is symmetric, K.0/ D 1 and K.u/ D 0
outside Œ1; 1, we have that, for all sufficiently large N ,
X
BN X
BN
˙ .m/ O .m/ C .m/
K.k=BN / O k C
.m/
K.k=BN /. O k /T :
N D 0
kD1 kD1
Lemma 16.3. If Assumptions 16.2-16.6 are satisfied, then we have for every m,
P
˙ .m/
N ! ˙
.m/
;
as N ! 1:
Proof. Since the sequence X.m/
` is m-dependent we have that
X
m X
m
˙ .m/ D EX1 XT1 C EX1 XT`C1 C EX`C1 XT1 :
`D1 `D1
It follows from the ergodic theorem that for any fixed k and m
O .m/ !
P
EX.m/ .m/ T
k 1 .X1Ck / :
P X
m X
m
! EX1 XT1 C EX1 XT`C1 C EX`C1 XT1 :
`D1 `D1
16.9 Proof of Theorem 16.6 329
X
BN
.m/ P
K.k=BN / O k ! 0 (16.53)
kDmC1
and
X
BN
.m/ P
K.k=BN /. O k /T ! 0: (16.54)
kDmC1
X
BN
.m/
E.m/
N D K.k=BN / O k
kDmC1
1 X .m/ .m/ T
X
BN N k
D K.k=BN / X` X`Ck
N
kDmC1 `D1
N .mC1/
X
D X.m/
`
H.m/
`;N
;
`D1
where
`;BN /
min.NX
K.k=BN / T
H.m/
`;N D X.m/
`Ck :
N
kDmC1
Let
N .mC1/
X
.m/ .m/ .m/
EN .i; j / D Xi ` H`;N .j /; 1 i; j pq;
`D1
where Xi ` and H`;N .j / are the i th and the j th coordinates of the vectors X`;N
.m/ .m/ .m/
.m/
and H`;N , respectively. Next we write
0 12
2 N .mC1/
X
DE@ Xi ` H`;N .j /A
.m/ .m/ .m/
E EN .i; j /
`D1
XX
.m/
D E H`;N .j /Xi.m/
`
Xi.m/ .m/
r Hr;N .j /
1rN .mC1/
1`N .mC1/
.m/ .m/
D E1;N .i; j / C E2;N .i; j /;
330 16 Functional time series
where
XX
.m/ .m/
E1;N .i; j / D E H`;N .j /Xi.m/
` X .m/ .m/
ir Hr;N .j / ;
1rN .mC1/
1`N .mC1/
jr`jm
and
XX
.m/ .m/
E2;N .i; j / D E H`;N .j /Xi.m/
` X .m/ .m/
ir Hr;N .j / :
1rN .mC1/
1`N .mC1/
jr`j>m
.m/
E H`;N .j /Xi.m/
`
X .m/ .m/
ir H r;N .j /
8
ˆ .m/ .m/ .m/ .m/
ˆEXi ` E H`;N .j /Xi r Hr;N .j / r > m C `;
ˆ
<
D EXi.m/ E H .m/
.j /X .m/ .m/
H .j / ` > m C r;
ˆ
ˆ
r `;N i` r;N
:̂E H .m/ .j /X .m/ X .m/ H .m/ .j / j` rj m;
`;N i` ir r;N
(
0 j` rj > m;
D .m/ .m/ .m/ .m/
E H`;N .j /Xi ` Xi r Hr;N .j / j` rj m:
Thus we have
.m/
EE2;N .i; j / D 0:
`;BN / min.NX
min.NX `;BN /
K.k=BN / K.v=BN /
.m/ .m/ .m/
E.H`;N .j //2 D E Xj;`Ck Xj;`Cv
vDmC1
N N
kDmC1
(16.55)
M2
`;BN / min.NX
min.NX `;BN /
.m/ .m/
E Xj;`Ck Xj;`Cv
N2 vDmC1
kDmC1
2 X
m ˇ ˇ
M ˇ .m/ ˇ
BN E ˇXj.m/
0 X jr ˇ
N2 rDm
BN
DO :
N2
16.9 Proof of Theorem 16.6 331
In the next step we will first use the Cauchy-Schwarz inequality, then the indepen-
.m/
dence of H`;N .j / and Xi.m/
`
.m/
and the independence of Hr;N .j / and Xi.m/
r to get
ˇ ˇ XX ˇ ˇ
ˇ .m/ ˇ ˇ .m/ ˇ
ˇE2;N .i; j /ˇ E ˇH`;N .j /Xi.m/
`
X .m/ .m/
ir Hr;N .j / ˇ
1rN .mC1/
1`N .mC1/
jr`jm
XX 2 1=2 2 1=2
.m/ .m/ .m/ .m/
E H`;N .j /Xi ` E Xi r Hr;N .j /
1rN .mC1/
1`N .mC1/
jr`jm
XX 2 1=2 2 1=2 2 1=2
.m/
E H`;N .j / E Xi.m/
`
E X .m/
ir
1rN .mC1/
1`N .mC1/
jr`jm
2 1=2
.m/
E Hr;N .j /
1=2
! 1=2
!
BN BN
2mNO O.1/O.1/O
N N
BN
DO
N
D o.1/;
where we also used (16.55) and Assumption 16.6. This completes the proof of
Lemma 16.3. u
t
and
!2
1 X
N
.m/ i kt
lim sup lim sup sup E Xjk e < 1: (16.58)
N !1 m!1 1<t <1 N 1=2
kD1
332 16 Functional time series
!2
X
N
.m/
X .m/
E .Xjk Xjk /e i kt D E..Xjk Xjk /e i kt /2
kD1 1kN
X h i
.m/
C2 E .Xjk jk /.Xj ` Xj.m/
` / e i.kC`/t :
1k<`N
It follows from Assumption 16.4 that there is a sequence c1 .m/ ! 0 such that
ˇ ˇ
ˇ ˇ
ˇ X ˇ
ˇ .m/ 2 i 2kt ˇ
ˇ E.X jk X jk / e ˇ Nc1 .m/:
ˇ1kN ˇ
Next we write
X h i
.m/
E .Xjk Xjk /Xj ` e i.kC`/t
1k<`N
X h i
.m/
D E .Xjk Xjk /.Xj ` Xj`k
` / e i.kC`/t ;
1k<`N
.m/
since .Xk ; Xk / and X``k are independent. Using the Cauchy-Schwarz inequality
first, then Assumption 16.4 again, we get that
X ˇ h i ˇ
ˇ .m/ .`k/ ˇ
ˇE .Xjk Xjk /.Xj ` Xj ` / e i.kC`/t ˇ
1k<`N
X h i1=2 h i1=2
.m/ .`k/ 2
E.Xjk Xjk /2 E.Xj ` Xj ` /
1k<`N
h i1=2 X h i1=2
.m/ .k/
N E.Xj1 Xj1 /2 E.Xj1 Xj1 /2
1k<1
D Nc2 .m/
X ˇ h i ˇ
ˇ .m/ .m/ ˇ
ˇE .Xjk Xjk /Xj ` e i.kC`/t ˇ D Nc3 .m/;
1k<`N
X
N X
2 2i kt
D EXjk e C2 EXjk Xj ` e i.kC`/t
kD1 1k<`N
X
N X .`k/
2
D EXj1 e 2i kt C 2 EXjk .Xj ` Xj ` /e i.kC`/t ;
kD1 1k<`N
.`k/ .`k/
since by the independence of Xjk and Xj ` we have that EXjk Xj ` D 0.
Using the Cauchy-Schwarz inequality with Assumption 16.4 we get that
ˇ ˇ
ˇ ˇ
ˇ X ˇ
ˇ EXjk Xj ` Xj `
.`k/ i.kC`/t ˇ
ˇ e ˇ cN
ˇ1k<`N ˇ
with some constant c, completing the proof of (16.57). The same arguments can be
used to prove (16.58). t
u
PN .m/ P N .m/ i kt
Next we define SN .t/ D kD1 Xk e
i kt
and SN .t/ D kD1 Xk e . Let
SN .t/ be the conjugate transpose of SN .t/ and introduce
1
IN .t/ D SN .t/SN .t/
N
1 X X
N N
D Xk e i kt
XT` e i `t
N
kD1 `D1
1 X X i t .k`/
N N
D e Xk XT`
N
`D1 kD1
X
N 1
1 X k/
min.N;N
D e i t k Xk XT`Ck
N
kD1N `Dmax.1;1k/
X
N 1
D e i t k O k :
kD1N
Similarly we define
1 .m/ X
N 1
.m/
I.m/
N .t/ D SN .t/ S.m/
N .t/ D e i t k O k :
N
kD1N
1 ˇˇ ˇ
ˇ
ˇSN .t/.SN .t/ .SN .t// /ˇ
.m/
N
1 ˇˇ ˇ
ˇ
ˇ.SN .t/ SN .t//.SN .t// ˇ :
.m/ .m/
C
N
Now the result follows from Lemma 16.4 via the Cauchy-Schwartz inequality. t
u
O
Proof ofR Theorem 16.6. Recall that the Fourier transform, K.u/, O
of K is K.u/ D
1
f2g1 1 K.s/e i su ds. Since K and KO are in L1 and both are Lipschitz func-
R1
O
tions, the inversion formula gives K.s/ D 1 K.u/e i su
du: From the relationship
between K and KO and from the fact that K is supported on the interval Œ1; 1, we
obtain:
X
BN
˙N D K.k=BN / O k
kDBN
X
N 1
D K.k=BN / O k
kD1N
X
N 1 Z 1
D O
K.u/e i.k=BN /u
du O k
1
kD1N
Z 1 X
N 1
D O
K.u/ e i.u=BN /k O k du
1
kD1N
Z 1
D O
K.u/IN .u=BN /du:
1
Similarly,
Z 1
˙ .m/ O .m/
N D K.u/IN .u=BN /du:
1
Hence we have
ˇ ˇ ˇZ 1 ˇˇ
ˇ ˇ ˇ
E ˇˇ˙ N ˙ .m/
N ˇ
ˇDEˇ
ˇ
O
K.u/ IN .u=BN / I.m/
N .u=BN / duˇˇ
1
Z 1ˇ ˇ ˇ ˇ
ˇ O ˇ ˇ .m/ ˇ
ˇK.u/ˇ E ˇ IN .u=BN / I .u=BN / ˇ du N
1
ˇˇ ˇˇ Z 1 ˇ ˇ
ˇˇ .m/ ˇˇ ˇO ˇ
sup ˇˇIN .t/ IN .t/ˇˇ ˇK.u/ˇ du:
1<t <1 1 1
16.10 Proofs of Theorems 16.7 and 16.8 335
.m/ P
˙ N ! ˙ .m/ :
Since
˙ .m/ ! ˙ ;
The proof of Theorem 16.7 relies on Theorem A.1 of Aue et al. (2009), which we
state here for ease of reference.
where fW./.x/; x 2 Œ0; 1g is a mean zero Gaussian process with covariances
1
GN .x; / D Ln .x; /T Ȯ ./1 Ln .x; /T :
N
We notice that replacing the LN .x; / O with LN .x; Ǒ / does not change the test
statistic in (16.31). Furthermore, since by the second part of Proposition 16.3
j Ȯ ./
O Ȯ . Ǒ /j D oP .1/, it is enough to study the limiting behavior of the sequence
GN .x; Ǒ /. This is done by first deriving the asymptotics of GN .x; ˇ/ and then ana-
lyzing the effect of replacing ˇ with Ǒ .
336 16 Functional time series
Let ˇ .m/
i be the m-dependent approximations for ˇ i which are obtained by
replacing Yi .t/ in (16.25) by Yi.m/ .t/. For a vector v in Rd we let jvj be its Euclidian
norm. Then
X
d 2
Ejˇ 1 ˇ .m/ 2
1 j DE
.m/
ˇ`1 ˇ`1
`D1
X
d Z 2
.m/
D E .Y1 .t/ Y1 .t//v` .t/dt
`D1
X
d Z Z
.m/
E .Y1 .t/ Y1 .t//2 dt v`2 .t/dt
`D1
.m/
D d 22 .Y1 Y1 /:
1 D d Œ0;1
p SN .x; ˇ/ ! W.ˇ/.x/:
N
The coordinatewise absolute convergence of the series ˙ .ˇ/ follows from part (a)
of Theorem 16.5. By assumption the estimator Ȯ .ˇ/ is consistent and consequently
Z d Z
X
DŒ0;1
GN .x; ˇ/dx ! B`2 .x/dx
`D1
1 O Ǒ /j D oP .1/
sup p jSN .x; ˇ/ SN .x; C (16.60)
x2Œ0;1 N
and
j Ȯ .ˇ/ Ȯ .C
O Ǒ /j D oP .1/: (16.61)
16.10 Proofs of Theorems 16.7 and 16.8 337
Relation (16.61) follows from Proposition 16.3. To show (16.60) we observe that by
the Cauchy-Schwarz inequality and Theorem 16.2
1 O Ǒ /j2
sup jSN .x; ˇ/ SN .x; C
x2Œ0;1 N
d ˇ Z bN ˇ2
1 X ˇˇ X xc
ˇ
D sup ˇ Yn .t/.v` .t/ cO` vO` .t//dt ˇˇ
x2Œ0;1 N `D1 nD1
Z bN
X xc 2 d Z
X
1
sup Yn .t/ dt .v` .t/ cO` vO ` .t//2 dt
N x2Œ0;1 nD1 `D1
Z X
k 2
1
max Yn .t/ dt OP .N 1 /:
N 1kN
nD1
Define
1=2 X .r/ 1=2
g.t/ D EjY1 .t/j2 C 2 EjY1 .t/j2 EjY1Cr .t/ Y1Cr .t/j2 :
r1
and
O 2 is defined analogously.
338 16 Functional time series
It follows from (16.62) that TN .d / can be expressed as the sum of three terms:
TN .d / D T1;N .d / C T2;N .d / C T3;N .d /;
where
Z
1 1 O T Ȯ ./ O
T1;N .d / D LN .x; ˇ/ O 1 LN .x; ˇ/dxI
N 0
N
T2;N .d / D .1 /ŒO1 O 2 T Ȯ ./
O 1 ŒO1 O 2 I
2
Z 1
T3;N .d / D g.x; /LN .x; ˇ/O T Ȯ ./O 1 Œ
O1 O 2 dx;
0
˚
with g.x; / D 2 x.1 /Ifx g C .1 x/Ifx> g :
Since ˝ in (16.33) is positive definite (p.d.), Ȯ ./
O is almost surely p.d. for large
enough N (N is random). Hence for large enough N the term T1;N .d / is nonnega-
tive. We will show that N 1 T2;N .d / 1 C oP .1/; for a positive constant 1 , and
N 1 T3;N .d / D oP .1/: To this end we notice the following. Ultimately all eigen-
values of Ȯ ./
O are positive. Let .N / and .N / denote the largest, respectively,
the smallest eigenvalue. By Lemma 2.2, .N / ! a.s. and .N / ! a.s.,
where and are the largest and smallest eigenvalue of ˝. Next we claim that
O1
j O 2 j D j1 2 j C oP .1/:
To obtain this, we use the relation kvO i cOj vj k D oP .1/ which can be proven
similarly as Lemma A.1 of Berkes et al. (2009), but the law of large numbers in
a Hilbert space must be replaced by the ergodic theorem. The ergodicity of fYn g
follows from the representation Yn D f ."n ; "n1 ; : : :/. Notice that because of the
presence of a change point it cannot be claimed that kvO i cOj vj k D OP .N 1=2 /.
It follows that if N is large enough, then
1 1
O1
Œ O 2 T Ȯ ./
O 1 Œ
O1 O 2 > j O1 O 2 j2 D j1 2 j2 C oP .1/:
2 2
To verify N 1 T3;N .d / D oP .1/, observe that
ˇ ˇ
sup ˇLN .x; ˇ/O T Ȯ ./
O 1 Œ
O1 O 2 ˇ
x2Œ0;1
O Ȯ ./
sup jLN .x; ˇ/jj O 1 jj
O1
O 2j
x2Œ0;1
D oP .N /j1 2 j:
We first establish a technical bound which implies the consistency of the estimator
O `k given in (16.40). Let cO` D sign.hv` ; vO` i/ and dOk D sign.huk ; uO k i/.
16.11 Proof of Theorem 16.9 339
where
ZZ X
N
1
T1 D Xi .s/Yi .t/ EŒXi .s/Yi .t/ uk .s/v` .t/dt dsI (16.63)
N
i D1
N ZZ
1 X
T2 D EŒXi .s/Yi .t/ uk .t/v` .s/ dOk uO k .t/cO` vO ` .s/ dt ds: (16.64)
N
i D1
we obtain
ZZ X
N 2
1
T12 2 Xi .s/Yi .t/ EŒXi .s/Yi .t/ dt dsI (16.66)
N
i D1
T22 D 222 .X /22 .Y / kuk dOk uO k k2 C kv` cO` vO ` k2 : (16.67)
Hence by similar arguments as we used for the proof of Theorem 16.1 we get
NET12 D O.1/. The proof follows now immediately from Lemma 2.3 and The-
orem 16.1.
Now we are ready to verify (16.42). We have
X
K X
L
O KL .t; s/ D O 1
` O `k u
O k .t/vO ` .s/:
kD1 `D1
The orthogonality of the sequences fuk g and fv` g and (16.41) imply that
ZZ X X 2
1
` `k uk .t/v` .s/ dt ds
k>K `>L
X X ZZ
D 2
`
2 2 2
`k uk .t/v` .s/dt ds
k>K `>L
XX
D 2
`
2
`k !0 .L; K ! 1/:
k>K `>L
340 16 Functional time series
Therefore, letting
X
K X
L
KL .t; s/ D 1
` `k uk .t/v` .s/;
kD1 `D1
X L ZZ h
K X i2 P
KL 1
` `k uk .t/v` .s/ O 1
` O `k u
O k .t/vO ` .s/ dt ds ! 0
kD1 `D1
(16.68)
.N ! 1/:
Hence
ZZ h i2
1
1
` O 1
`k uk .t/v` .s/ ` O `k u
O k .t/vO ` .s/ dt ds
4
ˇ ˇ
2 O 2 2 ˇ 1 O 1 ˇ2
` j `k cO` dk O `k j C O `k ` `
C `k ` kuk dOk uO k k2 C kv` cO` vO ` k2 :
2 2
X
K X
L
P
KL 2
` j `k cO` dOk O `k j2 ! 0I (16.69)
kD1 `D1
X
K X
L
ˇ ˇ2 P
2 ˇ 1
KL O `k ` O 1
`
ˇ ! 0I (16.70)
kD1 `D1
X
K X
L
P
KL 2 2
`k ` kuk dOk uO k k2 C kv` cO` vO` k2 ! 0: (16.71)
kD1 `D1
Assumption 16.9 (ii) assures that the last term goes to zero. This completes the
proof.
Chapter 17
Spatially distributed functional data
Chapters 13, 14 and 16 focused on functional time series. The present chapter and
Chapter 18 deal with curves observed at spatial locations. The data consist of curves
X.sk I t/; t 2 Œ0; 1; observed at spatial locations s1 ; s2 ; : : : ; sN . We propose meth-
ods for the estimation of the mean function and the FPC’s for such data. We also
develop a significance test for the correlation of two such functional spatial fields.
The test we consider in this section is an extension of the test of Chapter 9 in which
the pairs of curves were assumed to be independent. The main feature of spatially
distributed curves is that the curves at neighboring locations look similar, so the
dependence cannot be neglected, and, together with the spatial distribution of the
locations s1 ; s2 ; : : : ; sN , is the main feature of the data. After validating the finite
sample performance of the test by means of a simulation study, we apply it to
determine if there is correlation between long term trends in the so called critical
ionospheric frequency and changes in the direction of the internal magnetic field
of the Earth. The test provides conclusive evidence for correlation thus solving a
long standing space physics conjecture. This conclusion is not apparent if the spa-
tial dependence of the curves is neglected. This chapter focuses on methodological
and computational issues. Chapter 18 investigates the asymptotic properties of the
sample mean and of the EFPC’s for spatially distributed functions.
This chapter is organized as follows. Section 17.1 introduces spatially distributed
functional data in greater detail, and provides the motivation for the research pre-
sented in this Chapter. In Section 17.2, we briefly describe the fundamental concepts
of spatial statistics required to understand the remaining sections. Sections 17.3 and
17.4 focus, respectively, on the estimation of the mean function and the FPC’s in the
spatial setting. Section 17.5 demonstrates by means of a simulation study that the
methods we propose improve on the standard approach, and discusses their relative
performance and computational cost. In Section 17.6, we develop a test for the cor-
relation of two functional spatial fields. This test requires estimation of a covariance
tensor. After addressing this issue in Section 17.7, we study in Section 17.8 the finite
sample properties of several implementations of the test. Finally, in Section 17.9, we
apply the methodology developed in the previous section to test for the correlation
between the ionospheric critical frequency and magnetic curves.
L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, 343
Springer Series in Statistics 200, DOI 10.1007/978-1-4614-3655-3_17,
© Springer Science+Business Media New York 2012
344 17 Spatially distributed functional data
17.1 Introduction
01 02 03 04 05 06 07 08 09 10 11 12
Month
Fig. 17.1 Average annual temperature curves at 35 locations in Canada. The continuous thick line
is the simple average, the dashed line is the average that takes into account spatial locations and
dependence.
17.1 Introduction 345
from 1840 to 1990 at 191 Australian weather stations. Snow water curves measured
at several dozen locations in every state over many decades have been studied in the
purely spatial framework, e.g. Carroll et al. (1995) and Carroll and Cressie (1996).
Useful insights can potentially be gained by studying the whole curves reflecting the
temporal dynamics, rather than just temporal averages. Another important example
are pollution curves: X.sk I t/ is the concentration of a pollutant at time t at location
sk . Data of this type were studied by Kaiser et al. (2002). A functional framework
might be convenient because such data are typically available only at sparsely dis-
tributed time points tj , which can be different at different locations. In many studies,
X.sk I t/ is the count at time t of an infectious disease cases, where sk is a represen-
tative location, e.g. a “middle point” of a county. Delicado et al. (2010) review other
examples and contributions to the methodology for spatially distributed functional
data. The work with geostatistical functional data has focused on kriging, see Deli-
cado et al. (2010), Nerini et al. (2010) and Giraldo et al. (2010, 2011).
The data set that most directly motivated the research described in this chapter
consists of the curves of the so–called F2–layer critical frequency, foF2. Three such
curves are shown in Figure 17.2. In principle, foF2 curves are available at over
200 locations throughout the globe, but sufficiently complete data are available at
only 30-40 locations which are very unevenly spread; for example, there is a dense
network of observatories over Europe and practically no data over the oceans. The
study of this data set has been motivated by the hypothesis of Roble and Dickinson
(1989) who suggested that the increasing amounts of (radiative) greenhouse gases
14
10
6
14 2
foF2, MHz
10
6
14 2
10
6
2
Fig. 17.2 F2-layer critical frequency curves at three locations. Top to bottom (latitude in parenthe-
ses): Yakutsk (62.0), Yamagawa (31.2), Manila (14.7). The functions exhibit a latidudal trend in
amplitude.
346 17 Spatially distributed functional data
EXOSPHERE
600
Height, km
THERMOSPHERE F
300
E
85 D
MESOSPHERE
45 STRATOSPHERE
12 TROPOSPHERE
Fig. 17.3 Typical profile of day time ionosphere. The curve shows electron density as a function
of height. The right vertical axix indicates the D, E and F regions of the ionosphere.
where .t/ D X.sI t/, the vi are the eigenfunctions of the covariance operator
and i .s/ D hX.s/ ; vi i. Note that the mean function and the FPC’s vi do not
depend on s. Even if model (17.1) does not hold, our estimates of the mean function
and the FPC’s provide useful descriptive statistics, as illustrated in Figure 17.1. For
the applications we have in mind, it is enough to assume that the spatial domain is a
subset of the plane or a two–dimensional sphere.
Recall that for functions, X1 ; X2 ; : : : ; XN , the sample mean is defined as
X
N
XN N D N 1 Xn ;
nD1
X
N
˝ ˛
b.x/ D N 1
C .Xn XNN /; x .Xn XN N / ; x 2 L2 :
nD1
The EFPC’s are computed as the eigenvalues of C b. These are the estimates produced
by several software packages, including the popular R package fda. For sparse data
measured with error, nontrivial modifications are needed, see Section 1.5. In either
case, the consistency of the sample mean and the EFPC’s relies on the assump-
tion that the functional observations form a simple random sample. In Chapter 16,
we showed that the the consistency holds with the same rates for weakly depen-
dent functional time series. However, if the functions Xk D X.sk / are spatially
distributed, the sample mean and the EFPC’s need not be consistent, see Chapter
18. This happens if the spatial dependence is strong or if there are clusters of the
points sk . For moderately dependent spatially separated curves, these estimators are
consistent. We will demonstrate that in finite samples better estimators are available
though. We will then use these improved estimators as part of the procedure for test-
ing the independence of two functional fields fX.s/; s 2 Sg and fY .s/; s 2 Sg. First
we review some essential concepts of spatial statistics.
In order to make this chapter self–contained, we discuss in this section some rele-
vant concepts and methods of spatial statistics. We focus only on geostatistical data,
i.e. observations available at irregularly distributed points of a spatial domain. The
book of Schabenberger and Gotway (2005) offers an accessible and comprehensive
introduction to spatial statistics, a reader interested in a quick introduction to basic
ideas of geostatistics, which goes beyond the information presented in this section,
is referred to Chapters 2 and 3 of Gelfand et al. (2010). In this section, we assume
that all data are scalars.
348 17 Spatially distributed functional data
fX.sk /; sk 2 S; k D 1; 2; : : : ; N g :
for any points s1 ; s2 ; : : : ; sm 2 S and any shift h. If we assume only that the mean
EX.s/ and the covariances Cov.X.s/; X.sCh// do not depend on s, then the field is
called second–order stationary. For such a field, we define the covariance function
If C.h/ depends only on the length h of h, we say that the random field is isotropic.
The covariance function of an isotropic random field is typically parametrized as
2
C.h/ D .h/; h 0; .0/ D 1:
The function is then called the correlation function. The function ./ quantifies
the strength of linear dependence between observations distance h apart and the
smoothness of the field. The following correlation functions are frequently used.
The powered exponential correlation function is defined by
p
h
.h/ D exp ; > 0; 0 < p 2:
If p D 1, this correlation function is called exponential, if p D 2, it is called
Gaussian. A very general family of correlation functions is the so–called Matérn
class. The Matérn class correlation functions are defined as
21 h
.h/ D K .h=/; > 0; > 0;
./
where K is the modified Bessel function, see Stein (1999) for the details. The
function K decays monotonically and approximately exponentially fast; numerical
calculations show that K .s/ practically vanishes if s > .
The correlation function of an isotropic random field is positive definite, and
every positive definite function is a correlation function of a (Gaussian) ran-
dom field. This follows from Kolmogorov’s consistency theorem. A positive def-
inite function is called a valid correlation function. There exist examples of cor-
relation functions which are valid in one dimension, but are no longer valid in a
higher dimension, or on a manifold. For this reason, when working with globally
17.2 Essentials of spatial statistics 349
distributed data, we use the chordal distance defined as the Euclidean distance in
the three–dimensional space. Denoting the latitude by L and the longitude by l, the
chordal distance, 0 dk;` 2, between two points, sk ; s` , on the unit sphere is
given by
1=2
2 Lk L` 2 lk l`
dk;` D 2 sin C cos Lk cos L` sin : (17.2)
2 2
If we work with distance (17.2), we can use any correlation function which is valid
is the three–dimensional Euclidean space.
In spatial statistics, the concept of intrinsic stationarity is very useful. The field
fX.s/; s 2 Sg is said to be intrinsically stationary if VarŒX.s C h/ X.s/ does not
depend on s. Notice that a second–order stationary field is intrinsically stationary.
The converse is not true. The Brownian motion is intrinsically stationary (has sta-
tionary increments), but it is not a stationary process. If fX.s/; s 2 Sg is intrinsically
stationary, we define the semivariogram by
1
.h/ D
VarŒX.s C h/ X.s/:
2
(The variogram is defined as 2 ./.) The semivariogram of a second order stationary
field with the covariance function C./ is given by
.h/ D C.0/ C.h/: (17.3)
Even for second order stationary fields, the estimation of the covariance func-
tion proceeds through the estimation of the semivariogram. One advantage of this
approach is that the semivariogram is less sensitive to the misspecification or biased
estimation of the mean. Typically isotropy is assumed. First, an empirical variogram
is computed at several available lags h > 0. Then a parametric model (derived from
a valid covariance function via (17.3)) is fitted. There are several versions of the
empirical variogram. The classical estimator proposed by Matheron is given by
1 X
O .d / D .X.sk / X.s` //2 ; (17.4)
jN.d /j
N.d /
where N.d / is the set of pairs .sk ; s` / approximately distance d apart, and jN.d /j
is the count of pairs in N.d /. A robust estimator proposed by Cressie and Hawkins
is defined as
0 14
X
0:494 1 @ 1
O .d / D 0:457 C jX.sk / X.s` /j1=2 A : (17.5)
jN.d /j jN.d /j
N.d /
The precise definition of the summations in (17.4) and (17.5) requires the intro-
duction of a binning parameter which allows to treat pairs of points as being
approximately distance d apart. For details, we refer to Section 4.4 of Schaben-
berger and Gotway (2005), where other ways of variogram estimation are also dis-
cussed. Examples of empirical variograms and fitted parametric models are given in
Figure 17.11.
350 17 Spatially distributed functional data
depend only on the distance d.sk ; s` /, this defines a “functional” second order sta-
tionarity. The estimation of the Ck` is the central issue, and will be discussed as we
introduce methods M1 and M2. Method M3 does not require the estimation of the
Ck` , but it requires the estimation of the corresponding covariances for the projec-
tions of the functions X.sk / onto several basis functions.
Methods M1 and M2 estimate by the weighted average
X
N
O N D wn X.sn /: (17.8)
nD1
ˇˇP ˇˇ2
ˇˇ ˇˇ
The optimal weights wk are defined to minimize E ˇˇ NnD1 wn X.sn / ˇˇ subject
PN
to the condition nD1 wn D 1. Using the method of the Lagrange multiplier, we
seek to minimize the objective function
ˇˇ N ˇˇ2 !
ˇˇ X ˇˇ XN
ˇˇ ˇˇ
'.w1 ; w2 ; : : : ; wN ; r/ D E ˇˇ wn X.sn / ˇˇ 2r wn 1
ˇˇ ˇˇ
nD1 nD1
!
XN X
N
D wk w` Ck` 2r wn 1 : (17.9)
k;`D1 nD1
@' X N
D2 wk Ckn 2r; n D 1; 2; : : : ; N I
@wn
kD1
!
@' XN
D 2 wn 1 :
@r nD1
17.3 Estimation of the mean function 351
X
N X
N
wn D 1; wk Ckn r D 0; n D 1; 2; : : : ; N: (17.10)
nD1 kD1
2 d.sk ; s` /
Cov.X.sk I tj /; X.s` I tj // D .tj / exp : (17.11)
.tj /
It is clear how they can be modified for other popular models. Observe that under
model (17.11),
Z
Ck` D E .X.sk I t/ .t// .X.s` I t/ .t// dt
Z
D Cov.X.sk I tj /; X.s` I tj //dt
Z
2 d.sk ; s` /
D .t/ exp dt:
.t/
One way to estimate Ck` is to set
Z
b d.sk ; s` /
Ck` D O 2 .t/ exp dt; (17.12)
O
.t/
with the estimates O 2 .tj / and .t
O j / obtained using some version of empirical vari-
ogram, for example (17.4) or (17.5).
If the sample size N is small, the ordinary nonlinear least squares method needed
to obtain O 2 .tj / and .t
O j / may fail to converge for some tj . An example based on
the critical frequency data is given in Figure 17.4. The convergence does however
take place for most tj , so the integral in (17.12) can be approximated using a Rie-
mann sum.
Another
P way to proceed, is to replace the .t O j / by their average O D
m1 m j D1 O
.tj /; where m is the count of the tj at which the variogram is esti-
mated successfully. Then, the Ck` are approximated by
Z
b 2 d.sk ; s` /
Ck` D O .t/dt exp :
O
As explained above, in order to compute the weights wj in (17.10), it is enough
to know the matrix C only up to a multiplicative constant. Thus we may set
1.4
0.4 0.6 0.8 1.0 1.2
Range parameter, rad
0.2
Fig. 17.4 The range parameter .tj / of the scaled foF2 curves, determined using method M1,
as a function of time. The horizontal line is its average value, N D 0:474. The gaps indicate the
times tj where the method failed to converge.
Once the matrix C has been estimated, we compute the weights wj , and estimate
the mean via (17.8).
If (17.12) is used, we refer to this method as M1a, if (17.13) is used, we call it
M1b.
Method M1 relies on the estimation of the variograms
2
2 .sk ; s` I tj / D E X.sk I tj / X.s` I tj / (17.14)
D 2Var.X.sk I tj // 2Cov.X.sk I tj /; X.s` I tj //;
which lead to the estimates in a parametric model. The model is the same for every
tj , but the estimates ( O 2 .tj /; .t
O j /, for the exponential model) depend on tj . An
advantage of this approach is that even if, for small N , parameter estimates may
not converge at some tj , it is still possible to obtain estimates (17.12) and (17.13).
Method M2, described below, requires only one optimization, so it is much faster
than M1, but this optimization may fail to converge for small N . (This has not
happened though for our real and simulated data.)
Method M2. We define the functional variogram
2 .sk ; s` / D EkX.sk / X.s` /k2 (17.15)
2
D 2E jjX.sk / jj 2E ŒhX.sk / ; X.s` / i
D 2E jjX.s/ jj2 2Ck` :
The variogram (17.15) can be estimated by its empirical counterparts, like (17.4) or
(17.5), with the jX.sk / X.s` /j replaced by
Z 1=2
kX.sk / X.s` /k D .X.sk I t/ X.s` I t//2 dt :
17.3 Estimation of the mean function 353
The subscript f is used to emphasize the functional variogram. Denoting by Of the
resulting estimate, we estimate the Ckl by (17.13) with O replaced by Of .
Method M3. This method uses a basis expansion of the functional data, it does not
use the weighted sum (17.8). Suppose Bj ; 1 j K; are elements of a functional
basis with K so large that for each k
X˝ ˛
X.sk / Bj ; X.sk / Bj (17.17)
j K
˝ ˛ ˝ ˛ 2 d.sk ; s` /
Cov Bj ; X.sk / ; Bj ; X.s` / D j exp :
j
˝ ˛ ˝ ˛
The mean Bj ; is estimated by a weighted average of the Bj ; X.sk / . The
weights
˝ depend˛ on ˝ j and are˛ computed using (17.10) with the Ckn replaced by
Cov. Bj ; X.sk / ; Bj ; X.sn / . Denote the resulting estimate by O j . The mean func-
tion is then estimated by
X
O
.t/ D O j Bj .t/:
j K
Assume now that the mean function has been estimated, and this estimate is sub-
tracted from the data. To simplify the formulas, in the following we thus assume
that EX.s/ D 0.
We consider analogs of methods M2 and M3. Extending Method M1 is possible,
but presents a computational challenge because a parametric spatial model would
need to be estimated for every pair .ti ; tj /. For the ionosonde data studied in Section
17.9, there are 336 points tj . Estimation on a single data set would be feasible, but
not a simulation study based on thousands of replications. In both approaches, which
we term CM2 and CM3, the FPC’s are estimated by expansions of the form
X
K
vj .t/ D x˛.j / B˛ .t/; (17.19)
˛D1
1
X
X.sI t/ D bj .s/Bj .t/;
j D1
where,
˝ ˛ orthonormality of the Bj , the bj .s/ form an observable field bj .sk / D
by the
Bj ; X.sk / . Using the orthonormality of the Bj again, we obtain
"* 1
+ 1
#
X X
C.Bj / D E b˛ .s/B˛ ; Bj bi .s/Bi (17.20)
˛D1 i D1
" 1
#
X
D E bj .s/ bi .s/Bi
i D1
1
X
D EŒbi .s/bj .s/Bi :
i D1
X
K
b.Bj / D
C rOij Bi : (17.21)
i D1
17.4 Estimation of the principal components 355
Setting
x D Œx1 ; x2 ; : : : ; xK T ; b
R D ŒOrij ; 1 i; j K;
we can write the above as a matrix equation
b
Rx D x: (17.22)
X
K
vO j D xO ˛.j / B˛ (17.24)
˛D1
are also orthonormal (because the Bj are orthonormal). The vO j given by (17.24) are
the estimators of the FPC’s, and the O j in (17.23) of the corresponding eigenvalues.
As in method M3, the value of K can be taken to the number of basis functions
used to create the functional objects in R, so it can be a relatively large number, e.g.
K D 49. Even though the range of j in (17.23) and (17.24) runs up to K, only the
first few estimated FPC’s vO j would be used in further work.
Method CM2. Recall that under the assumption of zero mean function, the covari-
ance operator is defined by C.x/ D EŒhX.s/; xi X.s/: For independent data it is
estimated by the simple average
1 X 1 X
N N
hX.sn /; i X.sn / D Ck ; (17.25)
N nD1 N nD1
As for the mean, more precise estimates can be obtained by using the weighted
average
XN
bD
C wk Ck : (17.26)
kD1
Before discussing the estimation of the weights wk , we explain how the FPC’s
vj and their eigenvalues j can be estimated
P using (17.26) and the representation
(17.19). As in method CM3, set x D 1˛K x˛ B˛ , and observe that
!
X
K X
K
b.x/ D
C sj˛ x˛ Bj ;
j D1 ˛D1
where
X
N
˝ ˛
sj˛ D wk Xk ; Bj hXk ; B˛ i :
kD1
Thus, analogously to (17.22), we obtain a matrix equation Sx D x, from which the
estimates of the vj ; j can be found as in (17.23) and (17.24).
We now return to the estimation of the weights wk in (17.26). One way to define
the optimal weights is to require that they minimize the expected Hilbert–Schmidt
norm of Cb C . Recall that the Hilbert–Schmidt norm of an operator K is defined
by
1
X X1 Z
kKk2S D kK.ei /k2 D jK.ei /.t/j2 dt;
i D1 i D1
we can repeat all algebraic manipulations needed to obtain the weight wi in (17.8).
The optimal weights in (17.26) thus satisfy
X
N X
N
wn D 1; wk kn r D 0; n D 1; 2; : : : ; N; (17.27)
nD1 kD1
where
k` D EŒhCk C; C` C iS :
Finding the weights thus reduces to estimating the expected inner products k` .
Since method M2 of Section 17.3 relies only on estimating inner product in the
Hilbert space L2 , it can be extended to the Hilbert space S. First observe that, anal-
ogously to (17.15),
by fitting a parametric model. In formulas (17.4) and (17.5), the squared distances
.X.sk / X.s` //2 must be replaced by the squared norms kCk C` k2S . These norms
are equal to
1 Z
X
kCk C` k2S D .fi k Xk .t/ fi ` X` .t//2 dt;
i D1
where Z
fi k D Xk .t/ei .t/dt:
In this section, we report the results of a simulation study designed to compare the
performance of the methods proposed in Sections 17.3 and 17.4 in a realistic setting
motivated by the ionosonde data. It is difficult to design an exhaustive simulation
study due to the number of possible combinations of the point distributions, depen-
dence structures, shapes of mean functions and the FPC’s and ways of implementing
the methods (choice of spatial models, variogram estimation etc.). We do however
think that our study provides useful information and guidance for practical applica-
tion of the proposed methodology.
Data generating processes. We generate functional data at location sk as
X
p
X.sk I t/ D .t/ C i .sk /vi .t/; (17.28)
i D1
where the vi are orthonormal functions, cf. model (17.1), and the scalar fields i are
independent.
To evaluate the estimators of the mean, we use p D 2 and
p p
v1 .t/ D 2 sin.2 t 6/; v2 .t/ D 2 sin.2 t=2/: (17.29)
and p
.t/ D t sin.6 t/: (17.31)
358 17 Spatially distributed functional data
The mean function (17.30) resembles the mean shape for the ionosonde data. It
is however a member of the Fourier basis, and can be isolated using only one basis
function, what could possibly artificially enhance the performance of method M3.
We therefore also consider the mean function (17.31). Combining the mean function
(17.30) and the FPC’s (17.29), we obtain functions which very closely resemble
the shapes of the ionosonde curves. In the above formulas, time is rescaled so that
t 2 Œ0; 1.
To evaluate the estimators of the FPC’s, we set
e1 .t/ C e2 .t/
X.sk I t/ D 1 .sk / p C 2 .sk /e3 .t/; (17.32)
2
p p p
where e1 .t/ D 2 sin.2 t 7/, e2 .t/ D 2 sin.2 t 2/, e3 .t/ D 2 sin.3 t 3/.
Direct verification, which uses the independence of the fields 1 and 2 , shows that
the FPC’s are v1 D 21=2 .e1 C e2 / and v2 D e3 .
To complete the description of the data generating processes, we must specify
the dependence structure of the scalar spatial fields 1 and 2 . We use the Gaussian
and exponential models:
2
Gaussian: c.sk ; s` / D c0 C expfd 2 .k; `/=2 g;
2
(17.33)
Exponential: c.sk ; s` / D c0 C expfd.k; `/=g:
The distances are the chordal distances (17.2) between the locations described
below. To make simulated data look similar to the real foF2 data we set 1 D 1,
1 D =6 for the field 1 .s/ and 2 D 0:1, 2 D =4 for 2 .s/. For the simu-
lated data we set c0 D 0. These parameters are the same for both the Gaussian and
exponential models. They result in effective ranges that differ by about 20%.
The locations sk are selected to match the locations of the real ionosonde stations.
For the sample size 218 we use all available locations, shown in Figure 17.5. The
selected 32 locations correspond to the ionosondes with the longest record history.
The 100 stations were selected randomly out of the 218 stations.
Details of implementation. All methods require the specification of parametric
spatial model for various variograms. Even though for some methods the variograms
are defined for L2 – or S–valued objects, only scalar models are required. In this
simulation study, we employ the exponential model. Methods M3, CM2 and CM3
require the specification of a basis fBj g andpthe number K of the basis functions.
We use the Fourier basis and K D 1 C 4Œ #ftj g, where #ftj g is the count of
p at which the curves are observed. For our real and simulated data K D
the points
1 C 4Œ 336 D 73, a number that falls between the recommended values of 49 and
99 for the number of basis functions. Specifically, the basis functions Bj are
p p
f1; 2 sin.2 i t/; 2 cos.2 i t/I i D 1; 2; : : : ; 36g: (17.34)
60
30
Latitude,o
0
−30
−60
Longitude,o
Fig. 17.5 Locations of 218 ionosonde stations. Circles represent the 32 stations with the longest
complete records.
where R is the number of replications, we use R D 103 . For the FPC’s, the definition
is fully analogous. We also compute the standard deviation for L, based on the
normal approximation for R independent runs. We use the L1 distance rather than
the L2 distance, so as not to favor a priori methods which minimiza the L2 distance.
The results of the simulations for the mean function (17.31) are shown in
Figure 17.6. The DGP’s have exponential covariance functions. If the i in (17.28)
have Gaussian covariances, the results are not visually distinguishable. The errors
values for mean (17.30) are slightly different, but the relative position of the box
plots practically does not change. All methods M1, M2 and M3 are significantly
better than the sample average. Method M2 strikes the best balance between the
computational cost and the precision of estimation.
Errors in the estimation of the FPC’s in model (17.32) are shown in Figure 17.7.
The displayed errors are those for the i with exponential covariances and the CH
variogram. The results for Gaussian covariances and the MT variogram are prac-
tically the same. The performance of methods CM2 and CM3 is comparable, and
they are both much better than using the eigenfunctions of the empirical covariance
operator (17.25), which is the standard method implemented in the fda package.
The computational complexity of methods CM2 and CM3 is the same.
360 17 Spatially distributed functional data
110
105 M3
M2
100
M1a M1b
95
M3
M3
M2
90
M1a M1b
M2
M1a M1b
85
80
Fig. 17.6 Errors in the estimation of the mean function for sample sizes: 32, 100, 218. The dashed
boxes are estimates using the CH variogram, empty are for the MT variogram. The right–most box
for each N corresponds to the simple average. The bold line inside each box plot represents the
average value of L (17.35). The upper and lover sides of rectangles shows one standard deviation,
and horizontal lines show two standard deviations.
CM3CM2
CM2
CM3
3.5
CM3CM2
CM3
CM3CM2 CM2
CM3CM2
3.0
Fig. 17.7 Errors in the estimation of the FPC’s for sample sizes: 32, 100, 218 . The bold line
inside each box plot represents the average value of L. The upper and lover sides of rectangles
shows one standard deviation, and horizontal lines show two standard deviations.
Conclusions. For simulated data generated to resemble the ionosonde data, all
methods introduced in Sections 17.3 and 17.4 have integrated absolute deviations
(away from a true curve) statistically significantly smaller than the standard meth-
ods designed for iid curves. Methods M2 and CM2, based on weighted averages
17.6 Testing for correlation of two spatial fields 361
Motivated by the problem of testing for correlation between foF2 and magnetic
curves, described in detail in Section 17.9, we now propose a relevant statistical
significance test.
The data are observed at N spatial locations: s1 ; s2 ; : : : ; sN . At location sk , we
observe two curves:
and
Yk D Y .sk / D Y .sk I t/; t 2 Œ0; 1:
We assume that the sample fXk g is a realization of a random field fX.s/; s 2 Sg,
and the sample fYk g is a realization of a random field fY .s/; s 2 Sg. We want to test
the null hypothesis:
H0 : for each s 2 S, the random functions X.s/ and Y .s/ are independent
against the alternative that H0 does not hold. The test statistic will detect departures
from H0 that manifest themselves in the lack of the correlation between the pro-
jections hx; X.s/i and hy; Y .s/i, for any x; y 2 L2 . In fact, the lack of correlation
will be tested only for x and y from sufficiently large subspaces, those spanned by
the first p FPC’s of X.s/, and the first q FPC’s of Y .s/. The idea of the test, thus
requires that the pairs .X.s/; Y .s// have the same distribution for every s 2 S. The
construction of the test assumes that both fields, fX.s/; s 2 Sg and fY .s/; s 2 Sg are
strictly stationary, even though this assumption could be weakened to the stationar-
ity of some fourth order moments. Since we provide only a heuristic derivation of
the test, we are not concerned here with optimal assumptions. To lighten the nota-
tion, assume that
EXk .t/ D 0 and EYn .t/ D 0:
The mean functions will be estimated and subtracted using one of the methods of
Section 17.3.
We now explain the idea of the test. We approximate the curves Xn and Yn by
the expansions
X
p X
q
˝ ˛
Xn .t/ hXn ; vi i vi .t/; Yn .t/ Yn ; uj uj .t/;
i D1 j D1
362 17 Spatially distributed functional data
where the vi and the uj are the corresponding FPC’s. At this point, the functions
vi ; 1 i p; and uj ; 1 j q; are deterministic, so the independence of the
curves Xn of the curves Yn implies the independence of the vectors
˝ ˛
ŒhXn ; v1 i ; hXn ; v2 i ; : : : ; Xn ; vp T ; 1 n N
and ˝ ˛
ŒhYn ; u1 i ; hYn ; u2 i ; : : : ; Yn ; uq T ; 1 n N:
Then, under H0 , the expected value of the sample covariances
1 X
N
˝ ˛
AN .i; j / D hXn ; vi i Yn ; uj (17.36)
N nD1
If some of the estimated AN .i; j / are too large, we reject the null hypothesis.
To construct a test statistic, we introduce the quantities
˝ ˛ ˝ ˛˝ ˛
Vk` .i; i 0 / D EŒhvi ; Xk i vi0 ; X` ; Uk` .j; j 0 / D EŒ uj ; Yk u0j ; Y` :
Note that Vk` .i; i 0 / D 0 and Uk` .j; j 0 / D 0, if the observations in each sample
are independent (and have mean zero). Thus, the Vk` .i; i 0 / and the Uk` .j; j 0 / are
specific to dependent data, they do not occur in the testing procedure developed ˝ ˛
in Chapter 9 for independent curves. Setting Xi k D hvi ; Xk i ; Yjk D uj ; Yk ;
observe that if the Xi k are uncorrelated with the Yjk , then
" N #
p p 1 X X
N
0 0
EŒ N AN .i; j / N AN .i ; j / D E Xi k Yjk Xi 0 ` Yj 0 `
N
kD1 `D1
1 X
N X
N
1 X
N X
N
D EŒXi k Xi 0 ` EŒYjk Yj 0 ` D Vk` .i; i 0 /Uk` .j; j 0 /:
N N
kD1 `D1 kD1 `D1
p
The covariance tensor of the N AN .i; j / thus has the entries
1 X
N
0 0
N .i; j I i ; j / D Vk` .i; i 0 /Uk` .j; j 0 /: (17.37)
N
k;`D1
1 X
N
˝ ˛
AON .i; j / D hXn ; vO i i Yn ; uO j :
N nD1
17.6 Testing for correlation of two spatial fields 363
If the observations within each sample are independent, the test statistic introduced
in Chapter 9 is
Xp X q
N O 1 1 O2
i Oj AN .i; j /:
i D1 j D1
Since i D EŒhvi ; X i2 , the sum above is essentially the sum of all correlations, and
it usually tends to a chi–squared distribution with pq degrees of freedom, as shown
in Chapter 9. This is however not necessarily true for dependent data. To explain, set
aN D vec.AN /; i.e. aN is a column vector of length pq consisting of thepcolumns
of AN stacked on top of each other, starting with the first column. Then N aN is
approximated by a Gaussian vector z with covariance matrix ˙ constructed from
the entries (17.37). It follows that
1
SON D N aO TN Ȯ aO N 2pq ; (17.38)
where aO N D vec.A O N /, see e.g. Theorem 2.9 of Seber and Lee (2003), which states
that for a zero mean normal vector N with covariance matrix ˙ , the quadratic form
NT ˙ 1 N has chi–square distribution. The entries of the matrix Ȯ are
1 X O
N
O N .i; j I i 0 ; j 0 / D Vk` .i; i 0 /UO k` .j; j 0 /; (17.39)
N
k;`D1
where VOk` .i; i 0 / and UO k` .j; j 0 / are estimators of Vk` .i; i 0 / and Uk` .j; j 0 /, respec-
tively. This estimation is discussed in Section 17.7. The test rejects H0 if SON >
2pq .1 ˛/, where 2pq .1 ˛/ is the 100.1 ˛/th percentile of the chi–squared
distribution with pq degrees of freedom. One can use Monte Carlo versions of the
above test, for example the test based on the approximation
TON WD N aO TN aO N wT Ȯ w; (17.40)
1. Subtract the mean functions, estimated by one of the methods of section 17.3,
from both samples.
2. Estimate the FPC’s by method CM2 or CM3.
3. Using a model for the covariance tensor (17.39), see Section 17.7, compute the
test statistic SON . (This tensor is not needed to compute TON , but it is needed to
find its Monte Carlo distribution.)
4. Find the P–value using either a Monte-Carlo distribution or the 2 approxima-
tion.
We now turn to the important issue of modeling and estimation of the matrix ˙ .
364 17 Spatially distributed functional data
The estimation of the Vk` .i; i 0 / involves only the Xn , and the estimation of the
Uk` .j; j 0 / only the Yn , so we describe only the procedure for the Vk` .i; i 0 /. We
assume that the mean has been estimated and subtracted, so we define
and an extension of the multivariate intrinsic model, see e.g. Chapter 22 of Wacker-
nagel (2003). A most direct extension is to assume that
Ch D C r.h/; (17.41)
Under (17.42) (equivalently, under (17.41)), each scalar field hX.s/; vi i has the
same correlation function, only their variances are different. Under (17.43), the
fields hX.s/; vi i can have different correlation functions. As will be seen below,
model (17.43) also leads to a valid covariance matrix. The correlations ri .d.sk ; s` //
and the variances i can be estimated using a parametric model for the scalar
field i .s/ D hX.s/; vi i. The resulting estimates rOi .d.sk ; s` // and O i lead to the
estimates VOk` .i; j / via (17.43). Analogous estimates of the functional field Y are
Oj .d.sk ; s` //; Oj and UO k` .i; j /. For ease of reference, we note that under model
(17.43) and H0 , the covariance tensor,
" #
1 XX O
N N
0 O 0 0 0
Vk` .i; i /Uk` .j; j /; 1 i; i p; 1 j; j q ;
N
kD1 `D1
where
1
Ȯ i .k; `/ D p O i rOi .d.sk ; s` //
N
and
Ȯ 1
D p Oj Oj .d.sk ; s` //:
j .k; `/
N
This form is used to construct the Monte Carlo tests discussed in Section 17.8.
The matrix ḃ with the estimates just specified is positive definite, as the follow-
ing verification shows. (The matrix ˙ is also positive definite by the same argu-
ment.) To verify that the matrix ḃ is positive definite, we must show that
XX
O .i; j I i 0 ; j 0 /bij bi 0 j 0 0; (17.45)
i;j i 0 ;j 0
Thus, (17.45) will follow once we have shown that for any i; j ,
X
O i rOi .d.sk ; s` // Oj Oj .d.sk ; s` // 0:
k;`
Since O i rOi is a covariance function, there are mean zero random variables
"1 .i /; "2 .i /; : : : ; "N .i / (17.46)
such that O i rOi .d.sk ; s` // D EŒ"k .i /"` .i /: Similarly, there are random variables
1 .j /; 2 .j /; : : : ; N .j / (17.47)
such that Oj Oj .d.sk ; s` // D EŒk .j /` .j /: The families (17.46) and (17.47) can
be assumed independent. Using the above construction, we obtain
X X
O i rOi .d.sk ; s` // Oj Oj .d.sk ; s` // D EŒ"k .i /"` .i /EŒk .j /` .j /
k;` k;`
" #2
X X
N
D EŒf"k .i /k .j /g f"` .i /` .j /g D E "k .i /k .j / 0:
k;` kD1
366 17 Spatially distributed functional data
As in Section 17.5, our objective is to evaluate the finite sample performance of the
test introduced in Section 17.6 in a realistic setting geared toward the application
presented in Section 17.9.
Data generating processes. We generate samples of zero mean Gaussian processes
X
p X
q
X.sI t/ D i .s/vi .t/I Y .sI t/ D j .s/uj .t/: (17.48)
i D1 j D1
Then we merged all x1i into vector y1 D Œx11 ; : : : ; x1N T and all x2i into vec-
tor y2 D Œx21 ; : : : ; x2N T . Performing the Cholesky rotation, we obtain correlated
spatial vectors:
N =32 N =100
0.14
0.12
0.12
0.10
0.10
Nominal Size =0.05
0.05
0.03
0.03
0.01
0.01
1 2 3 4 5 6 7 1 2 3 4 5 6 7
p p
Fig. 17.8 Size of the correlation test as a function of p. Solid disks represent method S (based on
2 distribution). Circles represent method SM (based on the Monte-Carlo distribution).
Conclusions. As Figure 17.8 shows, the empirical size is higher than the nominal
size, and it tends to increase with the number p of principal components used to
construct the test, especially for N D 32. The usuall recommendation is to use p
which explains about 85% of the variance. For the foF2 data with N D 32, this cor-
responds to p D 4. Tests of independence typically have larger than nominal size
because real or simulated data may have some spurious dependencies; to put it sim-
ply, one cannot get “more independent data”. Applied to real data in Section 17.9,
all tests (S, SM and T) lead to extremely strong rejections, so the inflated empirical
size is not a problem. Figure 17.8 also shows that the Monte Carlo approximation
is useful for N D 32, this is the sample size we must use in Section 17.9. The size
of test T is practically indistinguishable for that of test SM. Figure 17.9 shows the
power of method SM; power curves for method T are practically the same, method
S has higher power. The simulation study shows that a strong rejection when the test
is applied to real data can be viewed as a reliable evidence of dependence.
368 17 Spatially distributed functional data
1.0
1.0
N = 32 N = 100
0.8
0.8
0.6
0.6
Power
Power
0.4
0.4
0.2
0.2
0.0
0.0
0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8
ρ ρ
Fig. 17.9 Power of the correlation test SM as a function of the population correlation . Each line
represents one of the four possible correlated spatial field 1 , 2 , 3 , 4 . The test
was performed using p D 4, which explains about 85% of variance of the foF2 curves. Since all
curves in the graphs are practically the same, we do not specify which curve represents a particular
dependent pair i .
In this section, we apply the correlation test to foF2 and magnetic curves.
Description of the data. The F2 layer of the ionosphere is the upper part of the
F layer shown in Figure 17.3. The F2 layer electron critical frequency, foF2, is
measured using an instrument called the ionosonde, a type of radar. The foF2 fre-
quency is used to estimate the location of the peak electron density, so an foF2
trend corresponds to a trend in the average height of the ionosphere over a spatial
location. The foF2 data have therefore been used to test the hypothesis of iono-
spheric global cooling discussed in Section 17.1. Hourly values of foF2 are avail-
able from the SPIDR database http://spidr.ngdc.noaa.gov/spidr/ for
more than 200 ionosondes. We use monthly averages for 32 selected ionosondes,
with sufficiently complete records, for the period 1964 1992. Their locations are
shown in Figure 17.5. Three typical foF2 curves are shown in Figure 17.2. We omit
the details of the procedure for obtaining curves like those shown in Figure 17.2,
but we emphasize that it requires a great deal of work. In particular, the SPIDR data
suffer from two problems. First, for some data, the amplitude is artificially mag-
nified ten times, and needs to be converted into standard units (MHz). Second, in
many cases, missing observations are not replaced by the standard notation 9999,
but rather just skipped. Thus if one wants to use equally-spaced time series, skipped
data must be found and replaced by missing values. For filling in missing values, we
perform linear interpolation. We developed a customized C++ code to handle these
17.9 Application to critical ionospheric frequency and magnetic curves 369
issues. We emphasize that one of the reasons why this global data set has not been
analyzed prior to the work of Gromenko et al. (2011) is that useable data had been
derived only over relatively small regions, like Western Europe, see e.g. Bremer
(1998), and more often only for a single location, see e.g. Lastovicka et al. (2006).
We use the foF2 data to test a hypotheses on long term ionospheric trends
extending over several decades. We thus removed annual and higher frequency
variations using 16 month averaging with MODWT filter, see Chapter 5 of Per-
cival and Walden (2000). This leads to 32 time series at different locations, each
containing 336 equally–spaced temporal observations. The amplitude of the foF2
curves exhibits a nonlinear latitudal trend; it decreases as the latitude increases, see
Figure 17.2. To remove this trend, which may potentially bias the test, we assume
that the foF2 signal, F .sI t/, at location s follows the model
where X.sI t/ is a constant amplitude field, and G./ is a scaling function which
depends only on the magnetic latitude L (in radians). Since the trend in the ampli-
tude of F .sI t/ is caused by the solar radiation which is nonlinearly proportional to
the zenith angle, we postulate that the function G./ has the form
In the above formula, Vnx , Vny and Vn´ are, respectively, meridional (parallel to
constant longitude lines), zonal (parallel to constant latitude lines) and vertical com-
ponents of the thermospheric neutral wind; I and D are inclination and declination
of the earth magnetic field, see Figure 13.2 in Kivelson and Russell (1997). Usu-
ally Vn´ Vnx ; Vny ; and assuming that the difference between magnetic and geo-
graphic coordinates, D, is small (at least for low- and mid-latitude regions) we can
simplify the above formula to W D Vnx sin I cos I: Thus, only the meridional ther-
mospheric wind is significant. Measuring neutral wind components (Vnx , Vny , Vn´ )
370 17 Spatially distributed functional data
1.0
0.9
Scaling Function
0.8
0.7
0.6
Latitude, rad
Fig. 17.10 Dots represent the scaling function GL .si / in the magnetic coordinate system and
crosses are same in the geographic coordinate system. Line is the best fit for GL in the magnetic
coordinate system.
is difficult, and long term wind records are not available. We therefore replace Vnx
by its average. For our test, which uses correlations, the specific value of this average
plays no role, so we define the magnetic curves as
The curves I.sI t/ are computed using the international geomagnetic reference field
(IGRF); the software is available at http://www.ngdc.noaa.gov/IAGA/vmod/.
The test is applied to the curves X.sk I t/ defined by (17.49) and (17.50), and to
the curves Y .sk I t/ defined by (17.51).
Application of the correlation test. We first estimate and subtract the mean func-
tions of the fields X.sk / and Y .sk / using method M2 (the other spatial methods give
practically the same estimates). The principal components vi and ui are estimated
using method CM2 (method CM3 gives practically the same curves).
We apply the test, for all 1 p 7 and q D 1. The first seven eigenval-
ues of the field X (computed per (17.22) or its analog for method CM2) explain
about 95% of the variance. The first eigenvalue of the field Y explains about 99%
of the variance. The eigenfunction u1 is approximately equal to the linear func-
tion: u1 .t/ t. This means that at any location, after removing the average, the
magnetic field either linearly increases or decreases, with slopes depending on the
location, see Figure 17.12. To lighten the notation, we drop the “hats” from the
estimated scores and denote the zero mean vector Œi .s1 /; : : : ; i .sN /T by i , and
17.9 Application to critical ionospheric frequency and magnetic curves 371
η ξ1
25
6
4
15
2
5
0
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5
6
ξ2 ξ3
5
5
4
4
3
3
2
2
1
1
0
0
0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5
4
ξ4 ξ5
4
3
3
2
2
1
1
0
0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7
1.2
ξ6 ξ7
2.0
0.8
1.0
0.4
0.0
0.0
0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5
Fig. 17.11 Fitted empirical variograms for different spatial fields. The horizontal axes show nor-
malized lag distance. The dots correspond to estimated variograms.
Œ1 .s1 /; : : : ; 1 .sN /T by The covariances ˙ i and ˙ are estimated using para-
metric spatial models determined by the inspection of the empirical variograms. In
this application, it is sufficient to use only two covariance models, the Gaussian
and the exponential models define in (17.33). When the scores do not have a spa-
tial structure, we use the sample variance (flat variogram). The fitted variograms
are shown in Figure 17.11, The estimated models and their parameters are listed in
Table 17.1.
The P–values for different number of FPC’s 1 p 7 are summarized in Table
17.2. Independent of p and a specific implementation of the test, all P–values are
very small, and so the rejection of the null hypothesis is conclusive; we conclude
that there is a statistically significant correlation between the foF2 curves X.sk / and
the magnetic curves Y .sk /. We also applied the test of Chapter 9, which neglects any
spatial dependence. The P-values for that test hover around the 5% level, but still
372 17 Spatially distributed functional data
Table 17.1 Models and estimated covariance parameters for the transformed foF2 curves and the
magnetic curves.
Spatial Model Parameters
field
c0 2
Gaussian – 5:99 ˙ 0:48 0:32 ˙ 0:04
1 Gaussian – 20:05 ˙ 2:20 0:12 ˙ 0:03
2 – – 3:30 ˙ 0:43 –
3 Exponential – 2:63 ˙ 0:52 0:16 ˙ 0:07
4 Gaussian – 2:66 ˙ 0:39 0:18 ˙ 0:05
5 – – 2:74 ˙ 0:32 –
6 Gaussian 0:16 ˙ 0:02 0:85 ˙ 0:24 0:17 ˙ 0:06
7 – – 1:22 ˙ 0:18 –
Table 17.2 P–values of the correlation tests applied to the transformed foF2 data. The first column
shows the number of FPC’s, the second column shows cumulative variances computed as the ratios
of the eigenvalues estimated using method CM2. Testing procedures S, SM and T are defined in
Section 17.8. The “simple” procedure neglects the spatial dependence of the curves.
p CV, % Spatial Simple
S SM T
1 47.88 6:22 105 3:05 104 3:05 104 0:035
2 62.59 3:26 106 2:91 104 2:99 104 0:095
3 73.67 4:53 108 2:43 104 2:32 104 0:043
4 84.40 1:47 1026 1:6 107 2:24 105 0:039
5 88.70 4:95 1026 2:6 107 2:27 105 0:046
6 92.21 6:73 1027 5:9 107 2:21 105 0:060
7 94.57 2:12 1032 1:6 107 1:92 105 0:030
point toward rejection. The evidence is however much less clear cut. This may par-
tially explain why this issue has been a matter of much debate in the space physics
community. The correlation between the foF2 and magnetic curves is far from obvi-
ous. Figure 17.12 shows these pairs at all 32 locations. It is hard to conclude by eye
the the direction of the magnetic field change impacts the foF2 curves.
Discussion. A very important role in our analysis is played by the transformation
(17.50). Applying the test to the original foF2 curves F .sk I t/, gives the P–values
0.209 (p D 1) and 0.011 (p D 2) for the spatial S test, and 0.707 (p D 1), 0.185
(p D 2), 0.139 (p D 3) for the “simple” test. These values of p explain over 90%
of the variance. As explained above, the amplitude of the field F .sk I t/ evolves with
the latitude. This invalidates the assumption of a mean function which is indepen-
dent of the spatial location. Thus even for the spatial test, the mean function con-
founds the first FPC. However, the spatial estimation of the mean function and of the
FPC’s “quickly corrects” for the violation of assumptions, and the null hypothesis
is rejected for p 2. When the spatial structure is neglected (and no latitudal trans-
formation is applied) no correlation between the foF2 curves and magnetic curves
is found.
−1.5 −0.5 0.5 1.5 −1.5 −0.5 0.5 1.5 −1.5 −0.5 0.5 1.5 −1.5 −0.5 0.5 1.5 −1.5 −0.5 0.5 1.5 −1.5 −0.5 0.5 1.5 −1.5 −0.5 0.5 1.5 −1.5 −0.5 0.5 1.5
1970
1980
1990
−0.32 −0.318 −0.316 0.341 0.342 0.342 0.087 0.091 0.096 −0.405 −0.404 −0.403 0.306 0.308 0.309 0.424 0.426 0.428 0.434 0.436 0.438 −0.465 −0.464 −0.462
1970
1980
1990
0.274 0.276 0.279 0.197 0.199 0.202 0.436 0.437 0.439 0.206 0.209 0.212 0.239 0.242 0.245 0.293 0.295 0.298 0.297 0.299 0.301 0.267 0.269 0.272
0.222 0.224 0.226 0.246 0.248 0.25 0.326 0.327 0.328 0.372 0.373 0.374 −0.533 −0.532 −0.531 −0.486 −0.485 −0.485 −0.19 −0.188 −0.186 −0.409 −0.407 −0.405
1970
1980
1990
curves are different: the foF2 curves have the same scale, the scale of the magnetic curves changes
Fig. 17.12 Transformed and centered foF2 curves (continuous) and centered magnetic curves
373
(dashed) at 32 locations denoted with circles in Figure 17.5. The scales for the two families of
0.458 0.459 0.46 0.192 0.195 0.198 0.393 0.395 0.398 −0.394 −0.388 −0.381 0.249 0.251 0.253 −0.53 −0.529 −0.529 0.394 0.396 0.398 0.453 0.455 0.456
374 17 Spatially distributed functional data
The rejection of the null hypothesis means that after adjusting the foF2 curves
for the latitude and the global mean, their regional variability is correlated with the
regional changes in the in the magnetic field. This means that long term magnetic
trends must be considered as additional covariates in testing for long term trends in
the foF2 curves. (The main covariate is the solar activity which drives the shape of
the mean function.)
A broader conclusion of the work presented in this chapter is that methods of
functional data analysis must be applied with care to curves obtained at spatial
locations. Neglecting the spatial dependence can lead to incorrect conclusions and
biased estimates. The same applies to space physics research. If trends or models are
estimated separately at each spatial location, one should not rely on results obtained
by some form of a simple averaging. Interestingly, the results related to global iono-
spheric trends are often on the borderline of statistical significance if the spatial
dependence structure is neglected. Standard t-tests lead either to rejection or accep-
tance, depending on a specific method used (a similar phenomenon is observed in
the last column of Table 17.2). It is hoped that the methodology presented in this
Chapter will be useful in addressing such issues.
Chapter 18
Consistency of the simple mean and the
empirical functional principal components for
spatially distributed curves
L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, 375
Springer Series in Statistics 200, DOI 10.1007/978-1-4614-3655-3_18,
© Springer Science+Business Media New York 2012
376 18 Consistent estimation for spatially distributed curves
distributed points. At least three types of point distributions have been considered,
see Cressie (1993): When the region RN where the points fsi;N I 1 i N g are
sampled remains bounded, then we are in the so-called infill domain sampling case.
Classical asymptotic results, like the law of large numbers or the central limit the-
orem will usually fail, see Lahiri (1996). The other extreme situation is described
by the increasing domain sampling. Here a minimum separation between the sam-
pling points fsi;N g 2 RN for all i and N is required. This is of course only possible
if diam.RN / ! 1. We shall also explore the nearly infill situation studied by
Lahiri (2003) and Park et al. (2009). In this case, the domain of the sampling region
becomes unbounded (diam.RN / ! 1), but at the same time the number of sites
in any given subregion tends to infinity, i.e. the points become more dense. These
issues are also studied by Zhang (2004), Loh (2005), Lahiri and Zhu (2006), Du
et al. (2009). We formalize these concepts in Sections 18.2 and 18.3. Finally, the
interplay of the geostatistical spatial structure and the functional temporal structure
must be cast into a workable framework.
For the reasons explained above, the framework advocated in Chapter 16,
designed for functional time series, is inappropriate for functional spatial fields.
The starting point for the theory of Chapter 16 is the representation Xk D
f ."k ; "k1 ; : : :/ of a function Xk in terms of iid error functions "k . While all time
series models used in practice admit such a representation, no analog representations
exist for geostatistical spatial data. (Even though not widely used, spatial autoregres-
sive processes have been proposed, but no Volterra (nonlinear moving average) type
expansions have been developed for them.)
This chapter is organized as follows. Section 18.1 describes in greater detail the
objectives of this research by developing several examples which show how spa-
tially distributed functional data differ from functional random samples and from
functional time series. In simple settings, it illustrates what kind of consistency or
inconsistency results can be expected, and what kind of difficulties must be over-
come. After the stage has been set, we formulate the asymptotic assumptions in
Section 18.2. A crucial part of these assumptions consists of conditions on the spa-
tial distribution of the points sk . Section 18.3 compares our conditions to those typ-
ically assumed for scalar spatial processes. In Sections 18.4 and 18.5, we establish
consistency results, respectively, for the functional mean and the covariance oper-
ator. These sections also contain examples specializing the general results to more
specific settings. Section 18.6 explains, by means of general theorems and examples,
when the sample principal components are not consistent. The proofs are collected
in Section 18.7.
We have shown in this book that the FPC’s play a fundamental role in functional
data analysis, much greater than the usual multivariate principal components. This
is mostly due to the fact that the Karhunen-Loève expansion allows to represent
18.1 Motivating examples 377
In Section 16.2, we showed that (18.1) continues to hold for weakly dependent
time series, in particular for m–dependent Xk . Our first example shows why m–
dependence does not necessarily imply (18.1) for spatially distributed curves.
and denote by jBN .m/j the count of pairs in BN .m/. A brief calculation which uses
the identity
bN C k2S
NEkC
ZZ ( )
X
N
(18.2)
1
DN Var N .Xk .t/Xk .s/ EŒXk .t/Xk .s// dt ds
kD1
If the sk are the points in Rd with integer coordinates, then jBN .m/j is asymp-
totically proportional to mN , implying lim supN !1 N 1 jBN .m/j < 1, and the
standard rate (18.1). But if there are too many pairs in BN .m/ this rate will no longer
hold.
Example 18.1 shows that if the points sk are not equispaced and too densely
distributed, then the standard rate (18.1) need not hold. The next example shows
that in such cases the EFPC’s vOk may not converge at all.
Example 18.2. This example presents only an intuitive idea. A more precise argu-
ment, with a numerical example, is developed in Example 18.6.
378 18 Consistent estimation for spatially distributed curves
where fej ; j 1g is a complete orthonormal system and the j .s/ arePmean zero
random variables with EŒj .s/j .s C h/ D j j .h/; h D khk; where 1 j D1 j <
1 and each P1 j ./ is
˝ a positive
˛ correlation function. Direct verification shows that
C.x/ D j D1 j ej ; x ej ; so the j are the eigenvalues of C , and the ej the
corresponding eigenfunctions.
Now consider a sequence sn ! 0. Because of the positive dependence, X.sn /
is close to X.0/, so C bN , as an arithmetic average, is close to the random operator
P1D hX.0/; i X.0/: Observe that X ? .X.0// D kX.0/k2 X.0/: Thus kX.0/k2 D
?
X
2 ?
j D1 j .0/ is an eigenvalue of X . Since it is random, it cannot be close to any of
the j . The eigenfunctions of C bN are also close to random functions in L2 , and do
not converge to the FPC’s ej .
The above examples show that if the points sn are too close to each other, then the
empirical functional principal components are not consistent estimates of the popu-
lation principal components. Other examples of the lack of consistency are known,
see Johnstone and Lu (2009) and references therein. They fall into the “small n
large p” framework, and the lack of consistency is due to noisy data which are
not sparsely represented. A solution is to perform the principal component analysis
on transformed data which admits a sparse representation. (A different asymptotic
approach is taken by Jung and Marron (2009).) The spatial functional data intro-
duced in Chapter 17 admit natural sparse representations; the lack of consistency
may be due to dependence and densely distributed locations of the observations. It
is actually not crucial that the sn be close to each other. What matters is the inter-
play of the spatial distances between these points and the strength of dependence
between the curves. To illustrate, suppose in Example 18.2, the covariance between
X.sn / and X.0/ is
1
X ksn k ˝ ˛˝ ˛
EŒhX.sn /; xi hX.0/; yi D j exp ej ; x ej ; y :
j
j D0
In a finite sample, small ksn k have the same effect as large j , i.e. as stronger depen-
dence.
These considerations show that it is useful to have general criteria for functional
spatial data, which combine the spatial distribution of the points and the strength
of dependence, and which ensure that the functional principal components can be
consistently estimated, and, consequently, that further statistical inference for spatial
functional data can be carried out. Such criteria should hold for practically useful
models for functional spatial data. The next example discusses such models, with a
rigorous formulation presented in Section 18.2.
18.1 Motivating examples 379
where the j .s/ are zero mean random variables. In principle, all properties of X ,
including the spatial dependence structure, can be equivalently stated as properties
of the family of the scalar fields j . Representation (18.4) is thus the most natural
and convenient model for spatially distributed functional data.
Assume that D 0 and the field X is strictly stationary (in space). (Strict
stationarity can be replaced by weaker moment conditions formulated in Sec-
tion 18.2.) Suppose we want to predict X.s0 / using a linear combination of the
curves X.s1 /; X.s2 /; : : : ; X.sN /, i.e. we want to minimize
2
X
N
E
X.s0 / an X.s /
n
nD1
X
N
D E hX.s0 /; X.s0 /i 2 an E hX.sn /; X.s0 /i (18.5)
nD1
X
N
C ak a` E hX.sk /; X.s` /i :
k;`D1
Thus for the problem of the least squares linear prediction of a mean zero spatial
process (kriging), we need to know only
˝ ˛
K.s; s0 / D E X.s/; X.s0 / : (18.6)
X 1
1 X 1
X
˝ ˛
D EŒj .s/i .s0 / ej ; ei D EŒj .s/j .s0 /:
j D1 i D1 j D1
Thus, the functional covariances (18.6) are fully determined by the covariances
Notice that we do not need to know the cross covariances EŒj .s/i .s0 / for
i ¤ j . Thus, if we are interested in kriging, we can assume that the spatial processes
j ./ in (18.18) are independent. Such an assumption simplifies the verification of
some fourth order properties discussed in the following sections. This observation
380 18 Consistent estimation for spatially distributed curves
remains true if the spatial field does not have zero mean, i.e. if we observe realiza-
tions of Z.s/ D .s/ C X.s/. A brief calculation shows that for kriging, it is enough
to know ./ and the covariances (18.7). Stein (1999) and Cressie (1993) provide
rigorous accounts of kriging for scalar spatial data.
Our next example shows how representation (18.4) and the independence of the
j allow to derive the standard rate (18.1), if the points sk are equispaced on the line
and the covariances decay exponentially. In the following sections, we construct a
theory that allows us to obtain the standard and nonstandard rates of consistency in
much more general settings. We will use the following well–known Lemma, which
follows from a direct verification using the bivariate normal density.
Lemma 18.1. Suppose X and Y are jointly normal mean zero random variables
such that EX 2 D 2 ; EY 2 D 2 ; EŒX Y D : Then
Cov.X 2 ; Y 2 / D 22 2 2
:
Without any further assumptions, a sufficient condition for the EFPC’s to be consis-
tent with the rate N 1=2 is that the right–hand side of (18.8) is bounded from above
by a constant. Under additional assumptions, more precise sufficient conditions are
possible.
Suppose first that representation (18.18) holds with independent strictly station-
ary scalar fields j ./. Define the covariances
EŒj .sk /j .s` / D j .sk s` /; Cov.j2 .sk /; j2 .s` // D j .sk s` /:
and
X 1
N X
ˇ ˇ
lim sup N 1 ˇj .sk s` /ˇ < 1: (18.10)
N !1
k;`D1 j D1
18.1 Motivating examples 381
so that ˚
Cov.j2 .sk /; j2 .s` // D 2 4
j exp 2j1 d.sk ; s` / : (18.12)
Suppose the points sk are equispaced on the line. Denoting the smallest distance
between the points by d , we see that
8 92
N <X
X 1 =
N 1 j .sk s` /
: ;
k;`D1 j D1
8 92
1
X X
N 1 <X1
=
D 2
C 2N 1 .N m/ 2 1
j exp j md :
j
: ;
j D1 mD1 j D1
If we assume that
X
2
j < 1 and sup j < < 1; (18.13)
j 1 j 1
then Conditions (18.9) and (18.10), and so the standard rate (18.1), hold. Condi-
tion (18.13) means that the correlation functions of all processes j ./ must decay
uniformly sufficiently fast.
To verify (18.9), observe that
8 92
X
N 1 <X1
=
N 1 .N m/ 2 1
j exp j md
: ;
mD1 j D1
8 92
X
N 1 < X
1 =
2
1
exp md
: j j
;
mD1 j D1
8 92
X
N 1 < X
1 =
2
1
j exp md
: ;
mD1 j D1
0 12
1
X
D O.1/ @ 2A
j D O.1/:
j D1
P1
The verification of (18.10) is analogous because (18.13) implies j D1
4
j < 1.
We will see that Condition (18.13) (formulated analogously for several classes of
models) is applicable in much more general settings than equispaced points on the
line.
382 18 Consistent estimation for spatially distributed curves
The mean function .s/ D fEX.sI t/; t 2 Œ0; 1g and the covariance operator is
then defined for x 2 L2 by
For the existence of Cs1 ;s2 , a minimal assumption is that the variables have finite
second moments in the sense that
Assumption 18.1. The spatial process fX.s/; s 2 Rd g satisfies (18.14) and (18.15).
In addition,
jEhX.s1/ ; X.s2 / ij h ks1 s2 k2 ; (18.16)
where h W Œ0; 1/ ! Œ0; 1/ with h.x/ & 0, as x ! 1.
18.2 Models and Assumptions 383
Examples 18.5 and 18.6 consider typical spatial covariance functions, and show
when condition (18.20) holds with a function h as in Assumption 18.1.
Example 18.5. Suppose that the fields fj .s/; s 2 Rd g, j 1, are zero mean,
strictly stationary and ˛-mixing. That is
with ˛j .h/ ! 0 if khk2 ! 1. Let ˛j0 .h/ D supf˛j .h/ W khk2 D hg. Then
˛j .h/ D supf˛j0 .x/ W x hg & 0 as h ! 1. Using stationarity and the main
result in Rio (1993) it follows that
where Qj .u/ D infft W P .jj .0/j > t/ ug is the quantile function of jj .0/j.
R1
Note that ˛h .h/ 1=4 for any h, and thus j .x/ 2 0 Qj2 .u/du D 2EŒj2 .0/.
P P
If j 1 Ej2 .0/ < 1, then (18.16) holds with h.x/ D j 1 j .x/. (Note that
jh.x/j & 0 follows from ˛j .x/ & 0 and the monotone convergence theorem.)
384 18 Consistent estimation for spatially distributed curves
We note that ˛-mixing is one of the classical assumptions in random field liter-
ature to establish limit theorems. It is in fact a much stronger assumption than ours
and it is suitable if one needs more delicate results, like a central limit theorem (see
e.g. Bolthausen (1982)) or uniform laws of large numbers, see Jenish and Prucha
(2009). Besides the restriction to scalar observations, many papers restrict to the
so-called “purely increasing domain sampling”, an assumption that we are going to
relax in the following.
P
Example 18.6. Suppose (18.19) holds, and set h.x/ D j 1 j .x/. If each j is a
powered exponential covariance function defined by
p
x
j .x/ D j2 exp :
j
then h satisfies the conditions of Assumption 18.1 if
X
2
j < 1 and sup j < 1: (18.21)
j 1 j 1
Condition (18.21) is also sufficient if all j are in the Matérn class, see Stein (1999),
with the same , i.e.
j .x/ D j2 x K .x=j /;
because the modified Bessel function K decays monotonically and approximately
exponentially fast; numerical calculations show that K .s/ practically vanishes if
s > . Condition (18.21) is clearly sufficient for spherical j defined (for d D 3) by
8 !
ˆ
< 2 1 3x C x
3
j ; x j
j .x/ D 2j 2j3
:̂
0; x > j
Lemma 18.2. Let X.s/ have representation (18.4) with zero mean and EkX.s/k4 <
1. Assume further that i ./ and j ./ are independent if i ¤ j . Then
ˇ ˇ
ˇEhX.s1 / ˝ X.s1 / C ; X.s2 / ˝ X.s2 / C iS ˇ
ˇX ˇ ˇ ˇ2
ˇ 2 ˇ ˇ X ˇ
ˇˇ Cov j .s1 /; j .s2 / ˇˇ C ˇˇ
2
E j .s1 /j .s2 / ˇ :
ˇ
j 1 j 1
Proof. If i ./ and j ./ are independent for i ¤ j , then the ej are the eigenvalues
of C , and the j .s/ are the principal component scores with Ej2 .s/ D j . Using
continuity of the inner product and dominated convergence we obtain
ˇ ˇ
ˇEhX.s1 / ˝ X.s1 / C ; X.s2 / ˝ X.s2 / C iS ˇ
ˇ Eˇˇ
ˇ XD
D ˇˇE hX.s1 /; ej iX.s1 / C.ej / ; hX.s2 /; ej iX.s2 / C.ej / ˇˇ
j 1
ˇ Eˇˇ
ˇ XD X X
D ˇˇE j .s1 / ` .s1 /e` j ej ; j .s2 / k .s2 /ek j ej ˇˇ
j 1 `1 k1
ˇ X X ˇ
ˇ ˇ
D ˇˇE j .s1 /j .s2 / ` .s1 /` .s2 / C 2j j j2 .s1 / j j2 .s2 / ˇ
ˇ
j 1 `1
ˇ ˇ ˇ ˇ
ˇX 2 ˇ ˇ X X ˇ
ˇ
ˇ Cov j .s1 /; j .s2 / ˇˇ C ˇˇ
2
E j .s1 /j .s2 / E ` .s1 /` .s2 / ˇ
ˇ
j 1 j 1 `¤j
ˇX ˇ ˇ ˇ2
ˇ 2 ˇ ˇ X ˇ
ˇ
ˇ Cov j .s1 /; j .s2 / ˇˇ C ˇˇ
2
E j .s1 /j .s2 / ˇ : t
u
ˇ
j 1 j 1
where jSj denotes the number of elements of S. The quantity I .S/ is the maximal
fraction of S–points in a ball of radius centered at an element of S. Notice that
1=jSj I .S/ 1. We call 7! I .S/ the intensity function of S.
386 18 Consistent estimation for spatially distributed curves
The following Lemma relates the non-random regular design to Definition 18.1.
We write aN bN if lim sup bN =aN < 1.
Lemma 18.4. In the above described design the following pairs of statements are
equivalent:
(i) ˛N remains bounded , Type A sampling;
(ii) ˛N ! 1 and ˛N D o.N 1=d / , Type B sampling;
(iii) ˛N N 1=d , Type C sampling.
388 18 Consistent estimation for spatially distributed curves
Proof. Let U" .x/ be the sphere in Rd with center x and radius ". Assume first that
˛N D o.N 1=d /, which covers (i) and (ii). In this case the volume of the rectangles
Li;n as described in the proof of Lemma 18.3 satisfies
d
˛N
Vol.Li;n / D d dN D ! 0: (18.24)
N
Hence jU .x/ \ Z.N ı/j is asymptotically proportional to
d
Vol.U .x//=Vol.Li;n / D Vd N;
˛N
By the required Riemann measurability we can find an x 2 R0 such that for some
small enough " we have U2" .x/ R0 . Then U2"˛N .˛N x/ RN . Hence for any
20 "˛N ,
d
jU =2 .˛N x/ \ SN j jU2 .˛N x/ \ SN j
CL I .SN /
˛N N N
d
CU :
˛N
With the help of the above inequalities (i) and (ii) are easily checked.
Now we prove (iii). We notice that by (18.24) ˛N N 1=d is equivalent to
Vol.Li;n / does not converge to 0. Assume first that we have Type C sampling. Then
by the arguments above we find an x and a > 0 such that U .˛N x/ RN . Thus
irregular sets. For the sake of simplicity we shall assume that on R0 the density is
bounded away from zero, so that we have 0 < fL infx2R0 f .x/. The point set
fsk;N ; 1 k N g is defined by sk;N D ˛N sk for k D 1; : : : ; N . For fixed N ,
this is equivalent to: fsk;N ; 1 k N g is an iid sequence on RN D ˛N R0 with
d 1
density ˛N f .˛N s/.
We cannot expect to obtain a full analogue of Lemma 18.4 in the randomized
setup. For Type C sampling, the problem is much more delicate, and a closer study
shows that it is related to the oscillation behavior of multivariate empirical pro-
cesses. While Stute (1984) gives almost sure upper bounds, we would need here
sharp results on the moments of the modulus of continuity of multivariate empirical
process. Such results exist, see Einmahl and Ruymgaart (1987), but are connected
to technical assumptions on the bandwidth for the modulus (here determined by
˛N ) which are not satisfied in our setup. Since a detailed treatment would be very
difficult, we only state the following lemma.
Lemma 18.5. In the above described sampling scheme the following statements
hold:
(i) ˛N remains bounded ) Type A sampling;
(ii) ˛N ! 1 and ˛N D o.N 1=d / ) Type B sampling;
Proof. By Jensen’s inequality we infer that
1 X
N
EI .SN / D E sup I fsk;N 2 U .x/ \ RN g
x2RN N
kD1
sup P s1;N 2 U .x/ \ RN
x2RN
D sup P s1 2 U =˛N .x/ \ R0
x2R0
Z
D sup f .s/ds:
x2R0 U=˛N .x/\R0
Our goal is to establish the consistency of the sample mean for functional spatial
data. We consider Type B or Type C sampling and obtain rates of convergence. We
390 18 Consistent estimation for spatially distributed curves
start with a general setup, and show that the rates can be improved in special cases.
The general results are applied to functional random fields with specific covariance
structures. The proofs of the main results, Propositions 18.1, 18.2, 18.3, are collected
in Section 18.7.
For independent or weakly dependent functional observations Xk ,
X 2
1 N
E N X k D O N 1 :
(18.26)
kD1
The technical assumptions on h pose no practical problem, they are satisfied for
all important examples, see Example 18.6. A common situation is that x d 1 h.x/ is
increasing on Œ0; b and decreasing thereafter.
Our first example shows that for most typical covariance functions, under nearly
infill domain sampling, the rate of consistency may be much slower than for the iid
case, if the size of the domain does not increase fast enough.
Example 18.9. Suppose the functional spatial process has representation (18.4), and
(18.19) holds with with the covariance functions jP as in Example 18.6 (powered
exponential, Matérn or spherical). Define h.x/ D j 1 j .x/, and assume that
condition (18.21) holds. Assumption 18.1 is then satisfied and
Z 1
x d 1 h.x/dx < 1 and sup x d 1 h.x/ < 1: (18.29)
0 x2R
Choosing "N such that "N ! 0 and ˛N "N ! 1, it follows that under Type B or
Type C sampling, the sample mean is consistent.
The bound in Proposition 18.3 can be easily applied to any specific random sam-
pling design and any model for the functions j in (18.18). It nicely shows that what
matters for the rate of consistency is the interplay between the rate of growth of the
sampling domain and the rate of decay of dependence.
Let us explain in slightly more detail a Type C sampling situation. Here typically
we have ˛N D N 1=d . Then taking "N D aN 1=d log N , a > 0, we see that the
rate of consistency is h.a log N / _ N 1 . For typical covariance functions j , like
powered exponential, Matérn or spherical, h.a log N / decays faster than N 1 . In
such cases, the rate of consistency is, up to some logarithmic factor, the same as for
an iid sample. For ease of reference, we formulate the following corollary, which
can be used in practical applications.
Corollary 18.1. Assume the random sampling design with the sequence fsk;N g
independent the process X . Suppose that X.s/ has representation (18.18) and that
(18.19) holds with the j in one of the families specified in Example 18.6. If Con-
dition (18.21) holds, and ˛N N 1=d then (18.26) holds up to some multiplicative
logarithmic factor.
18.5 Consistency of the empirical covariance operator 393
In Section 18.4 we found the rates of consistency for the functional sample mean.
We now turn to the rates for the sample covariance operator. Assuming the func-
tional observations have mean zero, the natural estimator of the covariance operator
C is the sample covariance operator given by
X
N
bN D 1
C X.sk / ˝ X.sk /:
N
kD1
where
1 X
N
XNN D X.sk /:
N
kD1
Both operators are implemented in statistical software packages, for example in
the popular R package fda and in a similar MATLAB package, see Ramsay et al.
(2009), The operator ON is used to compute the EFPC’s for centered data, while
bN for data without centering.
C
We first derive the rates of consistency for C bN assuming EX.s/ D 0. Then
O
we turn to the operator N . The proofs are obtained by applying the technique
developed for the estimation of the functional mean. It is a general approach based
on the estimation of the second moments of an appropriate norm (between estimator
and estimand) so that the conditions in Definition 18.1 can come into play. It is
broadly applicable to all statistics obtained by simple averaging. The proofs are
thus similar to those presented in the simplest case in Section 18.7, but the notation
becomes more cumbersome because of the increased complexity of the objects to
be averaged. To conserve space these proofs are not included.
We begin by observing that
E C bN C 2 D hC
S
bN C ; C bN C iS
1 XX
N N
D hX.sk / ˝ X.sk / C ; X.s` / ˝ X.s` / C iS :
N2
kD1 `D1
Relation (18.31) is used as the starting point of all proofs, cf. the proof of Propo-
sition 18.1 in Section 18.4. Modifying the proofs of Section 18.4, we arrive at the
following results.
394 18 Consistent estimation for spatially distributed curves
Proposition 18.4. Let Assumption 18.2 hold, and assume that SN defines a non-
random design of Type A, B or C. Then for any N > 0
E CbN C 2 H.N / C H.0/I N .SN /:
S
It follows that under Type B or Type C sampling the sample covariance operator is
consistent.
Example 18.12. Let X have representation (18.4), in which the scalar fields j ./
are independent and Gaussian, and (18.11) (18.12) and (18.13) hold.
It follows that for some large enough constant A,
ˇ ˇ ˇ ˇ2
ˇX 2 ˇ ˇ X ˇ
ˇ Cov .s /; 2
.s / ˇ C ˇ E .s / .s / ˇ
ˇ j 1 j 2 ˇ ˇ j 1 j 2 ˇ
j 1 j 1
1
A exp 2 ks1 s2 k2 :
Hence by Lemma 18.2, Assumption 18.2 holds with H.x/ D A exp 21 ks1
s2 k2 . Proposition 18.4 yields consistency of the estimator under Type B or Type C
sampling, as
EkCbN C k2S A exp.21 N / C I N .SN / :
we see that
QN ON D .XN N / ˝ .XN N /:
Therefore
Ek ON C k2S 2Ek QN C k2S C 2Ek.XN N / ˝ .XN N /k2S :
The bounds in Propositions 18.4, 18.5 and 18.6 apply to Ek QN C k2S . Observe
that
Ek.XN N / ˝ .XN N /k2S D EkXN N k4 :
If X.s/ are bounded variables, i.e. supt 2Œ0;1 jX.sI t/j B < 1 a.s., then
kXNN k4 4B 2 kXNN k2 . It follows that under Assumption 18.1 we obtain
the same order of magnitude for the bounds of EkXNN k4 as we have obtained
in Propositions 18.1, 18.2 and 18.3 for EkXNN k2 . In general EkXN N k4
can neither be bounded in terms of EkXNN k2 nor with EkCO N C k2S . To
bound fourth order moments, conditions on the covariance between the variables
Zk;` WD hX.sk;N / ; X.s`;N /i and Zi;j for all 1 i; j; k; ` N are unavoid-
able. However, a simpler general approach is to require higher order moments of
kX.s/k. More precisely, we notice that for any p > 1, by the Hölder inequality,
1=p
4p2 .p1/=p
EkXN N k4 EkXN N k2 EkXNN k p1 :
4p2
Thus as long as EkX.s/k p1 < 1, we conclude that, by stationarity,
1=p
EkXN N k4 M.p/ EkXNN k2 ;
where M.p/ depends on the distribution of X.s/ and on p, but not on N . It is now
evident how the results of Section 18.4 can be used to obtain bounds for Ek ON
C k2S : We state in Proposition 18.7 the version for the general non-random design.
The special cases follow, and the random designs are treated analogously. It follows
that if Assumptions 18.1 and 18.2 hold, then Ek ON C k2S ! 0, under Type B or
C sampling, provided EkX.s/k4Cı < 1.
Proposition 18.7. Let Assumptions 18.1 and 18.2 hold and assume that for some
ı > 0 we have EkX.s/k4Cı < 1. Assume further that SN defines a non-random
design of Type A, B or C. Then for any N > 0 we have
Ek ON C k2S
˚ ˚ 2Cı
ı
2 H.N / C H.0/I N
.SN / C 2C.ı/ h.N / C h.0/I N
.SN / :
(18.32)
If X.s1 / is a.s. bounded by some finite constant B, then we can formally let ı in
(18.32) go to 1, with C.1/ D 4B 2 .
396 18 Consistent estimation for spatially distributed curves
We begin by formalizing the intuition behind Example 18.2. By Lemma 2.3, the
claims in that example follow from Proposition 18.8. Recall that X ? D X.0/˝X.0/,
and observe that for x 2 L2 ,
Z Z
X ? .x/.t/ D X.0I u/x.u/du X.0I t/ D c ? .t; u/x.u/du;
where
c ? .t; u/ D X.0I t/X.0I u/:
Since ZZ
2
E c ? .t; u/ dt du D EkX.0/k4 < 1;
Proposition 18.8. Suppose representation (18.18) holds with stationary mean zero
Gaussian processes j such that
where the 1 and 2 are iid processes on the line, and 0 < < 1. Assume that the
processes 1 and 2 are Gaussian with mean zero and covariances EŒj .s/j .s C
h/ D expfh2 g; j D 1; 2. Thus, each Zj WD j .0/ is standard normal. Rearrang-
ing the terms, we obtain
p
X ? .x/ D Z12 hx; e1 i C Z1 Z2 hx; e2 i e1
p
C Z1 Z2 hx; e1 i C Z22 hx; e2 i e2 :
The matrix p
2
p Z1 Z1 Z2
Z1 Z2 Z22
18.6 Inconsistent empirical functional principal components 397
has only one positive eigenvalue Z12 CZ22 D kX.0/k2 . A normalized eigenfunction
associated with it is
X.0/ 1=2 p
f WD D Z12 C Z22 Z1 e1 C Z2 e2 : (18.35)
kX.0/k
We now state a general result showing that Type A sampling generally leads to
inconsistent estimators if the spatial dependence does not vanish.
where B.x/ is non-increasing, then under Type A sampling the sample covariance
bN is not a consistent estimator of C .
C
Example 18.14. We focus on condition (18.36) for the FPC’s. For the general model
(18.18), the left–hand side of (18.36) is equal to
X
.s1 ; s2 / D Cov.i .s1 /j .s1 /; i .s2 /j .s2 //:
i;j 1
Then,
.s1 ; s2 / D f ./.ks1 s2 k/;
where
p h i
f ./ D .3 2 2/.1 C 2 / C 2 1 C C 2 .1 C 3=2 /.1 C /1=2 :
1 XX
N N
2 h ksk;N s`;N k2
N
kD1 `D1
1 XX
N N
h.N /I fksk;N s`;N k2 N g
N2
kD1 `D1
C h.0/I fksk;N s`;N k2 N g
h.N / C h.0/ I N .SN /: t
u
The following Lemma is a simple calculus problem and will be used in the proof
of Proposition 18.2.
X
L Z L=N
k 1 2
f f .x/dx C sup jf .x/j:
N N 0 N x2Œ0;L=N
kD0
1 X
SN X
SN
2
h ksk;N s`;n k2 :
SN
kD1 `D1
where K depends on diam.R0 /. It is easy to see that the number of points on the
grid having distance m from a given point is less than 2d.2m C 1/d , m 0. Hence
the number of pairs for which (18.38) holds is less than 2d.2m C 1/d 1 N . On the
other hand, if d.sk;N ; s`;N / D m, then ksk;N s`;N k2 mı0 N . Let us assume
without loss of generality that ı0 D 1. Noting that there is no loss of generality if
400 18 Consistent estimation for spatially distributed curves
we assume that x ı1 h.x/ is also monotone on Œ0; b, we obtain by Lemma 18.6 for
large enough N and K < K 0 < K 00
1 X
SN X
SN
2
h ksk;N s`;n k2
SN
kD1 `D1
K 0X
N 1=d
.2m C 1/d 1 2h.0/
2d h mN C
mD1
N N
K 0 N 1=d C1 d 1
3 d 1 X m m 1 2h.0/
2d N N h N N C
N mD0
N N N N
d 1 Z K 00 N 1=d 1
3
2d .N N x/d 1 h N N x dx
N 0
2 d 1 2h.0/
C sup x h.x/ C
N x2Œ0;K 00 ˛N = N
Z K 00 ˛N =
.3/ dd
D d
x d 1 h x dx
˛N 0
4d.3/d 1 2h.0/
C d 1 1=d
sup x d 1 h.x/ C :
˛N N x2Œ0;K 00 ˛N = N
By Lemma 18.4, Type B sampling implies ˛N ! 1 and ˛N D o N 1=d . This
d
shows (18.28). Under Type C sampling 1=˛N 1=N . The proof is finished. t
u
Proof of Proposition 18.3. This time we have
X 2
1 N 1 XX
N N
EN X.s k;N /
2
Eh ksk;N s`;N k2
N
kD1 kD1 `D1
Z Z
2d
1 1 h.0/
˛N h ks rk2 f .˛N s/f .˛N r/ d sd r C
R R N
Z Z N N
h.0/
D h ˛N ks rk2 f .s/f .r/ d sd r C :
R0 R0 N
Furthermore, for any "N > 0,
Z Z
h ˛N ks rk2 f .s/f .r/ d sd r
R0 R0
Z Z
˚
h.0/ f .s/f .r/I ks rk2 "N d sd r C h.˛N "N /
R0 R0
Z Z
2
˚
h.0/ sup f .s/ I ks rk2 "N d sd r C h.˛N "N /:
s2R0 R0 R0
R ˚
Now for fixed r it is not difficult to show that R0 I ks rk2 "N d s 6 "dN .
(The constant 6 could be replaced with d=2 = .d=2 C 1//.
18.7 Proofs of the results of Sections 18.4, 18.5 and 18.6 401
ZZ ( )2
1 X
N
bN
kC X ? k2S D ŒX.sn I t/X.sn I u/ X.0I t/X.0I u/ dt du:
N nD1
Therefore,
bN X ? k2S 2I1 .N / C 2I2 .N /;
kC
where
ZZ ( )2
1 X
N
I1 .N / D X.sn I t/.X.sn I u/ X.0I u// dt du
N nD1
and
ZZ ( )2
1 X
N
I2 .N / D X.0I u/.X.sn I t/ X.0I t// dt du:
N nD1
We will show that EI1 .N / ! 0. The argument for I2 .N / is the same. Observe
that
I1 .N /
N ZZ
1 X
D 2 X.sk I t/.X.sk I u/ X.0I u//X.s` I t/.X.s` I u/ X.0I u//dt du
N
k;`D1
N Z Z
1 X
D X.sk I t/X.s` I t/dt .X.sk I u/ X.0I u//.X.s` I u/ X.0I u//du:
N2
k;`D1
Thus,
EI1 .N /
( Z 2 )1=2 ( Z 2 )1=2
1 X
N
2 E X.sk I t/X.s` I t/dt E Yk .u/Y` .u/du ;
N
k;`D1
where
Yk .u/ D X.sk I u/ X.0I u/:
The right hand side tends to zero by the Dominated Convergence Theorem. This
establishes (18.39), and completes the proof of (18.33). t
u
Akaike, H. (1978). A Bayesian analysis of the minimum AIC procedure. Ann. Inst.
Statist. Math., 30A, 9–14.
Akhiezier, N. I. and Glazman, I. M. (1993). Theory of Linear Operators in Hilbert
Space. Dover, New York.
Andersen, T. G. and Bollerslev, T. (1997a). Heterogeneous information arrivals
and return volatility dynamics: uncovering the long run in high frequency data.
Journal of Finance, 52, 975–1005.
Andersen, T. G. and Bollerslev, T. (1997b). Intraday periodicity and volatility
persistence in financial markets. Journal of Empirical Finance, 2–3, 115–158.
Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis. Wiley,
New York.
Anderson, T. W. (1994). The Statistical Analysis of Time Series. Wiley and Sons.
Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent
covariance matrix estimation. Econometrica, 59, 817–858.
Andrews, D. W. K. and Monahan, J. C. (1992). An improved heteroskedasticity
and autocorrelation consistent covariance matrix estimator. Econometrica, 60,
953–966.
Antoch, J., Husková, M. and Prásková, Z. (1997). Effect of dependence on statistics
for determination of change. Journal of Statistical Planning and Inference, 60,
291–310.
Antoniadis, A., Paparoditis, E. and Sapatinas, T. (2006). A functional wavelet–
kernel approach for time series prediction. Journal of the Royal Statistical Soci-
ety, Series B, 68, 837–857.
Aston, J. A. D. and Kirch, C. (2011a). Detecting and estimating epidemic changes
in dependent functional data. CRiSM Research Report 11–07. University of War-
wick.
Aston, J. A. D. and Kirch, C. (2011b). Estimation of the distribution of change-
points with application to fMRI data. CRiSM Research Reports. University of
Warwick.
Aue, A., Gabrys, R., Horváth, L. and Kokoszka, P. (2009). Estimation of a change–
point in the mean function of functional data. Journal of Multivariate Analysis,
100, 2254–2269.
L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, 405
Springer Series in Statistics 200, DOI 10.1007/978-1-4614-3655-3,
© Springer Science+Business Media New York 2012
406 References
Aue, A., Hörmann, S., Horváth, L. and Hušková, M. (2010). Sequential stability test
for functional linear models. Technical Report. University of California Davis.
Aue, A., Hörmann, S., Horváth, L. and Reimherr, M. (2009). Break detection in the
covariance structure of multivariate time series models. The Annals of Statistics,
37, 4046–4087.
Benko, M., Härdle, W. and Kneip, A. (2009). Common functional principal com-
ponents. The Annals of Statistics, 37, 1–34.
Berkes, I., Gabrys, R., Horváth, L. and Kokoszka, P. (2009). Detecting changes in
the mean of functional observations. Journal of the Royal Statistical Society (B),
71, 927–946.
Berkes, I., Hörmann, S. and Horváth, L. (2008). The functional central limit theorem
for a family of GARCH observations with applications. Statistics and Probability
Letters, 78, 2725–2730.
Berkes, I., Hörmann, S. and Schauer, J. (2009). Asymptotic results for the empirical
process of stationary sequences. Stochastic Processes and their Applications,
119, 1298–1324.
Berkes, I. and Horváth, L. (2001). Strong approximation for the empirical process
of a GARCH sequence. The Annals of Applied Probability, 11, 789–809.
Berkes, I. and Horváth, L. (2003a). Asymptotic results for long memory LARCH
sequences. The Annals of Applied Probability, 13, 641–668.
Berkes, I. and Horváth, L. (2003b). Limit results for the empirical process of
squared residuals in GARCH models. Stochastic Processes and their Applica-
tions, 105, 279–298.
Berkes, I., Horváth, L. and Kokoszka, P. (2005). Near integrated GARCH
sequences. Annals of Applied Probability, 15, 890–913.
Berkes, I., Horváth, L., Kokoszka, P. and Shao, Q-M. (2005). Almost sure conver-
gence of the Bartlett estimator. Periodica Mathematica Hungarica, 51, 11–25.
Berkes, I., Horváth, L., Kokoszka, P. and Shao, Q-M. (2006). On discriminating
between long-range dependence and changes in mean. The Annals of Statistics,
34, 1140–1165.
Berkes, I., Horváth, L. and Kokoszka, P. S. (2003). GARCH processes: structure
and estimation. Bernoulli, 9, 201–227.
Besse, P., Cardot, H. and Stephenson, D. (2000). Autoregressive forecasting of some
functional climatic variations. Scandinavian Journal of Statistics, 27, 673–687.
Bhansali, R. J. (1993). Order selection for linear time series models: a review.
In Developments in Time Series Analysis, London (ed. T. Subba Rao), pp. 50–6.
Chapman and Hall.
Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York.
Billingsley, P. (1995). Probability and Measure, 3rd edn. Wiley, New York.
Billingsley, P. (1999). Convergence of Probability Measures; Second Edition.
Wiley, New York.
Boente, G. and Fraiman, R. (2000). Kernel–based functional principal components.
Statistics and Probability Letters, 48, 335–345.
References 407
Boente, G., Rodriguez, D. and Sued, M. (2011). Testing the equality of covariance
operators. In Recent Advances in Functional Data Analysis and Related Topics
(ed. F. Ferraty). Physica–Verlag.
Bolthausen, E. (1982). On the central limit theorem for stationary mixing random
fields. The Annals of Probability, 10, 1047–1050.
Borggaard, C. and Thodberg, H. (1992). Optimal minimal neural interpretation of
spectra. The Annals of Chemistry, 64, 545–551.
Bosq, D. (2000). Linear Processes in Function Spaces. Springer, New York.
Bosq, D. and Blanke, D. (2007). Inference and Prediction in Large Dimensions.
Wiley.
Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1994). Time Series Analysis:
Forecasting and Control, Third edn. Prentice Hall, Englewood Cliffs.
Bradley, R. C. (2007). Introduction to Strong Mixing Conditions, volume 1,2,3.
Kendrick Press.
Bremer, J. (1998). Trends in the ionospheric E and F regions over Europe. Annales
Geophysicae, 16, 986–996.
Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods.
Springer, New York.
Brodsky, B. E. and Darkhovsky, B. S. (1993). Nonparametric Methods in Change–
Point Problems. Kluwer.
Cai, T. and Hall, P. (2006). Prediction in functional linear regression. The Annals
of Statistics, 34, 2159–2179.
Campbell, J. Y., Lo, A. W. and MacKinlay, A. C. (1997). The Econometrics of
Financial Markets. Princeton University Press, New Jersey.
Cardot, H., Faivre, R. and Goulard, M. (2003c). Functional approaches for predict-
ing land use with the temproal evolution of coarse resolution remote sensing data.
Journal of Applied Statistics, 30, 1185–1199.
Cardot, H., Ferraty, F., Mas, A. and Sarda, P. (2003). Testing hypothesis in the
functional linear model. Scandinavian Journal of Statistics, 30, 241–255.
Cardot, H., Ferraty, F. and Sarda, P. (2003b). Spline estimators for the functional
linear model. Statistica Sinica, 13, 571–591.
Carey, J. R., Liedo, P., Harshman, L., Müller, H. G., Partridge, L. and Wang, J. L.
(2002). Life history responce of mediterranean fruit flies to dietary restrictions.
Aging Cell, 1, 140–148.
Carroll, S. S. and Cressie, N. (1996). A comparison of geostatistical methodologies
used to estimate snow water equivalent. Water Resources Bulletin, 32, 267–278.
Carroll, S. S., Day, G. N., Cressie, N. and Carroll, T. R. (1995). Spatial modeling
of snow water equivalent using airborne and ground–based snow data. Environ-
metrics, 6, 127–139.
Cattell, R. B. (1966). The scree test for the number of factors. Journal of Multivari-
ate Behavioral Research, 1, 245–276.
Chatfield, C. (1998). Durbin–Watson test. In Encyclopedia of Biostatistics (eds P.
Armitage and T. Colton), volume 2, pp. 1252–1253. Wiley.
408 References
Chiou, J-M. and Müller, H-G. (1998). Quasi–likelihood regression with unknown
link and variance functions. Journal of the American Statistical Association, 92,
72–83.
Chiou, J-M. and Müller, H-G. (2007). Diagnostics for functional regression via
residual processes. Computational Statistics and Data Analysis, 15, 4849–4863.
Chiou, J-M., Müller, H-G. and Wang, J-L. (2004). Functional response models.
Statistica Sinica, 14, 675–693.
Chiou, J-M., Müller, H-G., Wang, J-L. and Carey, J. R. (2003). A functional mul-
tiplicative effects model for longitudal data, with application to reproductive his-
tories of female medflies. Statistica Sinica, 13, 1119–1133.
Chitturi, R. V. (1976). Distribution of multivariate white noise autocorrelation.
Journal of the American Statistical Association, 71, number 353, 223–226.
Clarkson, D. B., Fraley, C., Gu, C. and Ramsay, J. O. (2005). S+ Functional Data
Analysis. Springer.
Cochrane, D. and Orcutt, G. H. (1949). Application of least squares regression
to relationships containing auto-correlated error terms. Journal of the American
Statistical Association, 44, 32–61.
Cook, D. R. (1977). Detection of influential observations in linear regression.
Technometrics, 19, 15–18.
Cook, R. D. (1994). On interpretation of regression plots. Journal of the American
Statistical Association, 89, 177–189.
Cook, R. D. and Weisberg, S. (1982). Residuals and Inference in Regression. Chap-
man and Hall.
Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley.
Csörgő, M. and Horváth, L. (1993). Weighted Approximations in Probability and
Statistics. Wiley, New York.
Csörgő, M. and Horváth, L. (1997). Limit Theorems in Change-Point Analysis.
Wiley, New York.
Cuevas, A., Febrero, M. and Fraiman, R. (2002). Linear functional regression: the
case of fixed design and functional response. The Canadian Journal of Statistics,
30, 285–300.
Cuevas, A., Febrero, M. and Fraiman, R. (2004). An ANOVA test for functional
data. Computational Statistics and Data Analysis, 47, 111–122.
Cupidon, J., Gilliam, D. S., Eubank, R. and Ruymgaart, F. (2007). The delta method
for analytic functions of random operators with application to functional data.
Bernoulli, 13, 1179–1194.
Daglis, I. A., Kozyra, J. U., Kamide, Y., Vassiliadis, D., Sharma, A. S., Liemohn,
M.W., Gonzalez, W. D., Tsurutani, B. T. and Lu, G. (2003). Intense space
storms: Critical issues and open disputes. Journal of Geophysical Research, 108,
doi:10.1029/2002JA009722.
DasGupta, A. (2008). Asymptotic Theory of Statistics and Probability. Springer.
Dauxois, J., Pousse, A. and Romain, Y. (1982). Asymptotic theory for principal
component analysis of a vector random function. Journal of Multivariate Analy-
sis, 12, 136–154.
References 409
Fraiman, R. and Muniz, G. (2001). Trimmed means for functional data. Test, 10,
419–440.
Fremdt, S., Horváth, L., Kokoszka, P. and Steinebach, J. (2011). Testing the equality
of covariance operators in functional samples. Technical report. Universität zu
Köln.
Gabrys, R., Horváth, L. and Kokoszka, P. (2010). Tests for error correlation in the
functional linear model. Journal of the American Statistical Association, 105,
1113–1125.
Gabrys, R. and Kokoszka, P. (2007). Portmanteau test of independence for func-
tional observations. Journal of the American Statistical Association, 102, 1338–
1348.
Gelfand, A. E., Diggle, P. J., Fuentes, M. and Guttorp, P. (2010) (eds). Handbook
of Spatial Statistics. CRC Press.
Gervini, D. (2008). Robust functional estimation using the spatial median and
spherical principal components. Biometrika, 95, 587–600.
Giraitis, L., Kokoszka, P. S. and Leipus, R. (2000). Stationary ARCH models:
dependence structure and Central Limit Theorem. Econometric Theory, 16, 3–
22.
Giraitis, L., Kokoszka, P. S., Leipus, R. and Teyssière, G. (2003). Rescaled variance
and related tests for long memory in volatility and levels. Journal of Economet-
rics, 112, 265–294.
Giraldo, R., Delicado, P. and Mateu, J. (2010). Ordinary kriging for function–valued
spatial data. Environmental and Ecological Statistics, 18, 411–426.
Giraldo, R., Delicado, P. and Mateu, J. (2011). A generalization of cokriging and
multivariable spatial prediction for functional data. Technical report. Universitat
Politécnica de Catalunya, Barcelona.
Gohberg, I., Golberg, S. and Kaashoek, M. A. (1990). Classes of Linear Operators.
Operator Theory: Advances and Applications, volume 49. Birkhaüser.
Gohberg, I. C. and Krein, M. C. (1969). Introduction to the Theory of Linear
Nonselfadjoint Operators in Hilbert Space. Translations of Mathematical Mono-
graphs. AMS.
Graham, A. (1981). Kronecker Products and Matrix Calculus with Applications.
John Wiley and Sons.
Griswold, C., Gomulkiewicz, R. and Heckman, N. (2008). Hypothesis testing in
comparative and experimental studies of function-valued traits. Evolution, 62,
1229–42.
Gromenko, O. and Kokoszka, P. (2011). Testing the equality of mean func-
tions of ionospheric critical frequency curves. Technical Report. Utah State
University.
Gromenko, O., Kokoszka, P., Zhu, L. and Sojka, J. (2011). Estimation and testing
for spatially indexed curves with application to ionospheric and magnetic field
trends. The Annals of Applied Statistics, Forthcoming.
Gu, C. (2002). Smoothing Spline ANOVA Models. Springer.
Guillaume, D. M., Dacorogna, M. M., Dave, R. D., Müller, U. A., Olsen, R. B. and
Pictet, O. V. (1997). From the bird’s eye to the microscope: a survey of new
References 411
stylized facts of the intra-daily foreign exchange markets. Finance and Stochas-
tics, 1, 95–129.
Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and its Applications.
Academic Press, New York.
Hall, P. and Hosseini-Nasab, M. (2006). On properties of functional principal
components. Journal of the Royal Statistical Society (B), 68, 109–126.
Hall, P. and Hosseini-Nasab, M. (2007). Theory for high–order bounds in functional
principal components analysis. Technical Report. The University of Melbourne.
Hall, P. and Keilegom, I. Van (2007). Two–sample tests in functional data analysis
starting from discrete data. Statistica Sinica, 17, 1511–1531.
Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press, Princeton,
NJ.
Hannan, E. J. (1980). The estimation of the order of an ARMA process. The Annals
of Statistics, 8, 1071–1081.
Hannan, E. J. and Quinn, B. G. (1979). The determination of the order of an auto-
regression. J. Royal Statist. Soc. B, 41, 190–195.
Hannan, E. J. and Rissannen, J (1982). Recursive estimation of mixed auto-
regressive-moving average order. Biometrika, 69, 81–94; Correction (1983) 70,
303.
Hansen, B. E. (1995). Rethinking the univariate approach to unit root testing:
using covariates to increase power. Econometric Theory, 11, 1148–1171; Code
available at http://www.ssc.wisc.edu/ bhansen.
He, G., Müller, H-G. and Wang, J-L. (2003). Functional canonical analysis for
square integrable stochastic processes. Journal of Multivariate Analisis, 85, 54–
77.
He, G., Müller, H-G. and Wang, J-L. (2004). Methods of canonical analysis for
functional data. Journal of Statistical Planning and Inference, 122, 141–159.
Hörmann, S. (2008). Augmented GARCH sequences: Dependence structure and
asymptotics. Bernoulli, 14, 543–561.
Hörmann, S., Horváth, L. and Reeder, R. (2010). A functional version of the ARCH
model. Technical Report. University of Utah.
Hörmann, S. and Kokoszka, P. (2010). Weakly dependent functional data. The
Annals of Statistics, 38, 1845–1884.
Hörmann, S. and Kokoszka, P. (2011). Consistency of the mean and the principal
components of spatially indexed functional data. Bernoulli, Forthcoming.
Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge University
Press.
Horn, R. A. and Johnson, C. R. (1991). Topics in Matrix Analysis. Cambridge
University Press.
Horváth, L., Horváth, Z. and Hušková, M. (2008). Ratio tests for change point
detection. In Beyond Parametrics in Interdisciplinary Research: Festschrift in
Honor of Professor Pranab K. Sen, IMS Collections, pp. 293–304. IMS.
Horváth, L., Hušková, M. and Kokoszka, P. (2010). Testing the stability of the
functional autoregressive process. Journal of Multivariate Analysis, 101, 352–
367.
412 References
Horváth, L., Kokoszka, P. and Reeder, R. (2011). Estimation of the mean of func-
tional time series and a two sample problem. Journal of the Royal Statistical
Society (B), Forthcoming.
Horváth, L., Kokoszka, P. and Reimherr, M. (2009). Two sample inference in
functional linear models. Canadian Journal of Statistics, 37, 571–591.
Horváth, L., Kokoszka, P. S. and Steinebach, J. (1999). Testing for changes in
multivariate dependent observations with applications to temperature changes.
Journal of Multivariate Analysis, 68, 96–119.
Horváth, L. and Reeder, R. (2011). Detecting changes in functional linear models.
Technical Report. University of Utah.
Horváth, L. and Reeder, R. (2011b). A test of significance in functional
quadratic regression. Technical Report. University of Utah. preprint available
at http://arxiv.org/abs/1105.0014.
Hosking, J. R. M. (1980). The multivariate portmanteau statistic. Journal of the
American Statistical Association, 75, 602–608.
Hosking, J. R. M. (1981). Equivalent forms of the multivariate portmanteau statis-
tics. Journal of the Royal Statistical Society (B), 43, 261–262.
Hosking, J. R. M. (1984). Modeling persistence in hydrological time series using
fractional differencing. Water Resources Research, 20, number 12, 1898–1908.
Hosking, J. R. M. (1989). Corrigendum: Equivalent forms of the multivariate port-
manteau statistics. Journal of the Royal Statistical Society (B), 51, 303–303.
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal
components. Journal of Educational Psychology, 24, 498–520.
Hušková, M., Prášková, Z. and Steinebach, J. (2007). On the detection of changes
in autoregressive time series I. Asymptotics. Journal of Statistical Planning and
Inference, 137, 1243–1259.
Izem, R. and Marron, J. S. (2007). Functional data analysis of nonlinear modes of
variation. Electronic Journal of Statistics, 1, 641–676.
Jach, A. and Kokoszka, P. (2008). Wavelet domain test for long–range dependence
in the presence of a trend. Statistics, 42, 101–113.
Jach, A., Kokoszka, P., Sojka, J. and Zhu, L. (2006). Wavelet–based index of
magnetic storm activity. Journal of Geophysical Research, 111, A09215.
Jenish, N. and Prucha, I. R. (2009). Central limit theorems and uniform laws of
large numbers for arrays or random fields. Journal of Econometrics, 150, 86–98.
Jiofack, J. G. A. and Nkiet, G. M. (2010). Testing for lack of dependence between
functional variables. Statistics and Probability Letters, 80, 1210–1217.
Johnson, R. A. and Wichern, D. W. (2002). Applied Multivariate Statistical Analy-
sis. Prentice Hall.
Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparcity for princi-
pal components analysis in high dimensions. Journal of the Americal Statistical
Association, 104, 682–693.
Jolliffe, I. T. (2002). Principal Component Analysis. Springer.
Jones, M. C. and Rice, J. A. (1992). Displaying the important features of a large
collection of similar curves. The American Statistician, 46, 140–145.
References 413
Jung, S. and Marron, J. S. (2009). PCA consistency in high dimension and low
sample size. The Annals of Statistics, 37, 4104–4130.
Kaiser, M. S., Daniels, M. J., Furakawa, K. and Dixon, P. (2002). Analysis of par-
ticulate matter air pollution using Markov random field models of spatial depen-
dence. Environmetrics, 13, 615–628.
Kamide, Y., Baumjohann, W., a nd W. D. Gonzalez, I. A. Daglis, Grande, M., Jose-
lyn, J. A., McPherron, R. L., Phillips, J. L., Reeves, E. G. D., Rostoker, G.,
Sharma, A. S., Singer, H. J., Tsurutani, B. T. and Vasyliunas, V. M. (1998). Cur-
rent understanding of magnetic storms: Storm–substorm relationships. Journal
of Geophysical Research, 103, 17705–17728.
Kargin, V. and Onatski, A. (2008). Curve forecasting by functional autoregression.
Journal of Multivariate Analysis, 99, 2508–2526.
Kiefer, J. (1959). K-sample analogues of the Kolmogorov-Smirnov and Cramér-
v.Mises tests. Ann. Math. Statist., 30, 420–447.
Kirkpatrick, M. and Heckman, N. (1989). A quantitiative genetic model for growth,
shape, reaction norms and other infinite–dimensional characters. Journal of
Mathematical Biology, 27, 429–450.
Kivelson, M. G. and Russell, C. T. (1997) (eds). Introduction to Space Physics.
Cambridge University Press.
Kokoszka, P., Maslova, I., Sojka, J. and Zhu, L. (2008). Testing for lack of depen-
dence in the functional linear model. Canadian Journal of Statistics, 36, 207–222.
Kokoszka, P. and Reimherr, M. (2011). Determining the order of the functional
autoregressive model. Technical Report. University of Chicago.
Kokoszka, P. and Zhang, X. (2010). Improved estimation of the kernel of the
functional autoregressive process. Technical Report. Utah State University.
Kokoszka, P. and Zhang, X. (2011). Functional prediction of cumulative intraday
returns. Technical Report. Utah State University.
Koul, H. L. (2002). Weighted Empirical Processes in Dynamic Nonlinear Models.
Springer.
Kraus, D. and Panaretos, V. M. (2011). Statistical inference on the second–order
structure of functional data in the presence of influential observations. Technical
report. École Polytechnique Fédérale de Lausanne.
Kuelbs, J. (1973). The invariance principle for Banach space valued random vari-
ables. Journal of Multivariate Analysis, 3, 161–172.
Lahiri, S. N. (1996). On inconsistency of estimators based on spatial data under
infill asymptotics. Sankhya Series A, 58, 403–417.
Lahiri, S. N. (2003). Central limit theorems for weighted sums of a spatial process
under a class of stochastic and fixed designs. Sankhya, Series A, 65, 356–388.
Lahiri, S. N. and Zhu, J. (2006). Resampling methods for spatial regression models
under a class of stochastic designs. Annals of Statistics, 34, 1774–1813.
Lastovicka, J., A, V. Mikhailov, Ulich, T., Bremer, J., Elias, A., Ortiz de Adler, N.,
Jara, V., Abbarca del Rio, R., Foppiano, A., Ovalle, E. and Danilov, A. (2006).
long term trends in foF2: a comparison of various methods. Journal of Atmo-
spheric and Solar-Terrestrial Physics, 68, 1854–1870.
414 References
Lastovicka, J., Akmaev, R. A., Beig, G., Bremer, J., Emmert, J. T., Jacobi, C., Jarvis,
J. M., Nedoluha, G., Portnyagin, Yu. I. and Ulich, T. (2008). Emerging pattern
of global change in the upper atmosphere and ionosphere. Annales Geophysicae,
26, 1255–1268.
Laukaitis, A. and Račkauskas, A. (2002). Functional data analysis of payment
systems. Nonlinear Analysis: Modeling and Control, 7, 53–68.
Laukaitis, A. and Račkauskas, A. (2005). Functional data analysis for clients seg-
mentation tasks. European Journal of Operational Research, 163, 210–216.
Lavielle, M. and Teyssiére, G. (2006). Detection of multiple change-points in
multivariate time series. Lithuanian Mathematical Journal, 46, 287–306.
Lehmann, E. L. (1999). Elements of Large Sample Theory. Springer.
Leng, X. and Müller, H-G. (2006). Classification using functional data analysis for
temporal gene expression data. Bioinformatics, 22, 68–76.
Leon, S. (2006). Linear Algebra with Applications. Pearson.
Leurgans, S. E., Moyeed, R. A. and Silverman, B. W. (1993). Canonical correlation
analysis when the data are curves. Journal of the Royal Statistical Society (B),
55, 752–740.
Li, W. K. and McLeod, A. I. (1981). Distribution of the residual autocorrelations in
multivariate ARMA time series models. Journal of the Royal Statistical Society
(B), 43, 231–239.
Li, Y. and Hsing, T. (2007). On rates of convergence in functional linear regression.
Journal of Multivariate Analysis, 98, 1782–1804.
Li, Y. and Hsing, T. (2010). Deciding the dimension of effective dimension reduc-
tion space for functional and high-dimensional data. The Annals of Statistics, 38,
3028–3062.
Li, Y., Wang, N. and Carroll, R. J. (2010). Generalized functional linear models with
semiparametric single–index interactions. Journal of the American Statistical
Association, 105, 621–633.
Liu, W. and Wu, W. B. (2010). Asymptotics of spectral density estimates. Econo-
metric Theory, 26, 1218–1245.
Ljung, G. and Box, G. (1978). On a measure of lack of fit in time series models.
Biometrika, 66, 67–72.
Loh, W.-L. (2005). Fixed-domain asymptotics for a subclass of Matern-type Gaus-
sian random fields. Annals of Statistics, 33, 2344–2394.
López-Pintado, S. and Romo, J. (2009). On the concept of depth for functional data.
Journal of the American Statistical Association, 104, 718–734.
Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Springer.
Ma, P. and Zhong, W. (2008). Penalized clustering of large scale functional data
with multiple covariates. Journal of the American Statistical Association, 103,
625–636.
Malfait, N. and Ramsay, J. O. (2003). The historical functional model. Canadian
Journal of Statistics, 31, 115–128.
Mas, A. (2002). Weak convergence for the covariance operators of a Hilbertian
linear process. Stochastic Processes and their Applications, 99, 117–135.
References 415
Maslova, I., Kokoszka, P., Sojka, J. and Zhu, L. (2009). Removal of nonconstant
daily variation by means of wavelet and functional data analysis. Journal of
Geophysical Research, 114, A03202.
Maslova, I., Kokoszka, P., Sojka, J. and Zhu, L. (2010a). Estimation of Sq varia-
tion by means of multiresolution and principal component analyses. Journal of
Atmospheric and Solar–Terrestial Physics, 72, 625–632.
Maslova, I., Kokoszka, P., Sojka, J. and Zhu, L. (2010b). Statistical significance
testing for the association of magnetometer records at high–, mid– and low lati-
tudes during substorm days. Planetary and Space Science, 58, 437–445.
McKeague, I. and Sen, B. (2010). Fractals with point impacts in functional linear
regression. The Annals of Statistics, 38, 2559–2586.
McMurry, T. and Politis, D. N. (2010). Resampling methods for functional data. In
Oxford Handbook on Statistics and FDA (eds F. Ferraty and Y. Romain). Oxford
University Press.
Mikhailov, A. V. and Marin, D. (2001). An interpretation of the f0F2 and hmF2
long-term trends in the framework of the geomagnetic control concept. Annales
Geophysicae, 19, 733–748.
Móricz, F. (1976). Moment inequalities and the strong law of large numbers.
Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 35, 299–314.
Müller, H-G., and Yao, F. (2008). Functional additive models. Journal of the
American Statistical Association, 103, 1534–1544.
Müller, H.-G. (2009). Functional modeling of longitudinal data. In Longitudi-
nal Data Analysis (eds G. Fitzmaurice, M. Davidian, G. Verbeke and G. Molen-
berghs), pp. 223–252. Wiley, New York.
Müller, H-G. and Stadtmüller, U. (2005). Generalized functional linear models. The
Annals of Statistics, 33, 774–805.
Nerini, D., Monestiez, P. and Mantéa, C. (2010). Cokriging for spatial functional
data. Journal of Multivariate Analysis, 101, 409–418.
Newey, W. K. and West, K. D. (1987). A simple, positive semi-definite, het-
eroskedasticity and autocorrelation consistent covariance matrix. Econometrica,
55, 703–08.
Noble, B. (1969). Applied Linear Algebra. Prentice Hall, Englewood Cliffs, NJ.
Opsomer, J., Wand, Y. and Yang, Y. (2001). Nonparametric regression with corre-
lated errors. Statistical Science, 16, 134–153.
Panaretos, V. M., Kraus, D. and Maddocks, J. H. (2010). Second-order comparison
of Gaussian random functions and the geometry of DNA minicircles. Journal of
the American Statistical Association, 105, 670–682.
Park, B. U., Kim, T. Y., Park, J-S. and Hwang, S. Y. (2009). Practically applicable
central limit theorem for spatial statistics. Mathematical Geosciences, 41, 555–
569.
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space.
Philosophical Magazine, 2, 559–572.
Percival, D. B. and Walden, A. T. (2000). Wavelet Methods for Time Series Analysis.
Cambridge University Press, Cambridge.
416 References
L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, 419
Springer Series in Statistics 200, DOI 10.1007/978-1-4614-3655-3,
© Springer Science+Business Media New York 2012
420 Index