Functional Data Analysis With R and Matlab
Functional Data Analysis With R and Matlab
Series Editors:
Robert Gentleman Kurt Hornik Giovanni Parmigiani
Spencer Graves
Productive Systems Engineering
751, Emerson Ct.
San Jose, CA 95126
USA
[email protected]
Series Editors:
Robert Gentleman Kurt Hornik Giovanni Parmigiani
Program in Computational Biology Department of Statistik The Sidney Kimmel
Division of Public Health Sciences and Mathematik Comprehensive Cancer Center
Fred Hutchinson Cancer Wirtschaftsuniversität at Johns Hopkins University
Research Center Wien Augasse 2-6 550, North Broadway
1100, Fairview Avenue, N. M2-B876 A-1090 Wien Baltimore, MD 21205-2011
Seattle, Washington 98109 Austria USA
USA
v
vi Preface
level languages, and above all its cost explain its popularity in many fields served
by statisticians, students and new researchers. We hope that we can help many of
our readers to appreciate the strengths of each language, so as to invest wisely later
on. Secondarily, we hope that any user of either language wanting to learn the other
can benefit from seeing the same analyses done in both languages.
As with most books in this useR! series, this is not the place to gain enough
technical knowledge to claim expertise in functional data analysis nor to develop
new tools. But we do hope that some readers will find enough of value here to
want to turn to monographs on functional data analysis already published, such as
Ramsay and Silverman (2005), and to even newer works.
We wish to end this preface by thanking our families, friends, students, employ-
ers, clients and others who have helped make us what we are today and thereby
contributed to this book and to our earlier efforts. In particular, we wish to thank
John Kimmel of Springer for organizing this series and inviting us to create this
book.
James Ramsay, McGill University
Giles Hooker, Cornell University
Spencer Graves, San Jose, CA
Contents
vii
viii Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Chapter 1
Introduction to Functional Data Analysis
The main characteristics of functional data and of functional models are introduced.
Data on the growth of girls illustrate samples of functional observations, and data
on the US nondurable goods manufacturing index are an example of a single long
multilayered functional observation. Data on the gait of children and handwriting
are multivariate functional observations. Functional data analysis also involves esti-
mating functional parameters describing data that are not themselves functional, and
estimating a probability density function for rainfall data is an example. A theme in
functional data analysis is the use of information in derivatives, and examples are
drawn from growth and weather data. The chapter also introduces the important
problem of registration: aligning functional features.
The use of code is not taken up in this chapter, but R code to reproduce virtually
all of the examples (and figures) appears in files ”fdarm-ch01.R” in the ”scripts”
subdirectory of the companion ”fda” package for R, but without extensive explana-
tion in this chapter of why we used a specific command sequence.
Figure 1.1 provides a prototype for the type of data that we shall consider. It shows
the heights of 10 girls measured at a set of 31 ages in the Berkeley Growth Study
(Tuddenham and Snyder, 1954). The ages are not equally spaced; there are four
measurements while the child is one year old, annual measurements from two to
eight years, followed by heights measured biannually. Although great care was taken
in the measurement process, there is an average uncertainty in height values of at
least three millimeters. Even though each record is a finite set of numbers, their
values reflect a smooth variation in height that could be assessed, in principle, as
J.O. Ramsay et al., Functional Data Analysis with R and MATLAB, Use R, 1
DOI: 10.1007/978-0-387-98185-7_1,
© Springer Science + Business Media, LLC 2009
2 1 Introduction to Functional Data Analysis
often as desired, and is therefore a height function. Thus, the data consist of a sample
of 10 functional observations Heighti (t).
180
Height (cm)
140
100
60
2 4 6 8 10 12 14 16 18
Age (years)
Fig. 1.1 The heights of 10 girls measured at 31 ages. The circles indicate the unequally spaced
ages of measurement.
There are features in these data too subtle to see in this type of plot. Figure 1.2
displays the acceleration curves D2 Heighti estimated from these data by Ramsay
et al. (1995a) using a technique discussed in Chapter 5. We use the notation D for
differentiation, as in
d 2 Height
D2 Height = .
dt 2
The pubertal growth spurt shows up as a pulse of strong positive acceleration
followed by sharp negative deceleration. But most records also show a bump at
around six years that is termed the midspurt. We therefore conclude that some of
the variation from curve to curve can be explained at the level of certain derivatives.
The fact that derivatives are of interest is further reason to think of the records as
functions rather than vectors of observations in discrete time.
The ages are not equally spaced, and this affects many of the analyses that might
come to mind if they were. For example, although it might be mildly interesting to
correlate heights at ages 9, 10 and 10.5, this would not take account of the fact that
we expect the correlation for two ages separated by only half a year to be higher
than that for a separation of one year. Indeed, although in this particular example
the ages at which the observations are taken are nominally the same for each girl,
there is no real need for this to be so. In general, the points at which the functions
are observed may well vary from one record to another.
1.1 What Are Functional Data? 3
0
Acceleration(cm yr2)
−1
−2
−3
−4
5 10 15
Age (years)
Fig. 1.2 The estimated accelerations of height for 10 girls, measured in centimeters per year per
year. The heavy dashed line is the cross-sectional mean and is a rather poor summary of the curves.
The replication of these height curves invites an exploration of the ways in which
the curves vary. This is potentially complex. For example, the rapid growth during
puberty is visible in all curves, but both the timing and the intensity of pubertal
growth differ from girl to girl. Some type of principal components analysis would
undoubtedly be helpful, but we must adapt the procedure to take account of the
unequal age spacing and the smoothness of the underlying height functions.
It can be important to separate variation in timing of significant growth events,
such as the pubertal growth spurt, from variation in the intensity of growth. We will
look at this in detail in Chapter 8 where we consider curve registration.
Not all functional data involve independent replications; we often have to work
with a single long record. Figure 1.3 shows an important economic indicator: the
nondurable goods manufacturing index for the United States. Data like these often
show variation as multiple levels.
There is a tendency for the index to show geometric or exponential increase over
the whole century, and plotting the logarithm of the data in Figure 1.4 makes this
trend appear linear while giving us a better picture of other types of variation. At
a finer scale, we see departures from this trend due to the depression, World War
II, the end of the Vietnam War and other more localized events. Moreover, at an
4 1 Introduction to Functional Data Analysis
120
Nondurable Goods Index
80
40
0
1920 1940 1960 1980 2000
Year
Fig. 1.3 The monthly nondurable goods manufacturing index for the United States.
even finer scale, there is a marked annual variation, and we can wonder whether
this seasonal trend itself shows some longer-term changes. Although there are no
independent replications here, there is still a lot of repetition of information that we
can exploit to obtain stable estimates of interesting curve features.
Functional data also arise as input/output pairs, such as in the data in Figure 1.5
collected at an oil refinery in Texas. The amount of a petroleum product at a certain
level in a distillation column or cracking tower, shown in the top panel, reacts to
the change in the flow of a vapor into the tray, shown in the bottom panel, at that
level. How can we characterize this dependency? More generally, what tools can we
devise that will show how a system responds to changes in critical input functions
as well as other covariates?
1.2 Multivariate Functional Data 5
2
Log Nondurable Goods Index
1.6
1.2
10
0.8
Functional data are often multivariate. Our third example is in Figure 1.6. The Mo-
tion Analysis Laboratory at Children’s Hospital, San Diego, CA, collected these
data, which consist of the angles formed by the hip and knee of each of 39 children
over each child’s gait cycle. See Olshen et al. (1989) for full details. Time is mea-
sured in terms of the individual gait cycle, which we have translated into values of
t in [0, 1]. The cycle begins and ends at the point where the heel of the limb under
observation strikes the ground. Both sets of functions are periodic and are plotted as
dotted curves somewhat beyond the interval for clarity. We see that the knee shows
a two-phase process, while the hip motion is single-phase. It is harder to see how
the two joints interact: The figure does not indicate which hip curve is paired with
which knee curve. This example demonstrates the need for graphical ingenuity in
functional data analysis.
Figure 1.7 shows the gait cycle for a single child by plotting knee angle against
hip angle as time progresses round the cycle. The periodic nature of the process
implies that this forms a closed curve. Also shown for reference purposes is the
same relationship for the average across the 39 children. An interesting feature in
this plot is the cusp occurring at the heel strike as the knee momentarily reverses
its extension to absorb the shock. The angular velocity is clearly visible in terms
of the spacing between numbers, and it varies considerably as the cycle proceeds.
6 1 Introduction to Functional Data Analysis
Tray 47 level 3
0
0 20 40 60 80 100 120 140 160 180
0.2
Reflux flow
−0.2
−0.4
Fig. 1.5 The top panel shows 193 measurements of the amount of petroleum product at tray level
47 in a distillation column in an oil refinery. The bottom panel shows the flow of a vapor into that
tray during an experiment.
20 40 60
Hip angle (degrees)
0
Fig. 1.6 The angles in the sagittal plane formed by the hip and knee as 39 children go through a
gait cycle. The interval [0, 1] is a single cycle, and the dotted curves show the periodic extension of
the data beyond either end of the cycle.
1.2 Multivariate Functional Data 7
The child whose gait is represented by the solid curve differs from the average in
two principal ways. First, the portion of the gait pattern in the C–D part of the cycle
shows an exaggeration of movement relative to the average. Second, in the part
of the cycle where the hip is most bent, this bend is markedly less than average;
interestingly, this is not accompanied by any strong effect on the knee angle. The
overall shape of the cycle for this particular child is rather different from the average.
The exploration of variability in these functional data must focus on features such
as these.
80
•
• E
•
Knee angle (degrees)
•
• E
•
60
•
•• •
40
D •
• •
D
•
•
• • •
B
20
• •
•B
• • • •
• •
• • • •••
C •• A••
• •
• C A
0
0 20 40 60
Hip angle (degrees)
Fig. 1.7 Solid line: The angles in the sagittal plane formed by the hip and knee for a single child
plotted against each other. Dotted line: The corresponding plot for the average across children. The
points indicate 20 equally spaced time points in the gait cycle. The letters are plotted at intervals
of one fifth of the cycle with A marking the heel strike.
Multivariate functional data often arise from tracking the movements of points
through space, as illustrated in Figure 1.8, where the X-Y coordinates of 20 samples
of handwriting are superimposed. The role of time is lost in plots such as these, but
can be recovered to some extent by plotting points at regular time intervals.
Figure 1.9 shows the first sample of the writing of “statistical science” in sim-
plified Chinese with gaps corresponding to the pen being lifted off the paper. Also
plotted are points at 120-millisecond intervals; many of these points seem to coin-
cide with points of sharp curvature and the ends of strokes.
8 1 Introduction to Functional Data Analysis
4
2
0
−2
−4
−4 −2 0 2 4
Fig. 1.8 Twenty samples of handwriting. The axis units are centimeters.
0.05
0.00
y
−0.05
Fig. 1.9 The first sample of writing “statistical science” in simplified Chinese. The plotted points
correspond to 120-millisecond time steps.
1.3 Functional Models for Nonfunctional Data 9
Finally, in this introduction to types of functional data, we must not forget that
they may come to our attention as full-blown functions, so that each record may
consist of functions observed, for all practical purposes, everywhere. Sophisticated
online sensing and monitoring equipment now routinely used in research in fields
such as medicine, seismology, meteorology and physiology can record truly func-
tional data.
The data examples above seem to deserve the label “functional” since they so clearly
reflect the smooth curves that we assume generated them. Beyond this, functional
data analysis tools can be used for many data sets that are not so obviously func-
tional.
Consider the problem of estimating a probability density function p to describe
the distribution of a sample of observations x1 , . . . , xn . The classic approach to this
problem is to propose, after considering basic principles and closely studying the
data, a parametric model with values p(x|θ ) defined by a fixed and usually small
number of parameters in the vector θ . For example, we might consider the normal
distribution as appropriate for the data, so that θ = (µ , σ 2 )0 . The parameters them-
selves are usually chosen to be descriptors of the shape of the density, as in location
and spread for the normal density, and are therefore the focus of the analysis.
But suppose that we do not want to assume in advance one of the many textbook
density functions. We may feel, for example, that the application cannot justify the
assumptions required for using any of the standard distributions. Or we may see
features in histograms and other graphical displays that seem not to be captured by
any of the most popular distributions. Nonparametric density estimation methods
assume only smoothness, and permit as much flexibility in the estimated p(x) as the
data require or the data analyst desires. To be sure, parameters are often involved,
as in the density estimation method of Chapter 5, but the number of parameters is
not fixed in advance of the data analysis, and our attention is focused on the density
function p itself, not on parameter estimates. Much of the technology for estimation
of smooth functional parameters was originally developed and honed in the density
estimation context, and Silverman (1986) can be consulted for further details.
Psychometrics or mental test theory also relies heavily on functional models for
seemingly nonfunctional data. The data are usually zeros and ones indicating un-
successful and correct answers to test items, but the model consists of a set of item
response functions, one per test item, displaying the smooth relationship between
the probability of success on an item and a presumed latent ability continuum. Fig-
ure 1.10 shows three such functional parameters for a test of mathematics estimated
by the functional data analytic methods reported in Rossi et al. (2002).
10 1 Introduction to Functional Data Analysis
0 0 0
−2 0 2 −2 0 2 −2 0 2
θ θ θ
Fig. 1.10 Each panel shows an item response function relating an examinee’s position θ on a latent
ability continuum to the probability of a correct response to an item in a mathematics test.
Tempi (t) ≈ ci1 + ci2 sin(π t/6) + ci3 cos(π t/6) (1.1)
should do rather nicely for these data, where Tempi is the temperature function for
the ith weather station, and (ci1 , ci2 , ci3 ) is a vector of three parameters associated
with that station.
In fact, there are clear departures from sinusoidal or simple harmonic behavior.
One way to see this is to compute the function
20 M
M
M
E E
E M
M P P
P P
10 E E
P M
P
M
P
Mean Temperature (deg C)
P E R E P
P R M P
0 P
R
M
E R E
M
−10 M
M E R
E
E
R
−20
R R
Montreal (M)
Edmonton (E) R
−30 Pr. Rupert (P)
R
R R Resolute (R)
j F m A M J J A S O N D
The notation Dm Temp means “take the mth derivative of function Temp,” and the
notation LTemp stands for the function which results from applying the linear dif-
ferential operator L = (π /6)2 D + D3 to the function Temp. The resulting function,
LTemp, is often called a forcing function. If a temperature function is truly sinu-
soidal, then LTemp should be exactly zero, as it would be for any function of the
form (1.1). That is, it would conform to the differential equation
But Figure 1.12 indicates that the functions LTempi display systematic features
that are especially strong in the summer and autumn months. Put another way, tem-
perature at a particular weather station can be described as the solution of the non-
homogeneous differential equation corresponding to LTemp = u, where the forcing
function u can be viewed as input from outside of the system, or as an exogenous
influence. Meteorologists suggest, for example, that these spring and autumn effects
are partly due to the change in the reflectance of land when snow or ice melts, and
this would be consistent with the fact that the least sinusoidal records are associated
with continental stations well separated from large bodies of water.
Here, the point is that we may often find it interesting to remove effects of a sim-
ple character by applying a differential operator, rather than by simply subtracting
them. This exploits the intrinsic smoothness in the process. Long experience in the
natural and engineering sciences suggests that this may get closer to the underlying
driving forces at work than just adding and subtracting effects, as is routinely done
in multivariate data analysis. We will consider this idea in depth in Chapter 11.
12 1 Introduction to Functional Data Analysis
5e−04
L−Temperature
0e+00
−5e−04
Montreal (M)
Edmonton (E)
Pr. Rupert (P)
Resolute (R)
j F m A M J J A S O N D
Fig. 1.12 The result of applying the differential operator L = (π /6)2 D + D3 to the estimated tem-
perature functions in Figure 1.11. If the variation in temperature were purely sinusoidal, these
curves would be exactly zero.
Assuming that a functional datum for replication i arrives as a finite set of mea-
sured values, yi1 , . . . , yin , the first task is to convert these values to a function xi with
values xi (t) computable for any desired argument value t. If these observations are
assumed to be errorless, then the process is interpolation, but if they have some
observational error that needs removing, then the conversion from (finite) data to
functions (which can theoretically be evaluated at an infinite number of points) may
involve smoothing.
Chapter 5 offers a survey of these procedures. The roughness penalty smoothing
method discussed there will be used much more broadly in many contexts through-
out the book, and not merely for the purpose of estimating a function from a set of
observed values. The daily precipitation data for Prince Rupert, one of the wettest
places on the continent, is shown in Figure 1.13. The curve in the figure, which
seems to capture the smooth variation in precipitation, was estimated by penaliz-
ing the squared deviations in harmonic acceleration as measured by the differential
operator (1.2).
The gait data in Figure 1.6 were converted to functions by the simplest of interpo-
lation schemes: joining each pair of adjacent observations by a straight line segment.
This approach would be inadequate if we required derivative information. However,
1.5 First Steps in a Functional Data Analysis 13
Precipitation (mm) 16
12
0
J F M A M J J A S O N D
Fig. 1.13 The points indicate average daily rainfall at Prince Rupert on the northern coast of British
Columbia. The curve was fit to these data using a roughness penalty method.
one might perform a certain amount of smoothing while still respecting the period-
icity of the data by fitting a Fourier series to each record: A constant plus three pairs
of sine and cosine terms does a reasonable job for these data. The growth data in
Figures 1.1, 1.2, and 1.15 were fit using smoothing splines. The temperature data
in Figure 1.11 were fit smoothing a finite Fourier series. This more sophisticated
technique can also provide high-quality derivative information.
There are often conceptual constraints on the functions that we estimate. For
example, a smooth of precipitation such as that in Figure 1.13 should logically never
be negative. There is no danger of this happening for a station as moist as Prince
Rupert, but a smooth of the data in Resolute, the driest place that we have data for,
can easily violate this constraint. The growth curve fits should be strictly increasing,
and we shall see that imposing this constraint results in a rather better estimate of the
acceleration curves that we saw in Figure 1.2. Chapter 5 shows how to fit a variety
of constrained functions to data.
Figure 1.14 shows some biomechanical data. The curves in the figure are 20 records
of the force exerted on a meter during a brief pinch by the thumb and forefinger.
The subject was required to maintain a certain background force on a force meter
and then to squeeze the meter aiming at a specified maximum value, returning af-
14 1 Introduction to Functional Data Analysis
12
10
8
Force (N)
6
4
2
Seconds
Fig. 1.14 Twenty recordings of the force exerted by the thumb and forefinger where a constant
background force of 2 newtons was maintained prior to a brief impulse targeted to reach 10 new-
tons. Force was sampled 500 times per second.
terwards to the background level. The purpose of the experiment was to study the
neurophysiology of the thumb–forefinger muscle group. The data were collected
at the MRC Applied Psychology Unit, Cambridge, by R. Flanagan (Ramsay et al.
1995b).
These data illustrate a common problem in functional data analysis. The start of
the pinch is located arbitrarily in time, and a first step is to align the records by
some shift of the time axis. In Chapter 8 we take up the question of how to estimate
this shift and how to go further if necessary to estimate record-specific linear or
nonlinear transformations of the argument.
Displaying the results of a functional data analysis can be a challenge. With the gait
data in Figures 1.6 and 1.7, we have already seen that different displays of data
can bring out different features of interest, and the standard plot of x(t) against t is
not necessarily the most informative. It is impossible to be prescriptive about the
best type of plot for a given set of data or procedure, but we shall give illustrations
1.5 First Steps in a Functional Data Analysis 15
of various ways of plotting the results. These are intended to stimulate the reader’s
imagination rather than to lay down rigid rules.
1
Acceleration (cm/yr2)
−1
−2
−3
−4
−5
0 2 4 6 8 10 12
Velocity (cm/yr)
Fig. 1.15 The second derivative or acceleration curves are plotted against the first derivative or
velocity curves for the ten female growth curves in Figure 1.1. Each curve begins in time off the
lower right with the strong velocity and deceleration of infant growth. The velocities and acceler-
ations at age 11.7 years, the average age of the middle of the growth spurt, are marked on each
curve by circles. The curve is highlighted by a heavy dashed line is that of a girl who goes through
puberty at the average age.
16 1 Introduction to Functional Data Analysis
The examples considered so far offer a glimpse of ways in which the variability of
a set of functional data can be interesting, but there is a need for more detailed and
sophisticated ways of investigating variability. These are a major theme of this book.
Any data analysis begins with the basics: estimating means and standard deviations.
Functional versions of these elementary statistics are given in Chapter 7. But what
is elementary for univariate and classic multivariate data turns out to be not always
so simple for functional data. Chapter 8 returns to the functional data summary
problem, and shows that curve registration or feature alignment may have to be
applied in order to separate amplitude variation from phase variation before these
statistics are used.
Most sets of data display a small number of dominant or substantial modes of vari-
ation, even after subtracting the mean function from each observation. An approach
to identifying and exploring these, set out in Chapter 7, is to adapt the classic mul-
tivariate procedure of principal components analysis to functional data. Techniques
of smoothing are incorporated into the functional principal components analysis
itself, thereby demonstrating that smoothing methods have a far wider rôle in func-
tional data analysis than merely in the initial step of converting a finite number of
observations to functional form.
1.7 Functional Linear Models 17
How do two or more sets of records covary or depend on one another? While study-
ing Figure 1.7, we might consider how correlations embedded in the record-to-
record variations in hip and knee angles might be profitably examined and used
to further our understanding the biomechanics of walking.
The functional linear modeling framework approaches this question by consid-
ering one of the sets of functional observations as a covariate and the other as a
response variable. In many cases, however, it does not seem reasonable to impose
this kind of asymmetry. We shall develop two rather different methods that treat
both sets of variables in an even-handed way. One method essentially treats the
pair (Hipi , Kneei ) as a single vector-valued function, and then extends the func-
tional principal components approach to perform an analysis. Chapter 7 takes an-
other approach, a functional version of canonical correlation analysis, identifying
components of variability in each of the two sets of observations which are highly
correlated with one another.
For many of the methods we discuss, a naı̈ve approach extending the classic mul-
tivariate method will usually give reasonable results, though regularization will of-
ten improve these. However, when a linear predictor is based on a functional obser-
vation, and also in functional canonical correlation analysis, imposing smoothness
on functional regression coefficients is not an optional extra, but rather an intrinsic
and necessary part of the analysis; the reasons are discussed in Chapters 7 and 8.
The techniques of linear regression, analysis of variance, and linear modeling all
investigate the way in which variability in observed data can be accounted for by
other known or observed variables. They can all be placed within the framework of
the linear model
y = Zβ + ε (1.3)
where, in the simplest case, y is typically a vector of observations, β is a parameter
vector, Z is a matrix that defines a linear transformation from parameter space to
observation space, and ε is an error vector with mean zero. The design matrix Z
incorporates observed covariates or independent variables.
To extend these ideas to the functional context, we retain the basic structure (1.3)
but allow more general interpretations of the symbols within it. For example, we
might ask of the Canadian weather data:
• If each weather station is broadly categorized as being Atlantic, Pacific, Conti-
nental or Arctic, in what way does the geographical category characterize the de-
tailed temperature profile Temp and account for the different profiles observed?
In Chapter 10 we introduce a functional analysis of variance methodology, where
18 1 Introduction to Functional Data Analysis
both the parameters and the observations become functions, but the matrix Z re-
mains the same as in the classic multivariate case.
• Could a temperature record Temp be used to predict the logarithm of total an-
nual precipitation? In Chapter 9 we extend the idea of linear regression to the
case where the independent variable, or covariate, is a function, but the response
variable (log total annual precipitation in this case) is not.
• Can the temperature record Temp be used as a predictor of the entire precipitation
profile, not merely the total precipitation? This requires a fully functional linear
model, where all the terms in the model have more general form than in the
classic case. This topic is considered in Chapter 10.
• We considered earlier the many roles that derivatives play in functional data anal-
ysis. In the functional linear model, we may use derivatives as dependent and
independent variables. Chapter 10 is a first look at this idea, and sets the stage
for the following chapters on differential equations.
In Section 1.4 we have already had a taste of the ways in which derivatives and linear
differential operators are useful in functional data analysis. The use of derivatives
is important both in extending the range of simple graphical exploratory methods
and in the development of more detailed methodology. This is a theme that will
be explored in much more detail in Chapter 11, but some preliminary discussion is
appropriate here.
Chapter 11 takes up the question, unique to functional data analysis, of how
to use derivative information in studying components of variation. An approach
called principal differential analysis identifies important variance components by
estimating a linear differential operator that will annihilate them (if the model is
adequate). Linear differential operators, whether estimated from data or constructed
from external modeling considerations, also play an important part in developing
regularization methods more general than those in common use.
data in a new way but more modest in that the methods described in this book are
hardly the last word in how to approach any particular problems. We believe that
readers will gain more by experimenting with various modifications of the principles
described herein than by following any suggestion to the letter. To make this easier,
script files like ”fdarm-ch01.R” in the ”scripts” subdirectory of the companion ”fda”
package for R can be copied and revised to test understanding of the concepts. The
”debug” function in R allows a user to walk through standard R code line by line
with real examples until any desired level of understanding is achieved.
For those who would like access to the software that we have used, a selection is
available on the website:
http://www.functionaldata.org
and in the fda package in R. This website will also be used to publicize related and
future work by the authors and others, and to make available the data sets referred
to in the book that we are permitted to release publicly.
In this and subsequent chapters, we suggest some simple exercises that you might
consider trying.
1. Find some samples of functional data and plot them. Make a short list of ques-
tions that you have about the processes generating the data. If you do not have
some data laying around in a file somewhere, here are some suggestions:
a. Use your credit card or debit/bank card transactions in your last statement. If
you keep your statements or maintain an electronic record, consider entering
also the statements for five or so previous months or even for the same month
last year.
b. Bend over and try to touch your toes. Please do not strain! Have someone
measure how far your fingers are from the floor (or your wrist if you are that
flexible). Now inhale and exhale slowly. Remeasure and repeat for a series of
breath cycles. Now repeat the exercise, but for the person doing the measuring.
c. Visit some woods and count the number of birds that you see or the number
of varieties. Do this for a series of visits, spread over a day or a week. If over
a week, record the temperature and cloud and precipitation status as well.
d. Visit a weather website, and record the five-day temperature forecasts for a
number of cities.
e. If you have a chronic concern like allergies, brainstorm a list of terms to de-
scribe the severity of the condition, sort the terms from mild to severe and
assign numbers to them. Also brainstorm a list of possible contributing fac-
tors and develop a scale for translating variations in each contributing factor
into numbers. Each day record the level of the condition and each potential
contributing factor. One of us solved a serious allergy problem doing this.
Chapter 2
Essential Comparisons of the Matlab and R
Languages
There are similarities and differences in the syntax for Matlab and R.
Here is a quick list of the more commonly occurring differences so that you easily
translate a command in one language in that in the other:
• Your code will be easier to read if function names describe what the function
does. This often produces a preference for names with words strung together.
This is often done in Matlab by connecting words or character strings with under-
scores like create fourier basis. This is also acceptable in R. However,
it is not used that often, because previous versions of R (and S-Plus) accepted an
underscore as a replacement operator. Names in R are more likely to use dots or
periods to separate strings, as in create.fourier.basis used below.
J.O. Ramsay et al., Functional Data Analysis with R and MATLAB, Use R, 21
DOI: 10.1007/978-0-387-98185-7_2,
© Springer Science + Business Media, LLC 2009
22 2 Essential Comparisons of the Matlab and R Languages
The ways in which arguments are passed to functions and computed results returned
is, unfortunately, different in the two languages. We can illustrate the differences by
the ways in which we use the important smoothing function, smooth.basis in R
and smooth basis in Matlab. Here is a full function call in R:
smoothlist = smooth.basis(argvals, y, fdParobj,
wtvec, fdnames)
and here is the Matlab counterpart:
[fdobj, df, gcv, coef, SSE, penmat, y2cMap] = ...
smooth_basis(argvals, y, fdParobj, wtvec, fdnames);
An R function outputs only a single object, so that if multiple objects need to
be returned, as in this example, then R returns them within a list object. But Matlab
returns its outputs as a set of variable(s); if more than one, their names are contained
within square brackets.
The handy R feature of being able to use argument names to provide any subset
of arguments in any order does not exist in Matlab. Matlab function calls require
the arguments in a rigid order, though only a subsequence of leading arguments can
be supplied. The same is true of the outputs. Consequently, Matlab programmers
position essential arguments and returned objects first.
For example, most of the time we just need three arguments and a single output
for smooth.basis and its Matlab counterpart, so that a simpler R call might be
myfdobj = smooth.basis(argvals, y, fdParobj)$fd
and the Matlab version would be
myfdobj = smooth_basis(argvals, y, fdParobj);
Here R gets around the fact that it can only return a single object by returning a
list and using the $fd suffix to select from that list the object required. Matlab just
returns the single object. If we want the third output gcv, we could get that in R by
replacing fd with gcv; in Matlab, we need to provide explicit names for undesired
outputs as, [fdobj, df, gcv] in this example. R also has the advantage of
being able to change the order of arguments by a call like
myfdobj = smooth.basis(y=yvec, argvals=tvec,
fdParobj)$fd
In order to keep things simple, we will try keep the function calls as similar as
possible in the examples in this book.
24 2 Essential Comparisons of the Matlab and R Languages
The default behavior in matrices and arrays with a singleton dimension is exactly
the opposite between R and Matlab: R drops apparently redundant dimensions, com-
pressing a matrix to a vector or an array to a matrix or vector. Matlab does not.
For example, temp = matrix(c(1,2,3,4),2,2) sets up a 2 × 2 ma-
trix in R, and class(temp) tells us this is a "matrix". However, class(
temp[,1]) yields "numeric", which says that temp[,1] is no longer a ma-
trix. If you want a matrix from this operation, use temp[,1, drop=FALSE].
This can have unfortunate consequences in that an operation that expects temp[,
index] to be a matrix will work when length(index) > 1 but may throw
an error when length(index) = 1. If A is a three-dimensional array, A1 =
A[,1,] will be a matrix provided the first and third dimensions of A both have
multiple levels. If this is in doubt, dim(A1) = dim(A)[-2] will ensure that A
is a matrix, not a vector as it would be if the first or third dimensions of A were
singleton.
Matlab has the complementary problem. An array with a single index, as in temp
= myarray(:,1,:), is still an array with the same number of dimensions. If you
want to multiply this by a matrix or plot its columns, the squeeze() function will
eliminate unwanted singleton dimensions. In other words, squeeze(temp) is a
matrix, as long as only one of the three dimension of temp is a singleton.
A user who does not understand these issues in R or Matlab can lose much time
programming around problems that are otherwise easily handled.
Our code uses object-oriented programming, which brings great simplicity to the
use of some of the functions. For example, we can use the plot command in either
language to create specialized graphics tailored to the type of object being plotted,
e.g., for basis function systems or functional data objects, as we shall see in the next
chapter.
The notion of a class is built on the more primitive notion of a list object in R and
its counterpart in Matlab, a struct object. Lists and structs are used to group together
types of information with different internal characteristics. For example, we might
want to combine a vector of numbers with a fairly lengthy name or string that can be
used as a title for plots. The vector of numbers is a numeric object in R or a double
object in Matlab, while the title string is a character object in R and a char object
in Matlab.
Once we have this capacity of grouping together things with arbitrary proper-
ties, it is an easy additional step to define a class as a specific recipe or predefined
combination of types of information, along with a name to identify the name of the
recipe. For example, in the next chapter we will define the all-important class fd as,
minimally, a coefficient array combined with a recipe for a set of basis functions.
2.3 Classes and Objects in R and Matlab 25
getcoef(fdobj) that extracts the coefficient array from object fdobj of the fd
class. Similarly, a coefficient array can be inserted into a Matlab fd object with the
command fdobj = putcoef(fdobj, coefmat). In Matlab, all the extrac-
tion functions associated with a class can be accessed by the command methods
myclass.
The procedure for extracting coefficients from an R object depends on the class
of the object. If obj is an object of class fd, fdPar or fdSmooth, coef(obj)
will return the desired coefficients. (The fd class is discussed in Chapter 4, and the
fdPar and fdSmooth classes are discussed in Chapter 5.) This is quite useful,
because without this generic function, a user must know more about the internal
structure of the object to get the desired coefficients. If obj has class fd, then
obj$coefs is equivalent to coef(obj). However, if obj is of class fdPar or
fdSmooth, then obj$coefs will return NULL; obj$fd$coefs will return the
desired coefficients.
As of this writing, a “method” has not yet been written for the generic coef
function for objects of class monfd, returned by the smooth.monotone function
discussed in Section 5.4.2. If obj has that class, it is not clear what a user might
want, because it has two different types of coefficients: obj$Wfdobj$coefs give
the coefficients of a functional data object that is exponentiated to produce some-
thing that is always positive and integrated to produce a nondecreasing function.
This is then shifted and scaled by other coefficients in obj$beta to produce the
desired monotonic function. In this case, the structure of objects of class monfd is
described in the help page for the smooth.monotone function. However, we can
also get this information using str(obj), which will work for many other objects
regardless of the availability of a suitable help page.
To find the classes for which methods have been written for a particular generic
function like coef, use methods(’coef’). Conversely, to find generic func-
tions for which methods of a particular class have been written, use, e.g., methods(
class=’fd’). Unfortunately, neither of these approaches is guaranteed to find
everything, in part because of “inheritance’ of classes, which is beyond the scope
of the present discussion. For more on methods in R, see Appendix A in Chambers
and Hastie (1991).
is in addition to documentation beyond help that comes with the standard R in-
stallation available from help.start(), which opens a browser with additional
documentation on the language, managing installation, and “Writing R Extensions.”
Chapter 3
How to Specify Basis Systems for Building
Functions
We need to work with functions with features that may be both unpredictable and
complicated. Consequently, we require a strategy for constructing functions that
works with parameters that are easy to estimate and that can accommodate nearly
any curve feature, no matter how localized. On the other hand, we do not want to use
more parameters than we need, since doing so would greatly increase computation
time and complicate our analyses in many other ways as well.
We use a set of functional building blocks φk , k = 1, . . . , K called basis functions,
which are combined linearly. That is, a function x(t) defined in this way is expressed
in mathematical notation as
J.O. Ramsay et al., Functional Data Analysis with R and MATLAB, Use R, 29
DOI: 10.1007/978-0-387-98185-7_3,
© Springer Science + Business Media, LLC 2009
30 3 How to Specify Basis Systems for Building Functions
K
x(t) = ∑ ck φk (t) = c0 φ (t), (3.1)
k=1
and called a basis function expansion. The parameters c1 , c2 , . . . , cK are the coef-
ficients of the expansion. The matrix expression in the last term of (3.1) uses c to
stand for the vector of K coefficients and φ to denote a vector of length K containing
the basis functions.
We often want to consider a sample of N functions, xi (t) = ∑Kk=1 cik φk (t), i =
1, . . . , N, and in this case matrix notation for (3.1) becomes
where x(t) is a vector of length N containing the functions xi (t), and the coefficient
matrix C has N rows K columns.
Two brief asides on notation are in order here. We often need to distinguish be-
tween referring to a function in a general sense and referring to its value at a spe-
cific argument value t. Expression (3.1) refers to the basis function expansions of
the value of function x at argument value t, but the expansion of x is better written
as
K
x= ∑ ck φk = c0 φ . (3.3)
k=1
We will want to indicate the result of taking the mth derivative of a function x,
and we will often refer to the first derivative, m = 1, as the velocity of x and to the
second derivative, m = 2, as its acceleration. No doubt readers will be familiar with
the notation
dx d 2 x dmx
, 2 ,..., m
dt dt dt
used in introductory calculus courses. In order to avoid using ratios in text, and for
a number of other reasons, we rather prefer the notation Dx and D2 x for the velocity
and acceleration of x, and so on. The notation can also be extended to zero and
negative values of m, since D0 x = x and D−1 x refers to the indefinite integral of x
from some unspecified origin.
The√notion of a basis system is hardly new; a polynomial such as x(t) = 18t 4 −
3 2
2t + 17t + π /2 is just such a linear combination of the monomial basis functions
√
1,t,t 2 ,t 3 , and t 4 with coefficients π /2, 0, 17, −2, and 18, respectively. Within the
monomial basis system, the single basis function 1 is often needed by itself, and we
call it the constant basis system.
But polynomials are of limited usefulness when complex functional shapes are
required. Therefore we do most of our heavy lifting with two basis systems: splines
and Fourier series. These two systems often need to be supplemented by the con-
stant and monomial basis systems. These four systems can deal with most of the
applied problems that we are see in practice.
For each basis system we need a function in either R or Matlab to define a specific
set of K basis functions φk ’s. These are the create functions. Here are the calling
statements of the create functions in R that set up constant, monomial, Fourier
3.1 Basis Function Systems for Constructing Functions 31
and spline basis systems, omitting arguments that tend only to be used now and then
as well as default values:
basisobj = create.constant.basis(rangeval)
basisobj = create.monomial.basis(rangeval, nbasis)
basisobj = create.fourier.basis(rangeval, nbasis,
period)
basisobj = create.bspline.basis(rangeval, nbasis,
norder, breaks)
We will take each of these functions up in detail below, where we will explain the
roles of the arguments. The Matlab counterparts of these create functions are:
basisobj = create_constant_basis(rangeval);
basisobj = create_monomial_basis(rangeval, nbasis);
basisobj = create_fourier_basis(rangeval, nbasis, ...
period);
basisobj = create_bspline_basis(rangeval, nbasis, ...
norder, breaks);
In either language, the specific basis system that we set up, named in these com-
mands as basisobj, is said to be a functional basis object with the class name
basis (Matlab) or basisfd(R). Fortunately, users rarely need to worry about the
difference in class name between Matlab and R, as they rarely need to specify the
class name directly in either language.
However, we see that the first argument rangeval is required in each create
function. This argument specifies the lower and upper limits of the values of argu-
ment t and is a vector object of length 2. For example, if we need to define a basis
over the unit interval [0, 1], we would use a statement like rangeval = c(0,1)
in R or rangeval = [0,1] in Matlab.
The second argument nbasis specifies the number K of basis functions. It does
not appear in the constant basis call because it is automatically 1.
Either language can use the class name associated with the object to select the
right kind of function for operations such as plotting or to check that the object is
appropriate for the task at hand. You will see many examples of this in the examples
that we provide.
We will defer a more detailed discussion of the structure of the basis or
basisfd class to the end of this chapter since this information will only tend to
matter in relatively advanced uses of either language, and we will not, ourselves,
use this information in our examples.
We will now look at the nature of each basis system in turn, beginning with the
only mildly complicated Fourier basis. Then we will discuss the more challeng-
ing B-spline basis. That will be followed by more limited remarks on constant and
monomial bases. Finally, we will mention only briefly a few other basis systems that
are occasionally useful.
32 3 How to Specify Basis Systems for Building Functions
Many functions are required to repeat themselves over a certain period T , as would
be required for expressing seasonal trend in a long time series. The Fourier series is
φ1 (t) = 1
φ2 (t) = sin(ω t)
φ3 (t) = cos(ω t)
φ4 (t) = sin(2ω t)
φ5 (t) = cos(2ω t)
..
. (3.4)
ω = 2π /T.
We see that, after the first constant basis function, Fourier basis functions are ar-
ranged in successive sine/cosine pairs, with both arguments within any pair being
multiplied by one of the integers 1, 2, . . . up to some upper limit m. If the series
contains both elements of each pair, as is usual, the number of basis functions is
K = 1 + 2m. Because of how we define ω , each basis function repeats itself after T
time units have elapsed.
Only two pieces of information are required to define a Fourier basis system:
• the number of basis functions K and
• the period T ,
but the second value T can often default to the range of t values spanned by the
data. We will use a Fourier basis in the next chapter to smooth daily temperature
data. The following commands set up a Fourier basis with K = 65 basis functions
in R and Matlab with a period of 365 days:
daybasis65 = create.fourier.basis(c(0,365), 65)
daybasis65 = create_fourier_basis([0,365], 65);
Note that these function calls use the default of T = 365, but if we wanted to specify
some other period T , we would use
create.fourier.basis(c(0,365), 65, T)
in R.
In either language, if K is even, the create functions for Fourier series add on
the missing cosine and set K = K + 1. When this leads to more basis functions than
values to be fit, the code takes steps to avoid singularity problems.
There are situations where periodic functions are defined in terms of only sines or
only cosines. For example, a pure sine series will define functions that have the value
0 at the boundary values 0 and T , while a pure cosine series will define functions
3.3 Spline Series for Nonperiodic Data and Functions 33
with zero derivatives at these points. Bases of this nature can be set up by selecting
only the appropriate terms in the series by either subscripting the basis object or by
using a component of the class called dropind that contains a vector of indices
of basis functions to remove from the final series. For example, if we wanted to set
up a Fourier basis for functions centered on zero, we would want to not include the
initial constant term, and this could be achieved by either a command like
zerobasis = create.fourier.basis(rangeval, nbasis,
dropind=1)
or, using a basis object that has already been created, by something like
zerobasis = daybasis65[2:65]
Here is the complete calling sequence in R for the create.fourier.basis
in R:
create.fourier.basis(rangeval=c(0, 1), nbasis=3,
period=diff(rangeval), dropind=NULL, quadvals=NULL,
values=NULL, basisvalues=NULL, names=NULL,
axes=NULL)
A detailed description of the use of the function can be obtained by the command
help(create.fourier.basis) or ?create.fourier.basis
help create_fourier_basis or doc create_fourier_basis
in R and Matlab, respectively.
Splines are piecewise polynomials. Spline bases are more flexible and therefore
more complicated than finite Fourier series. They are defined by the range of va-
lidity, the knots, and the order. There are many different kinds of splines. In this
section, we consider only B-splines.
We will assume in this book that the order of the polynomial segments is the same
for each subinterval.
A spline basis is actually defined in terms of a set of knots. These are related to
the break points in the sense that every knot has the same value as a break point, but
there may be multiple knots at certain break points.
At each break point, neighboring polynomials are constrained to have a certain
number of matching derivatives. The number of derivatives that must match is de-
termined by the number of knots positioned at that break point. If only one knot is
positioned at a break point, the number of matching derivatives (including the func-
tion value itself) is two less than its order, which ensures that for splines of more
than order two the join will be seen to be smooth. This is because a function com-
posed of straight line segments of order two will have only the function value (the
derivative or order 0) matching, so the function is continuous but its slope is not;
this means that the joins would not be seen as smooth by most standards.
Order four splines are often used, consisting of cubic polynomial segments (degree
three), and the single knot per break point makes the function values and first and
second derivative values match.
By default, and in the large majority of applications, there will be only a single
knot at every break point except for the boundary values at each end of the whole
range of t. The end points, however, are assigned as many knots as the order of the
spline, implying that the function value will, typically, drop to zero outside of the
interval over which the function is defined.
3.3.3 Examples
You will not have to worry about those multiple knots at the end points; the code
takes care of this automatically. You will be typically constructing spline functions
where you will only have to supply break points, and if these break points are equally
spaced, you will not even have to supply these.
To summarize, spline basis systems are defined by the following:
• the break points defining subintervals,
• the degree or order of the polynomial segments, and
• the sequence of knots.
The number K of basis functions in a spline basis system is determined by the
relation
By interior here we mean only knots that are placed at break points which are not
either at the beginning or end of the domain of definition of the function. In the knot
sequence examples above, that would mean only knots positioned at 0.5.
3.3.4 B-Splines
Within this framework, however, there are several different basis systems for con-
structing spline functions. We use the most popular, namely the B-spline basis sys-
tem. Other possibilities are M-splines, I-splines, and truncated power functions.
For a more extensive discussion of splines, see, e.g, de Boor (2001) or Schumaker
(1981).
Figure 3.1 shows the 13 order four B-splines corresponding to nine equally
spaced interior knots over the interval [0, 10], constructed in R by the command
splinebasis = create.bspline.basis(c(0,10), 13)
or, as we indicated in Chapter 2, by the Matlab command
splinebasis = create_bspline_basis([0,10], 13);
Figure 3.1 results from executing the command plot(splinebasis).
Aside from the two end basis function, each basis function begins at zero and,
at a certain knot location, rises to a peak before falling back to zero and remaining
there until the right boundary. The first and last basis functions rise from the first
and last interior knot to a value of one on the right and left boundary, respectively,
but are otherwise zero. Basis functions in the center are positive only over four
intervals, but the second and third basis functions, along with their counterparts
on the right, are positive over two and three intervals, respectively. That is, all B-
spline basis functions are positive over at most four adjacent intervals. This compact
support property is important for computational efficiency since the effort required
is proportional to K as a consequence, rather than to K 2 for basis functions not
having this property.
36 3 How to Specify Basis Systems for Building Functions
0.8
Bspline basis functions B(t)
0.6
0.4
0.2
0
0 2 4 6 8 10
t
Fig. 3.1 The 13 spline basis functions defined over the interval [0,10] by nine interior boundaries
or knots. The polynomial segments are cubic or order four polynomials, and at each knot the
polynomial values and their first two derivatives are required to match.
The role of the order of a spline is illustrated in Figure 3.2, where we have plotted
linear combinations of spline basis functions of orders two, three and four, called
spline functions, that best fit a sine function and its first derivative. The three R
commands that set up these basis systems are
basis2 = create.bspline.basis(c(0,2*pi), 5, 2)
basis3 = create.bspline.basis(c(0,2*pi), 6, 3)
basis4 = create.bspline.basis(c(0,2*pi), 7, 4)
Recall from relation (3.5) that, using three interior knots in each case, we increase
the number of basis functions each time that we increase the order of the spline
basis.
We see in the upper left panel the order two spline function, a polygon, that best
fits the sine function, and we see how poorly its derivative, a step function, fits the
sine’s derivative in the left panel. As we increase order, going down the panels, we
see that the fit to both the sine and its derivative improves, as well as the smoothness
of these two fits. In general, if we need smooth and accurate derivatives, we need to
increase the order of the spline. A useful rule to remember is to fix the order of the
spline basis to be at least two higher than the highest order derivative to be used.
By this rule, a cubic spline basis is a good choice as long as you do not need to look
at any of its derivatives.
3.3 Spline Series for Nonperiodic Data and Functions 37
sine(t) D sine(t)
Order = 2 1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
0 2 4 6 0 2 4 6
1 1
Order = 3
0.5 0.5
0 0
−0.5 −0.5
−1 −1
0 2 4 6 0 2 4 6
1 1
Order = 4
0.5 0.5
0 0
−0.5 −0.5
−1 −1
0 2 4 6 0 2 4 6
t t
Fig. 3.2 In the left panels, the solid line indicates the spline function of a particular order that fits
the sine function shown as a dashed line. In the right panels, the corresponding fits to its derivative,
a cosine function, are shown. The vertical dotted lines are the interior knots defining the splines.
coefficient multiplying any basis function is approximately equal to the value of the
spline function near where that function peaks. Indeed, this is exactly true at the
boundaries.
Although spline basis functions are wonderful in many respects, they tend to pro-
duce rather unstable fits to the data near the beginning or the end of the interval over
which they are defined. This is because in these regions we run out of data to define
them, so at the boundaries the spline function values are entirely determined by a
single coefficient. This boundary instability of spline fits becomes especially seri-
ous for derivative estimation, and the higher the order of the derivative, the wilder
its behavior tends to be at the two boundaries. However, a Fourier series does not
have this problem because it is periodic; in essence, the data on the right effectively
“wrap around” to help estimate the curve at the left and vice versa.
Let us set up a spline basis for fitting the growth data by the methods that we will
use in Chapter 5. We will want smooth second derivatives, so we will use order six
splines. There are 31 ages for height measurements in the data, ranging from 1 to
18, and we want to position a knot at each of these sampling points. Relation (3.5)
indicates that the number of basis function is 29+6 = 35. If the ages of measurement
are in vector age, then the command that will set up the growth basis in Matlab is
heightbasis = create_bspline_basis([1,18], 35,6,age);
As with the Fourier basis, we can select subsets of B-spline basis functions to
define a basis by either dropping basis functions using the dropind argument or
by selecting those basis functions that we want by using subscripts.
Here is the complete calling sequence in R for the create.bspline.basis
in R:
create.bspline.basis(rangeval=NULL, nbasis=NULL,
norder=4, breaks=NULL, dropind=NULL, quadvals=NULL,
values=NULL, basisvalues=NULL,
names="bspl", axes=NULL)
A detailed description of the use of the function can be obtained by the commands
help(create.bspline.basis)
help create_bspline_basis
in R and Matlab, respectively.
We conclude this section with a tip than can be important if you are using large
numbers of spline basis functions. As with any calculation on a computer where the
accuracy of results is limited by the number of bits used to express a value, some
accuracy can be lost along the way. This can occasionally become serious. A spline
is constructed by computing a series of differences. These are especially prone to
3.4 Constant, Monomial and Other Bases 39
rounding errors when the values being differenced are close together. To avoid this,
you may need to redefine t so that the length of each subinterval is roughly equal to
one. For the gait data example shown in Figure 1.6, where we would construct 23
basis functions if we placed a knot at each time of observation, it would be better, in
fact, to run time from 0 to 20 than from 0 to 1 as shown. The handwriting example
is even more critical, and by changing the time unit from seconds to milliseconds,
we can avoid a substantial amount of rounding error.
On the other hand, computations involving Fourier basis functions tend to be
more accurate and stable if the interval [0, T ] is not too different from [0, 2π ]. We
have encountered computational issues, for example, in analyses of the weather data
when we worked with [0, 365]. Once results have been obtained, it is usually a sim-
ple matter to rescale them for plotting purposes to a more natural interval.
Different situations call for different basis systems. One such case leads to the sim-
plest basis system. This is the constant basis, which contains only a single function
whose value is equal to one no matter what value of t is involved. We need the
constant basis surprisingly often. For example, we will see in functional regression
and elsewhere that we might need to compare an analysis using an unconstrained
time-varying function (represented by a functional data or functional parameter ob-
ject discussed in Chapters 4 and 5, respectively) with a comparable analysis using a
constant. We can also convert a conventional scalar variable into functional form by
using the values of that variable as coefficients multiplying the constant basis.
The constant basis over, say [0, 1], is constructed in R by
conbasis = create.constant.basis(c(0,1))
conbasis = create_constant_basis([0,1]);
in R and Matlab, respectively.
Simple trends in data are often fit by straight lines, quadratic polynomials, and so
on. Polynomial regression is a topic found in most texts on the linear model or
regression analysis, and is, along with Fourier analysis, a form of functional data
analysis that has been used in statistics for a long time. As with constant functions,
these may often serve as benchmark or reference functions against which spline-
based functions are compared.
40 3 How to Specify Basis Systems for Building Functions
The basis functions in a monomial basis are the successive powers of t : 1,t,t 2 ,t 3
and so on. The number of basis functions is one more than the highest power in the
sequence. No parameters other than the interval over which the basis is defined are
needed. A basis for cubic polynomials is defined over [0, 1] in R by
monbasis = create.monomial.basis(c(0,1), 4)
monbasis = create_monomial_basis([0,1], 4);
Be warned that beyond nbasis = 7, the monomial basis system functions be-
come so highly correlated with each other that near singularity conditions can arise.
Common tasks like plot are called generic functions, for which methods
are written for object of different classes; see Section 2.3. In R, to see a list of generic
functions available for basis objects, use methods(class=’basisfd’).
Once a basis object is set up, we would like to use some of these generic
functions via methods written for objects of class basisfd in R or basis
3.5 Methods for Functional Basis Objects 41
in Matlab. Some of the most commonly used generic functions with methods for
functional basis objects are listed here. Others requiring more detailed treatment
are discussed later. The R function is shown first and the Matlab version second,
separated by a /.
In R, the actual name of the function has the suffix .basisfd, but the function
is usually used with its initial generic part only, though you may see some exceptions
to this general rule. That is, one types print(basisobj) to display the structure
of functional basis object basisobj, even though the actual name of the function
is print.basisfd. In Matlab, however, the complete function name is required.
print/display The type, range, number of basis functions, and parameters
of the functional basis object are displayed. Function print is used in R and
display in Matlab. These are invoked if the object name is typed (without a
semicolon in Matlab).
summary A more compact display of the structure of the basis object.
==/eq The equality of two functions is tested, and a logical value returned, as in
basis1 == basis2 in R or eq(basis1,basis2) in Matlab.
is/isa basis Returns a logical value indicating whether the object is a func-
tional basis object. In R the function inherits is similar.
In R, we can extract or insert/replace a component of a basis object, such as its
params vector, by using the component name preceded by $, as in basisobj$
params. This is a standard R protocol for accessing components of a list. In Matlab,
there is a separate function for each component to be extracted. Not all components
of an object can be changed safely; some component values interlock with others to
define the object, and if you change these, you may later get a cryptic error message
or (worse) erroneous results. But for those less critical components, which include
container components dropind, quadvals, basisvalues and values, the
R procedure is simple. The object name with the $ suffix appears on the left side of
the assignment operator. In Matlab, each reasonable replacement operation has its
own function, beginning with put. The first argument in the function is the name
of the basis object, and the second argument is the object to be extracted or inserted.
The names of these extractor and insertion functions are displayed in Table 3.1 in
Section 3.6.
It is often handy to set up a matrix of basis function values, say for some spe-
cialized plotting operation or as an input into a regression analysis. To this end, we
have the basis evaluation functions
basismatrix = eval.basis(tvec, mybasis)
basismatrix = eval_basis(tvec, mybasis)
where argument tvec is a vector of n argument values within the range used to
define the basis, and argument mybasis is the name of the basis system that you
have created. The resulting basismatrix is n by K. One can also compute the
derivatives of the basis functions by adding a third argument that specifies the degree
of the derivative, as in
42 3 How to Specify Basis Systems for Building Functions
All basis objects share a common structure, and all of the create functions are
designed to make the call to the function basisfd in R or basis in Matlab more
convenient. Functions like these two that set up objects of a specific class are called
constructor functions. The complete calling sequence for basisfd in R is
basisfd(type, rangeval, nbasis, params,
dropind=vector("list", 0),
quadvals=vector("list", 0),
values=vector("list", 0),
basisvalues=vector("list", 0))
The equivalent Matlab calling sequence lacks specification of default values:
basis(basistype, rangeval, nbasis, params, dropind,
quadvals, values, basisvalues)
We include a brief description of each argument here for R users, but you should
use the help command in either language to get more information.
type A character string indicating the type of basis. A number of character se-
quences are permitted for each type to allow for abbreviations and optional cap-
italization.
3.6 The Structure of the basisfd or basis Class 43
rangeval A vector of length two containing the lower and upper boundaries of
the range over which the basis is defined. If a positive number if supplied instead,
the lower limit is set to zero.
nbasis The number of basis functions.
params A vector of parameter values defining the basis. If the basis type is
"fourier", this is a single number indicating the period. That is, the basis
functions are periodic on the interval (0,PARAMS) or any translation of it. If
the basis type is bspline, the values are interior knots at which the piecewise
polynomials join.
dropind A vector of integers specifying the basis functions to be dropped, if
any. For example, if it is required that a function be zero at the left boundary, this
is achieved by dropping the first basis function, the only one that is nonzero at
that point.
The final three arguments, quadvals, values, and basisvalues, are used to
store basis function values in situations where a basis system is evaluated repeatedly.
quadvals A matrix with two columns and a number of rows equal to the num-
ber of argument values used to approximate an integral (e.g., using Simpson’s
rule). The first column contains the argument values. A minimum of five values
is required. For type = ’bspline’, this is used in each interknot interval, the
minimum of 5 values is often enough. These are typically equally spaced be-
tween adjacent knots. The second column contains the weights. For Simpson’s
rule, these are proportional to 1, 4, 2, 4, ..., 2, 4, 1.
values A list, with entries containing the values of the basis function derivatives
starting with 0 and going up to the highest derivative needed. The values corre-
spond to quadrature points in quadvals. It is up to the user to decide whether
or not to multiply the derivative values by the square roots of the quadrature
weights so as to make numerical integration a simple matrix multiplication. Val-
ues are checked against quadvals to ensure the correct number of rows, and
against nbasis to ensure the correct number of columns; values contains
values of basis functions and derivatives at quadrature points weighted by square
root of quadrature weights. These values are only generated as required, and only
if the quadvals is not matrix("numeric",0,0).
basisvalues A list of lists. This is designed to avoid evaluation of a basis sys-
tem repeatedly at a set of argument values. Each sublist corresponds to a specific
set of argument values, and must have at least two components, which may be
named as you wish. The first component in an element of the list vector contains
the argument values. The second component is a matrix of values of the basis
functions evaluated at the arguments in the first component. Subsequent com-
ponents, if present, are matrices of values of their derivatives up to a maximum
derivative order. Whenever function getbasismatrix is called, it checks the
first list in each row to see first if the number of argument values corresponds
to the size of the first dimension, and if this test succeeds, checks that all of the
argument values match.
44 3 How to Specify Basis Systems for Building Functions
The names of the suffixes in R or the functions in Matlab that either extract or
insert component information into a basis object are shown in Table 3.1.
Table 3.1 The methods for extracting and modifying information in a basisfd (R) or basis
(Matlab) object
We saw in the last chapter that functions are built up from basis systems φ1 (t), . . . ,
φK (t) by defining the linear combination
K
x(t) = ∑ ck φk (t) = c0 φ (t).
k=1
That chapter described how to build a basis system. Now we take the next step,
defining a functional data object by combining a set of coefficients ck (and other
useful information) with a previously specified basis system.
Once we have selected a basis, we have only to supply coefficients in order to define
an object of the functional data class (with class name fd).
If there are K basis functions, we need a coefficient vector of length K for each
function that we wish to define. If only a single function is defined, then the coeffi-
cients are loaded into a vector of length K or a matrix with K rows and one column.
If N functions are needed, say for a sample of functional observations of size N,
we arrange these coefficient vectors in a K by N matrix. If the functions themselves
are multivariate of dimension m, as would be the case, for example, for positions
in three-dimensional space (m = 3), then we arrange the coefficients into a three-
dimensional array of dimensions K, N, and m, respectively. (A single multivariate
function is defined with a coefficient array with dimensions K, 1, and m; see Section
2.2 for further information on this case.) That is, the dimensions are in the order
“number of basis functions,” “number of functions or functional observations” and
“number of dimensions of the functions.”
J.O. Ramsay et al., Functional Data Analysis with R and MATLAB, Use R, 45
DOI: 10.1007/978-0-387-98185-7_4,
© Springer Science + Business Media, LLC 2009
46 4 How to Build Functional Data Objects
Here is the command that creates a functional data object using the basis with
name daybasis65 that we created in the previous chapter, with the coefficients
for mean temperature for each of the 35 weather stations organized into the 65 by
35 matrix coefmat:
tempfd = fd(coefmat, daybasis65)
You will seldom need to use the fd function explicitly because other functions
call it after computing coefmat as a representation of functional data in terms of
the specified basis set. We will discuss some of these functions briefly later in this
chapter and in more detail in the next.
Let us take a moment here to reflect on what functional data objects mean. Func-
tional data objects represent functions, and functions are one-to-one mappings or
relationships between values in a domain and values in a range. In the language of
graphics, the domain values are points on the horizontal coordinate or abscissa, and
the range values are points in a vertical coordinate or ordinate. For the purpose of
this book, we consider mostly one-dimensional domains, such as time, but we do
allow for the possibility that the range space of multidimensional, such as (X,Y,Z)
triples for the coordinates of points in a three-dimensional space. Finally, we also
allow for the possibility of multiple or replicated functions.
Adding labels to functional data objects is a convenient way to supply the infor-
mation needed for graphical displays. Specialized plotting functions that the code
supplies in either language can look for these labels, and if they are present, place
them where appropriate for various kinds of plots. The component for labels for
functional data objects is called fdnames.
If we want to supply labels, we will typically need three, and they are, in order:
1. A label for the domain, such as ’Time’, ’Day’, and so on.
2. A label for the replication dimension, such as as ’Weather station’,
’Child’, etc.
3. A label for the range, such as "Temperature (deg. C)’, ’Space’, etc.
We refer to these three labels as the generic labels for the functional data object.
In R, we supply labels in a list object of length three. An empty version of such
a list can be set up by the command
fdnames = vector("list", 3)
The corresponding object in Matlab is a cell array of length three, which may be set
up by
fdnames = cell(1,3)
4.1 Adding Coefficients to Bases to Define Functions 47
In addition to generic labels for each dimension of the data, we may also want,
for the range and/or for the replication dimension, to supply sets of labels, each la-
bel applying to a specific dimension or replicate. For example, for the gait data, we
may want a label such as “Angle” to be common or generic to the two observed
angles, but in addition require two labels such as “Knee” and “Hip” to distinguish
which angle is being plotted. Similarly, in addition to “Weather Station” to describe
generically the replication dimension for the weather data as a whole, we probably
want to supply names for each weather station. Thus, labels for replicates and vari-
ables have the potential to have two levels, a generic level and a specific level. Of
course, if there is only one dimension for range or only one replicate, a two-level
labels structure of this nature would usually be superfluous.
In the simple case where a dimension only needs a single name, labels are sup-
plied as strings having the class character in R or char in Matlab. For example,
we may supply only a common name such as “Child” for the replication dimension
of the growth data, and “Height(cm)” for the range, combined with “Age (years)”
for the domain. Here is a command that sets up these labels in R directly, without
bothering to set up an empty list first,
fdnames = list("Age (years)", "Child", "Height (cm)")
or, assuming that the empty list has already been defined:
fdnames[[1]] = "Age (years"
fdnames[[2]] = "Child"
fdnames[[3]] = "Height (cm)"
Since Matlab accesses cell array elements by curly brackets the expressions are
fdnames{1} = ’Age (years)’
fdnames{2} = ’Child’
fdnames{3} = ’Height (cm)’
However, when the required label structure for either the replication or the range
dimension is two-level, we take advantage of the fact that the elements of a list in
R can be character vectors or lists, and entries in cell arrays in Matlab can be cell
arrays. We deal with the two languages separately in the following two paragraphs.
In R, generic and specific names can be supplied by a named list. The common
or generic label is supplied by the name of the list and the individual labels by the
entry of the list, this entry being of either the character or list class. Take weather
stations for the weather data, for example. The second element is itself a list, defined
perhaps by the commands
station = vector("list", 35)
station[[ 1]] = "St. Johns"
.
.
.
station[[35]] = "Resolute"
A command to set up a labels list for the daily temperature data might be
48 4 How to Build Functional Data Objects
fdnames = list("Day",
"Weather Station" = station,
"Mean temperature (deg C)")
Notice that the names attribute of a list entry can be a quoted string containing
blanks, such as what we have used here. The other two names, argname and
varname, will only be used if the entry is NULL or "" or, in the case of vari-
able name, if the third list entry contains a vector of names of the same length as
the number of variables. The code also checks that the number of labels in the label
vector for replications equals the number of replications and uses the names value
if this condition fails.
Matlab does not have an analogue of the names attribute in R, but each entry
in the cell array of length three can itself be a cell array. If the entry is either a
string or a cell array whose length does not match the required number of labels,
then the Matlab plotting functions will find in this entry a generic name common
to all replicates or variables. But if the entry for either the replicates or variables
dimension is a cell array of length two, then the code expects the generic label
in the first entry and a character matrix of the appropriate number of rows in the
second. The weather station example above in Matlab becomes
station=cell(1,2);
station{1} = ’Weather Station’;
station{2} = [’St. Johns ’;
’Charlottetown’;
.
.
.
’Resolute ’];
Note that a series of names are stored as a matrix of characters, so that enough
trailing blanks in each name must be added to allow for the longest name to be
used.
As for the basis class, there are similar generic functions for printing, summarizing
and testing for class and identity for functional data objects.
There are, in addition, some useful methods for doing arithmetic on functional
data objects and carrying out various transformations. For example, we can take the
sum, difference, power or pointwise product of two functions with commands like
fdsumobj = fdobj1 + fdobj2
fddifobj = fdobj1 - fdobj2
fdprdobj = fdobj1 * fdobj2
fdsqrobj = fdobjˆ2
4.2 Methods for Functional Data Objects 49
One can, as well, substitute a scalar constant for either argument in the three arith-
metic commands. We judged pointwise division to be too risky since it is difficult to
detect if the denominator function is nonzero everywhere. Similarly,
fdobjˆa
may produce an error or nonsense if a is negative and fdobj is possibly zero at
some point.
Beyond this, the results of multiplication and exponentiation may not be what
one might naively expect. For example, the following produces a straight line from
(-1) to 2 with a linear spline basis:
tstFn0 <- fd(c(-1, 2), create.bspline.basis(norder=2))
plot(tstFn0)
However,
tstFn0ˆ2
is not a parabola but a straight line that approximates this parabola over rangeval
using the same linear basis set. We get a similar approximation from
tstFn0*tstFn0, but it differs in the third significant digit.
What do we get from
tstFn0ˆ(-1)?
The result may be substantially different from what many people expect. These are
known “infelicities” in fda, which the wise user will avoid. Using cubic or higher-
order splines with basis sets larger than in this example will reduce substantially
these problems in many but not all cases.
The mean of a set of functions is achieved by a command like
fdmeanobj = mean(fdobj)
Similarly, functions are summed by the sum function. As the software evolves, we
expect that other useful methods will be added (and infelicities further mitigated).
We often want to work with the values of a function at specified values of ar-
gument t, stored, say, in vector tvec. The evaluation function comparable to that
used in Chapter 3 for basis functions is eval.fd in R and eval fd in Matlab.
For example, we could evaluate functional data object thawfd at times in vector
day.5 by the R command
thatvec = eval.fd(tvec, thawfd)
The same command can be used to evaluate a derivative of thawfd by supplying
the index of the derivative as the third argument. The second derivative of thawfd
is evaluated by
D2thatvec = eval.fd(tvec, thawfd, 2)
More generally, if Lfdobj is an object of the linear differential operator Lfd class,
defined in Section 4.4, then
50 4 How to Build Functional Data Objects
We pointed out in Chapter 3 that curves defined by B-spline bases tend to follow the
same track as their coefficients. Here is an example. This R code sets up a coefficient
4.3 Smoothing Using Regression Analysis 51
20
Mean temperature (deg. C)
10
0
−10
−20
−30
The topic of smoothing data will be taken up in detail in Chapter 5. However, we can
sometimes get good results without more advanced smoothing machinery simply
by keeping the number of basis functions small relative to the amount of data being
approximated.
52 4 How to Build Functional Data Objects
1.0
0.5
0.0
f(t)
−0.5
−1.0
0 2 4 6 8 10
Fig. 4.2 The 13 spline basis functions defined in Figure 3.1 are combined with coefficients whose
values are sinusoidal to construct the functional data object plotted as a solid line. The coefficients
themselves are plotted as circles.
Canadians love to talk about the weather, and especially in midwinter when the
weather puts a chill on many other activities. The January thaw is eagerly awaited,
and in fact the majority of Canadian weather stations show clear evidence of these
few days of relief. The following code loads 34 years of daily temperature data for
Montreal, extracts temperatures for January 16th to February 15th and plots their
mean, shown in Figure 4.3.
# This assumes the data are in "MtlDaily.txt"
# in the working directory getwd()
MtlDaily = matrix(scan("MtlDaily.txt",0),34,365)
thawdata = t(MtlDaily[,16:47])
daytime = ((16:47)+0.5)
par(cex=1.2)
plot(daytime, apply(thawdata,1,mean), "b", lwd=2,
xlab="Day", ylab="Temperature (deg C)")
We can fit these data by regression analysis by using a matrix of values of a basis
system taken at the times in vector daytime. Here we construct a basis system
over the interval [16,48] using seven cubic B-splines, and evaluate this basis at these
points to produce a 32 by 7 matrix. By default the knots are equally spaced over this
interval.
4.3 Smoothing Using Regression Analysis 53
−9
−10
Temperature (deg C)
−11
−12
−13
−14
20 25 30 35 40 45
Day
Fig. 4.3 Temperatures at Montreal from January 16 to February 15 averaged over 1961 to 1994.
thawbasis = create.bspline.basis(c(16,48),7)
thawbasismat = eval.basis(daytime, thawbasis)
Now we can compute coefficients for our functional data object by the usual equa-
tions for regression coefficients, b = (X0 X)−1 X0 y, and construct a functional data
object by combining them with our basis object. A plot of these curves is shown
in Figure 4.4 and, sure enough, we do see a fair number of them peaking between
January 20 and 25, and a few others with later peaks as well.
thawcoef = solve(crossprod(thawbasismat),
crossprod(thawbasismat,thawdata))
thawfd = fd(thawcoef, thawbasis,
list("Day", "Year", "Temperature (deg C)"))
plot(thawfd, lty=1, lwd=2, col=1)
We can use these objects to illustrate two useful tools for working with func-
tional data objects. We often want to compare a curve to the data from which it was
estimated. In the following command we use function plotfit.fd to plot the
data for 1961 along with corresponding B-spline fit. The command also illustrates
the possibility of using subscripts on functional data objects. The result is shown
in Figure 4.5, where the fit suggests a thaw before January 15 and another in early
February. The legend on the plot indicates that the standard deviation of the variation
of the actual temperatures around the curve is four degrees Celsius.
plotfit.fd(thawdata[,1], daytime, thawfd[1],
lty=1, lwd=2)
54 4 How to Build Functional Data Objects
0
Temperature (deg C)
−10
−20
−30
15 20 25 30 35 40 45
Day
Fig. 4.4 Functional versions of temperature curves for Montreal between January 16 and February
15. Each curve corresponds to one of the years from 1960 to 1994.
−5
Temperature (deg C)
−10
−15
−20
15 20 25 30 35 40 45
Day
RMS residual = 4
Fig. 4.5 The temperature curve for 1961 along with the actual temperatures from which it is esti-
mated.
4.4 The Linear Differential Operator or Lfd Class 55
The availability of a sample of N curves makes us wonder how they vary among
themselves. The analogue of the correlation and covariance matrices in the multi-
variate context are the correlation and covariance functions or surfaces, ρ (s,t) and
σ (s,t). The value ρ (s,t) specifies the correlation between the values x(s) and x(t)
over a sample or population of curves, and similarly for σ (s,t). This means that
we also need to be able to define functions of two arguments, in this case s and t.
We will need this capacity elsewhere. Certain types of functional regression require
bivariate regression coefficient functions.
The bivariate functional data class with name bifd is designed to do this. Ob-
jects of this class are created in much the same way as fd objects, but this now
requires two basis systems and a matrix of coefficients for a single such object. In
mathematical notation, we define an estimate of a bivariate correlation surface as
K L
r(s,t) = ∑ ∑ bk,` φk (s)ψ` (t) = φ 0 (s)Bψ (t), (4.2)
1 1
where φk (s) is a basis function for variation over s and ψ` (t) is a basis function for
variation over t. The following command sets up such a bivariate functional object:
corrfd = bifd(corrmat, sbasis, tbasis)
4.7 Some Things to Try 57
However, situations where you would have to set up bivariate functional data
objects are rare, since most of these are set up by R or Matlab functions, var.fd
and var fd in R and Matlab, respectively. We will use these functions in Chapter
6.
To summarize the most important points of this chapter, we give here the arguments
of the constructor function fd for an object of the fd class.
coef A vector, matrix, or three-dimensional array of coefficients. The first di-
mension (or elements of a vector) corresponds to basis functions. A second di-
mension corresponds to the number of functional observations, curves or repli-
cates. If coef is a three-dimensional array, the third dimension corresponds to
variables for multivariate functional data objects.
basisobj A functional basis object defining the basis.
fdnames A list of length three, each member potentially being a string vector
containing labels for the levels of the corresponding dimension of the data. The
first dimension is for argument values and is given the default name "time".
The second is for replications and is given the default name "reps". The third
is for functions and is given the default name "values".
The arguments of the constructor function Lfd for objects of the linear differential
operator class are
nderiv A nonnegative integer specifying the order m of the highest order deriva-
tive in the operator.
bwtlist A list of length m. Each member contains a functional data object that
acts as a weight function for a derivative. The first member weights the function,
the second the first derivative, and so on up to order m − 1.
d. Now set up the basis function system in the language you are working with.
Plot the basis to see how it looks using the plot command (as described in
the previous chapter on basis sets).
e. Next define a vector of random coefficients using your language’s normal ran-
dom number generator. These can vary about zero as a mean, but you can
also vary them around some function, such as sin(2π t) over [0,1]. If you use
a trend, because of the unit sum property of B-splines described above, the
function you define will also vary around this trend. You may want to play
around their standard deviation as a part of this exercise.
f. Finally, set up a functional data object having a single function using the fd
command.
2. Plot this function using the plot command.
3. Plot both the function and the coefficients on the same graph. To plot the coef-
ficients for order four splines, plot all but the second and third in from each end
against knot locations. For example, if you have 23 basis functions, and hence 23
coefficients, plot coefficients 1, 4, 5, and so on up to 20, and then the 23rd. The
21 knots (including end points) are equally spaced by default. At the same time,
evaluate the function using the eval.fd (R) or eval fd (Matlab) function at
a fine mesh of values, such as 51 equally spaced values. Plot these values over
the coefficients that you have just plotted. Compare the trend in the coefficients
and the curve. If you specified a mean function for the random coefficients, you
might want to add this to the plot as well.
4. You might want to extend this exercise to generating N random functions, and
plot all of them simultaneously to see how much variation there is from curve
to curve. This will, of course, depend on the standard deviation of the random
coefficients that you use.
5. Why not also plot the first and second derivatives of these curves, evaluated again
using the eval.fd function and specifying the order of derivative as the third
argument. You might want to compare the first derivative with the difference
values for the coefficients.
Chapter 5
Smoothing: Computing Curves from Noisy Data
The previous two chapters have introduced the Matlab and R code needed to specify
basis function systems and then to define curves by combining these coefficient
arrays. For example, we saw how to construct a basis object such as heightbasis
to define growth curves and how to combine it with a matrix of coefficients such as
heightcoef so as to define growth functional data objects such as were plotted
in Figure 1.1.
We now turn to methods for computing these coefficients with more careful con-
sideration of measurement error. For example, how do we compute these coefficients
to obtain an optimal fit to data such as the height measurements for 54 girls in the
Berkeley growth study stored in the 31 by 54 matrix that we name heightmat? Or
how do we replace the rather noisy mean daily precipitation observations by smooth
curves?
Two strategies are discussed. The simplest revisits the use of regression analysis
that concluded Chapter 4, but now uses a special function for this purpose. The
second and more elaborate strategy aims to miss nothing of importance in the data
by using a powerful basis expansion, but avoids overfitting the data by imposing a
penalty on the “roughness” of the function, where the meaning of “rough” can be
adapted to special features of the application from which the data were obtained.
We tend, perhaps rather too often, to default to defining data fitting as the minimiza-
tion of the sum of squared errors or residuals,
n
SSE(x) = ∑[y j − x(t j )]2 . (5.1)
j
When smoothing function x is defined as a basis function expansion (3.1), the least-
squares estimation problem becomes
J.O. Ramsay et al., Functional Data Analysis with R and MATLAB, Use R, 59
DOI: 10.1007/978-0-387-98185-7_5,
© Springer Science + Business Media, LLC 2009
60 5 Smoothing: Computing Curves from Noisy Data
n K n
SSE(c) = ∑[y j − ∑ ck φk (t j )]2 = ∑[y j − φ (t j )0 c]2 . (5.2)
j k j
where the true errors or residuals ε j are statistically independent and have a normal
or Gaussian distribution with mean 0 and constant variance. Of course, if we look
closely, we often see that this error model is too simple. Nevertheless, the least-
squares estimation process can be defended on the grounds that it tends to give
nearly optimal answers relative to “best” estimation methods so long as the true
error distribution is fairly short-tailed and departures from the other assumptions
are reasonably mild.
Readers will no doubt recognize (5.3) as the standard regression analysis model,
along with its associated least-squares solution. Using matrix notation, let the n-
vector y contain the n values to be fit, vector ε contain the corresponding true resid-
ual values, and n by k matrix Φ contain the basis function values φk (t j ). Then we
have
y = Φc + ε
and the least-squares estimate of the coefficient vector c is
ĉ = (Φ 0 Φ )−1 Φ 0 y. (5.4)
R and Matlab already have the capacity to smooth data through their functions
for regression analysis. Here is how we can combine these functions with the ba-
sis creation functions available in the fda package. Suppose that we want a basis
system for the growth data with K = 12 basis functions using equally spaced knots.
This can be accomplished in R with the following command:
heightbasis12 = create.bspline.basis(c(1,18), 12, 6)
If we evaluate the basis functions at the ages of measurement in vector object age
by the command basismat = eval.basis(age, heightbasis12) (in
R), then we have a 31 by 12 matrix of covariate or design values that we can use in
a least-squares regression analysis defined by commands such as
heightcoef = lsfit(basismat, heightmat,
intercept=FALSE)$coef
heightcoef = basismat\heightmat
in R and Matlab, respectively. Spline curves fit by regression analysis are often
referred to as regression splines in statistical literature.
However, the function smooth.basis (R) and smooth basis (Matlab) are
provided to produce the same results as well as much more without the need to
explicitly evaluate the basis functions, through the R command
heightList = smooth.basis(age, heightmat,
5.1 Regression Splines: Smoothing by Regression Analysis 61
heightbasis12)
and the Matlab version
[fdobj, df, gcv, coef, SSE, penmat, y2cMap] = ...
smooth_basis(age, heightmat, heightbasis12);
The R function smooth.basis returns an object heightlist of the list
class, and the Matlab function smooth basis returns all seven of its objects as
an explicit sequence of variable names surrounded by square brackets. However, if
we just wanted the first three returned objects as separate objects, in R we would
have to extract them individually:
heightfd = heightList$fd
height.df = heightList$df
height.gcv = heightList$gcv
In Matlab, we would just request only the first three objects:
[fdobj, df, gcv] = ...
smooth_basis(age, heightmat, heightbasis12);
In any case, the three most important returned objects are the following, where the
names in bold type are used in each language to retrieve the objects:
fd An object of class fd containing the curves that fit the data.
df The degrees of freedom used to define the fitted curves.
gcv The value of the generalized cross-validation criterion: a measure of lack
of fit discounted for degrees of freedom. If there are multiple curves, a vector
is returned containing gcv values for each curve. (See Ramsay and Silverman
(2005) for details.)
Notice that the coefficient estimate ĉ in (5.4) is obtained from the data in the
vector y by multiplying this vector by a matrix, to which we give the text name
y2cMap. We will use this matrix in many places in this book where we need to
estimate the variability in quantities determined by ĉ, so we here give it a name:
Here is the corresponding R code for computing this matrix for the growth data:
age = growth$age
heightbasismat = eval.basis(age, heightbasis12)
y2cMap = solve(crossprod(heightbasismat),
t(heightbasismat))
In Matlab this last command would be
y2cMap = (heightbasismat’*heightbasismat) \ ...
heightbasismat’;
62 5 Smoothing: Computing Curves from Noisy Data
This code for the mapping matrix y2cMap only applies to regression-based
smoothing. More general expressions for y2cMap include other term(s) that disap-
pear with zero smoothing. This is important because as we change the smoothing,
y2cMap changes, but ĉ is still the product of y2cMap, however changed, and the
data.
While we are at it, we also will need what is often called the “hat” matrix, denoted
by H. This maps the data vector into the vector of fitted values
The regression approach to smoothing data only works if the number K of ba-
sis functions is substantially smaller than the number n of sampling points. With
the growth data, it seems that roughly K = 12 spline basis functions are needed to
adequately smooth the growth data. Larger values of K will tend to undersmooth
or overfit the data. Interestingly, after over a century of development of parametric
growth curve models, the best of these also use about 12 parameters in this example.
Although regression splines are often adequate for simple jobs where only curve
values are to be used, the instability of regression spline derivative estimates at the
boundaries is especially acute. The next section describes a more sophisticated ap-
proach that can produce much better derivative results and also allows finer control
over the amount of smoothing.
The roughness penalty approach uses a large number of basis functions, possibly
extending to one basis function per observation and even beyond, but at the same
time imposing smoothness by penalizing some measure of function complexity. For
example, we have already in the last chapter defined a basis system for the growth
data called heightbasis that has 35 basis functions, even though we have only
31 observations per child. Would using such a basis system result in overfitting the
data, as well as singularity problems on the computational side? That answer is,
“Not if a positive penalty is applied to the degree to which the fit is not smooth.”
We define a measure of the roughness of the fitted curve, and then minimize a fitting
criterion that trades off curve roughness against lack of data fit.
Here is a popular way to quantify the notion “roughness” of a function. The
square of the second derivative [D2 x(t)]2 of a function x at argument value t is of-
ten called its curvature at t, since a straight line, which we all tend to agree has
no curvature, has second derivative zero. Consequently, a measure of a function’s
5.2 Data Smoothing with Roughness Penalties 63
(Unless otherwise stated, all integrals in this book are definite integrals over the
range of t.)
Penalty terms such as PEN2 (x) provide smoothing because wherever the function
is highly variable, the square of the second derivative [D2 x(t)]2 is large. We can
apply this concept to derivative estimation as well. If we are interested in the second
derivative D2 x of x, chances are that we want it to appear to be smooth. This suggests
that we ought to penalize the curvature of the second derivative, that is, use the
roughness measure Z
PEN4 (x) = [D4 x(t)]2 dt . (5.8)
But is “roughness” always related to the second derivative? Thinking a bit more
broadly, we can define roughness as the extent to which a function departs from
some baseline “smooth” behavior. For periodic functions of known period that can
vary in level, such as mean temperature curves, the baseline behavior can be con-
sidered to be shifted sinusoidal variation,
that is, represented by the first three terms in the Fourier series for some known
ω = 2π /T. If we compute ω 2 Dx + D3 x for such a simple function, we find that the
result is exactly 0. We refer to the differential operator L = ω 2 D + D3 in Ramsay
and Silverman (2005) as the harmonic acceleration operator. What happens when
we apply this harmonic acceleration operator to higher-order terms in a Fourier
series:
This expression is 0 for j = 1 and increases with the cube of j. This property suggests
that the integral of the square of this harmonic acceleration operator may be a
suitable measure of roughness for periodic data like the temperatures curves:
Z
PENL (x) = [Lx(s)]2 ds . (5.11)
where x(t) = c0 φ (t). The smoothing parameter λ specifies the emphasis on the sec-
ond term penalizing curvature relative to goodness of fit quantified in the sum of
squared residuals in the first term. As λ moves from 0 upward, curvature becomes
increasingly penalized. With λ sufficiently large, D2 x will be essentially 0. This in
turn implies that x will be essentially a straight line = polynomial of degree one,
order two, except possibly at a finite number of isolated points such as join points
or knots of a B-spline. At the other extreme, λ → 0 leaves the function x free to fit
the data as closely as possible with the selected basis set, sometimes at the expense
of some fairly wild variations in the approximating function.
It is usually convenient to plot and modify λ on a logarithmic scale. More gen-
erally, the use of a differential operator L to define roughness will result in λ → ∞
forcing the fit to approach more and more closely a solution to the differential equa-
tion Lx = 0. If L = Dm , this solution will be a polynomial of order m (i.e., degree
m − 1). For the harmonic acceleration operator, this solution will be of the form
(5.9). In this way, we can achieve an important new form of control over the smooth-
ing process, namely by having the capacity to define the concept “smooth” in a way
that is appropriate to the application.
We can now provide an explicit form of the estimate of the coefficient vector ĉ for
roughness penalty smoothing that is the counterpart of (5.4) for regression smooth-
ing. The general version of the roughness penalized fitting criterion (5.12) is
Z
F(c) = ∑[y j − x(t j )]2 + λ [Lx(t)]2 dt. (5.13)
j
If we substitute the basis expansion x(t) = c0 φ (t) = φ 0 (t)c into this equation, we get
Z
F(c) = ∑[y j − φ 0 (t j )c]2 + λ c0 [ Lφ (t)Lφ 0 (t)dt]c. (5.14)
j
With this defined, it is a relatively easy exercise in matrix algebra to work out that
ĉ = (Φ 0 Φ + λ R)−1 Φ 0 y. (5.16)
5.2 Data Smoothing with Roughness Penalties 65
From here we can define the matrix y2cMap that we will use in Chapter 6 for
computing confidence regions about estimated curves:
H = Φ (Φ 0 Φ + λ R)−1 Φ 0 . (5.18)
But how is one to compute matrix R in either language? This is taken care of in
the function eval.penalty in R and eval penalty in Matlab. These func-
tions require two arguments:
basisobj A functional basis object of the basisfd class in R and basis
class in Matlab.
Lfdobj A linear differential operator object of the Lfd class.
In the case of the harmonic accelerator operator, we can calculate the roughness
penalty matrix Rmat in R by
Rmat = eval.penalty(tempbasis, harmaccelLfd)
We hasten to add that most routine functional data analyses will not actually
need to calculate roughness penalty matrices, since this happens inside functions
such as smooth.basis. Computing R can involve numerical approximations to
the integrals involved in (5.15). However, for a spline basis, if L is a power of D, then
the integrals are analytically available and evaluated to within machine precision.
The values x(t j ), j = 1, . . . , n defined by minimizing criterion (5.14) are critical for
a detailed analysis of how well alternative choices λ work for fitting the data values
y j . Let us denote these by the vector x̂ and the corresponding data values by y. It
turns out (see Ramsay and Silverman (2005) for details) that x̂ has the following
linear relationship to y:
x̂ = H(λ )y. (5.19)
The smoothing matrix H(λ ) is square, symmetric and of order n and, needless to say,
a function of λ . It has many uses, among which is that a measure of the effective
degrees of freedom of the fit defined by λ is defined by
Going beyond the smoothing problem, we need the general capacity in functional
data analysis to impose smoothness on estimated functional parameters, of which
the smoothing curve is only one example. We now explain how this is made possible
in the two programming languages.
A roughness penalty is defined by constructing a functional parameter object
consisting of:
• a basis object,
• a derivative order m or a differential operator L to be penalized and
• a smoothing parameter λ .
We put these elements together by using the fdPar class in either language and the
function fdPar to construct an object of that class.
The following R commands do two things: First they set up an order six B-spline
basis for smoothing the growth data using a knot at each age. Then they define a
functional parameter object that penalizes the roughness of growth acceleration by
using the fourth derivative in the roughness penalty. The smoothing parameter value
that we have found works well here is λ = 0.01.
norder = 6
nbasis = length(age) + norder - 2
heightbasis = create.bspline.basis(c(1,18),
nbasis, norder, age)
heightfdPar = fdPar(heightbasis, 4, 0.01)
The data are in array heightmat. In Chapter 4, these data were passed to
smooth.basis with a basis object as the third argument. Here, we will use the
functional parameter object heightfdPar as the third argument:
heightfd = smooth.basis(age, heightmat,
heightfdPar)$fd
Notice that we set up a functional data object heightfd directly by using the
suffix $fd. In Matlab, we would use
heightfd = smooth_basis(age, heightmat, heightfdPar)
Notice that this is a twice-discounted mean square error measure. The right factor is
the unbiased estimate of error variance σ 2 familiar in regression analysis, and thus
represents some discounting by subtracting d f (λ ) from n. The left factor further
discounts this estimate by multiplying by n/(n − d f (λ )).
Figure 5.1 shows how the generalized cross-validation (GCV) criterion varies as
a function of log10 (λ ) for the entire female Berkeley growth data. Matlab code for
generating the plotted values is
loglam = -6:0.25:0;
gcvsave = zeros(length(loglam),1);
dfsave = gcvsave;
for i=1:length(loglam)
lambdai = 10ˆloglam(i);
hgtfdPari = fdPar(heightbasis, 4, lambdai);
[hgtfdi, dfi, gcvi] =
smooth_basis(age, hgtfmat, hgtfdPari);
gcvsave(i) = sum(gcvi);
dfsave(i) = dfi;
end
The minimizing value of λ is about 10−4 , and at that value df(λ ) = 20.2. In fact, the
value λ = 10−4 is rather smaller than the value of 10−2 that we chose to work with
in our definition of the fdPar object in Section 5.2.4, for which df(λ ) = 12.7. We
explain our decision in Section 5.3, and recommend a cautious and considered ap-
proach to choosing the smoothing parameter rather than relying solely on automatic
methods such as GCV minimization.
GCV values often change slowly with log10 λ near the minimizing value, so that
a fairly wide range of λ values may give roughly the same GCV value. This is a sign
that the data are not especially informative about the “true” value of λ . If so, it is not
worth investing a great deal of effort in precisely locating the minimizing value, and
simply plotting GCV over a mesh of log10 λ might be sufficient. Plotting the func-
tion GCV(λ ) in any case will inform us about the curvature of near its minimum. If
the data are not telling us all that much about λ , then it is surely reasonable to use
your judgment in working with values which seem to provide more useful results
than the minimizing value does. Indeed, Chaudhuri and Marron (1999) argue per-
suasively for inspecting data smooths over a range of λ values in order to see what
is revealed at each level of smoothing. However, if a more precise value seems im-
portant, the function lambda2gcv can be used as an argument in an optimization
function that will return the minimizing value.
The fda package for R includes CanadianWeather data, which includes the
base 10 logarithms of the average annual precipitation in millimeters (after replacing
68 5 Smoothing: Computing Curves from Noisy Data
35
30
25
GCV(λ)
20
15
10
5
−6 −5 −4 −3 −2 −1 0
log10(λ)
Fig. 5.1 The values of the generalized cross-validation or GCV criterion for choosing the smooth-
ing parameter λ for fitting the female growth curves.
zeros with 0.05) for each day of the year at 35 different weather stations. We put
these data in logprecav, shifted to put winter in the middle, so the year begins
with July 1 and ends with June 30:
logprecav = CanadianWeather$dailyAv[
dayOfYearShifted, , ’log10precip’]
Next we set up a saturated Fourier basis for the data:
dayrange = c(0,365)
daybasis = create.fourier.basis(dayrange, 365)
We will smooth the data using a harmonic acceleration roughness penalty that penal-
izes departures from a shifted sine, x(t) = c1 + c2 sin(2π t/365) + c3 cos(2π t/365).
Here we define this penalty. The first command sets up a vector containing the
three coefficients required for the linear differential operator, and the second uses
function vec2Lfd to convert this vector to the linear differential operator object
harmaccelLfd.
Lcoef = c(0,(2*pi/diff(dayrange))ˆ2,0)
harmaccelLfd = vec2Lfd(Lcoef, dayrange)
Now that we are set up to do some smoothing, we will want to try a range of
smoothing parameter λ values and examine the degrees of freedom and values of
the generalized cross–validation coefficient GCV associated with each value of λ .
First we set up a range of values (identified, of course, by some preliminary trial-
5.3 Case Study: The Log Precipitation Data 69
and-error experiments). We also set up two vectors to contain the degrees of freedom
and GCV values.
loglam = seq(4,9,0.25)
nlam = length(loglam)
dfsave = rep(NA,nlam)
gcvsave = rep(NA,nlam)
Here are commands that loop through the smoothing values, storing degrees of free-
dom and GCV along the way:
for (ilam in 1:nlam) {
cat(paste(’log10 lambda =’,loglam[ilam],’\n’))
lambda = 10ˆloglam[ilam]
fdParobj = fdPar(daybasis, harmaccelLfd, lambda)
smoothlist = smooth.basis(day.5, logprecav,
fdParobj)
dfsave[ilam] = smoothlist$df
gcvsave[ilam] = sum(smoothlist$gcv)
}
The GCV values have to be summed, since function smooth.basis returns a
vector of GCV values, one for each replicate.
Figure 5.2 plots the GCV values. This shows a minimum at log10 (λ ) = 6. Next we
smooth at this level and add labels to the resulting functional data object. Then we
plot all the log precipitation curves in a single plot, followed by a curve–by–curve
plot of the raw data and the fitted curve.
lambda = 1e6
fdParobj = fdPar(daybasis, harmaccelLfd, lambda)
logprec.fit = smooth.basis(day.5,logprecav,fdParobj)
logprec.fd = logprec.fit$fd
fdnames = list("Day (July 1 to June 30)",
"Weather Station" = CanadianWeather$place,
"Log 10 Precipitation (mm)")
logprec.fd$fdnames = fdnames
plot(logprec.fd)
plotfit.fd(logprecav, day.5, logprec.fd)
This example will be revisited in Chapter 7. There, we will see that the λ =
1e6 leaves some interesting structure in the residuals for a few weather stations.
Moreover, the curvature in the GCV function is rather weak, suggesting we will not
lose much by using other values of λ in the range of 1e5 to 1e8. Our advice at the
end of Section 5.2.5 seems appropriate here, and perhaps we should have worked
with a lower value of λ .
70 5 Smoothing: Computing Curves from Noisy Data
1.46
1.45
GCV Criterion
1.44
1.43
1.42
4 5 6 7 8 9
log10(λ)
Fig. 5.2 The values of the generalized cross-validation or GCV criterion for the log precipitation
data. The roughness penalty was defined by harmonic acceleration.
Often estimated curves must satisfy one or more side constraints. If the data are
counts or other values that cannot be negative, then we do not want negative curve
values, even over regions where values are at or close to zero. If we are estimating
growth curves, it is probably the case that negative slopes are implausible, even if
the noisy measurements do go down here and there. If the data are proportions, it
would not make sense to have curve values outside the interval [0,1].
Unfortunately, linear combinations of basis functions such as those we have been
using up to this point are difficult to constrain in these ways. The solution to the
problem is simple: We transform the problem to one where the curve being esti-
mated is unconstrained. We lose simple closed form expressions for the smoothing
curve and therefore must resort to iterative methods for calculating the transformed
curve, but the price is well worth paying.
This transformation strategy is easiest to see in the case of positive (or negative)
curves. We express the smoothing problem (5.3) as the transformed problem
5.4 Positive, Monotone, Density and Other Constrained Functions 71
That is, function w(t) is now the logarithm of the data-fitting function x(t) =
exp[w(t)], and consequently is unconstrained as to its sign, while at the same time
the fitting function is guaranteed to be positive. It can go as close to zero as we like
by permitting the values of w(t) to be arbitrarily large negative numbers.
For example, we can smooth Vancouver’s mean daily precipitation data, which
can have zero but not negative values, using these commands using the function
smooth.pos in R or smooth pos in Matlab:
lambda = 1e3
WfdParobj = fdPar(daybasis, harmaccelLfd, lambda)
VanPrec = CanadianWeather$dailyAv[
dayOfYearShifted, ’Vancouver’, ’Precipitation.mm’]
VanPrecPos = smooth.pos(day.5, VanPrec, WfdParobj)
Wfd = VanPrecPos$Wfdobj
These commands plot Wfd, the estimated log precipitation.
Wfd$fdnames = list("Day (July 1 to June 30)",
"Weather Station" = CanadianWeather$place,
"Log 10 Precipitation (mm)")
plot(Wfd)
The fit to the data, shown in Figure 5.3, is displayed by
precfit = exp(eval.fd(day.5, Wfd))
plot(day.5, VanPrec, type="p", cex=1.2,
xlab="Day (July 1 to June 30)",
ylab="Millimeters",
main="Vancouver’s Precipitation")
lines(day.5, precfit,lwd=2)
Some applications require a fitting function x(t) that is either monotonically increas-
ing or decreasing, even though the observations may not exhibit perfect monotonic-
ity:
y j = β0 + β1 x(t j ) + ε j (5.23)
We can get this easily by letting
Z t
x(t) = exp[w(u)] du. (5.24)
t0
Here t0 is the fixed origin for the range of t-values for which the data are being
fit. The intercept term β0 in (5.23) is the value of the approximating function at t0 .
72 5 Smoothing: Computing Curves from Noisy Data
Vancouver’s Precipitation
8
6
Millimeters
4
2
0
Fig. 5.3 Vancouver’s precipitation data, along with a fit estimated by positive smoothing.
Figure 5.4 shows the length of the tibia of a newborn infant, measured by Dr.
Michael Hermanussen with an error of the order of 0.1 millimeters, over its first
40 days. The staircase nature of growth in this early period and need to estimate
the velocity of change in bone length, also shown in the figure, makes monotone
smoothing essential. It seems astonishing that this small bone in the baby’s lower
leg has the capacity to grow as much as two millimeters in a single day.
Variables day and tib in the following code contain the numbers of the days
and the measurements, respectively. A basis for function w and a smoothing profile
are set up, the data are smoothed, the values of the functional data object for w
and the coefficients β0 and β1 are returned. Then the values of the smoothing and
velocity curves are computed.
5.4 Positive, Monotone, Density and Other Constrained Functions 73
130 2
125 1.5
120 1
115 0.5
110 0
0 10 20 30 40 0 10 20 30 40
Fig. 5.4 The left panel shows measurements of the length of the tibia of a newborn infant over its
first 40 days, along with a monotone smooth of these day. The right panel shows the velocity or
first derivative of the smoothing function.
In Chapter 8 we will need our best estimates of the growth acceleration functions for
the Berkeley girls, and smoothing their data monotonically substantially improves
these estimates over direct smoothing, and especially in the neighborhood of the
pubertal growth spurts.
74 5 Smoothing: Computing Curves from Noisy Data
We set up an order 6 spline basis with knots at ages of observations for their func-
tions w, along with
√ a roughness penalty on their third derivatives and a smoothing
parameter of 1/ 10, in these commands:
wbasis = create.bspline.basis(c(1,18), 35, 6, age)
growfdPar = fdPar(wbasis, 3, 10ˆ(-0.5))
The monotone smoothing of the data in the 31 by 54 matrix hgtf, and the extraction
of the the functional data object Wfd for the wi functions, the coefficients β0i , β1i
and the functional data object hgtfhatfd for the functions fitting the data are
achieved by
growthMon = smooth.monotone(age, hgtf, growfdPar)
Wfd = growthMon$Wfd
betaf = growthMon$beta
hgtfhatfd = growthMon$yhatfd
growing phase, and an adequate supply of moisture in the soil is essential for a good
crop.
Precipitation is a difficult quantity to model for several reasons. First of all, on
about 65% of the days in this region, no rain is even possible, so that zero really
means a “nonprecipitation day” rather than “no rain.” Since there can be a small
amount of precipitation from dew, we used only days when the measured precip-
itation exceeded two millimetres. Also, precipitation can come down in two main
ways: as a gentle drizzle and, more often, as a sudden and sometimes violent thun-
derstorm. Consequently, the distribution of precipitation is extremely skewed, and
Regina experienced three days in this period with more than 40 mm of rain. We
deleted these days, too, in order to improve the graphical displays, leaving N = 212
rainfall values.
Figure 5.5 plots the ordered rainfalls for the 1,006 days when precipitation was
recorded against their rank orders, a version of a quantile plot. We can see just how
extreme precipitation can be; the highest rainfall of 132.6 mm on June 25, 1975, is
said to have flooded 20,000 basements.
120
100
Ordered daily rainfall (mm)
80
60
40
20
0
Rank of rainfall
Fig. 5.5 The empirical quantile function for daily rainfall at Regina in the month of June over 34
years.
We set up the break points for a cubic B-spline basis to be the rainfalls at 11
equally spaced ranks, beginning at the first and ending at N. In this code variable
RegPrec contains the 212 sorted rainfall amounts between 2 and 45 mm.
Wknots = RegPrec[round(N*seq(1/N,1,len=11),0)]
Wnbasis = length(Wknots) + 2
Wbasis = create.bspline.basis(range(RegPrec),13,4,
76 5 Smoothing: Computing Curves from Noisy Data
Wknots)
Now we estimate the density, applying a light amount of smoothing, and extract
the functional data object Wfd and the normalizing constant C from the list that
density.fd returns.
Wlambda = 1e-1
WfdPar = fdPar(Wbasis, 2, Wlambda)
densityList = density.fd(RegPrec, WfdPar)
Wfd = densityList$Wfdobj
C = densityList$C
These commands set up the density function values over a fine mesh of values.
Zfine = seq(RegPrec[1],RegPrec[N],len=201)
Wfine = eval.fd(Zfine, Wfd)
Pfine = exp(Wfine)/C
The estimated density is shown in Figure 5.6. The multiphase nature of precip-
itation is clear here. The first phase is due to heavy dew or a few drops of rain,
followed by a peak related to light rain from low-pressure ridges that arrive in this
area from time to time, and then thunderstorm rain that can vary from about 7 mm
to catastrophic levels.
0.25
0.20
Probability density
0.15
0.10
0.05
0.00
0 10 20 30 40
Precipitation (mm)
Fig. 5.6 The solid line indicates the probability density function p(z) for rainfall in Regina of 2
mm or greater, but stopping at about 45 mm. The vertical dashed lines indicate the knot values
used to define the cubic B-spline expansion for w = ln p.
5.5 Assessing the Fit to the Data 77
Having smoothed the data, there are many questions to ask, and these direct us to
do some further analyses of the residuals, ri j = yi j − xi (t j ). These analyses can be
functional since there is some reason to suppose that at least part of the variation in
these residuals across t is smooth.
Did we miss some important features in the data by oversmoothing? Perhaps,
for example, there may have been something unusual in one or two curves that we
missed because the GCV criterion selected a level of smoothing that worked best
for all samples simultaneously. Put another way, could there be an indication that
we might have done better to smooth each weather station’s log precipitation data
separately? We will defer looking at this question until the end of the next chapter,
since principal components analysis can be helpful here.
A closely related question concerns whether the variation in the residuals con-
forms to the assumptions implicit in the type of smoothing that we performed. The
use of the unweighted least-squares criterion is only optimal if the residuals for all
time points are normally distributed and if the variance of these residuals is constant
across both years and weather stations (curves).
We now return to the log precipitation data considered in Section 5.3 and create a
365 by 35 matrix of residuals from the fit discussed there. We then use this to create
variance vectors across
• stations, of length 365, dividing by 35 since the residuals need not sum to zero
on any day,
• time, of length 35, dividing by 365-12; the number “12” here is essentially the
equivalent degrees of freedom in the fit (logprec.fit$df).
logprecmat = eval.fd(day.5, logprec.fd)
logprecres = logprecav - logprecmat
# across stations
logprecvar1 = apply(logprecresˆ2, 1, sum)/35
# across time
logprecvar2 = apply(logprecresˆ2, 2, sum)/(365-12)
Let us look at how residual variation changes over stations; Figure 5.7 displays
their standard deviations. With labels on a few well-known stations and recalling
that we number the stations from east to west to north, we see that there tends to be
more variation for prairie and northerly stations in the center of the country, and less
for marine stations. This is interesting but perhaps not dramatic enough to make us
want to pursue the matter further.
Figure 5.8 shows how standard deviations taken over stations and within days
vary. The smooth line in the plot was computed by smoothing the log of the standard
deviations and exponentiating the result by these two commands:
logstddev.fd = smooth.basis(day.5,
log(logprecvar1)/2, fdParobj)$fd
logprecvar1fit = exp(eval.fd(day.5, logstddev.fd))
78 5 Smoothing: Computing Curves from Noisy Data
0.25
Winnipeg
Regina
Yellowknife
Standard deviation across day
Edmonton Resolute
0.20
Halifax Churchill
Vancouver Iqaluit
Montreal
Pr. George
0.15
St. Johns
Pr. Rupert
0.10
0 5 10 15 20 25 30 35
Station Number
Fig. 5.7 Standard deviations of the residuals from the smooth of the log precipitation taken across
days and within stations.
We could also have used smooth.pos to do the job. We see now that there is a
seasonal variation in the size of the residuals, with more variation in summer months
than in winter. Nevertheless, this form of variation is not strong enough to justify
returning to do a weighted least-squares analysis using smooth.basis; we would
need much larger variations in the variability for it to create a substantive difference
between weighted and unweighted solutions.
Also implicit in our smoothing technology is the assumption that residuals are
uncorrelated. This is a rather unlikely situation; departures from smooth variation
tend also to be smooth, implying a strong positive autocorrelation between neigh-
boring residuals. If observation times are equally spaced, we can use standard time
series techniques to explore this autocorrelation structure.
We give here the arguments of the constructor function fdPar that constructs an
object of the functional parameter fdPar class. The complete calling sequence is
fdPar(fdobj=NULL, Lfdobj=NULL, lambda=0,
estimate=TRUE, penmat=NULL)
5.6 Details for the fdPar Class and smooth.basis Function 79
0.30
Standard deviation across stations
0.25
0.20
0.15
Day
Fig. 5.8 Standard deviations of the residuals from the smooth of the log precipitation taken across
stations and within days. The solid line is an exponentiated smooth of the log of the variances.
e. Plot the velocity versus acceleration curves for the fit using a Fourier basis
and using the B-spline basis with a harmonic acceleration penalty. Are they
substantially different? Do they provide evidence of subcycles?
There is a large literature on smoothing methods, and Ramsay and Silverman (2005)
devote a number of chapters to the problem. Recent book-length references are Eu-
bank (1999), Rupert et al. (2003), and Simonoff (1996). Moreover, there are smooth-
ing methods that do not define x explicitly in terms of basis functions that may serve
as well, such as local polynomial smoothing. However, the well-known method
of kernel smoothing, made all too available in software packages, should now be
viewed as obsolete because its poor performance near the end points of the interval
(Fan and Gijbels, 1996).
Chapter 6
Descriptions of Functional Data
This chapter and the next are the exploratory data analysis end of functional data
analysis. Here we recast the concepts of mean, standard deviation, covariance and
correlation into functional terms and provide R and Matlab functions for computing
and viewing them.
Exploratory tools are often the most fruitful when applied to residual variation
around some model, where we often see surprising effects once we have removed
relatively predictable structures from the data. Summary descriptions of residual
variation are also essential for estimating confidence regions.
Contrasts are often used in analysis of variance to explore prespecified patterns
of variation. We introduce the more general concept of a functional probe as a means
of looking for specific patterns or shapes of variation in functional data and of pro-
viding methods for estimating confidence limits for estimated probe values.
The phase-plane plot has turned out to be a powerful tool for exploring data
for harmonic variation, even in data on processes such as human growth where we
do not ordinarily think of cyclic variation as of much interest. It is essentially a
graphical analogue of a second order linear differential equation. In fact, the phase-
plane plot, developed in detail in this chapter, is a precursor to the dynamic equations
that we will explore in Chapter 11.
These are computed, for the log-precipitation data considered in Section 5.3, as
follows:
J.O. Ramsay et al., Functional Data Analysis with R and MATLAB, Use R, 83
DOI: 10.1007/978-0-387-98185-7_6,
© Springer Science + Business Media, LLC 2009
84 6 Descriptions of Functional Data
meanlogprec = mean(logprec.fd)
stddevlogprec = std.fd(logprec.fd)
As always in statistics, choices of descriptive measures like the mean and variance
should never be automatic or uncritical. The distribution of precipitation is strongly
skewed, and by logging these data, we effectively work with the geometric mean of
precipitation as a more appropriate measure of location in the presence of substantial
skewness.
Beyond this specific application, the functional standard deviation focuses on
the intrinsic variability between observations, e.g., Canadian weather stations, after
removing variations that are believed to represent measurement and replication error
not attributable to the variability between observations. A proper interpretation of
the analyses of this section require an understanding of exactly what we mean by
std.fd and what is discarded in smoothing.
variance
0.15
(log10 p
0.10
recip)
0.05
300 300
Da )
y ( 200 200 30
Ju ne
ly 1 Ju
to 1 to
Ju 100 100
uly
ne J
30 y(
) Da
0 0
Fig. 6.1 The estimated variance-covariance surface v(s,t) for the log precipitation data.
0.14
0.18
200
0.16
100
0.04
0.08 0.1
0.04 0.04
0
Fig. 6.2 A contour plot of the bivariate variance-covariance surface for the log precipitation data.
6.3 Functional Probes ρξ 87
Purely descriptive methods such as displaying mean and variance functions allow
us to survey functional variation without having to bring any preconceptions about
exactly what kind of variation might be important. This is fine as far as it goes, but
functions and their derivatives are potentially complex structures with a huge scope
for surprises, and we may need to “zoom in” on certain curve features.
Moreover, our experience suggests that a researcher seldom approaches func-
tional data without some fairly developed sense of what will be seen. We would be
surprised if we did not see the pubertal growth spurt in growth curves or sinusoidal
variation in temperature profiles. When we have such a structure in mind, we typi-
cally need to do two things: check the data to be sure that what we expect is really
there, and then do something clever to look around and beyond what we expect in
order to view the unexpected. Chapter 7 is mainly about looking for the dominant
modes of variation and covariation, but the tools that we develop there can also be
used to highlight interesting but more subtle features.
A probe ρξ is a tool for highlighting specific variation. Probes are variably
weighted linear combinations of function values. Let ξ be a weight function that
we apply to a function x as follows:
Z
ρξ (x) = ξ (t)x(t) dt. (6.2)
88 6 Descriptions of Functional Data
The two concepts of energy and of functional data having variation on more than one
timescale lead to the graphical technique of plotting one derivative against another,
something that we will call phase-plane plotting. We saw an example in Figure 1.15
for displaying the dynamics in human growth.
We now return to the US nondurable goods manufacturing index, plotted in Fig-
ures 1.3 and 1.4, to illustrate these ideas. A closer look at a comparatively stable
period, 1964 to 1967 shown in Figure 6.3, suggests that the index varies fairly
smoothly and regularly within each year. The solid line is a smooth of these data
using the roughness penalty method described in Chapter 5. We now see that the
variation within this year is more complex than Figure 1.4 can possibly reveal. This
curve oscillates three times during the year, with the size of the oscillation being
smallest in spring, larger in the summer, and largest in the autumn. In fact, each
year shows smooth variation with a similar amount of detail, and we now consider
how we can explore these within-year patterns.
Now that we have derivatives at our disposal, we can learn new things by studying
how derivatives relate to each other. Our tool is the plot of acceleration against ve-
locity. To see how this might be useful, consider the phase-plane plot of the function
sin(2π t), shown in Figure 6.4. This simple function describes a basic harmonic pro-
6.4 Phase-Plane Plots of Periodic Effects 89
1.7
1.68
1.66
1.64
10
1.62
cess, such as the vertical position of the end of a suspended spring bouncing with a
period of one time unit.
Springs and pendulums oscillate because energy is exchanged between two
states: potential and kinetic. At times π , 3π , . . . the spring is at one or the other
end of its trajectory, and the restorative force due to its stretching has brought it
to a standstill. At that point, its potential energy is maximized, and so is the force,
which is acting either upward (positively) or downward. Since force is proportional
to acceleration, the second derivative of the spring position, −(2π )2 sin(2π t), is also
at its highest absolute value, in this case about ±40. On the other hand, when the
spring is passing through the position 0, its velocity, 2π cos(2π t), is at its greatest,
about ±8, but its acceleration is zero. Since kinetic energy is proportional to the
square of velocity, this is the point of highest kinetic energy. The phase-plane plot
shows this energy exchange nicely, with potential energy being maximized at the
extremes of Y and kinetic energy at the extremes of X.
The amount of energy in the system is related to the width and height of the
ellipse in Figure 6.4; the larger it is, the more energy the system exhibits, whether
in potential or kinetic form.
90 6 Descriptions of Functional Data
50
30
20
Acceleration
10
Max. Max.
0 kinetic kinetic
energy energy
−10
−20
−30
−40
Max. potential energy
−50
−10 −8 −6 −4 −2 0 2 4 6 8 10
Velocity
Fig. 6.4 A phase-plane plot of the simple harmonic function sin(2π t). Kinetic energy is maximized
when acceleration is 0, and potential energy is maximized when velocity is 0.
Harmonic processes and energy exchange are found in many situations besides me-
chanics. In economics, potential energy corresponds to resources including capital,
human resources, and raw material that are available to bring about some economic
activity. This energy exchange can be evaluated for nondurable goods manufactur-
ing as displayed in Figure 6.3. Kinetic energy corresponds to the manufacturing
process in full swing, when these resources are moving along the assembly line and
the goods are being shipped out the factory door.
We use the phase-plane plot, therefore, to study the energy transfer within the
economic system. We can examine the cycle within individual years, and also see
more clearly how the structure of the transfer has changed throughout the 20th cen-
tury. Figure 6.5 presents a phase-plane plot for 1964, a year in a relatively stable
period for the index. To read the plot, find “jan” in the middle right of the plot and
move around the diagram clockwise, noting the letters indicating the months as you
go. You will see that there are two large cycles surrounding zero, plus some small
cycles that are much closer to the origin.
The largest cycle begins in mid-May (M), with positive velocity and near zero
acceleration. Production is increasing linearly or steadily at this point. The cycle
moves clockwise through June (“Jun”) and passes the horizontal zero acceleration
line at the end of the month, when production is now decreasing linearly. By mid-
July (“Jly”) kinetic energy or velocity is near zero because vacation season is in full
swing. But potential energy or acceleration is high, and production returns to the
6.4 Phase-Plane Plots of Periodic Effects 91
10 Jly
5
Acceleration
mar
S jan
M
0
N A
Oct
F
Aug
−5 Jun
−10
positive kinetic/zero potential phase in early August (“Aug”), and finally concludes
with a cusp at summer’s end (S). At this point the process looks like it has run out
of both potential and kinetic energy.
The cusp, near where both derivatives are zero, corresponds to the start of school
in September and the beginning of the next big production cycle passing through
the autumn months of October through November. Again this large cycle terminates
in a small cycle with little potential and kinetic energy. This takes up the months of
February and March (F and mar). The tiny subcycle during April and May seems
to be due to the spring holidays, since the summer and fall cycles, as well as the
cusp, do not change much over the next two years, but the spring cycle cusp moves
around, reflecting the variability in the timings of Easter and Passover.
To summarize, the production year in the 1960s has two large cycles swinging
widely around zero, each terminating in a small cusplike cycle. This suggests that
each large cycle is like a balloon that runs out of air, the first at the beginning of
school and the second at the end of winter. At the end of each cycle, it may be that
new resources must be marshaled before the next production cycle can begin.
Here are the commands in Matlab used to produce Figure 1.15. They use a func-
tional data object hgtfmonfd that contains the 54 curves for the Berkeley girls
92 6 Descriptions of Functional Data
much information there is in the data used to estimate these functions. See Figure
6.6 below for an example.
More generally, confidence regions are often required for the values of linear
probes ρξ defined in (6.2), of which x(t) and Dm x(t) are specific examples.
In order to study the sampling behavior of ρξ , we need to compute two linear map-
pings plus their composite. They are given names and described as follows:
1. Mapping y2cMap, which converts the raw data vector y to the coefficient vector
c of the basis function expansion of x. If y and c have lengths n and K, respec-
tively, this mapping is a K by n matrix y2cMap such that
c = y2cMap y
ρξ (x) = Lc = c2rMap c.
which converts a data vector y directly into the probe value; this is a 1 by n row
vector.
How is L = c2rMap actually calculated? In general, the computation includes
the use of the all-important inner product function inprod to compute the integral
(6.2). This function is working away behind the scenes in almost every functional
data analysis. It evaluates the integral of the product of two functions (or the matrices
defined by products
R of sets of functions), such as that defining the roughness penalty
matrix R = Lφ Lφ 0 defined in Subsection 5.2.2. Where possible, this function uses
an analytic expression for these integral values. However, more often than not, this
computation requires numerical approximation.
The four important arguments to function inprod are as follows:
fdobj1 Either a functional data object or a functional basis object.
fdobj2 Also either a functional data object or a functional basis object. It is the
integral of the products of these two objects that is computed. If either of these
first two arguments are a basis object, it is converted to a functional data object
with an identity matrix as its coefficient matrix.
94 6 Descriptions of Functional Data
y = Zc + e,
which is proportional to the density for the von Mises distribution of data on a
circle; the concentration parameter value 20 weights substantially about two months,
and the location value 197 centers the weighting on approximately January 15 (see
(Fisher et al., 1987) for more details.) The following code sets up the functional
data object for ξ and then carries out the two integrations required for the two sets
of 35 probe values produced by integrating the product of ξ with each of the basis
functions in each of the two systems.
dayvec = seq(0,365,len=101)
xivec = exp(20*cos(2*pi*(dayvec-197)/365))
xibasis = create.bspline.basis(c(0,365),13)
xifd = smooth.basis(dayvec, xivec, xibasis)$fd
tempLmat = inprod(tempbasis, xifd)
precLmat = inprod(precbasis, xifd)
The random behavior of the estimator of whatever we choose to estimate is ul-
timately tied to the random behavior of the data vector y. Let us indicate the order
n variance-covariance matrix of y as Var(y) = Σ e . Recall that we are operating in
this chapter with the model
6.5 Confidence Intervals for Curves and Their Derivatives 95
y = x(t) + ε ,
where x(t) here means the n-vector of values of x at the n argument values t j . In this
model x(t) is regarded as fixed, and as a consequence Σ e = Var(ε ).
We compute confidence limits in this book by a rather classic method: The covari-
ance matrix Σξ of ξ = Ay is
Σξ = AΣy A0 . (6.3)
If the residuals from a smooth of the data have a variance-covariance matrix Σe ,
then we see from ĉ = y2cMap y that the coefficients will have a variance-covariance
matrix
Σc = y2cMap Σe y2cMap0
We use the conditional variance of the residuals in this equation because we are
only interested in the uncertainty in our estimate of c that comes from unexplained
variation in y after we have explained what we can with our smoothing process. This
in turn estimates the random variability in our estimate of the smooth.
We apply (6.3) a second time to get the variance-covariance matrix Σξ for a
functional probe by
We can now plot the smooth of the precipitation data for Prince Rupert, British
Columbia, Canada’s rainiest weather station. The log precipitation data are stored
in 365 by 35 matrix logprecav, and Prince Rupert is the 29th weather station in
our database. We first smooth the data:
lambda = 1e6;
fdParobj = fdPar(daybasis, harmaccelLfd, lambda)
logprecList= smooth.basis(day.5, logprecav, fdParobj)
logprec.fd = logprecList$fd
fdnames = list("Day (July 1 to June 30)",
"Weather Station" = CanadianWeather$place,
"Log 10 Precipitation (mm)")
logprec.fd$fdnames = fdnames
Next we estimate Σe , which we assume is diagonal. Consequently, we need only
estimate the variance of the residuals across weather stations for each day. We do
96 6 Descriptions of Functional Data
this by smoothing the log of the mean square residuals and then exponentiating the
result:
logprecmat = eval.fd(day.5, logprec.fd)
logprecres = logprecav - logprecmat
logprecvar = apply(logprecresˆ2, 1, sum)/(35-1)
lambda = 1e8
resfdParobj = fdPar(daybasis, harmaccelLfd, lambda)
logvar.fit = smooth.basis(day.5, log(logprecvar),
resfdParobj)
logvar.fd = logvar.fit$fd
varvec = exp(eval.fd(daytime, logvar.fd))
SigmaE = diag(as.vector(varvec))
Next we get y2cMap from the output of smooth.basis, and compute c2rMap
by evaluating the smoothing basis at the sampling points. We then compute the
variance-covariance matrix for curve values, and finish by plotting the log precip-
itation curve for Prince Rupert along with this curve plus and minus two standard
errors. The result is Figure 6.6.
y2cMap = logprecList$y2cMap
c2rMap = eval.basis(day.5, daybasis)
Sigmayhat = c2rMap %*% y2cMap %*% SigmaE %*%
t(y2cMap) %*% t(c2rMap)
logprec.stderr = sqrt(diag(Sigmayhat))
logprec29 = eval.fd(day.5, logprec.fd[29])
plot(logprec.fd[29], lwd=2, ylim=c(0.2, 1.3))
lines(day.5, logprec29 + 2*logprec.stderr,
lty=2, lwd=2)
lines(day.5, logprec29 - 2*logprec.stderr,
lty=2, lwd=2)
points(day.5, logprecav[,29]))
1. The 35 Canadian weather stations are divided into four climate zones. These are
given in the character vector CanadianWeather$region that is available
in the fda package. After computing and plotting the variance-covariance func-
tional data object for the temperature data, compare this with the same analysis
applied only to the stations within each region to see if the variability varies be-
tween regions. In Chapter 10 we will examine how the mean temperature curves
changes from one region to another.
2. What does the covariance bivariate functional data object look like describing the
covariation between temperature and log precipitation?
6.6 Some Things to Try 97
1.2
1.0
Log 10 Precipitation (mm)
0.8
0.6
0.4
0.2
Fig. 6.6 The solid curve is the smoothed base 10 logarithm of the precipitation at Prince Rupert,
British Columbia. The dashed lines indicate 95% pointwise confidence limits for the smooth curve
based on the data shown as circles.
Now we look at how observations vary from one replication or sampled value to the
next. There is, of course, also variation within observations, but we focused on that
type of variation when considering data smoothing in Chapter 5.
Principal components analysis, or PCA, is often the first method that we turn to
after descriptive statistics and plots. We want to see what primary modes of varia-
tion are in the data, and how many of them seem to be substantial. As in multivariate
statistics, eigenvalues of the bivariate variance-covariance function v(s,t) are indi-
cators of the importance of these principal components, and plotting eigenvalues is
a method for determining how many principal components are required to produce
a reasonable summary of the data.
In functional PCA, there is an eigenfunction associated with each eigenvalue,
rather than an eigenvector. These eigenfunctions describe major variational compo-
nents. Applying a rotation to them often results in a more interpretable picture of
the dominant modes of variation in the functional data, without changing the total
amount of common variation.
We take some time over PCA partly because this may be the most common func-
tional data analysis and because the tasks that we face in PCA and our approaches to
them will also be found in more model-oriented tools such as functional regression
analysis. For example, we will see that each eigenfunction can be constrained to
be smooth by the use of roughness penalties, just as in the data smoothing process.
Should we use rough functions to capture every last bit of interesting variation in the
data and then force the eigenfunctions to be smooth, or should we carefully smooth
the data first before doing PCA?
A companion problem is the analysis of the covariation between two different
functional variables based on samples taken from the same set of cases or individu-
als. For example, what types of variation over weather stations do temperature and
log precipitation share? How do knee and hip angles covary over the gait cycle?
Canonical correlation analysis (CCA) is the method of choice here. We will see
many similarities between PCA and CCA.
J.O. Ramsay et al., Functional Data Analysis with R and MATLAB, Use R, 99
DOI: 10.1007/978-0-387-98185-7_7,
© Springer Science + Business Media, LLC 2009
100 7 Exploring Variation: Functional Principal and Canonical Components Analysis
v(s,t)
r(s,t) = p .
[v(s, s)v(t,t)]
Principal components analysis may be defined in many ways, but its motivation
is perhaps clearer if we define PCA as the search for a probe ξ , of the kind that we
defined in Chapter 6, that reveals the most important type of variation in the data.
That is, we ask, “For what weight function ξ would the probe scores
Z
ρξ (xi ) = ξ (t)xi (t)dt
have the largest possible variation?” In order for the question to make sense, we
haveRto impose a size restriction on ξ , and it is mathematically natural to require
that ξ 2 (t)dt = 1.
Of course, the mean curve by definition is a mode of variation that tends to be
shared by most curves, and we already know how to estimate this. Consequently, we
usually remove the mean first and then probe the functional residuals xi − x̄. Later,
when we look at various types of functional regression, we may also want to first
remove other known sources of variation that are explainable by multivariate and/or
functional covariates. R
The probe score variance Var[ ξ (t)(xi (t) − x̄(t))2 dt] associated with a probe
weight ξ is the value of
Z
µ = max{∑ ρξ2 (xi )} subject to ξ 2 (t)dt = 1. (7.1)
ξ i
7.1 An Overview of Functional PCA 101
In standard terminology, µ and ξ are referred to as the largest eigenvalue and eigen-
function, respectively, of the estimated variance-covariance function v. An alterna-
tive to the slightly intimidating term “eigenfunction” is harmonic.
As in multivariate PCA, a nonincreasing sequence of eigenvalues µ1 ≥ µ2 ≥
. . . µk can be constructed stepwise by requiring each new eigenfunction, computed
in step `, to be orthogonal to those computed on previous steps,
Z Z
ξ j (t)ξ` (t)dt = 0, j = 1, . . . , ` − 1 and ξ`2 (t)dt = 1. (7.2)
We see here as well as elsewhere that going from multivariate to functional data
analysis is often only a matter of replacing summation over integer indices by inte-
gration over continuous indices such as t. Although the computation details are not
at all the same, this is thankfully hidden by the notation and dealt with in the fda
package.
However, there is an important difference between multivariate and functional
PCA caused by the fact that, whereas in multivariate data the number of variables p
is usually less than the number of observations N, for functional data the number of
observed function values n is usually greater than N. This implies that the maximum
number of nonzero eigenvalues in the functional context is min{N − 1, K, n}, and in
most applications will be N − 1.
Suppose, then, that our software can present us with, say, N − 1 positive eigen-
value/eigenfunction pairs (µ j , ξ j ). What do we do next? For each choice of `,
1 ≤ ` ≤ N − 1, the ` leading eigenfunctions or harmonics define a basis system
that can be used to approximate the sample functions xi . These Rbasis functions are
orthogonal to each other and are normalized in the sense that ξ`2 = 1. They are
therefore referred to as an orthonormal basis. They are also the most efficient basis
possible of size ` in the sense that the total error sum of squares
N Z
PCASSE = ∑ [xi (t) − x̄(t) − c0i ξ (t)]2 dt (7.4)
i
is the minimum achievable with only ` basis functions. Of course, other `-dimensional
systems certainly exist that will do as well, and we will consider some shortly, but
none will do better. In the physical sciences, these optimal basis functions ξ j are
often referred to as empirical orthogonal functions.
It turns out that there is a simple relationship between the optimal total squared
error and the eigenvalues that are discarded, namely that
102 7 Exploring Variation: Functional Principal and Canonical Components Analysis
N−1
PCASSE = ∑ µ j.
j=`+1
As we will show below, they can be quite helpful in interpreting the nature of the
variation identified by the PCA. It is also common practice to treat these scores as
“data” to be subjected to a more conventional multivariate analysis.
We suggested that the eigenfunction basis was optimal but not unique. In fact, for
any nonsingular square matrix L of order `, the system φ = Tξ is also optimal and
spans exactly the same functional subspace as that spanned by the eigenfunctions.
Moreover, if T0 = T−1 , such matrices being often referred to as rotation matrices,
the new system φ is also orthonormal. There is, in short, no mystical significance to
the eigenfunctions that PCA generates, a simple fact that is often overlooked in text-
books on multivariate statistics. Well, okay, perhaps ` = 1 is an exception. In fact,
it tends to happen that only the leading eigenfunction has an obvious meaningful
interpretation in terms of processes known to generate the data.
But for ` > 1, there is nothing to prevent us from searching among the infinite
number of alternative systems φ = Tξ to find one where all of the orthonormal basis
functions φ j are seen to have some substantive interpretation. In the social sciences,
where this practice is routine, a number of criteria for optimizing the chances of
interpretability have been devised for choosing a rotation matrix T, and we will
demonstrate the usefulness of the popular VARIMAX criterion in our examples.
Readers are referred at this point to standard texts on multivariate data analysis
or to the more specialized treatment in Jolliffe (2002) for further information on
principal components analysis. Most of the material in these sources applies to this
functional context.
centerfns = TRUE)
The first argument is a functional data object containing the functional data to be
analyzed, and the second specifies the number ` of principal components to be re-
tained. The third argument is a functional parameter object that provides the in-
formation necessary to smooth the eigenfunctions if necessary; we will postpone
this topic to Section 7.3. Finally, although most principal components analyses are
applied to data with the mean function subtracted from each function, the final ar-
gument permits this to be suppressed.
Function pca.fd in R returns an object with the class name pca.fd, so that it
is effectively a constructor function. Here are the named components for this class:
harmonics A functional data object for the ` harmonics or eigenfunctions ξ j .
values The complete set of eigenvalues µ j .
scores The matrix of scores ci j on the principal components or harmonics.
varprop A vector giving the proportion µ j / ∑ µ j of variance explained by each
eigenfunction.
meanfd A functional data object giving the mean function x̄.
Here is the command to do a PCA using only two principal components for the log
precipitation data and to display the eigenvalues.
logprec.pcalist = pca.fd(logprecfd, 2)
print(logprec.pcalist$values)
We observe that these two harmonics account for 96% of the variation around the
mean log precipitation curve; the first four eigenvalues are 39.5, 3.9, 1.0 and 0.4,
respectively.
The two principal components are plotted by the command
plot.pca.fd(logprec.pcalist)
Figure 7.1 shows the two principal component functions by displaying the mean
curve along +’s and -’s indicating the consequences of adding and subtracting a
small amount of each principal component. We do this because a principal com-
ponent represents variation around the mean, and therefore is naturally plotted as
such. We see that the first harmonic, accounting for 88% of the variation, represents
a relative constant vertical shift in the mean, and that the second shows essentially a
contrast between winter and summer precipitation levels.
It is in fact usual for unrotated functional principal components to display the
same sequence of variation no matter what is being analyzed. The first will be a
constant shift, the second a linear contrast between the first and second half with a
single crossing of zero, the third a quadratic pattern, and so on. That is, we tend to
see the sequence of orthogonal polynomials. However, for periodic data, where only
periodic harmonics are possible, the linear contrast is suppressed.
104 7 Exploring Variation: Functional Principal and Canonical Components Analysis
0.30
+++++++ ++++++++ ++++++ ++++
−−−−−−−++++−−−−−−−−−−−−−++++++ +++−−−
−−+
−−−− −−− +++ ++−−−
−−− ++++ ++−−
++−−−
Harmonic 1
−−− ++++++++++++++++++
0.20
−−− +++ +
+−
−−−−− ++
−−−−−−−−−−−−− +++++ ++−−
++ −
−−−
−−− ++++++ ++ −−
+++−−−
0.10
−−− ++++ +++
+
−−− + +++ −
−−− +++++++++++++++++++ −−−
−−− −
−−−− −−−
−−−−−−−−−−−−−−−−−−−−−−
++++++++++++++++++++++++ +++
+++ +++
+++ ++
−−−−−−−−−−−−++++ ++ −−
0.25
−−−−− − + −
Harmonic 2
+++ −−− + −
+++ −−− ++ −−
+++ −−− ++ −−
+++ −−−− +++ −−−
++++ −−−− ++++−−−−−
0.05
Fig. 7.1 The two principal component functions or harmonics are shown as perturbations of the
mean, which is the solid line. The +’s show what happens when a small amount of a principal
component is added to the mean, and the -’s show the effect of subtracting the component.
The fact that unrotated functional principal components are so predictable em-
phasizes the need for looking for a rotation of them that can reveal more meaningful
components of variation. The VARIMAX rotation algorithm is often used for this
purpose. The following command applies this rotation and then plots the result:
logprec.rotpcalist = varmx.pca.fd(logprec.pcalist)
plot.pca.fd(logprec.rotpcalist)
The results are plotted in Figure 7.2. The first component portrays variation that is
strongest in midwinter and the second captures primarily summer variation.
It can be profitable to plot the principal component scores for pairs of harmon-
ics to see how curves cluster and otherwise distribute themselves within the K-
dimensional subspace spanned by the eigenfunctions. Figure 7.3 reveals some fas-
cinating structure. Most of the stations are contained within two clusters: the upper
right with the Atlantic and central Canada stations and the lower left with the prairie
and mid-Arctic stations. The outliers are the three west coast stations and Resolute
in the high Arctic. Often, functional data analyses will turn into a multivariate data
analysis at this point by using the component scores as “data matrices” in more
conventional analyses.
It may be revealing to apply PCA to some order of derivative rather than to
the curves themselves, because underlying processes may reveal their effects at the
change level rather than at the level of what we measure. This is certainly true of
growth curve data, where hormonal processes and other growth activators change
7.2 PCA with Function pca.fd 105
Rotated PC I (76%)
0.30
−−− +++
++−− +++ −
+++−
+−−−−−
+++++
−− −−−−++ +−
+−++
−−++++ +−
+−
−− +++++++++++ +−
−
−− + +
−
++ +
+−
0.15
−−−−−−−−
−−− ++++ +−
+−
−− ++ ++−
−− +++
−− ++++++++++++++−−
−− −
−−− −−−
−−−−−−−−−−−
Rotated PC II (20%)
+++++++++++++++ ++
++
0.30
++
+ ++
−−−− −−−−−−−−−+++ +
+ −−−−−
−−−−− −−−+ +
++−−−
−−− −−−−−−
+++++++ + −
++− + −
0.15
+−
+−
+− + −−
+−
+−
+− +++ −−
+−
+−
+− +
++ −−
+−
+−
+− +++++ −−−−
−−−−−−−
++++++ −−−−
0.00
Fig. 7.2 The two rotated principal component functions are shown as perturbations of the mean,
which is the solid line. The top panel contains the strongest component, with variation primarily in
the midwinter. The bottom panel shows primarily summer variation.
Pr. Rupert
4
Quebec
Halifax
Montreal
2
Thunder Bay
Toronto
Rotated Harmonic II
Winnipeg
Calgary Edmonton
0
Pr. George
Regina Iqaluit
Vancouver
−2
Uranium Cty
Dawson
−4
Whitehorse
Kamloops Victoria
−6
Resolute
−15 −10 −5 0 5 10 15
Rotated Harmonic I
Fig. 7.3 The scores for the two rotated principal component functions are shown as circles. Se-
lected stations are labeled in order to identify the two central clusters and the outlying stations.
106 7 Exploring Variation: Functional Principal and Canonical Components Analysis
the rate of change of height and can be especially evident at the level of the acceler-
ation curves that we plotted in Section 1.1.
We can now return to exploring the residuals from the smooths of the log precipita-
tion curves in Chapter 5. First, we set up function versions of the residuals and plot
them:
logprecres.fd = smooth.basis(day.5, logprecres,
fdParobj)$fd
plot(logprecres.fd, lwd=2, col=1, lty=1, cex=1.2,
xlim=c(0,365), ylim=c(-0.07, 0.07),
xlab="Day", ylab="Residual (log 10 mm)")
These are shown in Figure 7.4. There we see that, while most of these residual
functions show fairly chaotic variation, three stations have large oscillations in sum-
mer and autumn. The result of estimating a single principal component is shown
in Figure 7.5, where we see the mean residual along with the effect of adding and
subtracting this first component. The mean residual itself shows the oscillation that
we have noted. The principal component accounts for about 49% of the residual
variance about this mean. It defines variation around the mean oscillation located
in these months. Three stations have much larger scores on this component: They
are Kamloops, Victoria and Vancouver, all in southern British Columbia. It seems
that rainfall events come in cycles in this part of Canada at this time of the year, and
there is interesting structure to be uncovered in these residuals.
In multivariate PCA, we control the level of fit to the data by selecting the number
of principal components. In functional PCA, we can also modulate fit by controlling
the roughness of the estimated eigenfunctions. We do this by modifying the defini-
tion of orthogonality. If, for example, we want to penalize excessive curvature in
principal components, we can use this generalized form of orthogonality:
Z Z
ξ j (t)ξk (t)dt + λ D2 ξ j (t)D2 ξk (t)dt = 0, (7.6)
0.06
0.04
0.02
Residual (log 10 mm)
0.00
−0.02
−0.04
−0.06
Day
Fig. 7.4 The smoothed residual functions for the log precipitation data.
+ ++
+
+ ++ +
+
0.004
+ +
−+ + −
−−
− + −
− +−
−− − −
− −+ ++
+
0.002
−
−+
− − + +− −
−
− +− −
+ +
+ −+ + ++− −
− +− −+ ++
− −− + + − +−+
−
− −
−−−
− −− − − −+ −−
−− +
−
− − + +− + +−
+
+
−+− − + +− + + +
0.000
+ − + +− +++ − −
+ +
Harmonic 1
+ −−
− − +
− − − + + +
− −−−
−− −
− + + + + − −
+ + − −
− −
−+
− +++ −
+ +− −++++
++
−+
++
++
++ + − +− − +−
+ + +− − −− + +++++
+ +
− +− − − +
−0.006 −0.004 −0.002
+
− + + +
+
− +−+ −
−−
−
+ + −− +−
−
−+
++− +
− − − −
−−−
+− −+
−−
−
+ +
++
++
Fig. 7.5 The first principal component for the log precipitation residual functions, shown by adding
(+) and subtracting (-) the component from the mean function (solid line).
108 7 Exploring Variation: Functional Principal and Canonical Components Analysis
ously in terms of their amplitude, and second in terms of their complexity or amount
of high-frequency variation. This second feature is closely related to how rapidly a
Fourier series expansion of a function converges, and is therefore simply another
aspect of how PCA itself works. This second type of size of principal components
is what λ controls. Ramsay and Silverman (2005) show how λ in PCA can be data-
defined via cross-validation.
neig = 12
x = matrix(1,neig-nharm,2)
x[,2] = (nharm+1):neig
y = log10(fdaeig[(nharm+1):neig])
c = lsfit(x,y,int=FALSE)$coef
par(mfrow=c(1,1),cex=1.2)
plot(1:neig, log10(fdaeig[1:neig]), "b",
xlab="Eigenvalue Number",
ylab="Log10 Eigenvalue")
lines(1:neig, c[1]+ c[2]*(1:neig), lty=2)
The result is Figure 7.6. The first three log eigenvalues seem well above the linear
trend in the next nine, suggesting that the leading three harmonics are important.
Together they account for 62% of the variation in the scripts.
−2.6
−2.8
−3.0
Log10 Eigenvalue
−3.2
−3.4
−3.6
−3.8
2 4 6 8 10 12
Eigenvalue Number
Fig. 7.6 The logarithms (base 10) of the first 12 eigenvalues in the principal components analysis
of the “fda” handwriting data. The dashed line indicates the linear trend in the last nine in the
sequence.
0.04
0.02
0.00
−0.02
−0.04
Fig. 7.7 Two of the rotated harmonics are plotted as a perturbations of the mean “fda” script,
shown as a heavy solid line.
of subject characteristics such as age, ethnicity, etc. See Ramsay and Silverman
(2005) for further details.
We often want to examine the ways in which two sets of curves (xi , yi ), i = 1, . . . , N,
share variation. How much variation, for example, is shared between temperature
and log precipitation over the 35 Canadian weather stations? This question is re-
lated to the issue of how well one can predict one from another, which we will take
up in the next chapter. Here, we consider a symmetric view on the matter that does
not privilege either variable. We offer here only a quick summary of the mathemat-
ical aspects of canonical correlation analysis, and refer the reader to Ramsay and
Silverman (2005) for a more detailed account.
To keep the notation tidy, we will assume that the two sets of variables have been
centered, that is, xi and yi have been replaced by the residuals xi − x̄ and yi − ȳ,
respectively, if this was considered appropriate. That is, we assume that x̄ = ȳ = 0.
As before, we define modes of variation for the xi ’s and the yi ’s in terms of the pair
of probe weight functions ξ and η that define the integrals
Z Z
ρξ i = ξ (t)xi (t)dt and ρη i = η (t)yi (t)dt, (7.7)
7.5 Exploring Functional Covariation with Canonical Correlation Analysis 111
respectively. The N pairs of probe scores (ρξ i , ρη i ) defined in this way represent
shared variation if they correlate strongly with one another.
The canonical correlation criterion is the squared correlation
R R
2 [∑i ρξ i ρη i ]2 [∑i ( ξ (t)xi (t)dt)( η (t)yi (t)dt)]2
R (ξ , η ) = = R R . (7.8)
[∑i ρξ2i ][∑i ρη2 i ] [∑i ( ξ (t)xi (t)dt)2 ][∑i ( η (t)yi (t)dt)2 ]
As in PCA, the probe weights ξ and η are then specified by finding that weight
pair that optimizes the criterion R2 (ξ , η ). But, again as in PCA, we can compute a
nonincreasing series of squared canonical correlations R21 , R22 , . . . , R2k by constraining
successive canonical probe values to be orthogonal. The length k of the sequence is
the smallest of the sample size N, the number of basis functions for either functional
variable, or the number of basis functions used for ξ and η .
That we are now optimizing with respect to two probes at the same time makes
canonical correlation analysis an exceedingly greedy procedure, where this term
borrowed from data mining implies that CCA can capitalize on the tiniest variation
in either set of functions in maximizing this ratio to the extent that, unless we ex-
ert some control over the process, it can be hard to see anything of interest in the
result. It is in practice essential to enforce strong smoothness on the two weight
functions ξ and η to limit this greediness. This can be done by either selecting a
low-dimensional basis for each or by using an explicit roughness penalty in much
the same manner as is possible for functional PCA.
Let us see how this plays out in the exploration of covariation between daily
temperature and log precipitation, being careful to avoid the greediness pitfall by
placing very heavy penalties on roughness of the canonical weight functions as
measured by the size of their second derivatives. Here are the commands in R that
function cca.fd to do the job:
ccafdPar = fdPar(daybasis, 2, 5e6)
ncon = 3
ccalist = cca.fd(temp.fd, logprec.fd, ncon,
ccafdPar, ccafdPar)
The third argument of cca.fd specifies the number of canonical weight/variable
pairs that we want to examine, which, in this case, is the complete sequence. The
final two arguments specify the bases for the expansion of ξ and η , respectively, as
well as their roughness penalties.
The canonical weight functional data objects and the corresponding three squared
canonical correlations are extracted from the list object ccalist produced by
function cca.fd as follows:
ccawt.temp = ccalist$ccawtfd1
ccawt.logprec = ccalist$ccawtfd2
corrs = ccalist$ccacorr
The squared correlations are 0.92, 0.62 and 0.35; so that there is a dominant pair
of modes of variation that correlates at a high level, and then two subsequent pairs
with modest but perhaps interesting correlations.
112 7 Exploring Variation: Functional Principal and Canonical Components Analysis
Consider first the type of variation associated with the first canonical correlation.
Figure 7.8 displays the corresponding two canonical weight functions. The tempera-
ture canonical weight function ξ1 resembles a sinusoid with period 365/2 and having
zeros in July, October, January and April. But the log precipitation counterpart η1
is close to a sinusoid with period 365 and zeros in July and January.
0.05
Canonical Weight Function
0.00
−0.05
Temp.
−0.10
Log Prec.
Fig. 7.8 The first pair of canonical weight functions or probes (ξ , η ) correlating temperature and
log precipitation for the Canadian weather data.
St. John’s, on the other hand, are actually relatively warm in the winter and get
more precipitation in the fall than in the winter, and therefore anchor the lower left
of the plot. Note, though, that the linear order in Figure7.9 misses Kamloops by a
noticeable amount. The position of this interior British Columbia city deep in a val-
ley, where relatively little rain or snow falls at any time of the year, causes it to be
anomalous in many types of analysis.
Resolute
*
6
Log Precipitation Canonical Weight
* Whitehorse
*
4
** Churchill * *
Dawson
* *Iqaluit
Kamloops
* Calgary
2
* ***Winnipeg
*
*
Arvida *
*
Thunder Bay
0
*
Toronto *
* London
* *Montreal
−2
* *Fredericton
* Quebec
* *Victoria*
* * ***St. Johns
−4
* Pr. Rupert
−40 −20 0 20 40 60 80
Fig. 7.9 The scores for the first pair of canonical variables plotted against each other, with labels
for selected weather stations.
We give here the arguments of the constructor function pca.fd that carries out a
functional principal components analysis and constructs an object of the pca.fd
class. The complete calling sequence is
pca.fd(fdobj, nharm = 2, harmfdPar=fdPar(fdobj),
centerfns = TRUE)
The arguments are as follows:
fdobj A functional data object.
nharm The number of harmonics or principal components to compute.
114 7 Exploring Variation: Functional Principal and Canonical Components Analysis
1. Medfly Data: The medfly data have been a popular dataset for functional data
analysis and are included in the fda package. The medfly data consist of records
7.8 More to Read 115
of the number of eggs laid by 50 fruit flies on each of 31 days, along with each
individual’s total lifespan.
a. Smooth the data for the number of eggs, choosing the smoothing parameter
by generalized cross-validation (GCV). Plot the smooths.
b. Conduct a principal components analysis using these smooths. Are the com-
ponents interpretable? How many do you need to retain to recover 90% of the
variation. If you believe that smoothing the PCA will help, do so.
c. Try a linear regression of lifespan on the principal component scores from
your analysis. What is the R2 for this model? Does lm find that the model is
significant? Reconstruct and plot the coefficient function for this model along
with confidence intervals. How does it compare to the model obtained through
functional linear regression?
2. Apply principal components analysis to the functional data object Wfd re-
turned by the monotone smoothing function smooth.monotone applied to
the growth data. These functions are the logs of the first derivatives of the growth
curves. What is the impact of the variation in the age of the purbertal growth
spurt on these components?
This chapter presents two methods for separating phase variation from amplitude
variation in functional data: landmark and continuous registration. We mentioned
this problem in Section 1.1.1. We saw in the height acceleration curves in Figure
1.2 that the age of the pubertal growth spurt varies from girl to girl; this is phase
variation. In addition, the intensity of the pubertal growth spurt also varies; this
is amplitude variation. Landmark registration aligns features that are visible in all
curves by estimating a strictly increasing nonlinear transformation of time that takes
all the times of a given feature into a common value. Continuous registration uses the
entire curve rather than specified features and can provide a more complete curve
alignment. The chapter also describes a decomposition technique that permits the
expression of the amount of phase variation in a sample of functional variation as a
proportion of total variation.
Figure 1.2 presented the problem that curve registration is designed to solve. This
figure is reproduced in the top panel of Figure 8.1 along with a solution in the
bottom panel. In both panels, the dashed line indicates the mean of these ten growth
acceleration curves. In the top panel, this mean curve is unlike any of the individual
curves in that the duration of the mean pubertal growth is longer than it should be
and the drop in acceleration is not nearly as steep as even the shallowest of the
individual curves. These aberrations are due to the ten girls not being in the same
phase of growth at around 10 to 12 years of age. We see from the figure that peak
growth acceleration occurs around age 10.5 for many girls, but this occurred before
age 8 for one girl and after age 13 for another. Similarly, the maximum pubertal
growth rate occurs where the acceleration drops to zero following the maximum
pubertal acceleration. This occurs before age 10 for two girls and around age 14 for
another, averaging around 11.7 years of age. If we average the growth accelerations
at that age, one girl has not yet begun her pubertal growth spurt, three others are
J.O. Ramsay et al., Functional Data Analysis with R and MATLAB, Use R, 117
DOI: 10.1007/978-0-387-98185-7_8,
© Springer Science + Business Media, LLC 2009
118 8 Registration: Aligning Features for Samples of Curves
at or just past their peak acceleration, and the rest are beyond their peak pubertal
growth rate with negative acceleration. This analysis should make it fairly easy to
understand why the average of these acceleration curves displays an image that is
very different from any of the individual curves.
Acceleration (cm/yr/yr)
−3 −2 −1 0 1
5 10 15
Acceleration (cm/yr/yr)
−3 −2 −1 0 1
5 10 15
Age (Years)
Fig. 8.1 The top panel reproduces the second derivatives of the growth curves shown in Figure
1.2. The landmark–registered curves corresponding to these are shown in the bottom panel, where
the single landmark was the crossing of zero in the middle of the mean pubertal growth spurt. The
dashed line in each panel indicates the mean curve for the curves in that panel.
The bottom panel in Figure 8.1 uses landmark registration to align these curves
so the post–spurt accelerations for all girls cross zero at the same time. Then when
we average the curves, we get a much more realistic representation of the typical
pubertal growth spurt, at least among the girls in this study.
Functions can vary in both phase and amplitude, as illustrated schematically in
Figure 8.2. Phase variation is illustrated in the top panel as a variation in the loca-
tion of curve features along the horizontal axis, as opposed to amplitude variation,
shown in the bottom panel as the size of these curves. The mean curve in the top
panel, shown as a dashed curve, does not resemble any curve; it has less amplitude
variation, but its horizontal extent is greater than that of any single curve. The mean
has, effectively, borrowed from amplitude to accommodate phase. Moreover, if we
carry out a functional principal components analysis of the curves in each panel, we
find in the top panel that the first three principal components account for 55%, 39%
and 5%, of the variation. On the other hand, the same analysis of the amplitude-
varying curves requires a single principal component to account for 100% of the
variation. Like the mean and principal components, most statistical methods when
8.2 Time-Warping Functions and Registration 119
translated into the functional domain are designed to model purely amplitude varia-
tion.
−0.5 0.0 0.5
−4 −2 0 2 4
1.0
0.0
−1.0
−4 −2 0 2 4
Fig. 8.2 The top panel shows five curves varying only in phase. The bottom panel shows five
curves varying only in amplitude. The dashed line in each panel indicates the mean of the five
curves. This curve in the bottom panel is superimposed exactly on the central curve.
There is physiological growth time that unrolls at different rates from child to
child relative to clock time. In terms of growth time, all girls experience puberty at
the same age, with the peak growth rate (zero acceleration) occurring at about 11.7
years of age for the Berkeley sample. If we want a reasonable sense of amplitude
variation, we must consider it with this growth time frame of reference. Growth time
itself is an elastic medium that can vary randomly from girl to girl when viewed
relative to clock time, and functional variation has the potential to be bivariate, with
variation in both the range and domain of a function.
We can remove phase variation from the growth data if we can estimate a time-
warping function hi (t) that transforms growth time t to clock time for child i. For
example, we can require that hi (11.7) = ti for all girls, where 11.7 years is the
average time at which the Berkeley girls reached their midpubertal spurt (PGS) and
ti is the clock age at which the ith girl reached this event. If, at any time t, hi (t) <
t, we may say that the girl is growing faster than average at that clock time but
slower than average if hi (t) > t. This is illustrated in Figure 8.3, where the growth
120 8 Registration: Aligning Features for Samples of Curves
acceleration curves for the earliest and latest of the first ten girls are shown in the
left panels and their corresponding time-warping functions in the right panels.
1
15
0
10
o
−1
−2
5
−3
5 10 15 5 10 15
1
15
o
0
10
−1
−2
5
−3
5 10 15 5 10 15
Fig. 8.3 The top panels show the growth acceleration curve on the left and the corresponding time-
warping function h(t) on the right for the girl among the first ten in the Berkeley growth study with
the earliest pubertal growth spurt. The corresponding plots for the girl with the latest growth spurt
are in the bottom two panels. The middle of the growth spurt is shown as the vertical dashed line
in all panels.
effectively slows down or stretches out her clock time so as to conform with growth
time.
wbasisLM = create.bspline.basis(c(1,18), 4, 3,
c(1,PGSctrmean,18))
WfdLM = fd(matrix(0,4,1),wbasisLM)
WfdParLM = fdPar(WfdLM,1,1e-12)
The landmark registration using function landmarkreg along with the extrac-
tion of the registered acceleration functions, warping function and w-functions is
achieved by the commands
regListLM = landmarkreg(accelfdUN, PGSctr,
PGSctrmean, WfdParLM, TRUE)
accelfdLM = regListLM$regfd
accelmeanfdLM = mean(accelfdLM)
warpfdLM = regList$warpfd
WfdLM = regList$Wfd
The final logical argument value TRUE requires the warping functions hi to them-
selves be strictly monotone functions.
The bottom panel of Figure 8.1 displays the same ten female growth acceleration
curves after registering to the middle of the pubertal growth spurt. We see that the
curves are now exactly aligned at the mean PGS (pubertal growth spurt) age, but that
there is still some misalignment for the maximum and minimum acceleration ages.
Our eye is now drawn to the curve for girl seven, whose acceleration minimum is
substantially later than the others and who has still not reached zero acceleration by
age 18. The long period of near zero acceleration for girl four prior to puberty also
stands out as unusual. The mean curve is now much more satisfactory as a summary
of the typical shape of growth acceleration curves, and in particular is nicely placed
in the middle of the curves for the entire pubertal growth spurt period.
We may need registration methods that use the entire curves rather than their values
at specified points. A number of such methods have been developed, and the prob-
lem continues to be actively researched. Landmark registration is usually a good first
step, but we need a more refined registration process if landmarks are not visible in
all curves. For example, many but not all female growth acceleration curves have at
least one peak prior to the pubertal growth spurt that might be considered a land-
mark. Even when landmarks are clear, identifying their timing may involve tedious
interactive graphical procedures, and we might prefer a fully automatic method. Fi-
nally, as we saw in Figure 8.1, landmark registration using just a few landmarks can
still leave aspects of the curves unregistered at other locations.
Here we illustrate the use of function register.fd to further improve the ac-
celeration curves that have already been registered using function landmarkreg.
The idea behind this method is that if an arbitrary sample registered curve x[h(t)]
and target curve x0 (t) differ only in terms of amplitude variation, then their values
8.4 Continuous Registration with Function register.fd 123
will tend to be proportional to one another across the range of t-values. That is, if we
were to plot the values of the registered curve against the target curve, we would see
something approaching a straight line tending to pass through the origin, although
not necessarily at angle 45 degrees with respect to the axes of the plot. If this is
true, then a principal components analysis of the following order two matrix T(h)
of integrated products of these values should reveal essentially one component, and
the smallest eigenvalue should be near 0:
·R R ¸
{x (t)}2 dt x (t)x[h(t)] dt
C(h) = R 0 R 0 . (8.2)
x0 (t)x[h(t)] dt {x[h(t)]}2 dt
1
0
Acceleration (cm/yr/yr)
−1
−2
−3
5 10 15
Age (Years)
Fig. 8.4 The continuous registration of the landmark–registered height acceleration curves in Fig-
ure 8.1. The vertical dashed line indicates the target landmark age used in the landmark registration.
1
0
Height Acceleration
−1
−2
−3
5 10 15
Years
Fig. 8.5 The mean of the continuously registered acceleration curves is shown as a heavy solid
line, while that of the landmark-registered curves is a light solid line. The light dashed line is the
mean of the unregistered curves
8.5 A Decomposition into Amplitude and Phase Sums of Squares 125
Kneip and Ramsay (2008) developed a useful way of quantifying the amount of
these two types of variation by comparing results for a sample of N functional ob-
servations before and after registration. The notation xi stands for the unregistered
version of the ith observation, yi for its registered counterpart and hi for associated
warping function. The sample means of the unregistered and registered samples are
x̄ and ȳ, respectively.
The total mean square error is defined as
N Z
MSEtotal = N −1 ∑ [xi (t) − x̄(t)]2 dt. (8.3)
i
It can be shown that, defined in this way, MSEtotal = MSEamp + MSE phase .
The interpretation of this decomposition is as follows. If we have registered our
functions well, then the registered functions yi will have higher and sharper peaks
and valleys, since the main effect of mixing phase variation with amplitude varia-
tion is to smear variation over a wider range of t values, as we saw in Figure 1.2 and
Figure 8.2. Consequently, the first term in MSE phase will exceed the second and is a
measure of how much phase variation has been removed from the yi ’s by registra-
tion. On the other hand, MSEamp is now a measure of pure amplitude variation to
the extent that the registration has been successful. The decomposition does depend
on the success of the registration step, however, since it is possible in principle for
MSE phase to be negative.
From this decomposition we can get a useful squared multiple correlation index
of the proportion of the total variation due to phase:
MSE phase
R2 = . (8.6)
MSEtotal
126 8 Registration: Aligning Features for Samples of Curves
The handwriting data discussed in Section 1.2.2 consisted of the writing of “statis-
tics” in simplified Chinese 50 times. The average time of writing was six seconds,
with the X-, Y- and Z-coordinates of the pen position being recorded 400 times per
second. The handwriting involves 50 strokes, corresponding to about eight strokes
per second, or 120 milliseconds per stroke. The processing of these data was done
entirely in Matlab, and is too complex to describe in detail here.
The registration phase was carried out in two steps, as was the case for the growth
data. In the first phase, three clear landmarks were visible in all curves in the vertical
Z-coordinate corresponding to points where the pen was lifted from the paper. These
were used in a preliminary landmark registration process for the Z-coordinate alone.
The decomposition described above indicated that 66.6% of the variation in Z was
due to phase. The warping functions were applied to the X- and Y-coordinates as
well, and the decompositions indicated percentages of phase variation of 0% and
75%, respectively. This suggests that most of the phase variation in movement off
the writing plane was associated with motion that was also vertical in the writing
plane.
In a second registration phase, the scalar tangential accelerations,
8.7 Details for Functions landmarkreg and register.fd 127
q
TAi (t) = D2 Xi (t) + D2Yi (t),
of the tip of the pen along the writing path were registered using continuous registra-
tion. This corresponded to 48% of the variation in the landmark-registered tangential
accelerations being due to phase. Figure 8.6 plots the tangential acceleration for all
50 replications before and after applying this two–stage registration procedure. Af-
ter alignment, we see the remarkably small amount of amplitude variation in many
of the acceleration peaks, and we also see how evenly spaced in time these peaks
are. The pen hits acceleration of 30 meters/sec/sec, or three times the force of grav-
ity. If sustained, this would launch a satellite into orbit in about seven minutes and
put us in a plane’s luggage rack if our seat belts were not fastened. It is also striking
that near zero acceleration is found between these peaks.
Unregistered
30
meters/sec2
20
10
0
0 1 2 3 4 5 6
Registered
30
meters/sec2
20
10
0
0 1 2 3 4 5 6
seconds
Fig. 8.6 The acceleration along the pen trajectory for all 50 replications of the script in Figure 1.9
before and after registration.
WfdPar, monwrd=FALSE)
The arguments are as follows:
fdobj A functional data object containing the curves to be registered.
ximarks A matrix containing the timings or argument values associated with
the landmarks for the observations in fd to be registered. The number of rows
N equals the number of observations and the number of columns NL equals the
number of landmarks. These landmark times must be in the interior of the interval
over which the functions are defined.
x0marks A vector of times of landmarks for target curve. If not supplied, the
mean of the landmark times in ximarks is used.
WfdPar A functional parameter object defining the warping functions that trans-
form time in order to register the curves.
monwrd A logical value: if TRUE, the warping function is estimated using a
monotone smoothing method; otherwise, a regular smoothing method is used,
which is not guaranteed to give strictly monotonic warping functions.
Landmarkreg returns a list with two components:
fdreg A functional data object for the registered curves.
warpfd A functional data object for the warping functions.
It is essential that the location of every landmark be clearly defined in each of the
curves as well as the template function. If this is not the case, consider using the con-
tinuous registration function register.fd. Although requiring that a monotone
smoother be used to estimate the warping functions is safer, it adds considerably to
the computation time since monotone smoothing is itself an iterative process. It is
usually better to try an initial registration with this feature to see if there are any fail-
ures of monotonicity. Moreover, monotonicity failures can usually be cured by in-
creasing the smoothing parameter defining WfdPar. Not much curvature is usually
required in the warping functions, so a low-dimensional basis, whether B-splines or
monomials, is suitable for defining the functional parameter argument WfdPar. A
registration with a few prominent landmarks is often a good preliminary to using
the more sophisticated but more lengthy process in register.fd.
tion. Now repeat this analysis for the registered growth curves, and compare the
results. What about the impact of the pubertal growth spurt now?
2. Try applying continuous registration to the unregistered growth curves. You will
see that a few curves are badly misaligned, indicating that there are limits to how
well continuous registration works. What should we do with these misaligned
curves? Could we try, for example, starting the continous registrations off with
initial estimates of function Wfd set up from the landmark registered results?
3. Using only those girls whose curves are well registered by continuous registra-
tion, now use canonical correlation analysis to explore the covariation between
the Wfd object returned by function register.fd and the Wfd object from
the monotone smooth. Look for interesting ways in which the amplitude variation
in growth is related to its the phase variation.
4. Medfly Data: In Section 7.7, we suggested applying principal components anal-
ysis to the medfly data. Here, we suggest you extend that analysis as follows:
a. Perform a functional linear regression to predict the total lifespan of the fly
from their egg laying. Choose a smoothing parameter by cross-validation, and
plot the coefficient function along with confidence intervals.
b. Conduct a permutation test for the significance of the regression. Calculate
the R2 for your regression.
c. Compare the results of the functional linear regression with the linear regres-
sion on the principal component scores from your analysis in Section 7.7.
The classic paper on the estimation of time warping functions is Sakoe and Chiba
(1978), who used dynamic programming to estimate the warping function in a con-
text where there was no need for the warping function to be smooth.
Landmark registration has been studied in depth by Kneip and Gasser (1992)
and Gasser and Kneip (1995), who refer to a landmark as a structural feature, its
location as a structural point, to the distribution of landmark locations along the
t axis as structural intensity, and to the process of averaging a set of curves after
registration as structural averaging. Their papers contain various technical details
on the asymptotic behavior of landmark estimates and warping functions estimated
from them. Another source of much information on the study of landmarks and their
use in registration is Bookstein (1991).
The literature on continuous registration is evolving rapidly, but is still somewhat
technical. Gervini and Gasser (2004) and Liu and Müller (2004) are recent papers
that review the literature and discuss some theoretical issues.
Chapter 9
Functional Linear Models for Scalar Responses
This is the first of two chapters on the functional linear model. Here we have a
dependent or response variable whose value is to be predicted or approximated on
the basis of a set of independent or covariate variables, and at least one of these is
functional in nature. The focus here is on linear models, or functional analogues of
linear regression analysis. This chapter is confined to considering the prediction of a
scalar response on the basis of one or more functional covariates, as well as possible
scalar covariates.
Confidence intervals are developed for estimated regression functions in order to
permit conclusions about where along the t axis a covariate plays a strong role in
predicting a functional responses. The chapter also offers some permutation tests of
hypotheses.
More broadly, we begin here the study of input/output systems. This and the next
chapter lead in to Chapter 11, where the response is the derivative of the output from
the system.
J.O. Ramsay et al., Functional Data Analysis with R and MATLAB, Use R, 131
DOI: 10.1007/978-0-387-98185-7_9,
© Springer Science + Business Media, LLC 2009
132 9 Functional Linear Models for Scalar Responses
usually included because origin of the response variable and/or one or more of the
independent variables can be arbitrary, and β0 codes the constant needed to allow
for this. It is often called the intercept term.
The term εi allows for sources of variation considered extraneous, such as mea-
surement error, unimportant additional causal factors, sources of nonlinearity and
so forth, all swept under the statistical rug called error. The εi are assumed to add to
the prediction of the response, and are usually considered to be independently and
identically distributed.
In this chapter we replace at least one of the p observed scalar covariate vari-
ables on the right side of the classic equation by a functional covariate. To simplify
the exposition, though, we will describe a model consisting of a single functional
independent variable plus an intercept term.
Our test-bed problem in this section is to predict the logarithm of annual precipita-
tion for 35 Canadian weather stations from their temperature profiles. The response
in this case is, in terms of the fda package in R,
annualprec = log10(apply(daily$precav,2,sum))
We want to use as the predictor variable the complete temperature profile as well as
a constant intercept. These two covariates can be stored in a list of length 2, or in
Matlab as a cell array. Here we set up a functional data object for the 35 temperature
profiles, called tempfd. To keep things simple and the computation rapid, we will
use 65 basis functions without a roughness penalty. This number of basis functions
has been found to be adequate for most purposes, and can, for example, capture the
ripples observed in early spring in many weather stations.
tempbasis =create.fourier.basis(c(0,365),65)
tempSmooth=smooth.basis(day.5,daily$tempav,tempbasis)
tempfd =tempSmooth$fd
But what can we do when the vector of covariate observations xi = (xi1 , . . . , xip ) in
(9.1) is replaced by a function xi (t)? A first idea might be to discretize each of the
N functional covariates xi (t) by choosing a set of times t1 , . . . ,tq and consider fitting
the model
q
yi = α0 + ∑ xi (t j )β j + εi .
j=1
9.4 Three Estimates of the Regression Coefficient Predicting Annual Precipitation 133
But which times t j are important, given that we must have q < N?
If we choose a finer and finer mesh of times, the summation approaches an inte-
gral equation:
Z
yi = α0 + xi (t)β (t)dt + εi . (9.2)
We now have a finite number N of observations with which to determine the infinite-
dimensional β (t). This is an impossible problem: it is almost always possible to find
a β (t) so that (9.2) is satisfied with εi = 0. More importantly, there are always an
infinite number of possible regression coefficient functions β (t) that will produce
exactly the same predictions ŷi . Even if we expand each functional covariate in terms
of a limited number of basis functions, it is entirely possible that the total number
of basis functions will exceed or at least approach N.
Three strategies have been developed to deal with this underdetermination issue.
The first two redefine the problem using a basis coefficient expansion of β :
K
β (t) = ∑ ck φk (t) = c0 φ (t). (9.3)
k
templist = vector("list",2)
templist[[1]] = rep(1,35)
templist[[2]] = tempfd
Notice that the intercept term is here set up as a constant function with 35 repli-
cations.
0.0015
0.0010
0.0005
Beta for temperature
0.0000
−0.0010 −0.0005
Day
Fig. 9.1 Estimated β (t) for predicting log annual precipitation from average daily temperature
using five Fourier basis functions.
We can now compute the squared multiple correlation and the usual F-ratio for
comparing these two fits.
RSQ1 = (SSE0-SSE1.1)/SSE0
Fratio1 = ((SSE0-SSE1)/5)/(SSE1/29)
The squared multiple correlation is 0.80, and the corresponding F-ratio with 5 and
29 degrees of freedom is 22.6, suggesting a fit to the data that is far better than we
would expect by chance.
There are two ways to obtain a smooth fit. The simplest is to use a low-dimensional
basis for β (t). However, we can get more direct control over what we mean by
“smooth” by using a roughness penalty. The combination of a high-dimensional
basis with a roughness penalty reduces the possibilities that either (a) important
features are missed or (b) extraneous features are forced into the image by using a
basis set that is too small for the application.
Suppose, for example, that we fit (9.2) by minimizing the penalized sum of
squares
Z Z
PENSSEλ (α0 , β ) = ∑[yi − α0 − xi (t)β (t)dt]2 + λ [Lβ (t)]2 dt. (9.4)
136 9 Functional Linear Models for Scalar Responses
This allows us to shrink variation in β as close as we wish to the solution of the dif-
ferential equation Lβ = 0. Suppose, for example, that we are working with periodic
data with a known period. As noted with expression (5.11), the use of a harmonic
acceleration operator,
Lβ = (ω 2 )Dβ + D3 β ,
places no penalty on a simple sine wave and increases the penalty on higher-order
harmonics in a Fourier approximation approximately in proportion to the sixth
power of the order of the harmonic. (In this expression, ω is determined by the pe-
riod, which is assumed to be known.) Thus, increasing the penalty λ in (9.4) forces
β to look more and more like β (t) = c1 + c2 sin(ω t) + c3 cos(ω t).
More than one functional covariate can be incorporated into this model and
scalar covariates may also be included. Let us suppose that in addition to yi we
have measured p scalar covariates zi = (zi1 , . . . , zip ) and q functional covariates
xi1 (t), . . . , xiq (t). We can put these into a linear model as follows
q Z
yi = α0 + z0i α + ∑ xi j (t)β j (t)dt + εi , (9.5)
j=1
Fig. 9.2 Observed log annual precipitation values plotted against values predicted by functional
linear regression on temperature curves using a roughness penalty.
How did we come up with λ = 1012.5 for the smoothing parameter in this analysis?
Although smoothing parameters λ j can certainly be chosen subjectively, we can also
consider cross-validation as a way of using the data to define smoothing level. To
(−i) (−i)
define a cross-validation score, we let αλ and βλ be the estimated regression
parameters estimated without the ith observation. The cross-validation score is then
·
N Z ¸2
(−i) (−i)
CV(λ ) = ∑ yi − αλ − xi (t)βλ dt . (9.7)
i=1
9.4 Three Estimates of the Regression Coefficient Predicting Annual Precipitation 139
1e−03
Temperature Reg. Coeff.
5e−04
0e+00
−5e−04
Day
Fig. 9.3 Estimate β (t) for predicting log annual precipitation from average daily temperature with
a harmonic acceleration penalty and smoothing parameter set to 1012.5 . The dashed lines indicate
pointwise 95% confidence limits for values of β (t).
Observing that
¡ ¢−1 0
ŷ = Z Z0 Z + R(λ ) Z y = Hy
standard calculations give us that
N µ ¶2
yi − ŷi
CV(λ ) = ∑ . (9.8)
i=1 1 − Hii
These quantities are returned by fRegress for scalar responses only. This
GCV(λ ) was discussed (in different notation) in Section 5.2.5. For a comparison
of CV and GCV including reference to more literature, see Section 21.3.4, p. 368,
in Ramsay and Silverman (2005).
The following code generates the data plotted in Figure 9.4.
loglam = seq(5,15,0.5)
nlam = length(loglam)
SSE.CV = matrix(0,nlam,1)
for (ilam in 1:nlam) {
lambda = 10ˆloglam[ilam]
140 9 Functional Linear Models for Scalar Responses
6 8 10 12 14
log smoothing parameter lambda
Fig. 9.4 Cross-validation scores CV(λ ) for fitting log annual precipitation by daily temperature
profile, with a penalty on the harmonic acceleration of β (t).
betalisti = betalist
betafdPar2 = betalisti[[2]]
betafdPar2$lambda = lambda
betalisti[[2]] = betafdPar2
fRegi = fRegress.CV(annualprec, templist,
betalisti)
SSE.CV[ilam] = fRegi$SSE.CV
}
Once we have conducted a functional linear regression, we want to measure the pre-
cision to which we have estimated each of the β̂ j (t). This can be done in the same
manner as confidence intervals for probes in smoothing. Under the usual indepen-
dence assumption, the εi are independently normally distributed around zero with
variance σe2 . The covariance of ε is then
Σ = σe2 I.
A third alternative for functional linear regression with a scalar response is to regress
y on the principal component scores for functional covariate. The use of principal
components analysis in multiple linear regression is a standard technique:
1. Perform a principal components analysis on the covariate matrix X and derive the
principal components scores fi j for each observation i on each principal compo-
nent j.
2. Regress the response yi on the principal component scores ci j .
142 9 Functional Linear Models for Scalar Responses
We often observe that we need only the first few principal component scores, thereby
considerably improving the stability of the estimate by increasing the degrees of
freedom for error.
In functional linear regression, we consider the scores resulting from a functional
principal components analysis of the temperature curves conducted in Chapter 7. We
can write
xi (t) = x̄(t) + ∑ ci j ξ j (t).
j>=0
yi = β0 + ∑ ci j β j + εi . (9.10)
R
Now we recall that ci j = ξ j (t)(xi (t) − x̄(t))dt. If we substitute this in (9.10), we
can see that Z
yi = β0 + ∑ β j ξ j (t)(xi j (t) − x̄(t))dt + εi .
This gives us
β (t) = ∑ β j ξ j (t).
Thus, (9.10) expresses exactly the same relationship as (9.2) when we absorb the
mean function into the intercept:
Z
β̃0 = β0 − β (t)x̄(t)dt.
The following code carries out this idea for the annual cycles in daily tempera-
tures at 35 Canadian weather stations. First we resmooth the data using a saturated
basis with a roughness penalty. This represents rather more smoothing than in the
earlier version of tempfd that did not use a roughness penalty.
daybasis365=create.fourier.basis(c(0, 365), 365)
lambda =1e6
tempfdPar =fdPar(daybasis365, harmaccelLfd, lambda)
tempfd =smooth.basis(day.5, daily$tempav,
tempfdPar)$fd
Next we perform the principal components analysis, again using a roughness
penalty.
lambda = 1e0
tempfdPar = fdPar(daybasis365, harmaccelLfd, lambda)
temppca = pca.fd(tempfd, 4, tempfdPar)
harmonics = temppca$harmonics
Approximate pointwise standard errors can now be constructed out of the covari-
ance matrix of the β j :
9.5 Statistical Tests 143
ξ1 (t)
.. .
var[β̂ (t)] = [ξ1 (t) . . . ξk (t)]Var [β ] .
ξk (t)
Since the coefficients are orthogonal, the covariance of the β j is diagonal and can be
extracted from the standard errors reported by lm. When smoothed principal com-
ponents are used, however, this orthogonality no longer holds and the full covariance
must be used.
The final step is to do the linear model using principal component scores and to
construct the corresponding functional data objects for the regression functions.
pcamodel = lm(annualprec˜temppca$scores)
pcacoefs = summary(pcamodel)$coef
betafd = pcacoefs[2,1]*harmonics[1] +
pcacoefs[3,1]*harmonics[2] +
pcacoefs[4,1]*harmonics[3]
coefvar = pcacoefs[,2]ˆ2
betavar = coefvar[2]*harmonics[1]ˆ2 +
coefvar[3]*harmonics[2]ˆ2 +
coefvar[4]*harmonics[3]ˆ2
The quantities resulting from the code below are plotted in Figure 9.5. In this
case the R-squared statistic is similar to the previous analysis at 0.72.
plot(betafd, xlab="Day", ylab="Regression Coef.",
ylim=c(-6e-4,1.2e-03), lwd=2)
lines(betafd+2*sqrt(betavar), lty=2, lwd=1)
lines(betafd-2*sqrt(betavar), lty=2, lwd=1)
Functional linear regression by functional principal components has been studied
extensively. Yao et al. (2005) observes that instead of presmoothing the data, we can
estimate the covariance surface directly by a two-dimensional smooth and use this
to derive the fPCA. From here the principal component scores can be calculated by
fitting the principal component functions to the data by least squares. This can be
advantageous when some curves are sparsely observed.
1e−03
Temperature Reg. Coeff.
5e−04
0e+00
−5e−04
Day
Fig. 9.5 Estimate β (t) for predicting log annual precipitation from average daily temperature us-
ing scores from the first three functional principal components of temperature. The dashed lines
indicate pointwise 95% confidence limits for values of β (t).
Var[ŷ]
F= 1
n ∑(yi − ŷi )2
where ŷ is the vector of predicted responses. This statistic varies from the classic F
statistic in the manner in which it normalizes the numerator and denominator sums
of squares. The statistic is calculated several hundred times using a different random
permutation each time. The p value for the test can then be calculated by counting
the proportion of permutation F values that are larger than the F statistic for the
observed pairing.
The following code implements this procedure for the Canadian weather exam-
ple:
Functional linear regression for scalar responses has a large associated literature.
Models based on functional principal components analysis are found in Cardot et al.
(1999), Cardot et al. (2003a) and Yao et al. (2005). Tests for no effect are developed
in Cardot et al. (2004), Cardot et al. (2003b) and Delsol et al. (2008). More recent
work by James et al. (2009) has focused on using absolute value penalties to insist
that β (t) be zero or exactly linear over large regions.
Escabias et al. (2004) and James (2002) look at the larger problem of how to
adapt the generalized linear model to the presence of a functional predictor vari-
able. Müller and Stadtmüller (2005) also investigate what they call the generalized
functional linear model. James and Hastie (2001) consider linear discriminant anal-
ysis where at least one of the independent variables used for prediction is a function
and where the curves are irregularly sampled.
Chapter 10
Linear Models for Functional Responses
In this second chapter on the functional linear model, the dependent or response
variable is functional. We first consider a situation in which all of the independent
variables are scalar and in particular look at two functional analyses of variance.
When one or more of the independent variables is also function, we have two
possible classes of linear models. The simpler case is called concurrent, where
the value of the response variable y(t) is predicted solely by the values of one or
more functional covariates at the same time t. The more general case where func-
tional variables contribute to the prediction for all possible time values s is briefly
reviewed.
While we often find functional covariates associated with scalar responses, there are
also cases where the interest lies in the prediction of a functional response. We begin
this chapter with two examples of functional analysis of variance (fANOVA), where
variation in a functional response is decomposed into functional effects through the
use of a scalar design matrix Z. That is, in both of these examples, the covariates
are all scalar.
In the Canadian weather data, for example, we can divide the weather stations into
four distinct groups: Atlantic, Pacific, Prairie and Arctic. It may be interesting to
know the effect of geographic location on the shape of the temperature curves. That
is, we have a model of the form
J.O. Ramsay et al., Functional Data Analysis with R and MATLAB, Use R, 147
DOI: 10.1007/978-0-387-98185-7_10,
© Springer Science + Business Media, LLC 2009
148 10 Linear Models for Functional Responses
4
yi (t) = β0 (t) + ∑ xi j β j (t) + εi (t) (10.1)
j=1
where yi (t) is a functional response. In this case, the values of xi j are either 0 or 1. If
the 35 by 5 matrix Z contains these values, then the first column has all entries equal
to 1, which codes the contribution of the Canadian mean temperature; the remaining
four columns contain 1 if that weather station is in the corresponding climate zone
and 0 otherwise. In order to identify the specific effects of the four climate zones,
we have to add the constraint
4
∑ β j (t) = 0 for all t. (10.2)
j=1
There are a number of methods of imposing this constraint. In this example we will
do this by adding the above equation as an additional 36th “observation” for which
y36 (t) = 0.
We first create a list containing five indicator variables for the intercept term and
each of the regions. In this setup, the intercept term is effectively the Canadian mean
temperature curve, and each of the remaining regression coefficients is the pertur-
bation of the Canadian mean required to fit a region’s mean temperature. These
indicator variables are stored in the List object regionList.
regions = unique(CanadianWeather$region)
p = length(regions) + 1
regionList = vector("list", p)
regionList[[1]] = c(rep(1,35),0)
for (j in 2:p) {
xj = CanadianWeather$region == regions[j-1]
regionList[[j]] = c(xj,1)
}
The next step is to augment the temperature functional data object by a 36th obser-
vation that takes only zero values as required by (10.2).
coef = tempfd$coef
coef36 = cbind(coef,matrix(0,65,1))
temp36fd = fd(coef36,tempbasis,tempfd$fdnames)
We now create functional parameter objects for each of the coefficient functions,
using 11 Fourier basis functions for each.
betabasis = create.fourier.basis(c(0, 365), 11)
betafdPar = fdPar(betabasis)
betaList = vector("list",p)
for (j in 1:p) betaList[[j]] = betafdPar
Now call fRegress, extract the coefficients and plot them, along with the pre-
dicted curves for the regions.
10.1 Functional Responses and an Analysis of Variance Model 149
2
6
0
5
−4 −2
−5
4
−15
Day (July 1 to June 30) Day (July 1 to June 30) Day (July 1 to June 30)
10
−18 −14 −10
10
−10
6
−30
2
Day (July 1 to June 30) Day (July 1 to June 30) Day (July 1 to June 30)
Fig. 10.1 The regression coefficients estimated for predicting temperature from climate region.
The first panel is the intercept coefficient, corresponding to the Canadian mean temperature. The
last panel contains the predicted mean temperatures for the four regions.
these marine surveys have been conducted by the Kodiak National Wildlife Refuge
using a standard protocol revisiting a fixed set of transects in each bay. The bird
counts analyzed here are from two bays, Uyak and Uganik, both influenced by the
waters of the Shelikof Strait. We focus on results from 1991 to 2005, less 1998 when
the boat was in dry dock. We want to investigate potential differences in time trends
of bird species that primarily eat fish compared to those that primarily eat shellfish
and mollusks.
Figure 10.2 shows the base 10 logarithms of counts of the 13 species averaged
over transects and sites, and separated according to diet. It is obvious that there
are substantial differences in abundances over species that are consistent across the
years of observation, and there is more variation in abundance among the fish-eating
birds. The two mean functions suggest a slight tendency for the fish-eating birds to
be increasing in abundance relative to what we see for shellfish and mollusk eaters,
although this may be due to the sharply increasing trend for one fish-eating species.
We will use fRegress to see how substantial this difference is.
Shellfish Diet
1.5
2
2 2 2 2 2 1
2 6
1 2 2 2 2 1
6
3 1
2 2 1 1 1
2
2 6
1
3 3
1 3 2
6 1 6
1 1
6 3 3 2 3 1
3 3 3
1 3 3 3 3 2 2
0.5
3 6 6
1 6 1 1
3 6 6 3 3
1 3
1 5 6 6 6
6 5 5 4 4 6 4 4 6
5 4 5 5 5 4
5 4
5 5 5 4
5 6 5
4 6
5
5 4 4 5
4 4 5 5 4
5
4 4 4 4
4
−0.5
−1.5
Fish Diet
2 2 2 2 2
1.5
2 2 2 2
2 2 2 2 2 2 2
2 2 4 2 4
4 4 4 4 4 4
4 4 4 4 4 4
4
3 4 4 3 4 6 3 3 3 1 1 4
3 3 3 3 3 1 1 3 3
0.5
3 3 6 3 6 6 6 1 6 1
3 6 6 6 6 1 3 6 6
5 6 6 1 6 3 6 7 3
5 5
7
6 7 5 7 7 5 5
6 6
7 5 7 7 5 7 5 5 5
7 5 5 5 5
7 5 7 7
7
1 1 5
7 7
−0.5
1 7 5
7 7
1 1 1 1 5
1
1
−1.5
1 1
Fig. 10.2 The base 10 logarithms of seabird counts on Kodiak Island averaged over transects and
sites. The top panel shows counts for birds who eat shellfish and mollusks, and the bottom shows
counts for fish-eating birds. In each panel, the mean count taken across birds is shown as a heavy
solid line.
10.1 Functional Responses and an Analysis of Variance Model 151
We elected to fit the count data exactly using a polygonal basis, since we were
less interested in estimating smooth trends for each species than we were in esti-
mating the functional diet effect. By interpolating the data in this way, we were sure
to retain all the information in the original data. The following two commands set
up the polygonal basis and fit the data in the 19 by 26 matrix logCounts2, and
yearCode = c(1:12, 14:20) (because no data were collected for 1986, year
13). In this matrix, the first 13 columns are for the Uganik site and the remaining 13
for the Uyak site, and each row corresponds to a year of observation.
birdbasis = create.polygonal.basis(yearCode)
birdList = smooth.basis(yearCode,logCounts2,birdbasis)
birdfd = birdList$fd
We analyze the time trend in the log mean counts as affected by the type of diet,
with bird species nested within the diet factor. The model that we consider is
Zmat0 = matrix(0,26,15)
Zmat0[,1] = 1
Zmat0[,2] = foodvarbl
Zmat0[,3:15] = birdvarbl
However, defining an effect for each bird in this way would make Z singular
since the sum of these effects (columns 3:15) in each row is 1, and so is the intercept
value. To correct for this, we need to force the sum of the bird effects within each diet
group to add to zero. This requires two steps: we add two rows to the bottom of Z
coding the sum of the bird effects for each diet group, and we add two corresponding
functional observations to the 26 log count curves whose values are identically zero
for all t.
Zmat = rbind(Zmat0, matrix(0,2,15))
fishindex = (1:13)[-foodindex]
Zmat[27,foodindex+2] = 1
Zmat[28,fishindex+2] = 1
birdextfd = birdfd
birdextfd$coef =
cbind(birdextfd$coefs, matrix(0,19,2))
Now we set up the arguments for fRegress. In these commands we insert each
column in turn in matrix Z into the corresponding position in a list object xfdlist.
xfdlist = vector("list",15)
names(xfdlist) = c("const", "diet", birds)
for (j in 1:15) xfdlist[[j]] = Zmat[,j]
Now we define the corresponding list object betalist. We only want constant
functions for the bird regression coefficient functions effects since there only the
mean counts at the two sites available to estimate any bird’s effect. However, for
both the intercept and the food effect, we use a B-spline basis with knots at observa-
tion times. We determined the level of smoothing to be applied to the intercept and
food regression functions by minimizing the cross-validated error sum of squares,
as described in the next section, and the result was λ = 10.
betalist = xfdlist
foodbasis = create.bspline.basis(rng,21,4,yearCode)
foodfdPar = fdPar(foodbasis, 2, 10)
betalist[[1]] = foodfdPar
betalist[[2]] = foodfdPar
conbasis = create.constant.basis(rng)
for (j in 3:15) betalist[[j]] = fdPar(conbasis)
Next we fit the model using fRegress, which involves first defining lists for
both the covariates (in this case all scalar) and a list of low-dimensional regression
functions.
birdRegress = fRegress(birdextfd, xfdlist, betalist)
betaestlist = birdRegress$betaestlist
10.1 Functional Responses and an Analysis of Variance Model 153
Figure 10.3 displays the regression functions for the intercept and food effects,
along with 95% pointwise confidence intervals estimated by the methods described
in Section 10.2.2. The trend in the intercept in the top panel models the mean trend
over all species, and indicates a steady increase up to 1999 followed by some de-
cline. The difference in the mean trends of the two food groups is shown in the
bottom panel, and suggests a steady decline in the shellfish and mollusk eaters rel-
ative to the fish eaters starting in 1988. This is what we noticed in the log mean
counts Figure 10.2.
Intercept
0.6
Reg. Coeff.
0.3
0.0
Food Effect
0.1 0.3
Reg. Coeff.
−0.2
Fig. 10.3 The top panel shows the mean trend across all species, and the bottom panel shows
the difference between being a shellfish/mollusk eater and a fish eater. The dashed lines indicate
pointwise 95% confidence limits for these effects.
As for scalar response models, we would like to have a criterion for choosing any
smoothing parameters that we use. Unfortunately, while ordinary cross-validation
154 10 Linear Models for Functional Responses
can be calculated for scalar response models without repeatedly re-estimating the
model, this can no longer be done efficiently with functional response models.
Here, we use the function fRegress.CV to compute the cross-validated inte-
grated squared error:
N Z ³ ´2
(−i)
CVISE(λ ) = ∑ yi (t) − ŷi (t) dt
i=1
where y(−i) (t) is the predicted value for yi (t) when it is omitted from the estimation.
In the following code, we search over a range of values for λ applied to both the
intercept and the food effect:
loglam = seq(-2,4,0.25)
SSE.CV1 = rep(NA,length(loglam))
betalisti = betaestlist
for (i in 1:length(loglam){
for(j in 1:2)
betalisti[[j]]$lambda = 10ˆloglam[i]
CVi = fRegress.CV(birdRegress, xfdlist,
betalisti)
SSE.CV1[i] = CVi$SSE.CV
}
This√produces Figure 10.4, which indicates a unique minimum with λ approxi-
mately 10, although the discontinuities in the plot suggest that the cross-validated
error sum of squares can be rather sensitive to non-smooth variation in the response
functions as we defined them.
−2 −1 0 1 2 3 4
Fig. 10.4 The cross-validated integrated square errors for the bird data over a range of logarithms
of λ .
in terms of K j basis functions θk j . In order to express (10.5) and (10.7) in matrix no-
tation referring explicitly to these expansions, we need to construct some composite
or supermatrices.
Defining Kβ = ∑qj K j , we first construct vector b of length Kβ by stacking the
vectors vertically, that is,
b = (b01 , b02 , . . . , b0q )0 .
Now assemble q by Kβ matrix function Θ (t) as follows:
θ 1 (t)0 0 ··· 0
0 θ 2 (t)0 · · · 0
Θ (t) = . .. .. . (10.8)
.. . ··· .
0 0 · · · θ q (t) 0
Next let R(λ ) be the block diagonal matrix with jth block as follows:
Z
λj [L j θ j (t)]0 [L j θ j (t)]dt.
+b0 R(λ )b
If we differentiate this with respect to the coefficient vector b and set it to zero, we
get the normal equations penalized least squares solution for the composite coeffi-
cient vector b̂:
Z Z
[ Θ 0 (t)Z0 (t)Z(t)Θ (t) dt + R(λ )]b̂ = [ Θ 0 (t)Z0 (t)y(t) dt] . (10.9)
10.2 Functional Responses with Functional Predictors: The Concurrent Model 157
This is a linear matrix equation defining the scalar coefficients in vector b̂, Ab̂ = d,
where the normal equation matrix is
Z
A= Θ 0 (t)Z0 (t)Z(t)Θ (t) dt + R(λ ) , (10.10)
These equations are all given in terms of integrals of basis functions with func-
tional data objects. In some cases, it is possible to evaluate them explicitly, but we
will otherwise revert to numerical integration. In practice, numerical integration is
both feasible and accurate (with reasonable choices for basis sets, etc.).
Concurrent linear models make up an important subset of all possible linear
functional response models, especially for examining dynamics (see Chapter 11).
However, they can be particularly restrictive; we discuss the general class of linear
functional response models in Section 10.3.
ri j = yi j − Z j (ti )β (ti ).
1 0
Σe∗ = rr . (10.12)
N
Where r is the matrix of residuals.
In the Seabird data, the error variance is calculated from
yhatmat = eval.fd(year, yhatfdobj)
rmatb = logCounts2 - yhatmat
SigmaEb = var(t(rmatb))
With this estimate of Σe∗ , we must consider the smoothing done to take the obser-
vations of the response y onto the space spanned by the response basis functions
φ (t). Let C denote the matrix of regression coefficients for this representation, so
y(t) = Cφ (t). Substituting this into (10.9), we get
158 10 Linear Models for Functional Responses
£Z ¤
b̂ = A−1 Θ (t)0 Z(t)0 Cφ (t)dt
£Z ¤
= A−1 φ (t)0 ⊗ (Θ (t)0 Z(t)0 )dt vec(C)
= c2bMap vec(C) (10.13)
where ⊗ is used to represent the Kronecker product. The explicit use of a basis
expansion for y(t) allows the flexibility of modeling variation in y by itself or of
including the original measurements of each response curve into the variance calcu-
lation.
We now require the matrix y2cMap that is used compute the regression co-
efficient matrix C from the original observations, y. This this can be obtained
from functions like smooth.basis or smooth.basisPar. The map (c2bMap
y2cMap) now maps original observations directly to b̂. Therefore:
£ ¤
Var b̂ = c2bMap y2cMap Σe∗ y2cMap0 c2bMap0 . (10.14)
In the fda package, these intervals are created using fRegress.stderr. This
requires the result of a call to fRegress along with the matrices y2cMap and
Σe∗ . The standard errors for the regression coefficients used to create Figure 10.3 are
computed using the following code.
y2cMap = birdList2$y2cMap
stderrList = fRegress.stderr(birdRegress, y2cMap,
SigmaEb)
betastderrlist = stderrList$betastderrlist
Finally we plot the results using the special purpose plotting function plotbeta.
When the original curves are not the result of smoothing data that have common
observation times over curves, we can at least estimate confidence intervals based
on the variation of the smoothed curves about the model predictions. To do this,
we simply create pseudo data by evaluating the residual functions at a fine grid
of points and calculating variance matrix from this. When we do this, the use of
y2cMap above is no longer valid. Instead we replace it with a projection matrix
that takes the pseudo data to the coefficients C. This is simply [Φ (t)0 Φ (t)]−1 , but
we will not pursue this here.
The gait data displayed in Figure 1.6 are measurements of angle at the hip and knee
of 39 children as they walk through a single gait cycle. The cycle begins at the point
where the child’s heel under the leg being observed strikes the ground. For plotting
simplicity we run time here over the interval [0,20], since there are 20 times at which
the two angles are observed. This analysis is inspired by the question, “How much
control does the hip angle have over the knee angle?”
10.2 Functional Responses with Functional Predictors: The Concurrent Model 159
Figure 10.5 plots the mean knee angle along with its angular velocity and accel-
eration, and Figure 10.6 plots knee angle acceleration against velocity. We can see
three distinct phases in knee angle of roughly equal durations:
Knee angle
80
40
0
0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
Fig. 10.5 Knee angle and its velocity and acceleration over a single gait cycle, which begins when
the heel strikes the ground. The vertical dashed lines separate distinct phases in the cycle.
1. From time 0 to 7.5, the leg is bearing the weight of the child by itself, and the
knee is close to being straight. This corresponds to the small loop in the cycle
plot starting just before the marker “1” and up to the cusp.
2. From time 7.5 to time 14.7, the knee flexes in order to lift the foot off the ground,
reaching a maximum mean angle of about 70 degrees.
3. From time 14.7 to time 20, the knee is extended to receive the load at the next
heel-strike.
Together the second and third phases look like straightforward harmonic motion. A
similar analysis of the hip motion reveals only a single harmonic phase. We wonder
how the hip motion is coupled to knee motion.
Starting with functional data objects kneefd and hipfd for knee and hip angle,
respectively, these commands execute a concurrent functional regression analysis
where knee angle is fit by intercept and hip angle coefficient functions:
xfdlist = list(rep(1,39), hipfd)
betafdPar = fdPar(gaitbasis, harmaccelLfd)
betalist = list(betafdPar,betafdPar)
fRegressList = fRegress(kneefd, xfdlist, betalist)
kneehatfd = fRegressList$yhatfd
160 10 Linear Models for Functional Responses
10 19 20
11
10
5
Knee acceleration
1 12
9
18
6 7
8
5
0
4 2
13
3
−5
17
14
−10
15
16
−20 −10 0 10
Knee velocity
Fig. 10.6 A phase-plane plot of knee angle over a gait cycle. Numbers indicate indices of times of
observation of knee angle.
betaestlist = fRegressList$betaestlist
The intercept and hip regression coefficient functions are plotted as solid lines in
Figure 10.7.
These commands compute the residual variance-covariance matrix estimate,
which we leave as is rather than converting it to a diagonal matrix.
%kneemat = eval.fd(gaittime, kneefd)
kneehatmat = eval.fd(gaittime, kneehatfd)
resmat. = gait - kneehatmat
SigmaE = cov(t(resmat.))
We also set up error sums of square functions for variation about both the model fit
and mean knee angle. Then we compare the two via a squared multiple correlation
function.
kneefinemat = eval.fd(gaitfine, kneefd)
kneemeanvec = eval.fd(gaitfine, kneemeanfd)
kneehatfinemat = eval.fd(gaitfine, kneehatfd)
resmat = kneefinemat - kneehatfinemat
resmat0 = kneefinemat -
kneemeanvec %*% matrix(1,1,ncurve)
SSE0 = apply((resmat0)ˆ2, 1, sum)
SSE1 = apply(resmatˆ2, 1, sum)
Rsqr = (SSE0-SSE1)/SSE0
10.2 Functional Responses with Functional Predictors: The Concurrent Model 161
0 20 40 60 80
0 5 10 15 20
0 5 10 15 20
Fig. 10.7 The top panel shows as a solid line the intercept term in the prediction of knee angle
from hip angle; the dashed line indicates the mean knee angle assuming no hip angle effect. The
bottom panel shows as a solid line the functional regression coefficient multiplying hip angle in
the functional concurrent linear model, and the dashed line shows the squared multiple correlation
coefficient function associated with this model. Vertical dashed lines indicated boundaries between
the three phases of the gait cycle.
The R2 function is included in the second panel of Figure 10.7 as a dashed line. We
see that it tracks pretty closely the variation in the hip regression coefficient.
The commands plot the intercept and hip regression coefficients with 95% con-
fidence intervals, shown in Figure 10.8:
y2cMap = kneefdPar$y2cMap
fRegressList1 = fRegress(kneefd, xfdlist, betalist,
y2cMap, SigmaE)
fRegressList2 = fRegress.stderr(fRegressList1,
y2cMap, SigmaE)
betastderrlist = fRegressList2$betastderrlist
titlelist = list("Intercept", "Hip coefficient")
plotbeta(betaestlist, betastderrlist, gaitfine,
titlelist)
We see that hip angle variation is coupled to knee angle variation in the middle of
each of these three episodes, and the relation is especially strong during the middle
flexing phase. It seems logical that a strongly flexed knee is associated with a sharper
hip angle.
We can repeat these analyses to explore the relationship between knee and hip
acceleration. This can be interesting because neural activation of these two muscle
162 10 Linear Models for Functional Responses
Intercept
0 20 40 60
0 5 10 15 20
Hip coefficient
0.8
0.4
0.0
0 5 10 15 20
Fig. 10.8 The intercept and hip regression coefficient function for the gait cycle with 95% point-
wise confidence intervals.
The concurrent linear model only relates the value of a functional response to the
current value of functional covariate(s). A more general version for a single func-
tional covariate and an intercept is
Z
yi (t) = β0 (t) + β1 (t, s)xi (s)ds + εi (t). (10.15)
Ωt
The bivariate regression coefficient function β1 (s,t) defines the dependence of yi (t)
on covariate xi (s) at each time t. In this case xi (s) need not be defined over the same
range, or even the same continuum, as yi (t).
Set Ωt in (10.15) contains the range of values of argument s over which xi is
considered to influence response yi at time t, and the subscript t on this set indicates
that this set can change from one value of t to another. For example, when both s and
t are time, using xi (s) to predict yi (t) when s > t may imply backwards causation.
10.4 A Functional Linear Model for Swedish Mortality 163
1.5
1.0
0.5
0.0
0 5 10 15 20
Fig. 10.9 The solid line shows the regression function multiplying hip angle acceleration in the
prediction of knee angle acceleration, and the dashed line indicates the corresponding squared
multiple coerrelation function.
To avoid this nonsense, we consider only values of xi before time t. We may also
add a restriction on how far back in time the influence of xi on yi can happen. This
leads us to restrict the integral to
Ωt = {s|t − δ ≤ s ≤ t} ,
where δ > 0 specifies how much history is relevant to the prediction. Malfait and
Ramsay (2003) described this as the historical linear model.
We illustrate the estimation of (10.15) using Swedish life table data taken from
census records in Sweden. The data are the number of deaths at each age for women
born in each year from 1751 to 1914 and for ages 0 to 80. We will use the data up
until 1884 in order to allow the extrapolation problem to be considered (see Section
10.10). Figure 10.10 displays the log hazard rates for four selected years. The log
hazard rate is the natural logarithm of the ratio of the number of females who die at
a specific age to the number of females alive with that age. These data were obtained
from http://mortality.org. See also Chiou and Müller (2009) for another
approach to modeling these data.
164 10 Linear Models for Functional Responses
The hazard rate is greatest for infants and for the very elderly, and in most years
attains its minimum in the early teens. The four curves indicate that the hazard rate
decreases substantially as the health of the population improves over this period.
However, there are localized features in each curve that reflect various historical
events such as outbreaks of disease, war, and so on.
−2
−3
log Hazard
−4
−5
−6
1751
1810
1860
1914
0 20 40 60 80
age
Fig. 10.10 Log hazard rates as a function of age for Swedish women born in the years 1751, 1810,
1860 and 1914. These data are derived from mortality tables at http://mortality.org.
Let xi (t) represent the log hazard rate at age t for birth year i. We propose the
model Z
xi+1 (t) = β0 (t) + β1 (s,t)xi (t)ds + εi (t) . (10.16)
That is, for any year from 1752 to 1894, we model the log hazard function for that
year using as the functional covariate the log hazard curve for the preceding year.
Assume that the response curves have been smoothed and represented as functional
data object NextYear, and that the covariate curves are in functional data object
ThisYear.
The regression function β1 has the basis function expansion
K1 K2
β1 (s,t) = ∑ ∑ bk` φk (s)ψ` (t)
k=1 `=1
0
= φ (s)Bψ (t), (10.17)
10.5 Permutation Tests of Functional Hypotheses 165
where the coefficients for the expansion are in the K1 by K2 matrix B. We therefore
need to define two bases for β1 , as well as a basis for the intercept function β0 .
For a bivariate function such as β1 (t, s) smoothness can be imposed by penalizing
the s and t directions separately:
Z Z
PENλt ,λs (β1 (t, s)) = λ1 [Lt β1 (t, s)]2 ds dt + λ2 [Ls β1 (t, s)]2 ds dt , (10.18)
where linear differential operator Ls only involves derivatives with respect to s and
Lt only involves derivatives with respect to t. We can also apply a penalty to the
roughness of the intercept β0 .
The following code sets up a B-spline basis of order four with 23 basis functions.
This is used to define functional parameter objects for β0 , β1 (·,t) and β1 (s, ·). The
second derivative is penalized in each case, but the smoothing parameter values vary
as shown. The final statement assembles these three functional parameter objects
into a list object to be supplied to function linmod as an argument.
betabasis = create.bspline.basis(c(0,80),23)
beta0Par = fdPar(betabasis, 2, 1e-5)
beta1sPar = fdPar(betabasis, 2, 1e3)
beta1tPar = fdPar(betabasis, 2, 1e3)
betaList = list(beta0Par, beta1sPar, beta1tPar)
Function linmod is invoked in R for these data by the command
linmodSmooth = linmod(NextYear, LastYear, betaList)
Figure 10.11 displays the estimated regression surface β1 (s,t). The estimated in-
tercept function β0 ranged over values four orders of magnitude smaller than the re-
sponse functions, and can therefore be considered to be essentially zero. The strong
ridge one year off the diagonal, namely β1 (s − 1, s), indicates that mortality at any
age is most strongly related to mortality at the previous year for that age less one.
In other words, mortality is most strongly determined by age-specific factors like
infectious diseases in infancy, accidents and violent death in early adulthood, and
aging late in life. The height of the surface declines to near zero for large differences
between s and t for this reason as well.
As was the case for scalar response models, we have so far focused on exploratory
analyses. In the context of functional response models, it would again be useful to
gauge the extent to which the estimated relationship can be distinguished from zero.
The type of questions that we are interested in are generalizations of common
statistical tests and of common statistical models:
• Are two or more groups of functions statistically distinguishable?
166 10 Linear Models for Functional Responses
beta
Fig. 10.11 The bivariate regression coefficient function β1 (s,t) for the model (10.16) estimated
from the 143 log hazard rate functions for the Swedish life table data. The ridge in β1 (s,t) is one
year off the diagonal.
Consider the Berkeley growth study data for both boys and girls in Figure 10.12.
This plot suggests that boys generally become taller than girls. However, is this
difference statistically significant? To evaluate this, we consider the absolute value
of a t-statistic at each point:
10.5 Permutation Tests of Functional Hypotheses 167
5 10 15
age(yrs)
Fig. 10.12 The heights of the first ten boys and the first ten girls in the Berkeley growth study. We
use a permutation test to evaluate whether the the growth curves for the two groups are different.
This is plotted in the solid line in Figure 10.13. By itself, this provides a sense of
the relative separation of the two groups of functions. However, a formal hypothesis
test requires a value or statistic to test and a probability value indicating the result
of the test. The test statistic that we use is the maximum value of the multivariate
T -test, T (t). To find a critical value of this statistic, we use a permutation test. We
perform the following procedure:
1. Randomly shuffle the labels of the curves.
2. Recalculate the maximum of T (t) with the new labels.
Repeating this many times allows a null distribution to be constructed. This pro-
vides a reference for evaluating the maximum value of the observed T (t).
The following code executes a permutation test and generates the graphic in Fig-
ure 10.13. It uses a default value of 200 random shuffles, which is more than ade-
quate for such a large difference as is shown, but might not suffice for more delicate
effects.
tperm.fd(hgtmfd,hgtffd)
Here hgtmfd and hgtffd are functional data objects for the males and females
in the study. It is apparent that there is little evidence for difference up to the age
of around 12, about the middle of the female growth spurt, at which point the boys
168 10 Linear Models for Functional Responses
rapidly become taller. We can conclude that the main reason why boys end up taller
than girls is that they get an extra couple of years of growth on the average.
10
Observed Statistic
pointwise 0.05 critical value
maximum 0.05 critical value
8
t−statistic
6
4
2
0
5 10 15
age
Fig. 10.13 A permutation test for the difference between girls and boys in the Berkeley growth
study. The dashed line gives the permutation 0.05 critical value for the maximum of the t-statistic
and the dotted the permutation critical value for the pointwise statistic.
In the more general case of functional linear regression, the same approach can be
applied. In this case, we define a functional version of the univariate F-statistic:
Var[ŷ(t)]
F(t) = 1
(10.20)
n ∑(yi (t) − ŷ(t))2
where ŷ are the predicted values from a call to fRegress. Apart from a scale
factor, this is the functional equivalent of the scalar F-statistic in multiple linear
regression. It reduces to that for scalar-response models, as discussed in Section
9.5 above. As before, we reduce this to a single number by calculating max(F(t))
and conducting a permutation test. In this case, we permute the response curves (or
values), leaving the design unchanged. A test for no-effect of geographic region on
temperature profile is conducted below. Figure 10.14 reports pointwise and maximal
F-statistics and their corresponding permutation critical values for the temperature
data.
10.6 Details for R Functions fRegress, fRegress.CV and fRegress.stderr 169
3.0
2.5
F−statistic
2.0
1.5
Observed Statistic
1.0
Fig. 10.14 A permutation test for a predictive relationship between geographic region and temper-
ature profile for the Canadian weather data.
In either case, the object must have the same number of replications as the de-
pendent variable object. That is, if it is a scalar, it must be of the same length
as the dependent variable, and if it is functional, it must have the same number
of replications as the dependent variable. (Only univariate independent variables
are currently allowed in xfdlist.)
betalist For the fd, fdPar, and numeric methods, betalist must be
a list of length equal to length(xfdlist). Members of this list are func-
tional parameter objects (class fdPar) defining the regression functions to be
estimated. Even if a corresponding independent variable is scalar, its regression
coefficient must be functional if the dependent variable is functional. (If the de-
pendent variable is a scalar, the coefficients of scalar independent variables, in-
cluding the intercept, must be constants, but the coefficients of functional inde-
pendent variables must be functional.) Each of these functional parameter objects
defines a single functional data object, that is, with only one replication.
For the formula and character methods, betalist can be either a
list, as for the other methods, or NULL, in which case a list is created. If
betalist is created, it will use the bases from the corresponding component
of xfdlist if it is function or from the response variable. Smoothing infor-
mation (arguments Lfdobj, lambda, estimate, and penmat of function
fdPar) will come from the corresponding component of xfdlist if it is of
class fdPar (or for scalar independent variables from the response variable if it
is of class fdPar) or from optional ... arguments if the reference variable is
not of class fdPar.
wt Weights for weighted least squares.
y2cMap The matrix mapping from the vector of observed values to the coeffi-
cients for the dependent variable. This is output by function smooth.basis.
If this is supplied, confidence limits are computed, otherwise not.
SigmaE Estimate of the covariances among the residuals. This can only be esti-
mated after a preliminary analysis with fRegress.
method A character string matching either fRegress for functional regression
estimation or mode to create the argument lists for functional regression estima-
tion without running it.
sep Separator for creating names for multiple variables for fRegress.fdPar
or fRegress.numeric created from single variables on the right-hand side
of the formula y. This happens with multidimensional fd objects as well as
with categorical variables.
... Optional arguments.
These functions return either a standard fRegress fit object or a model speci-
fication:
fRegress fit A list of class fRegress with the following components:
y The first argument in the call to fRegress (coerced to class fdPar).
xfdlist The second argument in the call to fRegress.
betalist The third argument in the call to fRegress.
172 10 Linear Models for Functional Responses
xfdlist0 A list of the objects named on the right hand side of formula.
This will differ from xfdlist for any categorical or multivariate right-hand
side object.
type The type component of any fd object on the right-hand side of
formula.
nbasis A vector containing the nbasis components of variables named in
formula having such components.
xVars An integer vector with all the variable names on the right-hand side
of formula containing the corresponding number of variables in xfdlist.
This can exceed 1 for any multivariate object on the right-hand side of class
either numeric or fd as well as any categorical variable.
10.6 Details for R Functions fRegress, fRegress.CV and fRegress.stderr 173
yfdobj A functional data object for the response or dependent variable func-
tions.
xfdobj A functional data object for the covariate or independent variable func-
tions.
betaList A list object containing three functional parameter objects. The first
is for the intercept term β0 in (10.15), the second is for the bivariate regression
function β1 in (10.15) as a function of the first argument s, and the third is for β1
as a function of the second argument t.
The function returns a list of length three with components as follows:
beta0estfd A functional data object for the estimated intercept.
beta1estbifd A bivariate functional data object for the bivariate regression
function.
yhatfdobj A functional data object for the predicted response function.
This function can be called with exactly the same calling sequence as fRegress,
it has additional arguments which all have default values:
nperm Number of permutations to use in creating the null distribution.
argvals If yfdPar is a fd object, the points at which to evaluate the point-
wise F-statistic.
q Critical upper-tail quantile of the null distribution to compare to the observed
F-statistic.
plotres Argument to plot a visual display of the null distribution displaying
the qth quantile and observed F-statistic.
... Additional plotting arguments that can be used with plot.
If yfdPar is a fd object, the maximal value of the pointwise F-statistic is calcu-
lated. The pointwise F-statistics are also returned. The default of setting q = 0.95
is, by now, fairly standard. The default nperm = 200 may be small, depending on
the amount of computing time available. If argvals is not specified and yfdPar
is a fd object, it defaults to 101 equally spaced points on the range of yfdPar.
If plotres = TRUE and yfdPar is a functional data object, a plot is pro-
duced giving the functional F-statistic along with 95th quantiles of the null distribu-
tion at each point and the 95th quantile of the null distribution of maximal F-values.
If yfdPar is scalar, a histogram is plotted with the 95th quantile marked along with
the observed statistic. The function returns a list with the following elements which
may be used to reconstruct the plot.
pval The observed p-value of the permutation test.
176 10 Linear Models for Functional Responses
This function carries out a permutation t-test for the difference between two groups
of functional data objects. Its arguments are
x1fd and x2fd Functional data objects giving the two groups of functional
observations.
nperm The number of permutations to use in creating the null distribution.
q Critical upper-tail quantile of the null distribution to compare to the observed
t-statistic.
argvals If yfdPar is a fd object, the points at which to evaluate the point-
wise t-statistic.
plotres Argument to plot a visual display of the null distribution displaying
the 1-qth quantile and observed t-statistic.
If plotres=TRUE, a plot is given showing the functional t-statistic, along with
the critical values of the permutation distribution at each point and the permutation
critical value of the maximal t-statistic. It returns a list with the objects necessary to
recreate the plot:
pval The observed p-value of the permutation test.
qval The qth quantile of the null distribution.
Tobs The observed maximal t-statistic.
Tnull A vector of length nperm giving the observed values of the permutation
distribution.
Tvals The pointwise values of the observed t-statistic.
Tnullvals The pointwise values of of the permutation observations.
pvals.pts Pointwise p-values of the t-statistic.
qvals.pts Pointwise qth quantiles of the null distribution.
argvals Argument values for evaluating the F-statistic if yfdParis a func-
tional data object.
10.11 More to Read 177
The Swedish life table data consist of the log hazard rates (instantaneous risk of
death) at ages 0 to 80 for Swedish women by birth year from 1751 to 1914. We
want to develop a model for the way in which these have evolved over the years to
1894, and consider how well we can use this to forecast the hazard rate for women
born in the year 1914.
1. Smooth the data appropriately. Explore these smooths – are there clearly evident
features in how they change over time?
2. Create a functional linear model to predict the hazard curves from birth year for
years 1751 to 1894. Choose smoothing parameters by cross-validation. Provide a
plot of the error covariance. Plot the coefficient functions along with confidence
intervals.
3. Examine the residuals from your model above. Are there any indications of lack
of fit? If there are, construct an appropriately modified model. Plot the R-squared
for both the linear model and the new one. Does there appear to be evidence for
the effect of time on hazard curves?
4. Extrapolate your models to predict the hazard rate at 1914. How well does each
do? Do they give better predictions than just the mean hazard curve?
5. Because the hazard curves are ordered in time, it is also possible to consider
a functional time series model. Specifically, fit a model with the autoregressive
structure:
The concurrent linear model is closely related to the varying coefficients model. See
Hastie and Tibshirani (1993), plus a large recent literature associated in Ramsay
and Silverman (2005). A theoretical coverage of more general functional response
models is given in Cuevas et al. (2002) as well as earlier papers by Cardot et al.
(1999) and Ferraty and Vieu (2001). An elegant discussion of the ways in which
the functional ANOVA can be treated is given in Brumback and Rice (1998) and
associated discussion.
Chapter 11
Functional Models and Dynamics
This chapter brings us to the study of continuous time dynamics, where functional
data analysis has, perhaps, its greatest utility by providing direct access to relation-
ships between derivatives that could otherwise be studied only indirectly. Although
dynamic systems are the subject of a large mathematical literature, they are rela-
tively uncommon in statistics. We have therefore devoted the first section of this
chapter to reviewing them and their properties. Then we address how “principal
differential analysis (PDA)” can contribute to their study from an empirical per-
spective.
Functional data offer access to estimated derivatives, which reflect rates of change.
We have already seen the interpretative advantage of looking at velocity and accel-
eration in the Berkeley growth data. The field of dynamics is the study of systems
that are characterized by relationships among derivatives. Newton’s Second Law,
F = ma
J.O. Ramsay et al., Functional Data Analysis with R and MATLAB, Use R, 179
DOI: 10.1007/978-0-387-98185-7_11,
© Springer Science + Business Media, LLC 2009
180 11 Functional Models and Dynamics
Consider a straight-sided bucket of water with a leak at the bottom. Water will leak
out from the hole at a rate proportional to the amount of pressure on the bottom
of the bucket and the size of the hole (ignoring second-order effects like surface
tension and flow turbulence). Since the pressure is proportional to the height x(t) of
the water in the bucket at time t, the flow rate Dx(t) can be described as follows:
The negative sign is introduced here because water flowing out of the bucket reduces
the height.
Equations with this structure have the solution
x(t) = Ce−β t .
Since C = x(0), it is called the initial condition or state of this system. Since in our
example β > 0, the height of the water exhibits exponential decay.
If a hose adds water to the bucket at a rate g(t), Equation (11.2) becomes
The coefficient α is required to match the units of the two terms. The input function
g(t) is called a forcing function, changing the unforced behavior of the system.
Of course most buckets change their diameter with height, and there will be ad-
ditional loss from evaporation, splashing and so forth. Effects such as these would
require the coefficients to change with time:
To make this equation more interpretable, let us consider the situation where g(t) =
g is constant:
αg
x(t) = Ce−β t + .
β
The height of water tends to a level α g/β that balances inflow of water with outflow
leaving the bucket. Moreover, it tends to that level at an exponential rate. As a rule
of thumb, the exponential term implies that x(t) will move approximately two thirds
of the distance to α g/β in 1/β time units.
Of course, relationships between x and Dx may not capture all the important infor-
mation about how a system evolves. Linear second-order dynamics are expressed
as
D2 x(t) = −β0 x(t) − β1 Dx(t) + α g(t). (11.5)
A good way to understand (11.5) is to think of a physical system described by
Newton’s Second Law: Each of the terms represents a different “force” on the sys-
tem. The first term represents position-dependent forces like a spring for which the
“stress” (force) is proportional to the “strain” (deformation). The second term is pro-
portional to the speed at which the system moves, and can be thought of in terms of
friction or viscosity, especially when β1 is positive. As before, g(t) again represents
external inputs into a system that modify its behavior, like Newton’s Second Law,
expression (11.1) above.
Let us first suppose that β1 = α = 0 so that
d = β12 /4 − β0 .
Direct differentiation shows that solutions are given by linear combinations of ex-
ponential functions
x(t) = c1 exp(γ1t) + c2 exp(γ2t)
with
182 11 Functional Models and Dynamics
−β1 √ −β1 √
γ1 = + d, γ2 = − d.
2 2
These solutions will decay exponentially if γ1 < 0 (since γ2 <= γ1 ). If d < 0, γ1 and
γ2 are complex conjugates, and
√ √
x(t) = exp(−β1t/2)[d1 sin(t −d) + d2 cos(t −d)]. (11.6)
Increasing Increasing
Oscillations Oscillations
β0
d=0
0
Exponential Exponential
Growth Decay
0
β1
Fig. 11.1 A diagram of the various dynamic regimes for a second-order differential equation for
different values of β0 and β1 .
For many purposes, we may want to generalize expression (11.5) further to con-
sider time-varying coefficients:
Dynamic models can include more than one state variable. The second-order system
discussed in Section 11.1.2 can be cast as a first-order system with a vector state,
with components representing the location and velocity. In this context it is less easy
to produce analytic expressions with which to analyze the stability properties of a
system. However, the rules are not very different. A multidimensional linear system
involving a k-dimensional state can be written as
where B(t) is now a k × k matrix. It is clear that for this system, x(t) ≡ 0 results in
a stable solution.
We can specialize this system to constant coefficients and add a forcing function
to get the following:
Dx(t) = −Bx(t) + u(t).
If u is constant, this system has a fixed point at x = −B−1 u at which the solution
does not change. We can understand the stability of this solution in terms of the
eigenvalues d1 , . . . , dk of −B. Letting
ξ j (t) = ed j t , j = 1, . . . , k,
the solution to (11.8) is given in terms of linear combinations of the ξ j (t). For a
general matrix B, some of the eigenvalues may be complex. For real-valued matrices
B, any complex eigenvalue will be paired with its complex conjugate. Moreover,
the imaginary parts describe the oscillations that we observed for the second-order
system, with the period of oscillation being 2π over the (positive) imaginary part of
each complex conjugate pair. Moreover, any eigenvalue with a positive real part will
explode exponentially; a complex conjugate pair of eigenvalues with a positive real
184 11 Functional Models and Dynamics
part will exhibit an exponentially increasing oscillation. A forcing term may shift
the behavior but will not change the stability properties unless the forcing term is
a function of the state vector in a way that in essence modifies the state transition
matrix B.
How are we to deal with higher-order multivariate dynamics? In Section 11.4 we
use a model of the form
We have seen how linear models describing relationships between derivatives results
in a system whose behavior can be qualitatively characterized. We would now like
to use this theory to characterize the behavior of a system from which we have data.
How can we fit linear dynamic models to functional data? One approach is to
solve a differential equation like (11.3) for some value of the parameters and fit
this to observed data by least squares. This procedure is computationally expensive,
however, and such models rarely fit observed data well since they do not account for
unobserved external influences on a system.
Instead, we use the fact that functional data analysis already gives us derivative
information. Given repeated measurements of the same process, we can model
where the εi (t) are error terms to allow for variation between different curves.
This expression represents a functional linear regression and could be fit with
fRegress.
However, we can view the model in a different light: when ui (t) ≡ 0 functional
linear regression estimates β (t) to minimize
N Z N Z £ ¤2
2
PDASSE(β ) = ∑ [Dxi (t) + β (t)xi (t)] dt = ∑ Lβ xi (t) dt. (11.10)
i=1 i=1
11.3 Principal Differential Analysis of the Lip Data 185
That is, the model looks for a linear differential operator to represent covariation be-
tween x and Dx. This method has been labeled principal differential analysis (PDA)
because of its similarity to principal components analysis:
• Functional PCA looked for linear operators defined by β (t) to explain variation
between curves.
• PDA looks for linear operators to explain variation between derivatives but within
curves.
Naturally, we can extend the same ideas to multivariate functions and to higher
derivatives; these are all accommodated in the fda package.
When we also wish to consider inputs into a dynamic system, the PDA objec-
tive criterion is the difference between the effective input and the linear differential
operator:
N Z £ ¤2
PDASSEu (β ) = ∑ Lβ xi (t) − α (t)u(t) dt. (11.11)
i=1
Both β and α here are functional objects to be estimated. This creates an input-
output system which responds to changes in u(t). Our examples below do not use
forcing functions, but we provide a description of how to incorporate them into the
code.
We illustrate the use of PDA with data on the movement of lips during speech pro-
duction. Figure 11.2 presents the position of the lower lip when saying the word
“Bob” 20 times. As is clear from the data, there are distinct opening and shutting
phases of the mouth surrounding a fairly linear trend that corresponds to the vo-
calization of the vowel. Muscle tissue behaves in many ways like a spring. This
observations suggests that we consider fitting a second-order equation to these data.
The function pda.fd is the basic tool for this analysis. In a break from our
naming conventions, the equivalent Matlab function is pdacell. The arguments to
this function are similar to fRegress. We need to give it the functional data object
to be analyzed along with a list of functional parameter objects containing bases
and penalties for the β and α coefficient functions. The following code attempts to
derive a second-order homogeneous differential equation like expression (11.7) for
lipfd obtained from smoothing the lip data with no smoothing in the coefficients
β0 (t) and β1 (t):
lipfd = smooth.basisPar(liptime, lip, 6,
Lfdobj=int2Lfd(4), lambda=1e-12)$fd
names(lipfd$fdnames) = c("time(seconds)",
"replications", "mm")
lipbasis = lipfd$basis
bwtlist = list(fdPar(lipbasis,2,0),
186 11 Functional Models and Dynamics
10
lip position (mm)
5
0
−5
−10
Fig. 11.2 The lip data. These give the position of the lower lip relative to the upper during 20
enunciations of the word ”Bob” by the same subject.
fdPar(lipbasis,2,0))
pdaList = pda.fd(lipfd,bwtlist)
The definition of pda.fd provides for arguments awtlist and ufdlist,
whose absence here indicates that the forcing function α g(t) in (11.5) is zero.
We now need to analyze the result. The function
plot.pda.fd(pdaList,whichdim=3)
will plot the first two panels in Figure 11.3. In higher dimensional systems these
coefficient functions can be grouped by dimension, equation, or observation. For
the third panel we have plotted the discriminant function:
dfd = (0.25*pdaList$bwtlist[[2]]$fdˆ2
- pdaList$bwtlist[[1]]$fd )
dfd$fdnames= list(’time’,’rep’,’discriminant’)
From this we see that there is an initial explosive motion as the lips, previously
sealed, are opened. This is followed by a period where the motion of the lips is
largely oscillatory with a period of about 30-40 ms. This corresponds approximately
to the spring constant of flaccid muscle tissue. During the “o” phase of the word,
there is a period of damped behavior when the lips are kept open in order to enunci-
ate the vowel.
11.4 PDA of the Handwriting Data 187
beta 0
1000
beta0
400
−200
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
time
beta 1
50
beta1
−50
discriminant
2000
discriminant
−1000
Fig. 11.3 Results of performing principal differential analysis on the lip data. Top two panels
represent the estimated β0 (t) and β1 (t) functional coefficients. The bottom panel shows the dis-
criminant function. This reveals in initial explosive motion as the lips part followed by oscillatory
motion, modulated for the production of the “o”.
We can also overlay our β coefficients on the bifurcation diagram in Figure 11.1.
The following code produces Figure 11.4:
pda.overlay(pdaList)
This tells much the same story. The initial impulse corresponds to explosive growth,
followed by largely stable oscillatory motion.
PDA could be effectively carried out using fRegress. However pda.fd also
works for for multivariate functional observations, which cannot be handled by the
current version of fRegress. Here we examine the handwriting data in Figure 1.8
where PDA provides informative results. As for the lip data, since this is a physical
system, we model the second derivative of the data.
For multidimensional systems, a PDA will have three levels of weight functions.
Each function is indexed according to the equation in which it appears (i), the vari-
able it multiplies ( j), and the derivative of that variable (k):
188 11 Functional Models and Dynamics
200
0.21
0
0.035
0.14
0.105
0.315
−1000
0.35
−100 −50 0 50
beta1
Fig. 11.4 An overlay of the PDA parameters estimated for the lip data on a bifurcation diagram of
a second-order linear system. Points are marked as time points from 0 to 0.35 seconds. The initial
explosive growth is followed by a period close to undamped oscillations.
m−1 d
Lxqi (t) = Dm xqi (t) + ∑ ∑ βi jk (t)Dk xq j (t). (11.12)
k=0 j=1
pdaPar = fdPar(fdabasis,2,1)
pdaParlist = list(pdaPar, pdaPar)
bwtlist = list( list(pdaParlist,pdaParlist),
list(pdaParlist,pdaParlist) )
pdaList = pda.fd(xfdlist, bwtlist)
For higher-dimensional systems, the analysis presented in Figure 11.4 is no longer
feasible. Instead, we consider the pointwise eigenvalues of the system that we de-
scribed in Section 11.1.3. These can be plotted as functional quantities. Nonzero
eigenvalues sometimes come in conjugate pairs. Therefore, a plot of the imagi-
nary part of the eigenvalues may be symmetrical. With nonzero imaginary parts,
the system oscillates. When the real part of any eigenvalue is positive, the system
experiences exponential growth or a growing oscillation. Otherwise it is stable or
decaying.
The function eigen.pda(pdaList) takes the result of pda.fd and pro-
duces the stability analysis given in Figure 11.5. As can be see, there is a strong and
stable periodic solution in the data, with the real parts of the eigenvalues staying
close to zero, indicating that the writing is dominated by ellipsoidal motion.
0.010
Real
−0.010
Fig. 11.5 Stability analysis of a principal differential analysis (PDA) of the handwriting data. The
nearly constant imaginary eigenvalue indicates a constant cycle modulated locally.
190 11 Functional Models and Dynamics
1000
beta0
0
Fig. 11.6 A comparison of PDA results for the unregistered lip data (solid) and the lip data with
each derivative first calculated and then registered with the same warping function (dashed).
The pda.fd function is like fRegress except that since both the response and
covariates are given by derivatives of the same function, only one functional data ob-
ject needs to be specified. It also handles multivariate functional data, but insists that
each dimension be given in a separate element of a list. This allows each dimension
to be defined with respect to different bases. The standard call is
pda.fd(xfdlist, bwtlist,awtlist, ufdlist,nfine)
We divide the description of the arguments into two cases:
x(t) is univariate:
xfdlist A functional data object defining x(t).
bwtlist A list of functional parameter objects, the jth element of each
should specify the basis and smoothing penalty for β j (t).
192 11 Functional Models and Dynamics
x(t) is multivariate:
xfdlist A list of functional data objects, the ith entry defining xi (t).
bwtlist A list of lists of lists of functional parameter objects.
bwtlist[[i]][[j]][[k]] should define the basis and smoothing penalty
for βi jk (t).
awtlist A list of lists functional parameter objects. awtlist[[i]][[j]]
represents the coefficient of ufdlist[[i]][[j]] in equation i.
ufdlist A two-level list of functional data objects, ufdlist[[i]] repre-
sents the list of input functions that affect equation i.
Both awtlist and ufdlist default to NULL, in which case they are ignored.
Individual elements of bwtlist, awtlist and ufdlist can be set to NULL,
in which case the corresponding coefficient functions are forced to be zero. The
nfine component gives the number of evaluation points at which to perform a lin-
ear regression. It defaults to 501.
This function calculates the pointwise eigenvalues of the system and produces a
plot of the same format as Figure 11.5. If awtlist is present, the fixed point of
the system is also calculated at each time and plotted. Its arguments are
pdaList A list object returned by pda.fd.
plotresult Should the result be plotted? Default is TRUE.
npts Number of points to use for plotting.
11.7 Some Things to Try 193
For a second-order univariate principal differential analysis, this function plots β0 (t)
against β1 (t) and overlays a bifurcation diagram. It requires
pdaList A list object returned by pda.fd.
nfine Number of points to use for plotting.
ncoarse Number of points use as time markers along the plot.
This function will register a given functional data object with a specified warping
function. It requires
yfd A multivariate functional data object defining the functions to be registered
with Wfd.
Wfd A functional data object defining the registration functions to be used to
register yfd. This can be the result of either landmarkreg or register.fd.
type Indicates the type of registration function.
direct Assumes Wfd is a direct definition of the registration functions. This
is produced by landmarkreg.
monotone Assumes that Wfd defines a monotone functional data objected,
up to shifting and scaling to make end points agree. This is produced by
register.fd.
periodic Does shift registration for periodic functions. This is output from
register.fd if periodic=TRUE.
It outputs a functional data object containing the registered curves.
1. Instead of the time-varying principal differential analysis given for the handwrit-
ing data, try a constant-coefficient principal differential analysis, but include a
194 11 Functional Models and Dynamics
constant forcing term. Does your interpretation differ markedly? What does the
fixed point of the system tell you?
2. Try a PDA of the Chinese handwriting data. Do the dynamics of this system
appear to be very different from the cursive script?
3. PDA can be performed on a single time series as well, but we have to borrow
strength across times instead of across replicates. One easy way to do this is
to insist that all the βi jk (t) be constant. Try this with data on the incidence of
melanoma over a 30-year period. These data are available in the melanoma
object.
a. Smooth the data, choosing the optimal λ by gcv and plot both data and the
smooth. We observe that there are two distinct dynamics: a linear trend and a
cycle with a period of about 10 years.
b. These observations suggest that
D4 x(t) + α D2 (x) ≈ 0.
The study of dynamics has a long history in applied mathematics. Borrelli and Cole-
man (2004) provide a good introductory overview. While linear differential equa-
tions with constant coefficients are relatively easily studied, nonlinear systems are
harder to analyze. Generalizing (11.8) a system is described by a vector of states,
and its evolution is given in terms of
Dx = f(x, u, θ ), (11.13)
where f is a vector-valued nonlinear function. Unlike (11.8), however, it is not usu-
ally possible to write down solutions to (11.13) analytically. Instead, we must rely
on numerical methods to approximate them. Despite these challenges, nonlinear
dynamic systems have proved enormously versatile in producing different forms of
qualitative behavior that mimic real-world processes, from bursts of neural firing
through epidemic processes and chemical reactions. We can examine the behavior
of these systems by extending the analysis of the stability of linear systems that we
described above and examining how the stability of fixed points and cycles changes
with elements of the parameter vector θ . This is a large field and the reader is di-
rected to the relevant literature such as Kuznetsov (2004) to learn more.
11.8 More to Read 195
Despite the usefulness of such models, there is relatively little literature on as-
sessing their agreement with data or on estimating and performing inference for θ .
This is partly due to the numerical difficulties involved in finding solutions to (11.13)
and partly due to the idealization involved in assuming that a system evolves deter-
ministically. One way of reducing the numerical challenges in fitting these data is a
nonlinear version of PDA; if all the components of x are measured, we can smooth
the data to create x̂ and estimate θ to minimize
Z
SISE(θ ) = (Dx̂(t) − f(x(t), u(t), θ ))2 dt.
The idea has been rediscovered numerous times (see Bellman and Roth, 1971;
Varah, 1982; Pascual and Ellner, 2000; Chen and Wu, 2008). The statistical prop-
erties of the resulting estimates have recently been examined (Brunel, 2008). This
technique can only be used, however, when there are enough data to smooth each
component of x. More recent work has focused on using (11.13) as a smoothing
penalty and iteratively refining θ to match the data (Ramsay et al., 2007). The use
of functional data analysis in statistical inference for nonlinear systems remains an
important research area.
Symbol Table
197
198 Symbol Table
199
200 References
Yao, F., H.-G. Müller, and J.-L. Wang (2005). Functional data analysis for longitu-
dinal data. Annals of Statistics 33, 2873–2903.
Zwiefelhofer, D., J. H. Reynolds, and M. Keim (2008). Population trends and
annual density estimates for select wintering seabird species on Kodiak Island,
Alaska. Technical report, U.S. Fish and Wildlife Service, Kodiak National
Wildlife Refuge. Technical Report, no. 08-00x.
Index
203
204 Index
gait data: model for knee angle, 158 linear harmonic acceleration, 55, 56, 63, 64,
generalized cross-validation, 66 139, see harmonic acceleration operator
generic functions, 25 linear mapping, 93
goodness of fit, 77 linear model, see functional linear model
goods index, 1, 3, 88 linmod, 174
growth data, 1, 13, 15, 38, 47, 59–62, 66, 67, lip data, 185, 187
87, 88, 91, 104, 117, 119, 122, 166, 179, list, 24
190, see Berkeley Growth Study list object, 23, 47
log hazard rate, 163
handwriting, 39 logical variables, 22
Chinese, 7, 126, 162
“fda” script, 7, 108, 187 M-spline, 35
harmonic acceleration, 12, 55, 136 manufacturing index, 1, 3, 88
harmonic acceleration operator, 55, see linear Matlab and R syntax, 21
harmonic acceleration Matlab syntax, 21
harmonic process, 88 mean, 49
harmonics, 103 mean.fd, 84
“Hat” matrix, 65 mental test, 9
hazard rate, 163 methods, 25, 40, 48
hip angle, 5, 158 midpubertal age, 15
historical linear model, 163 midspurt, 2
http://www.functionaldata.org, mollusk, 150
see www.functionaldata.org monomial basis, 30, 39, 40
hydrolics, 180 monomial basis functions, 30
monotone smooth, 71
I-splines, 35 Montreal, 10, 52
index, 1, 3, 88 mortality, 163
inner product, 93 Motion Analysis Laboratory, 5
inner product function, 93 multicollinearity, 155
inprod, 93 multivariate function, 45
int2Lfd, 55 multivariate functional data, 5, 185, 187
interchild variability, 16
interpolation, 12 nbasis, 31
neurophysiology, 13
kinetic energy, 89 Newton, 179
knee angle, 5, 158 nondurable goods index, 1, 3–5, 88–90
knot spacing, 37 nonfunctional data, 9
knots, 34 nonurable goods cycle, 90
Kodiak Island, 149 normal equations, 156
Kronecker product, 158 normalizing constant, 74
number of spline basis function rule, 35
labels for functional data objects, 46 numerical precision, 38
landmark, 118, 121, 123, 190
landmarkreg, 127 object, 24
lattice package, 85 object-oriented programming, 24
leak, 180 oil refinery data, 4, 6, 34
Lfd, 57, see linear differential operator order of a spline, 33
line continuation, 22 order of spline rule, 36
line termination, 22 orthonormal, 101, 102
linear differential equation, 194
linear differential operator, 11, 18, 55, 65, 68, PCA, see principal components analysis
94, 140, 185 pca.fd, 103, 108
Lfd, 55 pca.fd function, 113
linear differential operators, 55 PDA, see principal differential analysis
206 Index
A Beginner's Guide to R
Alain F. Zuur, Elena N. Ieno, Erik H.W.G. Meesters, and Den Burg
The text covers how to download and install R, import and manage
data, elementary plotting, an introduction to functions, advanced plot-
ting, and common beginner mistakes.
2005. 2nd ed. XX, 430 p. 151 illus. Hardcover (Springer Series in Statistics)
ISBN: 978-0-387-40080-8
Frédéric Ferraty
Philippe Vieu