0% found this document useful (0 votes)
3 views

1982 When the Data Are Functions

The paper discusses the extension of classical statistical techniques to analyze functional data, which are continuous functions observed over time. It emphasizes the need to adopt a functional analytic perspective, allowing for a more dynamic understanding of data and the application of techniques such as least squares approximation and principal components analysis. The author introduces the concept of a duality diagram as a useful tool for describing analyses and exploring new possibilities in the realm of functional data analysis.

Uploaded by

rcascant82
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

1982 When the Data Are Functions

The paper discusses the extension of classical statistical techniques to analyze functional data, which are continuous functions observed over time. It emphasizes the need to adopt a functional analytic perspective, allowing for a more dynamic understanding of data and the application of techniques such as least squares approximation and principal components analysis. The author introduces the concept of a duality diagram as a useful tool for describing analyses and exploring new possibilities in the realm of functional data analysis.

Uploaded by

rcascant82
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

PSYCHOMETRIKA--VOL.47, NO. 4.

DECEMBER,1982

WHEN THE DATA ARE FUNCTIONS

J. O . RAMSAY

MCGILL UNIVERSITY

A datum is often a continuous function x(t) of a variable such as time observed over some
interval. One or more such functions are observed for each subject or unit of observation. The
extension of classical data analytic techniques designed for p-variate observations to such data is
discussed. The essential step is the expression of the classical problem in the language of function-
al analysis, after which the extension to functions is a straightforward matter. A schematic device
called the duality diagram is a very useful tool for describing an analysis and for suggesting new
possibilities. Least squares approximation, descriptive statistics, principal components analysis,
and canonical correlation analysis are discussed within this broader framework.
Key words: continuous data, functional analysis, duality diagram.

Introduction
Sophisticated data collection hardware often produce data which are a set of con-
tinuous functions. I am sure that all of us have seen such data: E E G and E M G records,
learning curves, paths in space, subject responses continuous in time, speech production
measurements during vocalization, bioassay data, and so on. Consider as a further exam-
ple the curves displayed in Figure 1. These indicate the height of the tongue d o r s u m
during ten different utterances of the sound " a h - k a h " by a single subject [Keller & Ostry,
N o t e 1]. It is natural to consider each curve as a single observation, to summarize the ten
curves in terms of an average curve, and to measure in some way the variation of the
curves a b o u t this average.
This paper considers the extension of classical statistical techniques to include func-
tional data. It will be an elementary and simplified treatment, which m a y a n n o y those
wanting more subtlety and rigor. I must warn you, however, that a fundamental change of
point of view about what data are will be required, and if y o u leave my address aware
that an altered state of statistical consciousness is possible, I shall be content.
In dealing with functional data I will refer frequently to two lines of development.
The first is the expression of traditional data analytic technology in the language of func-
tional analysis. M u c h o f this w o r k has taken place in France and is not available in
English. I a m particularly indebted to the m o n g r a p h s of Cailliez and Pages 1-1976] and
Dauxois and Pousse [1976]. We are very fortunate to have with us for these meetings a
n u m b e r of those associated with this work, and in part m y talk is only an introduction to
t o m o r r o w ' s symposium.* The second line of development that has fascinated m y col-
league Suzanne Winsberg and I in recent years has been statistical applications of spline
functions. I feel that splines are destined to play a fundamental role in the analysis of
functional data, but I will try to show h o w in only a vague way at this point. Finally, this

* "New glances at principal components and correspondence analysis" was a symposium at the 1982 Joint
Meetings of the Classification Society and Psychometric Society, Montreal, Canada.
Presented as the Presidential Address to the Psychometric Society's Annual Meeting, May, 1982. I wish to
express my gratitude to my colleagues in France, especially at the University of Grenoble, for their warm
hospitality during my sabbatical leave. Preparation of this paper was supported by Grant APA 0320 from the
Natural Sciences and Engineering Research Council of Canada.
Requests for reprints should be sent to: J. O. Ramsay, Dept. of Psychology, 1205 Dr. Penfield Ave., Mon-
treal, Qurbec, Canada H3A 1B1.

0033-3123/82/1200-5004500.75/0 379
© 1982 The Psychometric Society
380 PSYCHOMETRIKA

TaNGUE MOVEMENT DURING "RH-KRH"

B.O

5.5
I--

I,iJ
',r-

E
(f) 5.0
0

hJ

Z
ID
I..-
q.5

tl.0
I I ....... ! I
0.0 0.2 O.q 0.6 0.8 1.0
TIME
FIGURE 1
The height of the tongue dorsum over a 400 millisecond interval of time during which the sound "ah-kah" was
uttered. Each curve represents a single utterance. The same subject was involved in all ten replications. The
average curve is represented by a dashed line and was computed by averaging the ten curves at each point
in time. The time units have been arbitrarily scaled to the interval [0, 1].

p a p e r will be correctly perceived b y m a n y of the r e a d e r s o f Psychometrika as being a


g e n e r a l i z a t i o n of the p i o n e e r i n g w o r k of T u c k e r [1958], a n d it is a privilege to again
a c k n o w l e d g e the w o r k o f s o m e o n e w h o has so often been there first.
F i g u r e 2 offers an a p p r o a c h to the c o n c e p t o f a f u n c t i o n a l d a t u m . I n the u p p e r left
c o r n e r we have the d o m a i n of the classical d a t a m a t r i x : e a c h of n subjects is p a i r e d with
each of p variables a n d to each p a i r a n u m b e r x u is assigned as the c o n s e q u e n c e of a n
e x p e r i m e n t o r d a t a collection. As one m o v e s d o w n from this corner, we c o m e to the
situation where n is in effect infinity a n d we a r e discussing p o p u l a t i o n characteristics.
Let us n o w fix the n u m b e r o f subjects n a n d a l l o w the n u m b e r o f v a r i a b l e s p to
increase w i t h o u t limit. T h i s process c a n b e e x t e n d e d even b e y o n d c o u n t a b i l i t y to the
J. O. RAMSAY 381

Domain o f X
Number of Variables
p<a) p=Co
(/)
Q)
o o e 0 O
t/) O O Q O Q

t3 n <(30 O ~ e S Q

¢,3 $ $ 0 D 0

b e e o o

Q $ O $ O

Z
E
::3 n=a) //
Datum Xil , - - - , Xip xi(t), O t=T
FIGURE 2
Possible domains for statistical observations. These domains depend on whether the number of replications or
cases (n) is finite or infinite and whether the n u m b e r of variables or points of observation (p) is finite or
corresponds to the points on a continuum.

situation where the variables define a continuum. As the upper right corner of Figure 2
indicates, the data now offer a number for each point on a continuum for each subject,
and it becomes natural to use function notation x,.(t) to designate the value assigned to
individual i at point t on this continuum. The lower right corner shows that one may even
have an infinity of subjects or cases to consider, although in this talk I will confine my
attention to finite n.
When presented with a continuous function, classical statisticians have tended to
respond in two ways. The first is to sample the continuum at a limited number of points
tj, j = 1. . . . . p. In effect each point is considered as a variable in the classical multivariate
problem. There are a number of disadvantages to this approach, however. Continuity and
higher orders of smoothness, which are usually important aspects of functional data, are
ignored. Information between sampling points is lost. Covariance parameters proliferate
with murderous speed in p so that a reduction of information does not achieve a re-
duction in model complexity.
The second classical response has been to postulate a family of functions which ap-
proximate the data but which depend on a limited number of parameters. The literature
on the learning curve and the item characteristic curve illustrates this process nicely. Until
382 PSYCHOMETRIKA

recently this approach ran into problems with the lack of flexibility of most parametric
curve families, but the advent of spline function technology has been a great break-
through. Nevertheless, summarizing the data in terms of a point in parameter space is not
the same thing as summarizing directly as functions. This problem becomes particularly
obvious when one tries to express variation in the functional data in functional rather
than point terms.
The major step required to accommodate functional data is to express traditional
statistical ideas in functional analytic terminology. This involves thinking of the data as
defining a mapping rather than a set of points. That is, the data must be viewed as an
element in a space of possible functions taking a domain space into a range space. After
describing what this means first for the p-variate ease and then for the functional data
case, I will look at least squares estimation, principal components analysis, and canonical
correlation analysis from this perspective. In each case the transition from classical multi-
variate data to functional data is very simple, involving essentially replacing a summation
by an integral. The essential and perhaps most difficult step is the change from a static to
a dynamic conceptualization of the data.

A Functional Analytic View of the Data


The best way to begin is by a reminder that a vector space is a set of entities which
can be added according to the usual rules of addition and can also be multiplied by
numbers or scalars. Scalar multiplication also distributes with respect to vector addition.
The most familiar example of a vector space is the set of p-tuples of real numbers, but it is
very important to think more generally. Literally anything that can be added and
stretched or shrunk by an arbitrary factor can qualify as a vector. A most important
alternative example is the set of vector-valued continuous functions of a set. The sum of
any two functions will be a continuous function and hence also within the set. Since the
functions are vector-valued they can be multiplied by a scalar, and since this preserves
continuity the resulting function is still in the space. A vector space of functions particu-
larly relevant to this paper is the space of real-valued functions having squares which
have a finite integral over their domain.
There is a fundamental distinction between vector spaces which are finite dimension-
al and those which are not. Elements of the former can be represented as weighted sums
of a finite number p of basis vectors, and can be put into one-to-one correspondence with
p-tuples of numbers. Infinite dimensional spaces, which includes most spaces of functions,
may not be representable in this way or may require an infinite number of basis vectors.
Function spaces contain useful finite dimensional subspaees, however, consisting of func-
tions which are linear combinations of a finite number of special functions. Thus, the set
of polynomials of degree p - 1 is a p-dimensional subspaee of the space of continuous
functions.
An especially useful type of vector space is one which is equipped with an inner
product. This is a real-valued function of two vectors which is symmetric and linear in its
arguments, and which produces a positive number when both arguments are the same
nonzero vector. The familiar inner or dot product of two p-tuples is an example. In spaces
of square-integrable functions the integral of the product of two functions is an inner
product.
In the following discussion of data analysis from a functional analytic point of view,
the familiar p-dimensional vector space of p-tuples of real numbers will be juxtaposed
J. O. RAMSAY 383

with the vector space of real-valued functions defined over a closed interval and for which
the integral of their squares is finite.

p-variate Data
Let X represent the set of observations x~j, j = 1. . . . . p, i = 1. . . . . n. X can be viewed
as a funcion or a mapping from one vector space into another. The two vector spaces in
question are:
1. Subject space E of dimension p. In this space any subject or case, observed or
hypothetical, is represented by a point corresponding to its relation to each of the
p variables. In this space two vectors ei and ek will have an inner product b~(e~,
ek), which for purposes of simplicity will be e~-ek, and the norm I1 e It of a vector e
is the square root of the inner product of e with itself. The space can be spanned
by a set of p orthonormal vectors e.
2. Variable space F of dimension n. In this space any variable, observed or hypotheti-
cal, is represented by the values assigned to the n subjects at that position. It also
has an inner product b r ( . , . ) and a set of n orthonormal spanning vectors.
The mappings that X represents are then
1. X: E ---, F. Let e be any element of E. Then with X arranged as an n by p matrix
the matrix p r o d u c t f = Xe is an n-vector and thus a position in variable space.
2. Xt: F ---, E. For any element f ~ F the product X~f is an element of E. The map-
ping X is called the transpose mapping associated with X. This has a broader
meaning in functional analytic terms* for any elements e of E and f of F a map-
ping and its transpose satisfy br(f,, Xe) = be(X'f, e).

It can be useful to think of spaces E and F as themselves spaces of functions. For example,
F can be considered the space of possible mappings of n individuals into the real number
line. One then has a complete expression of the problem in function space terminology.
As unfamiliar as this may be, it can pay handsome dividends when considering some new
problems. However, the main temptation to be resisted at this point is to consider a
column or row of X as an element of F or E, respectively; X must be viewed as a
mapping rather than a set of vectors.
In addition to the two mappings above associated with X there are two other im-
portant mappings determined by X. Both of these are called operators since they are
mappings from a space into itself.
3. V: E --- E. The fact that X m a p s a vector from E to F and X t maps it back again
means that the composite X ' o X maps E into itself. Denoting the matrix X~X by
V, this mapping corresponds to the matrix product Ve. When the matrix X has
zero column means, the matrix n - 1 V is the variance-covariance matrix, and the
corresponding operator is also called the variance-covariance operator. Figure 3
shows an example for n = 3 and p = 2 of how a particular vector is transformed
by V.
4. W: F - - * F. Along with operator V for E goes a corresponding operator for F
which results from applying X t to an element f and then X to the resulting image
of f i n E. In matrix terms W = X X ' .

* When one or both spaces are equipped with something other than the identity metric it is important to
distinguish between the transpose of a mapping and its adjoint. The appendix discusses this.
384 PSYCHOMETRIKA

The m a p p i n g s that result from the d a t a X c a n be s u m m a r i z e d neatly i n the following


diagram:

v~E F~W. (1)


X t

-X-
X t"
toXe

FIGu~ 3
Two views of the transformation of the vector e = (2, 5)1 by the operator V = X'X, where X is the data matrix

X= 1 .
0 .
J. O. RAMSAY 385

This is a simplified version of what Jean-Pierre Pag6s [Pag6s & Tenenhaus, Note 2] will
describe for us tomorrow* as the duality diagram. It is a very important symbolic tool for
the presentation of data analytic problems in functional analytic terms. In this talk I will
ignore any considerations involving metrics for spaces E and F. Appendix A explains how
the duality diagram is extended to accommodate metrics other than the identity by the
use of the dual spaces for E and F.
We may now say that the data determine a subspace of F in the following way. Any
vector e in E is mapped by X into F. If we choose the p orthonormal vectors ej in E and
map them into F, the result will be p vectors in F; and the image of any e in F can be
represented by a linear combination of these p images. Thus the subspace of F is in effect
the image of E under the mapping X, and will usually be of dimension min {p, n}. If an
arbitrary elementfwithin F does not lie within this subspace, its image under operator W
certainly will since W is a result of first m a p p i n g f i n t o E and then its image back into F.

Functional Data
The next task is to extend the functional analytic discussion of the data given above
to the situation in which the datum for individual or case i is a function x,(t), 0 < t < T.
As Figure 2 suggests, one may imagine this arising from allowing the number of variables
to become so large that the index j, j = 1. . . . . p, can be treated as a continuum and is
renamed t. The continuum will be referred to as time in this paper since this is so often
the case in practice. In general, where summation over j takes place in the classical case,
integration over t will now be required. The data will still be indicated by X but it will no
longer be worthwhile thinking of X as a matrix. Nevertheless, X defines mappings among
the following spaces.
1. Subject space E of infinite dimension. Any subject, observed or hypothetical, can be
associated with a function e(t). Since an infinite variety of functions is possible the
space has infinite dimensionality. An inner product bE(e~, ek) for functions %(0 and
ek(t) is given by

b~(ej, ek) = ej(t)ek(t) dt,

and again the norm of a function is the inner product of the function with itself;
that is, the integral of its square. If attention is confined to functions having finite
norms, then the resulting space is a ttilbert space. It is fundamental to Hilbert
spaces that any element can be represented in them by a weighted sum of a count-
able number of orthonormal functions, so that although the dimensionality is in-
finite, it is at least countable. E in the p-variate case is also a Hilbert space.
2. Time space F of dimensionality n. This space has exactly the same characteristics
as F in the p-variate case. Any point in time, observed or hypothetical, is repre-
sented in this space by the values of the n functions at that time point.
The mappings that X represents are now
1. X: E --* F. Let e(t) be any function in space E. Then the v e c t o r f w h o s e ith element
is given by

f~ = ;/ x,{t)e(t) dt (2)

is an element in n-space and hence in F. Note that this is formally equivalent to


* (See Footnote 1 p. 379)
3 86 PSYCHOMETRIKA

what one would obtain in the p-variate case if the summation for the ith element
in the product Xe were replaced by an integration. Thus, it may be helpful to
imagine X in the functional data case as represented by a matrix with completely
dense rows.
2. Xt: F --- E. L e t f b e any vector in F. Then the function

X': f --, Xtf = e(t) = ~ fi x,(t) (3)


i......-

is an element of E. Moreover, it is easy to see that br(Xe, f) = be(e, X ~ for any


e e E a n d f e F so that one is justified in using the notation X z for this mapping.
3. V: E ---~E. Again the consequence of mapping a function e(t) into F by (2) and
then mapping the resulting n-vector back into E by (3) can be described as an
operator V = X' o X. It is analagous to a symmetric matrix with order so large
that it appears to have completely dense rows and columns. Explicitly

X ~ o X: e(t) --~, Ve(t) =


i
x,~t)
[fo x,(u)e(u) du
l
= f ~ I~i x,(t)xju)]e(u) du. (4)

Note that V is a member of that general class of integral transforms representable


by ~ K(t, u)e(u) du, where the function K(t, u) is called the kernel of the transform.
In this case K(t, u) = ~ xi(t)xi(u).
4. W: F --- F. The consequence of mapping a vector f i n F into E and back again is
a vector whose ith element is

f~ --. xi(t) f~ x~(t) ,it. (5)


l

These mappings are still described schematically by the duality diagram (1). The image of
E in F determined by the transformation X is now in general of dimension n and thus
coincides with F itself. This does not mean, however, that a W maps a vector f i n t o itself,
and a central problem is now to study the consequences of any or all of these mappings.

Least Squares Approximation in Functional Terms


The key idea underlying the most commonly used classical statistical procedures is
the approximation of a set of points by points lying within a subspace of reduced dimen-
sionality. For example, multiple regression is the process of representing an element f i n F
by its image f in the k-dimensional subspace spanned by predictor variables fj,
j = 1. . . . . k. This image f is a result of a projection o f f onto this subspace. Thus, we may
consider f to be the consequence of applying a projection operator P to f, where P maps
any vector into its least squares image in the subspace. Projection operators have two
important properties: P o P = P and pt = p. Implicit in the use of the term "operator" is
the uniqueness of the least squares estimate, which is the case when the vector being
approximated is (a) an element in a Hilbert space, and (b) the approximation is a member
of either a closed subspace or a convex subset.
Approximation problems in classical p-variate data analysis are usually expressed
in terms of the variable matrix F. However, it can be very useful in the functional data
J. O. RAMSAY 387

case to consider the problem of approximating a function e(t) in E by its projection


on a finite-dimensional subspace E. This process is familiar to electrical engineers
who apply filters to input signals to eliminate unwanted components. This is also what
the numerical analyst does in approximating a complicated function in terms of a linear
combination of simpler functions. When the goal of the approximation is the minimiza-
tion of. II e(t) - ~(t)112 = S [e(t) - ~(t)] 2 dt the corresponding mapping is a projection P.
This can be represented by the duality diagram
P

E /~. (6)
P
One set of approximating functions which have spectacular properties are piece-wise
polynomials or splines, which are not only very flexible with only a modest number of
parameters but remarkably easy to handle computationally. Winsberg and myself 1-1980;
1981; Winsberg & Ramsay, Note 3] have been working with monotone splines, and she
will report on another application in these meetings [Winsberg & Ramsay, 1982, Note 4].
Monotone spline approximation involves projecting function space onto a cone and has
turned out to be not at all difficult in a wide variety of problems. A comprehensive
treatment of splines and the various spaces associated with them is to be found in Schu-
maker 1"1981].

Principal Components Analysis

This very important technique is usually motivated in classical statistics by the


problem of either approximating the variance-covariance matrix n - x X ' X of rank p by a
matrix of reduced rank k, or of approximating the data matrix X by a matrix .~ of the
same dimensions but reduced rank k. In either case the solution can be expressed in terms
of either the eigenanalysis of X t X or the singular value decomposition of X. Counterparts
of eigenanalysis or the singular value decomposition for Hilbert spaces of functions also
play the critical role in the principal components analysis of functional data.
From a functional analytic point of view the problem is one of the description of a
mapping V: E ---, E or alternatively of X: E ---, F. Thus, we seek mappings ~" or .~ which
are in some sense close to those they approximate. One way in which the concept "dose"
can be expressed is by saying that the consequences of the approximating mapping should
be as close as possible to those of the original. In this way, the problem of approximating
a mapping can be reduced to that of approximation in the range space of the mapping,
and hence can be put into the least squares terminology discussed in the previous section.
The consequences of mapping X can be summarized by describing what happens to
a vector e in E of unit norm when it is mapped into F, whereupon it has norm [[ X e I[.
Now the image of the unit hypersphere in E will be a hyperellipsoid in F because the
mapping of X is linear. Figure 4 displays the image of the unit circle resulting from the 3
by 2 data matrix displayed in Figure 3. Thus, the location of an element ei in E whose
image [IX e l [J in F has the largest norm can be seen as a best one-dimensional approxi-
mation to this hyperellipsoid. The location of an element e 2 orthogonal to ex and having
the image with the largest norm provides a best two-dimensional approximation. This
process can be continued until either the patience of the data analyst or the dimen-
sionality of E is exhausted.
Because of the way in which the transpose was defined above,
bv(Xe, Xe) = b~(e, X t o Xe) = br(e, Ve). (7)
388 PSYCHOMETRIKA

Fioua.e 4
T w o views of the transformation of the unit circle by the m a p p i n g of X, where X is the data matrix

X= t 0 .
0 .5

Thus, the search for vector e of unit norm yielding maximum II Xe II is equivalent to the
search for the element maximizing bE(e, Ve). Any inner product satisfies the Cauchy-
Schwarz inequality, bF(ej, ek)2 < b~(ej, ej)b~(ek, ek), and it follows from this that b~(e, Ve)
is maximized when

Ve = 2e. (8)
J. O. R A M S A Y 389

This is of course an eigenequation. From (7) we have that IIXe II2 = 2 and thus that 2 is a
measure of variation in F. One may consider Xe as defining the dominant direction of
variation in the image of E in F, or in the image of F under mapping W. Under fairly
general conditions it can be shown that operator V can be expressed as ~ 2~ej(u)ej(t) with
at most a countable number of terms and where (2j, ei) is the jth solution to (8).
The identification of solutions to this equation is called spectral analysis, and is one
of the central topics in functional analysis. In the finite p case e is an eigenvector of
symmetric positive definite matrix V. In the functional data case e(t) is an eigenfunction of
the operator V. In either case 2 is an eigenvalue of V. Under very general conditions
which are reasonable for practical work it can be shown that the number of solutions to
the eigenequation are at most countable, that the eigenvalues are nonnegative and
distinct, and that the largest eigenvalue is finite.
There are various ways of tackling the problem of computing the eigenfunctions ej(t).
When n is not too large one may perform the matrix eigenanalysis of the matrix W
instead. The required eigenfunctions are then simply X ~ , i = 1. . . . . n. Techniques for
dealing directly with the operator V also exist, some of which involve discrete approxi-
mations for which an adequate convergence theory exists. A simple form of discrete ap-
proximation to operator V involves choosing a sufficiently large number of equally
spaced points in [0, T] and approximating the integrals in (4) by the corresponding sums
divided by the number of points. In this form the problem becomes identical with the
classical multivariate procedure for principal components analysis. However, more
sophisticated quadrature procedures will require much fewer points in general.
An important extension of principal components analysis is when E is mapped into
vet another space G according to the following diagram:
U X
~ ~ '- ------~

G ~._...~ _.I E ¢..__ ~ - F. (9)


U~ Xt

A particularly important example arises when G is a subspace or a convex subset of E.


For example, G may consist of the space spanned by a set of spline functions, or the cone
consisting of convex combinations of monotone splines. In this case both U and U t will
be the projection operator P. What is essential is that one may again define principal
components analysis as the extremal problem max 11X o U911 subject to 119 I1 = 1.

Descriptive Statistics for Functional Data


It is now possible to discuss the problem of descriptive statistics for functions in more
detail. The location of a set of functions can be summarized by the point-wise average
over subjects. Alternatively, if the functions have a large noise component or other unde-
sirable nonsmooth components, it may be preferable to first approximate them by suit-
able splines and then take the average of the approximations. The original functions can
then be centered by subtracting the average function from each of them.
Summarizing dispersion is more complicated. Just as one cannot summarize the
spread of a p-variate distribution in terms of the spread for each variable alone, so it is
that measuring the spread of the functions point-wise will be of little value. Instead, p-
variate spread is captured in the variance-covariance matrix n-IXrX. Analogously, it is
the variance-covariance operator n - t V in the functional data case which contains the
essential information on the dispersion of functions. The kernel of this operator K(u, t) =
n-a ~, xi(t)xi(u) defines a variance-covariance surface and it can be highly instructive to
plot this surface using either contour plotting or perspective plotting techniques. The
390 PSYCHOMETRIKA

C0VARIANCE S U R F A C E

7"0
0
FIGURE 5
The kernel of the variance-covariance operator K(t, u) = xl(t)x~u) computed from the functions in Figure 1
plotted as a surface. The peaks correspond to times of about 0.4 and 0.8 when the tongue dorsum was
decelerating. The hollows occur at the points (0.4, 0.8) and (0.8, 0.4) and indicate that covariance between
points where the tongue dorsum is decelerating is low. The height of the surface along the diagonal is the
variance functinn.

correlation surface defined by K(u, t)/[K(u, u)K(t, 0] 1/2 may also be plotted. The eigen-
functions ej(t) corresponding to dominant eigenvalues indicate the dominant types of vari-
ation from the average function, just as the eigenvectors in p-variate analysis indicate the
dominant directions of variation. Thus they should also be displayed.
The family of techniques known as time series analysis can be linked to the analysis
of functional data at this point. These procedures are based on the concept of stationarity
of covariance structure which implies that the correlation surface K(u, t) can be defined as
a function of I u - t l alone [Doob, 1953-1. It can be shown that the eigenfunctions in this
case are periodic and hence representable by at most a countable combination of sines
and cosines.
The descriptive analysis of functional data can be illustrated by the analysis of the
J. O. RAMSAY 391

CORRELATION SURFACE

"t'O
0
FIGURE 6
The correlation surface K(t, u)/(K(t, t)K(u, u)) ~/2 computed from the function in Figure 1. The hollows at (0.4,
0.8) and (0.8, 0.4) indicate that tongue dorsum movement is uncorrelated with itself at points where the
dorsum is decelerating.

curves presented in Figure 1, in which the pointwise average curve is indicated by a


dashed line. Note that the tongue dorsum is strongly decelerating at t = 0.4 and t = 0.8
and reaches its m a x i m u m height on the average at t = 0.5. Figure 5 displays the variance-
covariance surface for these curves. The height of the surface along its diagonal running
from lower right to upper left indicates the variability of the curves at each point in time.
The twin peaks correspond to t = 0.4 and t = 0.8 when the tongue dorsum is decelerating.
The correlation surface is displayed in Figure 6 and again the deep wells in the surface
392 PSYCHOMETRIKA

EIGENFUNCTION$ OF COVRR, OPERRTOR

0.6

O.q

0.2

-0.0

-0.2

-0.~

-0.6
I I I ' f"
0.0 0.2 O.q, 0.6 0.8 .0
TIME
FIGURE 7
The first two eigenfunctions of the variance-covariance operator K(t, u) computed from the data in Figure 1.
These correspond to eigenvalues of 0.17 and 0.12, respectively and account for 95% of the total of all
eigenvalues. The first eigenfunction indicates that the dominant form of deviation from the mean function is
a vertical shift, while the second eigenfunction indicates that a second component of variation is under- or
over-shooting of the tongue dorsum at points where it is decelerating.

correspond to the near zero correlation between tongue dorsum height at the two points
of deceleration. It appears that the tongue starts up with precision but has some di~culty
slowing down. The first two eigenvalues account for 95% of the total, and Figure 7
displays the first two eigenfunctions. These may be summarized by saying that the two
dominant modes of variation are a general vertical shift of the function, and an initial
overshooting prior to maximum height followed by an undershooting at minimal height.

Canonical Correlation Analysis


The description of canonical correlation analysis in functional analytic terms has a
great deal to offer a number of problems, especially since so many familiar techniques in
multivariate data analysis can be embedded within it. Let us now suppose that we have
two subject spaces E 1 and E 2 . In each a subject is located in terms of his relation to a set
of variables or points in time, with two different sets of variables or time being involved.
Let the two data-determined mappings from E1 and E , into F be denoted b y X 1 a n d X 2 ,
respectively. Then there are two subspaces in F to be considered. The first F t is the image
J.O. RAMSAY 393
of F under the operator Wt = XI o X~ and the second F 2 its image under W2 = X2 o X~.
Corresponding to these two subspaces are the projections P1 and P2 taking any element
of f into its least squares approximation in each of the subspaces Ft and F2, respectively.
This situation can be summed up in the following simplified duality diagram:
w,

Vl~/~l~___ X I ~ F ~ X?-~E 2 ~ W 2. (10)


xl ~ 2~
w,
Canonical analysis can be expressed as the description of Ft n Fz. In order for the
anal-ysfs to be nontrivial one must impose the additional requirement that these subspaces
be closed, which in practice means finite dimensional. An arbitrary element f in F will be
mapped into this intersection space if it is first projected into Ft by projection Px and
then this image is projected into F2 by projection Pz. This means that the operator
p1 o P2 (or for that matter P2 ° P0provides the required mapping.
In general terms canonical analysis reduces to the spectral analysis of the operator
Px ° P2- In the case where E~ and Ee are of finite dimensions p and q, respectively, this
reduces to the eigenanalysis of ( X ~V ; ~X ] ) ( X ~ V ~ ~ X~).
When one expresses the canonical correlation problem as one involving the spectral
analysis of the product of operators, it is obvious that it may be generalized to any
number of sets of variables or functions. Thus canonical correlation analysis for k-way
tables involves the eigenanalysis of the operator Px ° P 2 . . . . . Pk.
When the data for each subject is a pair of functions, the projections involved will
not normally be closed subspaces of F. In practical terms this means that two sets of
functions may correlate highly in terms of arbitrarily small and kinky components. The
way around this difficulty is to project each function space E onto an appropriate sub-
space or convex subset/~ of finite dimensionality (and less than n). This may be viewed as
a process of filtering the data. Using the fact that a projection is its own transpose, this
results in the following duality diagram:
w,
P~ x1 ~) X2 P2
VI~/~ 1 E1 F E2 ~2~V2. (11)
P~ X'l (7 X~ P~
W~
Now it is in general the case that the operators Vk = Pk ° Xk o X'k o Pk, k = 1, 2, will have
an inverse and the analysis may proceed as above.
In the functional data case it is natural to consider interchanging the roles of E and F
in the duality diagram. When the subjects are partitioned into two or more groups one
may be interested in determining pairs of functions, each being a weighted sum of func-
tions within a particular group, which have maximal intercorrelation subject to the usual
orthogonality conditions. Although such an analysis would seldom be of much interest in
the p-variate case, where the ordering of variables is usually arbitrary, the ordering of
points of time could make such an analysis very useful.

Conclusion
I have tried to indicate in a very general way how statistical concepts can be ex-
tended to include functional data. This requires the point of view from which all data are
394 PSYCHOMETRIKA

functional, and finally there is no essential distinction between data domains which are
finite sets and those which are continua. Functional analytic terminology also permits a
natural and rigorous treatment of a number of other familiar and not so familiar situ-
ations, including dual scaling or correspondence analysis, vector-valued continuous func-
tions as data, and tables of data with an arbitrary number of modes. At least one branch
of statistics is already expressed in functional terms: Bayesian inference is essentially the
mapping of the space of density functions defined on a parameter space into itself using a
nonlinear operator determined by the data. In this sense operator V is only a special case.
Unfortunately we in North America are handicapped by the fact that a course in
functional analysis is seldom a part of the preparation of an applied statistician. As was
the case for linear algebra, we pragmatic souls demand a demonstration of practical rele-
vance before making such a commitment. There are now a number of elementary intro-
ductions to functional analysis, including Aubin [1979] and Kreyszig [1978]. However
the classic work by Dieudonn6 [1960] still has few rivals. A very elementary treatment of
this material in a statistical context for the finite p case is a wonderful book by Cailliez
and Pag~g [1976] which is, alas, still only in French.
I would like to conclude with a quote from Dieudonn6 [1960].

The student should as soon as possible become familiar with the idea that a function f is a single object,
which m a y itself "vary" and is in general to be thought of as a "point" in a large "functional space"; indeed,
it m a y be said that one of the main differences between the classical a n d the modern concepts of analysis is
that, in classical mathematics, when one writes f(x)f is visualized as "fixed" and x as "variable" whereas
nowadays bothfand x are considered as "variables" ... (p. 1)

Functional analysis already has revolutionized numerical analysis, so that any issue of a
major journal now has a number of papers using this technology. I claim that this is
about to happen in statistics, and I hope that my talk leads you to speculate on this
possibility.

Appendix
The inner product function used in the text is too simple for many applications. For
example, the use of standardized data is equivalent to the use of inner product
~ ernjemk
bE(e~, ek) = m - a----~ "

Alternatively, one can say that br.(ej, ek) = e~ M e k , where M is a diagonal matrix contain-
ing reciprocals of variances in its diagonal. Matrix M is called the metric for E, and a
more general treatment requires the consideration of metrics M and N for spaces E and
F, respectively. The use of metrics requires the concept of a dual space, and in this appen-
dix the duality diagram (1) is extended to include these dual spaces.
For any vector e there are various possible real-valued functions that can be com-
puted. Weighted sums of elements and the maximal element are two examples. The set of
possible continuous real-valued functions of e which are linear in e is itself a vector space
since it is closed under weighted summation and has the other vector space properties.
This space is denoted by E* and is referred to as the dual of E. For example, if E consists
of the space of column vectors with p elements, then the dual of E is the space of row
vectors of size p. It is usual to use the notation (e, e*~ for the real number which results
when a function e* in E* is applied to an element e of E. F o r example, if e* is a row
vector and e a column vector then the matrix product e*e is denoted by (e, e*). The
symmetry of the notation corresponds to the symmetry of the relation between E and E*
since the dual of E* turns out to be E.
J. O. RAMSAY 395

There is a one-to-one correspondence between the two spaces induced by the equa-
tion
be(e, e0) = (e0, e*), (A1)
where eo is any fixed nonzero element of E. This equation defines a mapping M: E ---, E
for which to any e ~ E is associated the element e* = Me in E which satisfies (1). In the
p-dimensional case both E and E* can be represented by p-tuples of numbers, and
e* = Me, where M is the metric matrix. Thus M is used to denote both a metric and the
mapping from E to E*. If E* is equipped with inner product bE.(e*, eg) = ~"*t,,,~'t- l~.~k and
thus with metric M - l, then the two inner products satisfy br.(e~, e~) = be.(Me~, Mek) =
br(ej, ek).
It is usual to postulate two further properties of E. One of these is completeness,
implying that convergent sequences of vectors converge to a vector in the space. Intuit-
ively this implies that the space has no "holes" in it; the real numbers are a complete set
since they have no gaps between them while the rationals are not because there are se-
quences of rationals converging to irrational numbers. A complete vector space with an
inner product is called a Hilbert space. The other property used in practice is separability.
This says that the space contains a dense countable subset: every neighborhood of any
element contains a member of this dense countable subset. F o r example, every neighbor-
hood of a real number contains a rational number. Intuitively, this condition implies that
the space is "evenly and thinly spread."
Consider now a continuous linear mapping T: E --* F from Hilbert space E into
another Hilbert space F. Then there are two other mappings automatically associated
with T:
1. T*: F --* E: the adjoint mapping defined by the following equation

be(f, Te) = be(T'f, e) (A2)

for any e ~ E a n d f e F.
2. Tt: F* ---, E*: the transpose mapping defined by the following equation

br(f, T e ) = bE,(Ttf *, e*), (a3)

where f * = N f a n d e* = Me.
In the special case where both E and F are equipped with identity metrics, the two map-
pings are equivalent. The transpose mapping has the advantage of being invariant with
respect to changes in metric. In the finite dimensional case with E equipped with metric
matrix M, F with metric matrix N, and the mapping T represented by matrix T, the
transpose mapping is represented by T t and the adjoint by M - ~TtN.
Turning now to the data analytic applicaton of this theory, the data matrix X logi-
cally represents a mapping from E* to F. This arises because of the following two con-
siderations:
1. The ith individual is represented by a vector e~ in E specifying his relationship
with the p variables or with each point in time. If the dual space E* is spanned by
a set of orthonormal basis vectors e*, j = 1. . . . , p, then (e~, e*) = xij, the score of
individual i on variable j. Thus associated with each variable j is the orthonormal
basis vector e*.
2. The jth variable is also represented by v e c t o r fi in F specifying the relationship of
this variable with the n individuals. Again, iff~*, i = 1. . . . . n, is the orthonormal
spanning system of vectors for F* then (fj,f~ *) = xi~, and associated with each
individual i is the orthonormal basis vectorf~ *
396 PSYCHOMETRIKA

T h u s in b o t h E* a n d F are to be f o u n d vectors c o r r e s p o n d i n g to any variable, a n d


X : E* - - - F m a p s e* into f~. Similarly Xt: F* ---, E associates i n d i v i d u a l vectors. These
relations are s u m m e d u p in the c o m p l e t e version o f the d u a l i t y d i a g r a m :
X
E* - )F

(A4)
E )F*.
X'

T h e simplified version of the d u a l i t y d i a g r a m given in (1) was possible because, when M


a n d N are the identity m a p p i n g s , e a c h s p a c e a n d its d u a l have the same i n n e r p r o d u c t
a n d m e t r i c a n d are thus in every w a y identical. T h e m a p p i n g V: E* --., E is c o n s t r u c t e d
by going a r o u n d the d i a g r a m the long w a y ; V = X t o N o X. Similarly W = X o M o X t.
T h e p r i n c i p a l c o m p o n e n t s analysis p r o b l e m reduces then to the eigenanalysis o f
V o M : E - - - E represented in the finite d i m e n s i o n a l case b y the m a t r i x X t N X M .
Aside from the ease with which the d u a l i t y d i a g r a m p e r m i t s one to e x t e n d d a t a
analysis to functional d a t a , it also describes techniques such as p r i n c i p a l c o m p o n e n t s
analysis a n d c a n o n i c a l c o r r e l a t i o n analysis in a m a n n e r t h a t is free of b o t h basis a n d
m e t r i c for either E o r F. I n this sense it c a n b e c o n s i d e r e d a f u n d a m e n t a l a l g e b r a i c a d -
vance o v e r m a t r i x analysis, a n d it is to b e h o p e d t h a t it will b e c o m e a s t a n d a r d p a r t o f
statistical language.

REFERENCE NOTES
1. Keller, E. & Ostry, D. J. Computerized measurement of tongue dorsum movements with pulsed echo ultra-
sound(a). Manuscript submitted for publication to Journal of the Acoustical Society of America, 1982.
2. Pagrs, J. P. & Tenenhaus, M. Geometry and duality diagram. An example of application: The analysis of
qualitative variables. Paper presented at the Psychometric Society Annual Meeting, Montreal, Canada, 1982.
3. Winsberg, S. & Ramsay, J. O. Monotone spline transformations for dimension reduction. Submitted for
publication in Psychometrika.
4. Winsberg, S. & Ramsay, J. O. Monotone spline transformations for ordered categorical data. Paper presen-
ted at the Psychometric Society Annual Meeting, Montreal, Canada, 1982.

REFERENCES
Aubin, J.-P. Applied Functional Analysis. New York: Wiley, 1979.
Cailliez, F. & Pages, J.-P. Introduction d l'Analyse des Donn~es. Paris: Soci6t6 de Math6matiques Appliqu6es et
de Sciences Humaines, 9 rue Duban, 75016 Paris, 1976.
Dauxois, J. & Pousse, A. Les analyses factorieUes en calcul des probabilit6s et en statistique: Essai d'6tude
synth6tique. ThOse d'6tat, l'Universit6 Paul-Sabatier de Toulouse, France, 1976.
Dieudonn6, J. Foundationsof Modern Analysis. New York: Academic Press, 1960.
Doob, J. L. Stochastic Processes. New York: Wiley, 1953.
Kreyszig, E. Introductory Functional Analysis with Applications. New York: Wiley, 1978.
Schumaker, L Spline Functions: Basic Theory. New York: Wiley, 1981.
Tucker, L. R. Determination of parameters of a functional relationship by factor analysis. Psychometrika, 23,
1958, 19-23.
Winsberg, S. & Ramsay, J. O. Monotonic transformations to additivity using splines. Biometrika, 67, 1980,
669-674.
Winsberg, S. & Ramsay, J. O. Analysis of pairwise preference data using integrated B-splines. Psychometrika, 46,
1981, 171-186.
Final version received 7/12/82

You might also like