See
discussions, stats, and author profiles for this publication at:
https://www.researchgate.net/publication/282935852
Classification problems based on
regression models for multidimensional functional data
Article March 2015
CITATIONS
READS
12
3 authors, including:
Mirosaw Krzyko
Adam Mickiewicz University
101 PUBLICATIONS 61 CITATIONS
SEE PROFILE
Waldemar Woyski
Adam Mickiewicz University
38 PUBLICATIONS 40 CITATIONS
SEE PROFILE
Available from: Mirosaw Krzyko
Retrieved on: 24 October 2016
STATISTICS IN TRANSITION new series, Spring 2015
97
STATISTICS IN TRANSITION new series, Spring 2015
Vol. 16, No. 1, pp. 97110
CLASSIFICATION PROBLEMS BASED ON
REGRESSION MODELS FOR MULTI-DIMENSIONAL
FUNCTIONAL DATA
Tomasz Grecki 1, Mirosaw Krzyko 2, Waldemar Woyski 3
ABSTRACT
Data in the form of a continuous vector function on a given interval are referred
to as multivariate functional data. These data are treated as realizations of
multivariate random processes. We use multivariate functional regression
techniques for the classification of multivariate functional data. The approaches
discussed are illustrated with an application to two real data sets.
Key words: multivariate functional data, functional data analysis, multivariate
functional regression, classification.
1. Introduction
Much attention has been paid in recent years to methods for representing data
as functions or curves. Such data are known in the literature as functional data
(Ramsay and Silverman (2005)). Applications of functional data can be found in
various fields, including medicine, economics, meteorology and many others. In
many applications there is a need to use statistical methods for objects
characterized by multiple features observed at many time points (doubly
multivariate data). Such data are called multivariate functional data. The
pioneering theoretical work was that of Besse (1979), in which random variables
take values in a general Hilbert space. Saporta (1981) presents an analysis of
multivariate functional data from the point of view of factorial methods (principal
components and canonical analysis). In this paper we focus on the problem of
classification via regression for multivariate functional data. Functional regression
models have been extensively studied; see for example James (2002), Mller and
1
Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Poland.
E-mail:
[email protected].
2
Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Poland.
E-mail:
[email protected].
3
Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Poland.
E-mail:
[email protected].
T. Grecki, M. Krzyko, W. Woyski: Classification problems
98
Stadmller(2005), Reiss and Ogden (2007), Matsui et al. (2008) and Li et al.
(2010). Various basic classification methods have also been adapted to functional
data, such as linear discriminant analysis (Hastie et al. (1995)), logistic regression
(Rossi et al. (2002)), penalized optimal scoring (Ando (2009)), nn (Ferraty and
Vieu (2003)), SVM (Rossi and Villa (2006)), and neural networks (Rossi et al.
(2005)). Moreover, the combining of classifiers has been extended to functional
data (Ferraty and Vieu (2009)).
In the present work we adapt multivariate regression models to the
classification of multivariate functional data. We focus on the binary
classification problem. There exist several techniques for extending the binary
problem to multi-class classification problems. A brief overview can be found in
Krzyko and Woyski (2009). The accuracy of the proposed methods is
demonstrated using biometrical examples. Promising results were obtained for
future research.
2. Classification problem
The classical classification problem involves determining a procedure by
which a given object can be assigned to one of
populations based on
observation of features of that object.
The object being classified can be described by a random pair
, where
and
.
The optimum Bayesian classifier then takes the form (Anderson (1984)):
We shall further consider only the case
. Here
We note that
where
is the regression function of the random variable
random vector .
Hence
with respect to the
99
STATISTICS IN TRANSITION new series, Spring 2015
3. Functional data
We now assume that the object being classified is described by a dimensional random process
, where
is the
Hilbert space of square-integrable functions.
Let be the realization of the random process . Moreover, assume that the
th component of the vector
can be represented by a finite number of
orthonormal basis functions
(1)
where
are the unknown coefficients.
Let
and
where
,
.
Then, the vector of the continuous function at point can be represented as
(2)
We can estimate the vector on the basis of
independent realizations
of the random process (functional data).
Typically data are recorded at discrete moments in time. Let
denote an
observed value of the feature
,
at the th time point , where
. Then our data consist of the
pairs
. These discrete data
can be smoothed by continuous functions
and is a compact set such that
, for
.
Details of the process of transformation of discrete data to functional data can
be found in Ramsay and Silverman (2005) or in Grecki et al. (2014).
4. Regression analysis for functional data
We now consider the problem of the estimation of the regression function
.
Let us assume that we have an -element training sample
(3)
T. Grecki, M. Krzyko, W. Woyski: Classification problems
100
where
and
.
Analogously as in section 3, we assume that the functions are obtained as
the result of a process of smoothing independent discrete data pairs
,
,
,
.
Thus the functions at point have the following representation:
(4)
4.1. Multivariate linear regression. We take the following model for the
regression function:
We seek the unknown parameters in the regression function by minimizing
the sum of squares
We assume that the functions ,
have the representation (4).
We adopt an analogous representation for the -dimensional weighting function ,
namely
(5)
where
Then
,and
Hence
We define
and
STATISTICS IN TRANSITION new series, Spring 2015
101
Then
Minimizing the above sum of squares leads to the choice of a vector
satisfying
(6)
Provided the matrix
is non-singular, equation (6) has the unique solution
(7)
In the case of functional data we may use the smoothed least squares method
(Ramsay and Silverman (2005)), that is, we minimize the sum of squares in the
form
where
denotes the linear differential operator. Assuming
Thus
We define
Hence
where
Then
, we obtain
T. Grecki, M. Krzyko, W. Woyski: Classification problems
102
Minimizing the above sum of squares leads to the choice of a vector
satisfying the equation
The equation thus obtained has the form
(8)
From this we obtain the following form for the estimator of the regression
function for the multivariate functional data:
where
is given by the formula (7) or (8).
4.2. Functional logistic regression. We adopt the following logistic regression
model for functional data:
(9)
Using the representation of the function given by (2) and the weighting
function given by (5) we reduce (9) to a standard logistic regression model in
the form
To estimate the unknown parameters of the model, we use the training sample
and the analogous representation for the functions ,
given
by (4).
Thus we obtain the following form for the estimator of the regression function
4.3. Local linear regression smoothers. We consider the problem of
nonparametric estimation of a regression function
from a sample (3).
Let be a fixed and known point in the space
.
Using Taylor series, we can approximate
, where
as follows:
is close to a point
(10)
STATISTICS IN TRANSITION new series, Spring 2015
103
where
This is a local polynomial regression problem in which we use the data to
estimate the polynomial which best approximates
in a small neighborhood
around the point , i.e. we minimize it with respect to and in the function
This is a weighted least squares problem where the weights are given by the
kernel functions
.
Analogously as in the previous sections, suppose that the vector functions
and are in the same space, i.e.
Then
The least squares problem is then to minimize the weighted sum-of-squares
function
with respect to the parameters
and .
It is convenient to define the following vectors and matrices:
T. Grecki, M. Krzyko, W. Woyski: Classification problems
104
where
The least squares problem is then to minimize the function
The solution is
provided
is a non-singular matrix.
As in the case of multivariate functional linear regression model we can also
include an additional smoothing component. Then, we seek the unknown
parameter by minimizing the sum of squares
Provided the matrix
solution
is non-singular we have the unique
The
is than estimated by the fitted intercept parameter (i.e. by ) as
this defines the position of the estimated local polynomial curve at the point .
By varying the value of , we can build up an estimate of the function
over
the range of the data.
We have
where the vector is of the length
position and s elsewhere.
and has a in the first
4.4. Nadaraya-Watson kernel estimator. In Section 4.3 we approximated the
regression function
using Taylor series. In the approximation (10) let us
take into account only the first term, i.e.
Then
STATISTICS IN TRANSITION new series, Spring 2015
105
Minimizing the above sum of squares leads to the kernel estimator of the
regression function
of the form
where
This gives us a well-known kernel estimator proposed by Nadaraya and
Watson (1964).
5. Examples
Experiments were carried out on two data sets, these being labelled data sets
whose labels are given. The data sets originate from Olszewski (2001).The ECG
data set uses two electrodes (Figure 1) to collect data during one heartbeat. Each
heartbeat is described by a multivariate time series (MTS) sample with two
variables and an assigned classification of normal or abnormal. Abnormal
heartbeats are representative of a cardiac pathology known as supraventricular
premature beat. The ECG data set contains 200 MTS samples, of which 133 are
normal and 67 are abnormal. The length of an MTS sample is between 39 and
152.
Figure 1. Variables of the extended ECG data set.
The Wafer data set uses six vacuum-chamber sensors (Figure 2) to collect data
while monitoring an operational semiconductor fabrication plant. Each wafer is
described by an MTS sample with six variables and an assigned classification of
normal or abnormal. The data set used here contains 327 MTS samples, of which
200 are normal and 127 are abnormal. The length of an MTS sample is between
104 and 198.
T. Grecki, M. Krzyko, W. Woyski: Classification problems
106
The multivariate samples in the data sets are of different lengths. For each
data set, the multivariate samples are extended to the length of the longest
multivariate sample in the set (Rodriguez et al. (2005)). We extend all variables to
the same length. For a short univariateinstance with length , we extend it to a
long instance
with length
by setting
Some of the values in a data sample are duplicated in order to extend the
sample. For instance, if we wanted to extend a data sample of length 75 to a
length of 100, one out of every three values would be duplicated. In this way, all
of the values in the original data sample are contained in the extended data
sample.
For the classification process, we used the classifiers described above. For
each data set we calculated the classification error rate using the leave-one-out
cross-validation method (LOO CV). Table 1 contains the results of the
classification error rates (in %).
Table 1. Classification error (in %)
Model
Multivariate functional
linear regression
Functional logistic
regression
Local linear
regression smoothers
Nadaraya-Watson
kernel estimator
ECG
Wafer
11.50
0.59
11.50
0.17
16.50
0.67
20.50
10.64
STATISTICS IN TRANSITION new series, Spring 2015
107
Figure 2. Variables of the extended Wafer data set.
From Table 1 we see that the ECG data set is difficult to recognize. None of
the four regression methods can deal with it well. In contrast, the data set Wafer is
easily recognizable. For this set of data definitely the best results are given by a
functional logistic regression. We also see a big difference between the local
linear regression smother, and the Nadaraya-Watson kernel estimator.
108
T. Grecki, M. Krzyko, W. Woyski: Classification problems
6. Conclusion
This paper develops and analyzes methods for constructing and using
regression methods of classification for multivariate functional data. These
methods were applied to two biometrical multivariate time series. In the case of
these examples it was shown that the use of multivariate functional regression
methods for classification gives good results. Of course, the performance of the
algorithm needs to be further evaluated on additional real and artificial data sets.
In a similar way, we can extend other regression methods, such as partial least
squares regression PLS (Wold (1985)), least absolute shrinkage and selection
operator LASSO (Tibshirani (1996)), or least-angle regression LARS (Efron
et al. (2004)), to the multivariate functional case. This will be the direction of our
future research.
REFERENCES
ANDERSON, T. W., (1984). An Introduction to Multivariate Statistical Analysis.
Wiley, New York.
ANDO, T., (2009). Penalized optimal scoring for the classification of multidimensional functional data. Statistcal Methodology 6, 565576.
BESSE, P., (1979). Etude descriptive dun processus. Ph.D. thesis, Universit
Paul Sabatier.
EFRON, B., HASTIE, T., JOHNSTONE, I., TIBSHIRANI, R., (2004). Least
Angle Regression. Annals of Statistics 32(2), 407499.
FERRATY, F., VIEU, P., (2003). Curve discrimination. A nonparametric
functional approach. Computational Statistics & Data Analysis 44, 161173.
FERRATY, F., VIEU, P., (2006). Nonparametric Functional Data Analysis:
Theory and Practice. Springer, New York.
FERRATY, F., VIEU, P., (2009). Additive prediction and boosting for functional
data. Computational Statistics & Data Analysis 53(4), 14001413.
GRECKI, T., KRZYKO, M., (2012). Functional Principal Components
Analysis. In: J. Pociecha and R. Decker (Eds.): Data analysis methods and its
applications. C. H. Beck, Warszawa, 7187.
GRECKI, T, KRZYKO, M., WASZAK, ., WOYSKI, W., (2014).
Methods of reducing dimension for functional data. Statistics in Transition
new series 15, 231242.
HASTIE, T. J., TIBSHIRANI, R. J., BUJA, A., (1995). Penalized discriminant
analysis. Annals of Statistics 23, 73102.
STATISTICS IN TRANSITION new series, Spring 2015
109
JAMES, G. M., (2002). Generalized linear models with functional predictors.
Journal of the Royal Statistical Society 64(3), 411432.
JACQUES, J., PREDA, C., (2014). Model-based clustering for multivariate
functional data. Computational Statistics & Data Analysis 71, 92106.
KRZYKO, M., WOYSKI, W., (2009). New variants of pairwise
classification. European Journal of Operational Research 199(2), 512519.
MATSUI, H., ARAKI, Y., KONISHI, S., (2008). Multivariate regression
modeling for functional data. Journal of Data Science 6, 313331.
MLLER, H. G., STADMLLER, U., (2005). Generalized functional linear
models. Annals of Statistics 33, 774805.
NADARAYA, E. A., (1964). On Estimating Regression. Theory of Probability
and its Applications 9(1), 141142.
OLSZEWSKI, R. T., (2001). Generalized Feature Extraction for Structural
Pattern Recognition in Time-Series Data. Ph.D. Thesis, Carnegie Mellon
University, Pittsburgh, PA.
RAMSAY, J. O., SILVERMAN, B. W., (2005). Functional Data Analysis.
Springer, New York.
REISS, P. T., OGDEN R. T., (2007). Functional principal component regression
and functional partial least squares. Journal of the American Statistcal
Assosiation 102(479), 984996.
ROSSI, F., DELANNAYC, N., CONAN-GUEZA, B., VERLEYSENC, M.,
(2005). Representation of functional data in neural networks. Neurocomputing
64, 183210.
ROSSI, F., VILLA, N., (2006). Support vector machines for functional data
classification. Neural Computing 69, 730742.
ROSSI, N., WANG, X., RAMSAY, J. O., (2002). Nonparametric item response
function estimates with EM algorithm. Journal of Educational and Behavioral
Statistics 27, 291317.
RODRIGUEZ, J. J., ALONSO, C. J., MAESTRO, J. A., (2005). Support vector
machines of intervalbased features for time series classification. KnowledgeBased Systems 18, 171178.
SAPORTA, G., (1981). Mthodes exploratoires danalyse de donnes
temporelles, thse de doctorat dtat es sciences mathmatiques soutenue le
10 juin 1981, Universit Pierre et Marie Curie.
SHMUELI, G., (2010). To explain or to predict? Statistical Science 25(3),
289310.
110
T. Grecki, M. Krzyko, W. Woyski: Classification problems
TIBSHIRANI, R., (1996). Regression shrinkage and selection via the lasso.
Journal of the Royal Statistical Society, Series B 58(1), 267288.
WATSON, G. S., (1964). Smooth regression analysis. Sankhya The Indian
Journal of Statistics, Series A 26(4), 359372.
WOLD, H., (1985). Partial least squares. In: S. Kotz, and N.L. Johnson (Eds.):
Encyclopedia of statistical sciences vol. 6, Wiley, New York, 581591.