Computational Intelligence and Its Applications - Evolutionary Computation, Fuzzy Logic, Neural Network and Support Vector Machine Techniques (PDFDrive)
Computational Intelligence and Its Applications - Evolutionary Computation, Fuzzy Logic, Neural Network and Support Vector Machine Techniques (PDFDrive)
INTELLIGENCE
AND ITS APPLICATIONS
Evolutionary Computation, Fuzzy Logic, Neural Network
and Support Vector Machine Techniques
Editors
H. K. Lam
King’s College London, UK
S. H. Ling • H. T. Nguyen
University of Technology, Australia
Distributed by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
For photocopying of material in this volume, please pay a copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to
photocopy is not required from the publisher.
ISBN-13 978-1-84816-691-2
ISBN-10 1-84816-691-5
Printed in Singapore.
Preface
v
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 0
vi Preface
logics and their applications, (3) Neural networks and their applications
and (4) Support vector machines and their applications.
Chapter 1 compares three machine learning methods, support vector
machines, AdaBoost and soft margin AdaBoost algorithms, to solve the
pose estimation problem. Experiment results show that both the support
vector machines-based method and soft margin AdaBoost-based method
are able to reliably classify frontal and pose images better than the original
AdaBoost-based method.
Chapter 2 proposes a particle swarm optimization for polynomial
modeling in a dynamic environment. The performance of the proposed
particle swarm optimization is evaluated by polynomial modeling based
on a set of dynamic benchmark functions. Results show that the proposed
particle swarm optimization can find significantly better polynomial models
than genetic programming.
Chapter 3 deals with the problem of restoration of color-quantized
images. A restoration algorithm based on particle swarm optimization
with multi-wavelet mutation is proposed to handle the problem. Simulation
results show that it can improve the quality of a half-toned color-quantized
image remarkably in terms of both signal-to-noise ratio improvement and
convergence rate and the subjective quality of the restored images can also
be improved.
Chapter 4 deals with a non-invasive hypoglycemia detection for Type 1
diabetes mellitus (T1DM) patients based on the physiological parameters
of the electrocardiogram signals. An evolved fuzzy inference model is
developed for classification of hypoglycemia with rule optimization and
membership functions using a hybrid particle swarm optimization method
with wavelet mutation.
Chapter 5 studies the limit cycle behavior of weights of perceptron. It
is proposed that the perceptron exhibiting the limit cycle behavior can
be employed for solving a recognition problem when downsampled sets
of bounded training feature vectors are linearly separable. Numerical
computer simulation results show that the perceptron exhibiting the limit
cycle behavior can achieve a better recognition performance compared to a
multi-layer perceptron.
Chapter 6 presents an alternative information theoretic criterion
(minimum description length) to determine the optimal architecture of
neural networks according to the equilibrium between the model parameters
and model errors. The proposed method is applied for modeling of various
data using neural networks for verification.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 0
Preface vii
viii Preface
H.K. Lam
S.H. Ling
H.T. Nguyen
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 0
Contents
Preface v
ix
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 0
x Contents
Index 305
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 0
PART 1
1
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 0
2
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 1
Chapter 1
Contents
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Pose Detection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Procedure of pose detection . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Eigen Pose Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Maximal Margin Algorithms for Classification . . . . . . . . . . . . . . . . . . 9
1.3.1 Support vector machines . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.3 Soft margin AdaBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 1
1.1. Introduction
in a wide variety of domains, and for diverse base learners. For instance,
Viola and Jones demonstrated that AdaBoost can achieve both good speed
and performance in face detection [9]. However, research also showed that
AdaBoost often places too much emphasis on misclassified examples which
may just be noise. Hence, it can suffer from overfitting, particularly with
a highly noisy data set. The soft margin AdaBoost algorithm [10]
was introduced by using a regularization term to achieve a soft margin
algorithm, which allows mislabeled samples to exist in the training data
set, and improves the generalization performance of original AdaBoost.
Researchers have pointed out the equivalence between the mathematical
programs underlying both SVMs and boosting, and formalized a
correspondence. For instance, in [11], a given hypothesis set in boosting can
correspond to the choice of a particular kernel in SVMs and vice versa. Both
SVMs and boosting are maximal margin algorithms, whose generalization
performance of a hypothesis f can be bounded in terms of the margin with
respect to the training set. In this chapter, we will compare these maximal
margin algorithms, SVM, AdaBoost and SMA, in one practical pattern
recognition problem – pose estimation. We will especially analyze their
generalization performance in terms of the margin distribution.
This chapter is directed toward a pose detection system that can
classify the frontal images (within ±25o ) from pose images (greater angles),
under different scale, lighting or illumination conditions. The method uses
Principal Component Analysis (PCA) to generate a representation of the
facial image’s appearance, which is parameterized by geometrical variables
such as the angle of the facial image. The maximal margin algorithms are
then used to generate a statistical model, which captures variation in the
appearance of the facial angle.
All the experiments were evaluated on the CMU PIE database [12],
Weizmann database [13], CSIRO front database and CMU profile face
testing set [14]. It is demonstrated that both the SVM-based model and
SMA-based model are able to reliably classify frontal and pose images better
than the original AdaBoost-based model. This observation leads us to
discuss the generalization performance of these algorithms based on their
margin distribution graphs.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 1
We chose 3003 facial images from the CMU PIE database under 13 different
poses to generate the EPS. The details of PIE data set can be seen in
[12], where the facial images are from 68 persons, and each person was
photographed using 13 different poses, 43 different illumination conditions
and 4 different expressions. Figure 1.2 shows the pose variations.
A mean pose image and set of orthonormal eigen poses are produced.
The first eight eigen poses are shown in Fig. 1.3. Figure 1.4 shows the
Preprocessing
Crop
Normalize
Eigenvectors of
PCA EPS Training samples (x,y)
Compression EPS projection Maximal Margin Algorithms
Decision
function f(x)
Real x
Images Preprocessing EPS projection
Pose detection
Fig. 1.2. The pose variation in the PIE database. The pose varies from full left profile
to full frontal and on to full right profile. The nine cameras in the horizontal sweep are
each separated by about 22.5o . The four other cameras include one above and one below
the central camera, and two in the corners of the room, which are typical locations for
surveillance cameras.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 1
mean 1 2 3 4 5 6 7 8
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350
Fig. 1.5. Reconstructed image (Weizmann) using the eigen poses of the PIE data set.
Top: original image. Bottom: sample (masked and relocated) and its reconstruction
using the given numbers of eigenvectors. The number of eigenvectors is above each
reconstructed image.
For linear SVM, the kernel k is just the inner product in the input space,
i.e. k(x, xi ) = ⟨x, xi ⟩, and the corresponding decision function is (1.1). For
nonlinear SVMs, there are a number of kernel functions which have been
found to provide good performance, such as polynomials and radial basis
function (RBF). The corresponding decision function for a nonlinear SVM
is
(m )
∑
f (x) = sign yi αi k(x, xi ) + b . (1.2)
i=1
1.3.2. Boosting
Boosting is a general method for improving the accuracy of a learning
algorithm. In recent years, many researchers have reported significant
improvements in the generalization performance using boosting methods
with learning algorithms such as C4.5 [20] or CART [21] as well as with
neural networks [22].
A number of popular and successful boosting methods can be seen as
gradient descent algorithms, which implicitly minimize some cost function
of the margin [23–25]. In particular, the popular AdaBoost algorithm [8]
can be viewed as a procedure for producing voted classifiers which minimize
the sample average of an exponential cost function of the training margins.
The aim of boosting algorithms is to provide a hypothesis which is a
voted combination of classifiers of the form sign (f (x)), with
∑
T
f (x) = αt ht (x), (1.3)
t=1
where αt ∈ R are the classifier weights, ht are base classifiers from some
class F and T is the number of base classifiers chosen from F. Boosting
algorithms take the approach of finding voted classifiers which minimize
the sample average of some cost function of the margin.
Nearly all the boosting algorithms iteratively construct the combination
one classifier at a time. So we will denote the combination of the first t
classifiers by ft , while the final combination of T classifiers will simply be
denoted by f .
In [17], Schapire et al. show that boosting is good at finding classifiers
with large margins in that it concentrates on those examples whose margins
are small (or negative) and forces the base learning algorithm to generate
good classifications for those examples. Thus, boosting can effectively find
a large margin hyperplane.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 1
where w is the sample weight and t the training iteration index (cf. [10] for
a detailed description). The soft margin of a sample si is then defined as
Fig. 1.6. Normalized and cropped face region for nine different pose angles. The degree
of the pose is above the image. The subwindows only include eyes and part of nose for
the ±85o pose images.
original AdaBoost
0.12
training error
test error
0.1
0.08
0.06
error
0.04
0.02
0.08
0.06
error
0.04
0.02
Fig. 1.7. Training and testing error graphs of original AdaBoost and SMA when training
set size m = 500, and PC-dimension of the feature vector n = 30. The testing error of
AdaBoost overfits to 8.19% while the testing error of SMA converges to 6.06%.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 1
0.7
cumulative distribution
0.6
0.5
0.4
0.3
0.2
0.1
0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
margin
0.7
cumulative distribution
0.6
0.5
0.4
0.3
0.2
0.1
0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
margin
0.7
cumulative distribution
0.6
0.5
0.4
0.3
0.2
0.1
0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
margin
Fig. 1.8. Margin distribution graphs of SVM, AdaBoost and SMA on training set when
training sample size m = 500, PC-dimension is 30, 50, 80 respectively. For SVM and
SMA, although about 1 ∼ 2% of the training data are classified incorrectly, the margins
of more than 90% of the points are bigger than those of AdaBoost.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 1
So the “hard margin” condition needs to be relaxed. For SVM, the original
SVM algorithm [19] had poor generalization performance on noisy data
as well. After the “soft margin” was introduced [31], SVMs tended to
find a smoother classifier and converge to a margin distribution in which
some examples may have negative margins, and have achieved much better
generalization results. The SMA algorithm is similar, which tends to find a
smoother classifier because of the balancing influence of the regularization
term. So, SVM and SMA are called “soft margin” algorithms.
0.1
Support vector machine
0.09 Soft margin AdaBoost
0.08
0.07
0.06
test error
0.05
0.04
0.03
0.02
0.01
0
0 10 20 30 40 50 60 70 80 90
dimension of the preprocessed vectors
Fig. 1.9. The testing error of SVM and SMA versus the PC-dimension of the feature
vector. As the PC-dimension increases, the testing errors decrease. The testing error is
as low as 1.80% (SVM) and 1.72% (SMA) when the PC-dimension is 80.
versus the PC-dimension of the feature vector. The testing error is related
to the PC-dimension of the feature vector. The performance in pose
detection is better for a higher dimensional PCA representation. Especially,
the testing error is as low as 1.80% and 1.72% when PC-dimension is 80.
On the other hand, a low dimensional PCA representation can already
provide satisfactory performance, for instance, the testing errors are 2.27%
and 3.02% when PC-dimension is only 30.
Fig. 1.10. The number of support vectors versus the dimension of the feature vectors.
As the dimension increases, the number of support vectors increases as well.
Fig. 1.11. Test images correctly classified as profile. Notice these images include
different facial appearance, expression, significant shadows or sun-glasses.
The proportion of support vectors to the size of the whole training set is less
than 5.5%. Figure 1.11 shows some examples of the correctly classified pose
images, which include different scale, lighting or illumination conditions.
1.5. Conclusion
The main strength of the present method is the ability to estimate the
pose of the face efficiently by using the maximal margin algorithms. The
experimental results and the margin distribution graphs show that the “soft
margin” algorithms allow higher training errors to avoid the overfitting
problem of “hard margin” algorithms, and achieve better generalization
performance. The experimental results show that SVM- and SMA-based
techniques are very effective for the pose estimation problem. The testing
error on more than 20000 testing images was as low as 1.73%, where the
images cover different facial features such as beards, glasses and a great
deal of variability including shape, color, lighting and illumination.
In addition, because the only prior knowledge of the system is the
eye locations, the performance is extremely good, even when some facial
features such as the nose or the mouth become partially or wholly
occluded. For our current interest in improving the performance of our
face recognition system (SQIS), as the eye location is already automatically
determined, this new pose detection method can be directly incorporated
into the SQIS system to improve its performance.
References
[6] Y. Li, S. Gong, and H. Liddell, Support vector regression and classification
based multi-view face detection and recognition, Proceedings of IEEE
International Conference On Automatic Face and Gesture Recognition. pp.
300–305, (2000).
[7] Y. Guo, R.-Y. Qiao, J. Li, and M. Hedley, A new algorithm for face pose
classification, Image & Vision Computing New Zealand IVCNZ. (2002).
[8] Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line
learning and an application to boosting, Journal of Computer and System
Sciences. 55(1), 119–139, (1997).
[9] P. Viola and M. Jones, Rapid object detection using a boosted cascade
of simple features, IEEE Conference on Computer Vision and Pattern
Recognition. (2001).
[10] G. Rätsch, T. Onoda, and K.-R. Müller, Soft margins for AdaBoost, Machine
Learning. 42(3), 287–320, (2001).
[11] G. Rätsch, B. Schölkopf, S. Mika, and K.-R. Müller. SVM and boosting:
One class. Technical report, NeuroCOLT, (2000).
[12] T. Sim, S. Baker, and M. Bsat, The CMU Pose, Illumination, and Expression
(PIE) database, Proceedings of the IEEE International Conference on
Automatic Face and Gesture Recognition. (2002).
[13] Y. Moses, S. Ullman, and S. Edelman, Generalization to novel images in
upright and inverted faces, Perception. 25, 443–462, (1996).
[14] H. Schneiderman and T. Kanade, A statistical method for 3D object
detection applied to faces and cars, Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition. 1, 746–751, (2000).
[15] H. Murase and S. K. Nayar, Visual learning and recognition of 3D objects
from appearance, International Journal of Computer Vision. 14(1), 5–24,
(1995).
[16] M. Anthony and P. Bartlett, A Theory of Learning in Artificial Neural
Networks. (Cambridge University Press, 1999).
[17] R. Schapire, Y. Freund, P. L. Bartlett, and W. S. Lee, Boosting the
margin: A new explanation for the effectiveness of voting methods, Annals
of Statistics. 26(5), 1651–1686, (1998).
[18] V. Vapnik, The Nature of Statistical Learning Theory. (Springer, NY, 1995).
[19] B. E. Boser, I. M. Guyon, and V. N. Vapnik, A training algorithm for
optimal margin classifiers, Proceedings of the 5th Annual ACM Workshop
on Computational Learning Theory. pp. 144–152, (1992).
[20] J. R. Quinlan, C4.5: Programs for Machine Learning. (Morgan Kaufmann,
San Marteo, CA, 1993).
[21] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and
Regression Trees. (Wadsworth International Group, Belmont, CA, 1984).
[22] Y. Bengio, Y. LeCun, and D. Henderson. Globally trained handwritten word
recognizer using spatial representation, convolutional neural networks and
hidden markov models. In eds. J. Cowan, G. Tesauro, and J. Alspector,
Advances in Neural Information Processing Systems, vol. 5, pp. 937–944.
(Morgan Kaufmann, San Marteo, CA, 1994).
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 1
Chapter 2
Contents
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 PSO for Polynomial Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 PSO vs. GP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.1. Introduction
23
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 2
global exploration based on the position of best particle among all the
particles in the swarm, and local exploration based on each individual
element’s best previous position recorded. Each particle can converge
gradually toward the position of best particle among all the particles in
the swarm and its best previous position recorded so far. Kennedy and
Eberhart [2] demonstrated that PSO can solve hard optimization problems
with satisfactory results.
However, there is no research involving PSO on polynomial modeling in
a dynamic environment according to the authors’ knowledge. This can be
blamed on the fact that the polynomials consist of arithmetic operations
and polynomial variables, which are difficult to represent by the commonly
used continuous version of PSO (i.e. the original PSO [1]). In fact, recent
literature shows that PSO has been applied to solve various combinatorial
problems with satisfactory results [3–7] and the particles of the PSO-based
algorithm can represent arithmetic operations and polynomial variables
in polynomial models. These reasons motivated the author to apply a
PSO-based algorithm on generating polynomial models.
In this chapter, a PSO has been proposed for polynomial modeling in
a dynamic environment. The basic operations of the proposed PSO are
identical to those of the original PSO [1] except that elements of particles of
the PSO are represented by arithmetic operations and polynomial variables
in polynomial models. To evaluate the performance of the PSO polynomial
modeling in a dynamic environment, we compared the PSO with the GP,
which is a commonly used method on polynomial modeling [8–11]. Based
on the benchmark functions in which their optima are dynamically moved,
polynomial modeling in dynamic environments can be evaluated. This
evaluation indicates that the proposed PSO significantly outperforms the
GP in polynomial modeling in a dynamic environment.
∑
m ∑
m ∑
m
y = a0 + ai1 xi1 + ai1 i2 xi1 xi2 +
i1 =1 i1 =1 i2 =1
(2.2)
∑
m ∑
m ∑
m ∏
m
ai1 i2 i3 xi1 xi2 xi3 + ... + a123...m xim
i1 =1 i2 =1 i3 =1 im =1
where fit is the polynomial model represented by the i-th particle Pit at
the t-th generation, (xD (j), y D (j)) is the j-th training data set, and ND is
the number of training data sets used for developing the polynomial model.
t
The velocity vi,j (corresponding to the flight velocity in a search space) and
the k-th element of the i-th particle at the t-th generation pti,j are calculated
by the following formula:
t
vi,k t−1
= K(vi,k +ϕ1 ×rand()×(pbesti,k −pt−1
i,k +ϕ2 ×rand()×(gbestk −pi,k ))
t−1
(2.4)
pti,k = pt−1 t
i,k + vi,k (2.5)
where
pbesti = [pbesti,1 , pbesti,2 , ..., pbesti,Np ],
gbest = [pbest1 , pbest2 , ..., pbestNp ],
k = 1, 2, ..., Np ,
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 2
where the best previous position of a particle is recorded so far from the
previous generation and is represented as pbesti ; the position of best particle
among all the particles is represented as gbest; rand() returns a uniform
random number in the range of [0,1]; w is an inertia weight factor; φ1
and φ2 are acceleration constants; ϕ is a constriction factor derived from
the stability analysis of (2.6) to ensure the system to be converged but not
prematurely [16]. Mathematically, K is a function of φ1 and φ2 as reflected
in the following equation:
pti,j = pt−1 t
i,j + vi,j (2.6)
where ∆d = 1% or 10%.
Hence the optimal value Vmin of each dynamic function can be
represented by:
For each test run a different random seed was used. The severity was
chosen relative to the extension (in one dimension) of the initialization
area of each dynamic benchmark function. The different severities
were chosen as 1% and 10% of the range of each dynamic benchmark
function. The benchmark functions were periodically changed in every 5000
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 2
solution qualities of the GP are more unstable than the one obtained by
the PSO in every 5000 computational evaluations. Finally, the PSO can
reach smaller MAEs than the ones obtained by the GP. For the Rastrigin
function with ∆d = 10%, Fig. 2.2a shows that the PSO kept progressing
in every 5000 computational evaluations even when the optimum moved,
while the GP may obtain larger MEAs after the optimum moves at every
5000 computational evaluations. Finally, the PSO can obtain a smaller
MAE than the one obtained by the GP. For the Griewank function,
similar characteristics can be found in Fig. 2.2b. Similar results can be
found on solving the benchmark functions with ∆d = 10%. Therefore,
it can be concluded that the PSO is more effective to adapt to smaller
MAEs than the GP while the optimum of the benchmark functions move
periodically in every 5000 computational evaluations. Also, the PSO can
converge to smaller MAEs than the ones obtained by the GP in the
benchmark functions. For the benchmark functions with ∆d = 1%, similar
characteristics can be found.
Solely from the figures, it is difficult to compare the qualities and
stabilities of solutions found by both methods. Tables 2.3 and 2.4
summarize the results for the benchmark functions. They show that the
minimums, the maximums and the means of MAEs found by the PSO
are smaller than those found by the GP in all the benchmark functions
with ∆d = 1% and 10% respectively. Furthermore, the variances of
MAE found by the PSO are much smaller than those found by GP. These
results indicate that the PSO can not only produce smaller MEAs but
also produce more stable MEAs for polynomial modeling based on the
benchmark functions with the dynamic environments. The t-test is then
used to evaluate significance of performance difference between the PSO
and the GP. Table 2.4 shows that all t-values between the PSO and the GP
for all benchmark functions with ∆d = 1% and 10% are higher than 2.15.
Based on the normal distribution table, if the t-value is higher than 2.15,
the significance is 98% confident level. Therefore, the performance of the
PSO is significantly better than that of the GP with 98% confident level
in polynomial modeling based on the benchmark functions with ∆d = 1%
and 10%.
Since maintaining population diversity in population-based algorithms
like GP or PSO is a key in preventing premature convergence and stagnation
in local optima [19, 20], it is essential to study population diversities of
the two algorithms along the search. Various diversity measures, which
involve calculation of distance between two individuals in both genetic
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 2
Table 2.3. Experimental results obtained by the GP and the PSO for the benchmark
functions with ∆d = 1%.
programming [21, 22] and genetic algorithm [23] have been widely studied.
However, those distance measures can only apply either on tree-based
representation or string-based representation, which cannot be applied
on measuring population diversities for all the two algorithms, since
the representations of the two algorithms are not identical. Tree-based
representation of individuals is used in the GP. String-based representation
of individuals is used in the PSO. Either of the two types of distance
measures can be applied on the three algorithms in which the types of
representation are different.
To investigate population diversities of the algorithms, we measure the
distance between two individuals by counting the number of different terms
of the polynomials represented by the two individuals. If the terms in both
polynomials are all identical, the distance between two polynomials is zero.
The distance between two polynomials is larger with a greater number
of different terms in the two polynomials. However, for each method we
use the same for the polynomial and hence results can be compared. For
example, f1 and f2 are two polynomials represented by:
f1 = x1 + x2 + x1 x3 + x24 + x1 x3 x5 and f2 = x1 + x2 + x1 x5 + x4 + x1 x3 x5 .
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 2
Table 2.4. Experimental results obtained by the GP and the PSO for the benchmark
functions with ∆d = 10%.
2 ∑ ∑
Np Np
σg = 2 d(sg (i), sg (j)) (2.9)
Np i=1 j=i+1
where sg (i) and sg (j) are the i-th and the j-th individuals in the population
at the g-th generation, and d is the distance measure between the two
individuals.
The diversities of the populations along the generations were recorded
for the two algorithms. Figure 2.3a shows the diversities of the Sphere
functions with ∆d = 10%. It can be found from the figure that the
diversities of the GP are the highest in the early generations. The diversities
of the PSO are smaller than the ones of the GP in the early generations.
However, diversities of the PSO can be kept until the late generations,
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 2
while the ones of the GP saturated to low levels in the mid generations.
Figures 2.3b, 2.4a and 2.4b show the diversities of the two algorithms
for Rosenbrock function, Rastrigin function and Griewank function with
∆d = 10% respectively. The figures show similar characteristics to the one
of the Sphere function in that the diversities of populations of the PSO can
be kept along the generations, while the GP saturated to a low value after
the early generations. For the benchmark functions with ∆d = 1%, similar
characteristics can be found.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 2
Fig. 2.3. Population diversities for for Sphere and Rosenbrock functions.
2.4. Conclusion
In this chapter, a PSO has been proposed for polynomial modeling which
aims at generating explicit models in the form of Kolmogorov–Gabor
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 2
Fig. 2.4. Population diversities for for Rastrigin and Griewank functions.
It was shown that the PSO perform significantly better than the GP in
polynomial modeling. Enhancing the effectiveness of the proposed PSO in
polynomial modeling in dynamic environments will be considered in the
further work. It will be incorporated with the techniques implemented in
the PSO which have been shown to be able to obtain satisfactory results
on solving dynamic optimization problems [24, 25].
References
[1] R. Eberhart and J. Kennedy, A new optimizer using particle swarm theory,
Proceedings of the Sixth IEEE International Symposium on Micro Machine
and Human Science. pp. 39–43, (1995).
[2] J. Kennedy and R. Eberhart, Swarm Intelligence. (Morgan Kaufmann,
2001).
[3] Z. Lian, Z. Gu, and B. Jiao, A similar particle swarm optimization
algorithm for permutation flowshop scheduling to minimize makespan,
Applied Mathematics and Computation. 175, 773–785, (2006).
[4] C. J. Liao, C. T. Tseng, and P. Luran, A discrete version of particle swarm
optimization for flowshop scheduling problems, Computer and Operations
Research. 34, 3099–3111, (2007).
[5] Y. Liu and X. Gu, Skeleton network reconfiguration based on topological
characteristics of scale free networks and discrete particle swarm
optimization, IEEE Transactions on Power Systems. 22(3), 1267–1274,
(2007).
[6] Q. K. Pan, M. F. Tasgetiren, and Y. C. Liang, A discrete particle
swarm optimization algorithm for the no-wait flowshop scheduling problem,
Computer and Operations Research. 35, 2807–2839, (2008).
[7] C. T. Tseng and C. J. Liao, A discrete particle swarm optimization for
lot-streaming flowship scheduling problem, European Journal of Operations
Research. 191, 360–373, (2008).
[8] H. Iba, Inference of differential equation models by genetic programming,
Information Sciences. 178, 4453–4468, (2008).
[9] J. Koza, Genetic Programming: On the Programming of Computers by
Means of Natural Evolution. (MIT Press, Cambridge. MA, 1992).
[10] K. Rodriguez-Vazquez, C. M. Fonseca, and P. J. Fleming, Identifying
the structure of nonlinear dynamic systems using multiobjective genetic
programming, IEEE Transactions on Systems, Man and Cybernetics – Part
A. 34(4), 531–545, (2004).
[11] N. Wagner, Z. Michalewicz, M. Khouja, and R. R. McGregor, Time series
forecasting for dynamic environments: the dyfor genetic program model,
IEEE Transactions on Evolutionary Computation. 4(11), 433–452, (2007).
[12] D. Gabor, W. Wides, and R. Woodcock, A universal nonlinear filter
predictor and simulator which optimizes itself by a learning process,
Proceedings of IEEE. 108-B, 422–438, (1961).
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 2
Chapter 3
Contents
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Color Quantization With Half-toning . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Formulation of Restoration Algorithm . . . . . . . . . . . . . . . . . . . . . . 42
3.3.1 PSO with multi-wavelet mutation (MWPSO) . . . . . . . . . . . . . . 42
3.3.2 The fitness function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.3 Restoration with MWPSO . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Result and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
∗ The work described in this chapter was substantially supported by a grant from
the Research Grants Council of the Hong Kong Special Administrative Region, China
(Project No. PolyU 5224/08E).
39
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 3
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.1. Introduction
Figure 3.1 shows the system that performs color quantization with error
diffusion. The input image is scanned in a row-by-row fashion from top to
bottom and from left to right. The relationship between the original image
⃗ (i, j) and the encoded image Y
O ⃗(i, j) is described by
∑
U(i, j)c =O(i, j)c − H(i, j)c E(i−k, j−l)c (3.1)
(k, l)∈Ω
E ⃗(i, j) − U
⃗ (i, j) =Y ⃗ (i, j) (3.2)
⃗(i, j) =Qc [U
Y ⃗ (i, j) ] (3.3)
where U ⃗ (i, j) = (U(i, j)r , U(i, j)g , U(i, j)b ) is a state vector of the system,
⃗ (i, j) is the quantization error of the pixel at position (i, j) and H(i, j)c is
E
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 3
a coefficient of the error diffusion filter for the c color component. Ω is the
corresponding causal support region of H(i, j)c .
The operator Qc [·] performs a 3D vector quantization. Specifically,
the 3D vector U ⃗ (i, j) is compared with a set of representative color vectors
stored in a previously generated color palette V = {vbi : i = 1, 2, . . . , Nc }.
The best-matched vector in the palette is selected based on the minimum
Euclidean distance criterion. In other words, a state vector U ⃗ (i, j) is
represented by the color vbk if and only if ∥U ⃗ (i, j) − vbk ∥ ≤ ∥U ⃗ (i, j) − vbl ∥
for all l = 1, 2, . . . , Nc ; l ̸= k. Once the best-matched vector is selected
from the color palette, its index is recorded and the quantization error
E⃗ (i, j) = vbk − U
⃗ (i, j) is diffused to pixel (i, j)’s neighborhood as described
in (3.1). Note that in order to handle the boundary pixels, E ⃗ (i, j) is defined
to be zero when (i, j) falls outside the image. Without loss of generality,
in this chapter, we use a typical Floyd–Steinberg error diffusion kernel as
H(i, j)c to perform the half-toning. The recorded indices of the color palette
will be used again in future to reconstruct the color-quantized image of the
restored image.
where
and
gradually toward pbestp and gbest. A suitable selection of the inertia weight
w provides a balance between the global and local explorations. Generally,
w can be dynamically set with the following equation:
wmax − wmin
w = wmax − ×t (3.6)
T
where t is the current iteration number, T is the total number of iteration,
wmax and wmin are the upper and lower limits of the inertia weight, and
are set to 1.2 and 0.1 respectively in this chapter.
In (3.4), the particle velocity is limited by a maximum value vmax . The
parameter vmax determines the resolution of region between the present
position and the target position to be searched. This limit enhances the
local exploration of the problem space, affecting the incremental changes of
learning.
Before generating a new X(t), the mutation operation is performed:
every particle of the swarm will have a chance to mutate governed by a
probability of mutation µm ∈ [0 1], which is defined by the user. For
each particle, a random number between zero and one will be generated;
if µm is larger than the random number, this particle will be selected for
the mutation operation. Another parameter called the element probability
Nm ∈ [0 1] is then defined by the user to control the number of elements
in the particle that will mutate in each iteration step. For instance, if
xp (t) = [xp1 (t), xp2 (t), . . . , xpκ (t)] is the selected p-th particle, the expected
number of elements that undergo mutation is given by
Expected number of mutated elements = Nm × κ (3.7)
We propose a multi-wavelet mutation operation for realizing the
mutation. The exact elements for doing mutation in a particle are randomly
selected. The resulting particle is given by x̄p (t) = [x̄p1 (t), x̄p2 (t), . . . , x̄pκ (t)].
If the j-th element is selected for mutation, the operation is given by
{ ( )
xpj (t) + σ × parajmax − xpj (t) if σ > 0
p
x̄j (t) = ( ) (3.8)
xpj (t) + σ × xpj (t) − parajmin if σ ≤ 0
a = e− ln(g)×(1− T )
t ζwm
+ln(g)
(3.10)
where ζwm is the shape parameter of the monotonic increasing function, g
is the upper limit of the parameter a.
where Ebest is the fitness value of the best estimate gbest so far, then gbest
p
will be updated by Xnew (t). In summary, we have
{ p p p
p Xnew (t) if Enew < Ecur
Xcur (t) = (3.14)
Xcur (t − 1) otherwise
p
and
{ p p
Xnew (t) if Enew < Ebest
gbest = (3.15)
gbest otherwise
Fig. 3.3. Original images used for testing the proposed restoration algorithm.
Fig. 3.16. Comparisons between SA and MWPSO for Lena with 256- and 128-color
palette size.
Fig. 3.17. Comparisons between SA and MWPSO for Baboon with 256- and 128-color
palette size.
Fig. 3.18. Comparisons between SA and MWPSO for Fruit with 256- and 128-color
palette size.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 3
Fig. 3.19. Comparisons between SA and MWPSO for Lena with 64- and 32-color palette
size.
Fig. 3.20. Comparisons between SA and MWPSO for Baboon with 64- and 32-color
palette size.
Fig. 3.21. Comparisons between SA and MWPSO for Fruit with 64- and 32-color palette
size.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 3
∑ ⃗ (i, j) − Y
⃗(i, j)
2
(i, j) O
SN RI = 10 log ∑ 2 (3.16)
⃗ (i, j) − X
O ⃗ best(i, j)
(i, j)
where O ⃗ (i, j) , Y
⃗(i, j) and X
⃗ best(i, j) are the (i, j)-th pixels of the original, the
half-toned color-quantized and the optimally restored images respectively.
From Figs 3.16 to 3.21, we can see that the final results achieved by SA
and MWPSO are similar: both of them converge to some similar values. It
means that both algorithms can reach the same optimal point. However,
we can see that the proposed MWPSO exhibits a higher convergence rate.
Also, from Tables 3.1 to 3.3, MWPSO converges to smaller fitness values
in all experiments and offers higher SNRI in most of the experiments.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 3
3.5. Conclusion
References
58
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 3
PART 2
59
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 3
60
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 4
Chapter 4
Contents
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2 Hypoglycemia Detection System: Evolved Fuzzy Inference System Approach 66
4.2.1 Fuzzy inference system . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2.2 Particle swarm optimization with wavelet mutation . . . . . . . . . . . 71
4.2.3 Choosing the HPSOWM parameters . . . . . . . . . . . . . . . . . . . 74
4.2.4 Fitness function and training . . . . . . . . . . . . . . . . . . . . . . . 77
4.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
61
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 4
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.1. Introduction
Fig. 4.2. The concerned points: Q, R, Tp and Te, which are used to find the ECG
parameters.
√
QT
between Q and Tp points. QTc is RR in which RR is the interval between
60
R peaks. Heart rate is RR .
4.2.1.1. Fuzzification
The first step is to take the inputs and determine the degree of membership
to which they belong to each of the appropriate fuzzy sets via membership
functions. In this study, there are four inputs, HR, QTc , ∆HR, and ∆QTc .
The degree of the membership function is shown in Fig. 4.3. For input HR,
a bell-shaped function, µNHRk (HR(t)) is given when mkHR ̸= max{mHR } or
min{mHR }.
−(HR(t)−mk
HR )
2
2ς k
µNHR
k (HR(t)) = e HR , (4.1)
where
[ mf ]
mHR = m1HR m2HR · · · mkHR · · · mHR ,
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 4
0.9
0.8
0.7
0.6
V
0.5 k
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
4.2.1.3. Defuzzification
Defuzzification is the process of translating the output of the fuzzy rules
into a scale. The presence of hypoglycemia h(t) is given by:
{
−1 if y(t) < 0
h(t) = , (4.5)
+1 if y(t) ≥ 0
where,
∑
nr
y(t) = mγ (t)wγ , (4.6)
γ=1
and
µN k (z(t))
mγ (t) = ∑nr z (4.7)
γ=1 µNzk (z(t))
where
µNzk (z(t)) = (µNHR
k (HR(t))) × (µNQT
k (QTc (t)))
c
(4.8)
×(µN∆HR
k (∆HR(t))) × (µN∆QT
k (∆QTc (t)))
c
vjp (t) = k ·{{w ·vjp (t−1)}+{φ1 ·r1 (x̃pj −xpj (t−1))}+{φ2 ·r2 (x̂j −xpj (t−1))}}
(4.9)
and
where x̃p = [x̃p1 , x̃p2 , . . . , x̃pk ] and x̂ = [ xˆ1 xˆ2 , ...xˆκ ], j = 1, 2, ..., κ. The
best previous position of a particle is recorded and represented as x̃; the
position of best particle among all the particles is represented as x̂; w is
an inertia weight factor; r1 and r2 are acceleration constants which return
a uniform random number in the range of [0,1]; k is a constriction factor
derived from the stability analysis of 4.10 to ensure the system is converged
but not prematurely [57]. Mathematically, k is a function of φ1 and φ2 as
reflected in the following equation:
( )
2
k= √ (4.11)
|2 − φ − φ2 − 4φ|
1 (φ) ( ( φ )) φ 2
1 −( a )
σ = ψa,0 (φ) = √ ψ = √ e 2 cos 5 (4.14)
a a a a
The amplitude of ψa,0 (φ) will be scaled down as the dilation parameter
a increases. This property is used to do the mutation operation in order
to enhance the searching performance. According to 4.14, if σ is positive
and approaching one, the mutated element of the particle will tend to have
the maximum value of xpj (t). Conversely, when σ is negative (σ ≤ 0)
approaching −1, the mutated element of the particle will tend to the
minimum value of xpj (t). A larger value of |σ| gives a larger searching
space for xpj (t). When |σ| is small, it gives a smaller searching space
for fine-tuning. As over 99% of the total energy of the mother wavelet
function is contained in the interval [−2.5, 2.5], it can be generated from
[−2.5, 2.5] × a randomly. The value of the dilation parameter a is set to
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 4
vary with the value of Tt in order to meet the fine-tuning purpose, where T
is the total number of iterations and t is the current number of iterations.
In order to perform a local search when t is large, the value of a should
increase as Tt increases so as to reduce the significance of the mutation.
Hence, a monotonic increasing function governing a and Tt is proposed in
the following form:
ζwm
a = e−ln(g)×(1− T )
t
+ln(g)
(4.15)
4
10
ζ=5
10
3
ζ=2
dilation parameter a
ζ=1
2
10
ζ=0.5
ζ=0.2
1
10
0
10
0 0.2 0.4 0.6 0.8 1
t/T
t
Fig. 4.4. Effect of the shape parameter ζwm to a with respect to T
= 0.
(i) Increasing swarm size (θ) will increase the diversity of the search space,
and reduce the probability that HPSOWM prematurely converges to
a local optimum. However, it also increases the time required for the
population to converge to the optimal region in the search space.
(ii) Increasing the probability of mutation (µm ) tends to transform the search
into a random search such that when µm = 1, all elements of particles
will mutate. This probability gives us an expected number (µm × θ × κ)
of elements of particles that undergo the mutation operation. In other
words, the value of µm depends on the desired number of elements of
particles that undergo the mutation operation. Normally, when the
dimension is very low (number of elements of particles is less than 5,
µm is set at 0.5 to 0.8). When the dimension is around 5 to 10, µm is set
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 4
5
10
g=10000
g=100000
4
10
g=1000
dilation parameter a
3
10
g=100
2
10
1
10
0
10
0 0.2 0.4 0.6 0.8 1
t/T
t
Fig. 4.5. Effect of the parameter g to a with respect to T
= 0.
NT N
η= (4.17)
NT N + NF P
where NT P is the number of true positives which implies the sick people
correctly the diagnosed as sick; NF N is the number of the false negatives
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 4
Fifteen children with T1DM (14.6 ± 1.5 years) volunteered for the
10-hour overnight hypoglycemia study at the Princess Margaret Hospital
for Children in Perth, Western Australia, Australia. Each patient is
monitored overnight for the natural occurrence of nocturnal hypoglycemia.
Data are collected with approval from Women’s and Children’s Health
Service, Department of Health, Government of Western Australia and with
informed consent. A comprehensive patient information and consent form
is formulated and approved by the Ethics Committee. The consent form
includes the actual consent and a revocation of consent page. Each patient
receives this information consent form at least two weeks prior to the
start of the studies. He/she has the opportunity to raise questions and
concerns with any medical advisors, researchers and the investigators. For
the children participating in this study, the parent or guardian signed the
relevant forms.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 4
Training Testing
Method Sensitivity Specificity Sensitivity Specificity Area of ROC curve
14
12
10
BG (mml/dl)
0
0 100 200 300 400 500
time (minutes)
4.4. Conclusion
0.9
0.8
0.7
EFIS
0.6
sensitivity
0.5 EMR
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
1−specificity
References
[18] G. A. F. Seber and A. J. Lee, Linear regression analysis. (John Wiley &
Sons, New York, 2003).
[19] S. M. Winkler, M. Affenzeller, and S. Wagner, Using enhanced genetic
programming techniques for evolving classifiers in the context of medical
diagnosis, Genetic Programming and Evolvable Machines. 10(2), 111–140,
(2009).
[20] T. S. Subashini, V. Ramalingam, and S. Palanivel, Breast mass classification
based on cytological patterns using RBFNN and SVM, Source Expert
Systems with Applications: An International Journal Archive. 36(3),
5284–5290, (2009).
[21] H. F. Gray and R. J. Maxwell, Genetic programming for classification and
feature selection: analysis of 1H nuclear magnetic resonance spectra from
human brain tumour biopsies, NMR in Biomedicine. 11(4), 217–224, (1998).
[22] C. M. Fira and L. Goras, An ECG signals compression method and its
validation using NNs, IEEE Transactions on Biomedical Engineering. 55
(4), 1319–1326, (2008).
[23] W. Jiang, S. G. Kong, and G. D. Peterson, ECG signal classification using
block-based neural networks, IEEE Transactions on Neural Networks. 18
(6), 1750–1761, (2007).
[24] A. H. Khandoker, M. Palaniswami, and C. K. Karmakar, Support vector
machines for automated recognition of obstructive sleep apnea syndrome
from ECG recordings, IEEE Transactions on Information Technology in
Biomedicine. 13(1), 37–48, (2009).
[25] J. S. Jang and C. T. Sun, Neuro-Fuzzy and Soft Computing: A
Computational Approach to Learning and Machine Intelligence. (Prentice
Hall, Upper Saddle River, NJ, 1997).
[26] C. Cortes and V. Vapnik, Support-vector networks, Machine Learning. 20
(3), 273–297, (1995).
[27] A. K. Jain, J. Mao, and K. M. Mohiuddin, Artificial neural networks: a
tutorial, IEEE Computer Society. 29(3), 31–44, (1996).
[28] K. H. Chon and R. J. Cohen, Linear and nonlinear ARMA model
parameter estimation using an artificial neural network, IEEE Transactions
on Biomedical Engineering. 44(3), 168–174, (1997).
[29] W. W. Melek, Z. Lu, A. Kapps, and B. Cheung, Modeling of dynamic
cardiovascular responses during G-transition-induced orthostatic stress in
pitch and roll rotations, IEEE Transactions on Biomedical Engineering. 49
(12), 1481–1490, (2002).
[30] S. Wang and W. Min, A new detection algorithm (NDA) based on fuzzy
cellular neural networks for white blood cell detection, IEEE Transactions
on Information Technology in Biomedicine. 10(1), 5–10, (2006).
[31] Y. Hata, S. Kobashi, K. Kondo, Y. Kitamura, and T. Yanagida, Transcranial
ultrasonography system for visualizing skull and brain surface aided by fuzzy
expert system, IEEE Transactions on Systems, Man, and Cybernetics - Part
B. 35(6), 1360–1373, (2005).
[32] S. Kobashi, Y. Fujiki, M. Matsui, and N. Inoue, Genetic programming
for classification and feature selection: analysis of 1H nuclear magnetic
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 4
86
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 4
PART 3
87
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 4
88
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 5
Chapter 5
Contents
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3 Global Boundness Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.4 Limit Cycle Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.5 Application of Perceptron Exhibiting Limit Cycle Behavior . . . . . . . . . . 97
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1. Introduction
89
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 5
5.2. Notations
Define w(n) ≡ [w0 (n), w1 (n), . . . , wd (n)]T ∀n ≥ 0 and denote the output
of the perceptron as y(n) ∀n ≥ 0, then y(n) = Q(wT (n)x(n)) ∀n ≥ 0.
Denote the desired output of the perceptron corresponding to x(n) as t(n)
∀n ≥ 0. Assume that the perceptron training algorithm [9] is employed for
the training, so the updated rule for the weights of the perceptron is as
follows:
t(n) − y(n)
w(n + 1) = w(n) + x(n) ∀ n ≥ 0. (5.1)
2
Lemma 5.1. Assume that there are two perceptrons with the initial weights
w(0) and w∗ (0). Suppose that the set of the bounded training feature vectors
and the corresponding set of desired outputs of these two perceptrons are the
∗ T T
same, then (w(k) − w∗ (k))T Q((w (k)) x(k))−Q((w(k))
2
x(k))
x(k) ≤ 0∀k ≥ 0.
The importance of Lemma 5.2 is for deriving the result in Theorem 5.1
stated below, which describes the main result on the boundedness condition
of the weights of the perceptron.
Theorem 5.1. If ∃w∗ (0) ∈ Rd+1 and ∃B̃ ≥ 0 such that ∥w∗ (k)∥ ≤ B̃
∀k ≥ 0, then ∃B ′′ ≥ 0 such that ∥w(k)∥ ≤ B ′′ ∀k ≥ 0 and ∀w(0) ∈ Rd+1 .
what exact initial weights will lead to the bounded behavior. However, by
Theorem 5.1, we can conclude that it is not necessary to know the exact
initial weights which lead to the bounded behavior. This is because once
there exists a nonempty set of initial weights that leads to the bounded
behavior, then all initial weights will lead to the bounded behavior. The
result implies that engineers can employ arbitrary initial weights for the
training and the boundedness condition is independent of the choice of
the initial weights. This phenomenon is counter-intuitive to the general
understanding of symbolic dynamical systems because the system state
vectors of general symbolic dynamical systems may be bounded for some
initial system state vectors, but exhibit an unbounded behavior for other
initial system state vectors. It is worth noting that the weights of the
perceptron may exhibit complex behaviors, such as limit cycle or chaotic
behaviors.
{
t(j) j ̸= mod(n, N )
qj (n) ≡
Q(wT (n)x(n)) otherwise
∗
∀n ≥ 0 and for j = 0, 1, . . . , N
∑− 1. If ∃w (0) ∈ R
d+1
and ∃B ≥ 0 such that
∗
∥w (k)∥ ≤ B ∀k ≥ 0, then t(j) − qj (n) = 0 for j = 0, 1, . . . , N − 1.
∀n≥0
Fig. 5.1. A block diagram for modeling the dynamics of the weights of the perceptron.
Lemma 5.3. Suppose that q1 and q2 are co-prime and M and N are
positive integers. That is q1 M = q2 N . Then w∗ (n) is periodic with period
∑
M −1
t(kM + j) − Q((w∗ (kM + j))T x(kM + j))
M if and only if x(kM +
j=0
2
j) = 0 for k = 0, 1, . . . , q1 − 1.
Although the range of the number of updates for the weights of the
perceptron to reach the limit cycle is equal to ∥Cnq2 N +kN +j ∥1 , it can be
reflected through ∥XCnq2 N +kN +j ∥2 . Hence, Theorem 5.2 provides an idea
on the range of the number of updates for the weights of the perceptron to
reach the limit cycle, which is useful for the estimation of the computational
effort of the training algorithm. In order to estimate the bounds for
∥Cnq2 N +kN +j ∥1 , denote m′ as the number of the differences between the
output of the perceptron based on w(0) and that based on w∗ (0), that is
∑ Q((w∗ (n))T x(n)) − Q((w(n))T x(n))
m′ = | |.
2
∀n
Then we have the following result:
M −2 q1∑
q1∑ M −1
Q((w∗ (j))T x(j)) − y(j)
c≡ (w∗ (k) − w(k))T x(j), then m′ ≥
j=0 k=j+1
2
1 −1 M
q∑ ∑ −1
∗
c+ ∥w (kM + j) − w(kM + j)∥2
k=0 j=0
1 −1 M
q∑ −1
.
∑
∗
∥ w (pM + i) − w(pM + i)∥K
p=0 i=0
generates 100 bounded feature vectors for transmission and the channel is
corrupted by an additive white Gaussian noise with zero mean and variance
equal to 0.5. These bounded feature vectors are used for testing. Figure
5.2 shows the distribution of these bounded testing feature vectors. On the
other hand, the 16 noise-free bounded training feature vectors are trained
using the conventional perceptron training algorithm.
5.6. Conclusion
when the downsampled sets of bounded training feature vectors are linearly
separable.
References
Chapter 6
Yi Zhao
Harbin Institute of Technology Shenzhen Graduate School
Shenzhen, China
[email protected]
Artificial neural network (ANN) is well known for its strong capability
to handle nonlinear dynamical systems, and this modeling technique
has been widely applied to the nonlinear time series prediction problem.
However, this application needs more cautions as overfitting is the
serious problem endemic to neural networks. The conventional method
of avoiding overfitting is to avoid fitting the data too precisely while
it cannot determine the exact model size directly. In this chapter,
we employ an alternative information theoretic criterion (minimum
description length) to determine the optimal architecture of neural
networks according to the equilibrium between the model parameters
and model errors. When applied to various time series, we find that the
model with the optimal architecture both generalizes well and accurately
captures the underlying dynamics. To further confirm the dynamical
character of the model residual of the optimal neural network, the
surrogate data method from the regime of nonlinear dynamics is then
described to analyze such model residual so as to determine whether
there is significant deterministic structure not captured by the optimal
neural network. Finally, a diploid model is proposed to improve the
prediction precision under the condition that the prediction error is
considered to be deterministic. The systematic framework composed
of the preceding modules is validated in sequence, and illustrated with
several computational and experimental examples.
Contents
101
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 6
102 Yi Zhao
6.1. Introduction
104 Yi Zhao
learning [12–14]. As Zhao and Small discussed [15], the validation set
required in the method of early stopping should be representative of all
points in the training set, and the training algorithm cannot converge too
fast. The weights decaying method modifies the performance function, the
mean sum of squares of the model errors to the sum of the mean sum of
squares of the model errors and the network parameters. The key problem
of this method is that it is difficult to seek equilibrium between the modified
two parts.
For Bayesian learning, the optimal regularization parameter, the
previous balance is determined in an automatical way [14]. The promising
feature of Bayesian learning is that it can measure how many network
parameters are effectively used for the given application. Although it gives
an indicator of wasteful parameters, this method cannot build a compact
(or smaller) optimal neural network.
In this chapter, we describe an alternative approach, which estimates
exactly the optimal structure of the neural network for time series
prediction. We focus on feedforward multi-layer neural networks, which has
been proved to be qualified for modeling nonlinear functions. The criterion
is a modification and competitive to the initial minimum description length
(MDL) [16].
This method is based on the well-known principle of minimum
description length rooted in the theory of algorithmic complexity [17].
J. Rissanen proposed the issue of model selection as a problem in data
compression with a series of papers starting with [18]. Judd and Mees
developed this principle to selection for local linear models [19], and then
Small and Tse extended the previous analysis to selection for radius basis
function models [20]. Zhao and Small generalized the previous results
to neural network architectures [15]. Several other modifications to the
method of minimum description length can be found in the literature
according to their own demands [21–24].
Another technique, surrogate data method, is also described in this
chapter. Surrogate data tests are examples of Monte Carbo hypothesis tests
[25]. The standard surrogate data method, suggested and implemented
by Theiler et al. has been widely applied in the literature [26]. This
method aims to determine whether the given time series has a statistically
significant deterministic component, or is just consistent with identical
independent distribution (i.i.d.) noise. We will introduce this idea at length
in Section 6.4.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 6
The input vectors where b0 ,bi , vi , ωi,j are parameters, k represents the
number of neurons in the hidden layer, and f (·)is the activation function
of neurons. As shown in the figure below, there is one hidden layer,
but notice that the multiple hidden layers are also optional. The input
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 6
106 Yi Zhao
n1 a1
w1, 1 f
p1 b1
1
n2 a2 V(1×S)
p2 f y
+
b2 b0
p3
...
...
1
...
...
pR nS aS
f
wS, R
bS
1
a = f (WP+b) y = Vf (WP+b)+b0
Fig. 6.1. The multilayer network is composed of the input, hidden and output layers
[27].
108 Yi Zhao
Intuitively, the typical tendency of E(k ) and M(k ) is that if the model size
increases M(k ) increases and E(k ) decreases, which corresponds to the more
model parameters and more potential modeling ability (i.e. less prediction
error) respectively. The minimum description length principle states that
the optimal model is the one that minimizes D(k ).
Let {yi }N
i=1 be a time series of N measurements and
∑
k
c
M (k) = L(Λk ) = ln (6.3)
i=1
δi
N 2π N ∑N
N
E(k) = + ln( ) 2 + ln( e2i ) 2 (6.5)
2 N i=1
For the multilayer neural network described in the previous section, its
parameters are completely denoted by Λk = {b0 , bi , vi , wi,j |i = 1. · · · , k, j =
1, · · · , d}. Of these parameters, the weights vi and the bias b0 are all
linear, the remaining parameters wi,j and bi (i = 1, · · · , k, j = 1, · · · , d)
are nonlinear. Fortunately, the activation function f (·) is approximately
linear in the region of interest so we suppose that the precision of the
nonlinear parameters is similar to that of the linear one and employ the
linear parameters to give the precision of the model, δi .
To account for the contribution of all linear and nonlinear parameters
to M(k ), instead of the contribution of only linear ones, we define np (i)
as the effective number of the parameters associated with the ith neuron
contributed to the description length of the neural network [15]. So M(k )
is updated by:
∑
k
γ
M (k) = np (i) ln (6.6)
i=1
δi
110 Yi Zhao
A. Computational experiments
Consider a reconstruction of the Rössler system with dynamic noise. The
equations of the Rössler system are given by ẋ(t) = −y(t) − z(t), ẏ(t) =
x(t) + a ∗ y(t), ż(t) = b + z(t) ∗ [x(t) − c] with parameters a = 0.1, b =
0.1, c = 18 to generate the chaotic data [37]. Here we set the iteration step
ts to 0.25. By dynamic noise we mean that system noise is added to the
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 6
Fig. 6.2. Description Length (solid line) of neural networks for modeling the Rössler
system (left panel) has the minimum point at five, and the fitted curve (dashed line)
attains the minimum at the same point. In the right panel the solid line is the mean
square error of training set and the dotted line is that of testing data.
We observe that both the DL and fitted curves denote that the optimal
number of neurons is five. That is, the neural network with five neurons is
the optimal model according to the principle of the minimum description
length. Mean square error of the testing set gives little help in indicating
the appearance of overfitting.
As a comparison, we chose other three networks with different numbers
of neurons to perform a free-run prediction for the testing set. Concerning
free-run prediction, the prediction value is based on the current and
previous prediction values, in a comparison of the so-called one-step
prediction. The predicted x -component data is converted into three vectors,
x(t), x(t + 3) and x(t + 5) (t ∈ [1, 395]) to construct the phase space shown
in Fig. 6.3. The network with five neurons exactly captures the dynamics
of the Rössler system but the neural network with more neurons is apt to
overfit. As the test data is chaotic Rössler data, the corresponding attractor
should be full of reconstructed trajectories, as shown in Fig. 6.3(b).
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 6
112 Yi Zhao
B. Experimental data
6.4 describes description length and mean square error for this application
respectively.
Fig. 6.4. Description length (solid line) and the fitted curve (dashed line) of ECG data
(left panel) estimate the minimal points at the 11 and 7 point respectively. The right
one is the mean square error of training set (solid line) and testing set (dotted line).
The DL curve suggests the optimal number of neurons is 11, but the
fitted curve estimates that the optimal number of neurons is 7. In this
experiment the mean square error (MSE) of testing data cannot give
valuable information regarding the possibility of overfitting either. We
thus deliberately select neural networks with both 7 and 11 neurons for
verification. As in the previous case, another 2 neural networks with 5
neurons and 17 neurons are also used for comparison. All the free-run
predictions obtained by these models are illustrated in Fig. 6.5
Networks with 7 neurons can predict the testing data accurately, but
prediction of the network with 11 neurons is overfitted over the evolution
of the time. So we confirm that the fitted curve shows robustness against
fluctuation. The neural network with 7 neurons can provide adequate
fitting to both training and novel data, and networks with more neurons,
such as 11, overfit. Although referring to the DL line one may decide
the wrong optimal number of neurons, the fitted curve reflects the true
tendency hidden in the original DL estimation. For the computational
data, both the original DL curve and the fitted one can provide the same
or close estimation while taking the practical data into consideration, it is
demonstrated the nonlinear curve fitting is necessary and effective.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 6
114 Yi Zhao
Fig. 6.5. Free-run prediction (dotted line) and actual ECG data (solid line) for 4 neural
networks with 5, 7, 11 and 17 neurons.
The surrogate data method has been widely applied in the literature. It was
proposed to analyze whether the observed data is consistent with the given
null hypothesis. Here, the observed data is the previous model residual.
Hence, we employ the surrogate data method to answer the question in the
last section. The given hypothesis is whether such data is consistent with
random noise, known as NH0.
In fact, this technique can also be used prior to modeling to find whether
the data comes from the deterministic dynamics or merely random noise.
In the latter case, it is wasteful to make the prediction on it. Stochastic
process tools may be more applicable to model such data. Consequently,
the surrogate data method before modeling provides an indirect evidence
of predictability of the data.
If the test statistic value for the data is distinct from the ensemble of values
estimated for surrogates, then one can reject the given null hypothesis as
being a likely origin of the data. If the test statistic value for the data
is consistent with that for surrogates, then one may not reject the null
hypothesis. Consequently, surrogate data provides a rigorous way to apply
the statistical hypothesis testing to exclude experimental time series from
the family of certain dynamics. One can apply it to determine whether an
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 6
116 Yi Zhao
Increment of n
No
Model the time series
Collect time using neural networks n reaches an
series with n neurons adequate value?
(initialize n = 1)
Yes
Fig. 6.6. Systematic integration of neural networks, MDL, and the surrogate data
method to capture the dynamics of the observed time series.
118 Yi Zhao
Fig. 6.7. Analysis of the prediction error for the Rössler system (left panel) and human
pulse data (right panel). Stars are the correlation dimension of the original model residual
of the optimal model; the solid line is the mean of correlation dimension of 50 surrogates
at every embedding dimension; two dashed lines denote the mean plus one standard
deviation (the upper line) and the mean minus one standard deviation (the lower line);
two dotted lines are the maximum and minimum correlation dimension among these
surrogates.
Fig. 6.8 we can observe that the correlation dimension of this new error is
even further away from the maximal boundary of correlation dimension for
surrogate between de = 2 and de =6. So we can reject the given hypothesis
that the original error is the i.i.d noise. It is consistent with our expectation.
Again, numerical problems with GKA are evident for de ≥ 7.
We notice that the surrogate data method can exhibit the existence
of this deterministic structure in the model residual even if it is weak in
comparison with the original signal. On the contrary, if there is no difference
between the model prediction and data, the surrogate data method can also
exhibit corresponding consistency.
We apply the surrogate data method to the residual of the optimal
model (i.e. the prediction error). It estimates correlation dimension
for this prediction error and its surrogates under the given hypothesis
(NH0): the prediction error is consistent with i.i.d noise. According
to results, we cannot reject that the prediction error is i.i.d noise. We
conclude that with the test statistic at our disposal, the predictions achieved
by the optimal neural network and original data are indistinguishable.
Combination of neural networks, minimum description length and the
surrogate data method provides a comprehensive framework to handle time
series prediction. We feel that this technique is important and will be
applicable to a wide variety of real-world data.
Fig. 6.8. Application of the surrogate data method to the prediction error contaminated
with deterministic dynamics. All the donation of curves and symbol are the same as the
previous graph.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 6
120 Yi Zhao
In the previous section, all the prediction is made based on a single model.
However, the observed data may come from various different dynamics. So
a single model is not qualified for handling this kind of complicated data.
Section 6.4.4 gave an example to describe this case. The single model
usually just captures one dominating dynamic but fails to respond to other
relatively weak dynamics also hidden in the data. Thus, new integrated
model structures are developed to model those data with more dynamics.
It has been well verified that the prediction can be improved significantly
by the combination of different methods [43, 44]. Several combining schemes
have been proposed to enhance the ability of the neural network. Horn
described a combined system of the two feedforward neural networks to
predict the future value [45]. Lee et al. proposed an integrated model with
backpropagation neural networks and self organizing feature map (SOFM)
model for bankruptcy prediction [46]. Besides that, the combination
of autoregressive integrated moving average model (ARIMA) and neural
networks are often mentioned as a common practice to improve prediction
accuracy of the practical time series [47, 48]. They both show that the
prediction error of the combined model is smaller than just the single
model, which exhibits the advantages of the combined model. Inspired by
the previous works, we develop an idea of the diploid model for predicting
complicated data.
We construct the diploid neural network in a series way, which is defined
as the series-wound model (Fig. 6.9). The principle of series-wound diploid
model is that after the prediction of model I, its prediction error is used as
the input data for model II to compensate the error of model I. The final
results will be the sum of each output of both models.
6.6. Conclusion
122 Yi Zhao
Acknowledgments
References
124 Yi Zhao
126
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
Chapter 7
1
Yiguang Liu, 1 Zhisheng You, 2 Bingbing Liu and 1 Jiliu Zhou
1
Video and Image Processing Lab, School of Computer Science &
Engineering, Sichuan University, China 610065,
[email protected]
2
Data Storage Institute, A* STAR, Singapore 138632
Contents
7.1 A Simple Recurrent Neural Network for Computing the Largest and Smallest
Eigenvalues and Corresponding Eigenvectors of a Real Symmetric Matrix . . 128
7.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.1.2 Analytic solution of RNN . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.1.3 Convergence analysis of RNN . . . . . . . . . . . . . . . . . . . . . . . 131
7.1.4 Steps to compute λ1 and λn . . . . . . . . . . . . . . . . . . . . . . . . 135
7.1.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
127
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
128 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
7.1.1. Preliminaries
All eigenvalues of A are denoted as λ1 ≥ λ2 ≥ · · · ≥ λn , and their
corresponding eigenvectors and eigensubspaces are denoted as µ1 , · · · , µn
and V1 , · · · , Vn . ⌈x⌉ denotes the most minimal integer which is not less
than x. ⌊x⌋ denotes the most maximal integer which is not larger than x.
130 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
when zi (t) ̸= 0
1 d ∑ n
λi − zi (t) = zj2 (t).
zi (t) dt j=1
Therefore
1 d 1 d
λi − zi (t) = λr − zr (t) (r = 1, · · · , n) ,
zi (t) dt zr (t) dt
so
zi (t) zi (0)
= exp [(λi − λr ) t] (7.6)
zr (t) zr (0)
for t ≥ 0. From Equation (7.2), we can also get
[ ] ∑n [ ]2
d 1 1 zj (t)
= −2λr + 2 . (7.7)
dt zr2 (t) zr2 (t) j=1
zr (t)
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
i.e.
∑n [ ]2 ∫ t
exp (2λr t) 1 zj (0)
− = 2 exp (2λj τ ) dτ ,
zr2 (t) zr2 (0) j=1
zr (0) 0
so
∑
n
x (t) = zi (t) Si
i=1
∑n
zi (t)
= zr (t) zr (t) Si
i=1
∑n
zi (0) zr (0) exp(λr t)Si
= zr (0) exp [(λi − λr ) t] √ ∑
n ∫ (7.9)
i=1 1+2 zj2 (0) 0t exp(2λj τ )dτ
j=1
∑
n
zi (0) exp(λi t)Si
= √ i=1
.
∑
n ∫ t
1+2 zj2 (0) 0
exp(2λj τ )dτ
j=1
Theorem 7.2. For nonzero initial vector x (0) ∈ Vi , the equilibrium vector
ξ may be trivial or ξ ∈ Vi . If ξ ∈ Vi , ξ T ξ is the eigenvalue λi .
Proof: Since x (0) ∈ Vi , Vi ⊥Vj (1 ≤ j ≤ n, j ̸= i), the projections of
x (0) onto Vj (1 ≤ j ≤ n, j ̸= i) are zero. Therefore, from Equation (7.9),
it follows that
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
132 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
∑
n
zk (0) exp(λk t)Sk
x (t) = √ k=1
∑
n ∫ t
1+2 zj2 (0) 0
exp(2λj τ )dτ
j=1
zi (0) exp(λi t)Si
= √
∑
n ∫
1+2 zj2 (0) 0t exp(2λj τ )dτ
j=1
∈ Vi ,
zi (0) ̸= 0 (1 ≤ i ≤ n) .
∑
n
z1 (0)S1 + zi (0) exp[(λi −λ1 )t]Si
x (t) = √ i=2
.
2 (0)
z1 ∑
n ∫ t
exp(−2λ1 t)+ λ1 [exp(2λ1 t)−1] exp(−2λ1
t)+2 zj2 (0) 0
exp[(2λj τ )−(2λ1 t)]dτ
j=2
ξ = lim x (t)
t→∞
∑
n
z1 (0)S1 + zi (0) exp[(λi −λ1 )t]Si
= lim √ i=2
t→∞ 2 (0)
z1 ∑
n ∫
exp(−2λ1 t)+ λ1 [exp(2λ1 t)−1] exp(−2λ1 t)+2 zj2 (0) t
0
exp[(2λj τ )−(2λ1 t)]dτ ,
√ j=2
λ1
= z
z12 (0) 1
(0) S1
∈ V1
(7.10)
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
and
√ √
λ1 λ1
ξT ξ = z
z12 (0) 1
(0) S1T S1 z
z12 (0) 1
(0)
(7.11)
= λ1 .
From Equations (7.10) and (7.11), we know this theorem is proved.
dx (t)
= −Ax (t) − xT (t) x (t) x (t) . (7.12)
dt
For RNN (7.12), from Theorem 7.3, we know that if |x (0)| ̸= 0,
x (0) ∈
/ Vi (i = 1, · · · , n) and the equilibrium vector ∥ξ∥ ̸= 0, ξ belongs
to the eigenspace corresponding to the largest eigenvalue of −A. Because
the eigenvalues of −A are −λn ≥ −λn−1 ≥ · · · ≥ −λ1 . Therefore, ξ ∈ Vn
and ξ T ξ = −λn , i.e. λn = −ξ T ξ.
λn = −ξ T ξ + ⌈λ1 ⌉ . (7.13)
134 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
′
Proof: Let λn denote the smallest eigenvalue of (A − ⌈λ1 ⌉ I). Since A
is positive definite, so, the smallest eigenvalue of (A − ⌈λ1 ⌉ I) is negative,
therefore ξ is nonzero when A is replaced by − (A − ⌈λ1 ⌉ I), from Theorem
7.4, it follows that
′
λn = −ξ T ξ. (7.14)
λ1 = ξ T ξ + ⌊λn ⌋ . (7.16)
t→∞ ∑
n ∫ t
1+2 zj2 (0) 0
exp(2λj τ )dτ
j=1
∑
n (7.17)
zi (0) lim exp(λi t)Si
t→∞
= √ i=1
∑
n ∫ t
1+2 lim zj2 (0) 0
exp(2λj τ )dτ
t→∞ j=1
= 0.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
ξ = lim x (t)
t→∞
∑
k ∑
n
zi (0) exp(λi t)Si + zi (0) exp(λi t)Si
i=p i=k+1
= lim √
t→∞ ∑
n ∫
1+2 zj2 (0) 0t exp(2λj τ )dτ
j=1
∑
k ∑
n
zp (0)Sp + lim zi (0) exp(λi t−λp t)Si + lim zi (0) exp(λi t−λp t)Si
t→∞ i=p+1 t→∞ i=k+1
= v { }
u
u ∫
t1+ lim z 2 (0)[1−exp(−2λp t)]/λp +2 ∑ z 2 (0) t exp(2λj τ −2λp t)dτ
n
t→∞ p j 0
√ /
j=p+1
2) If k + 1 ≤ p ≤ n, it follows that
ξ = lim x (t)
t→∞
∑
n
zi (0) exp(λi t)Si
i=p
= lim √ . (7.19)
t→∞ ∑
n ∫ t
1+2 zj2 (0) 0
exp(2λj τ )dτ
p=1
=0
From Equations (7.17), (7.18) and (7.19), we know that the theorem is
correct.
7.1.5. Simulation
We provide three examples to evaluate the method integrated by RNN (7.2)
and the five steps for computing the smallest eignevalue and the largest
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
136 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
0.4116 0.3646 0.5513 0.6659 0.5836 0.6280 0.4079
0.3919
0.3646 0.4668 0.3993 0.4152 0.4059 0.4198
0.8687
0.5513 0.3993 0.9862 0.4336 0.6732 0.2326
B= 0.6659 0.4152 0.4336 0.5208 0.5422 0.5827 0.5871 .
0.5836 0.4059 0.6732 0.5422 0.3025 0.8340 0.4938
0.6280 0.4198 0.2326 0.5827 0.8340 0.9771 0.3358
0.4079 0.3919 0.8687 0.5871 0.4938 0.3358 0.1722
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
Fig. 7.3. The trajectories of the components of x when A is replaced with −B.
138 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
Fig. 7.4. The trajectories of λn (t) = −xT (t)x(t) when A is replaced with −B.
T ∑
7
λn (t) = −x (t) x (t) = − x2i (t) are shown in Figs 7.3 and 7.4. By using
i=1
Matlab to directly compute the eigenvalues of B, we get the “ground truth”
′ ′
largest eigenvalue λ1 = 3.6862 and the smallest eigenvalue λn = −0.4997.
Therefore, compare the results received by the RNN (7.2) approach with
the ground truth, the absolute difference values are
′
λ1 − λ1 = 0.0002,
and
′
λn − λn = 0.0000.
This verifies that the RNN (7.2) can compute the largest eigenvalue and
the smallest eigenvalue of B successfully.
Example 2: Use Matlab to generate a positive matrix that is
0.6495 0.5538 0.5519 0.5254
0.5538 0.8607 0.7991 0.5583
C=
0.5519 0.7991 0.8863 0.4508 ,
0.5254 0.5583 0.4508 0.6775
the ground truth eigenvalues of C directly computed by Matlab are λ′ =
(0.0420 0.1496 0.3664 2.5160). From step 1, we can get λ1 = 2.5159,
the trajectories of x (t) and xT (t) x (t) which will approach to λ1 are
shown in Figs 7.5 and 7.6. For step 2, the trajectories of x (t) and λn =
−xT (t) x (t) are shown in Figs 7.7 and 7.8. The equilibrium vector ξn =
( - 0.0010 - 0.0028 0.0025 0.0015) and λn = −ξnT ξn = −1.7615 × 10−5 ,
T
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
small eigenvalue of C
140 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
Fig. 7.8. The trajectories of −xT (t)x(t) when A is replaced with −C.
142 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
n [ ]
∑
dy(t)
dt = Az (t) − yj2 (t) + zj2 (t) y (t)
j=1
∑n [ ] . (7.22)
dz(t)
dt = −Ay (t) − yj2 (t) + zj2 (t) z (t)
j=1
Denote
x (t) = y (t) + iz (t) , (7.23)
where i is the imaginary unit, Equation (7.22) is equal to
dy(t) dz(t) ∑n
[ 2 ]
+i = −Ai [y (t) + iz (t)] − yj (t) + zj2 (t) [y (t) + iz (t)] ,
dt dt j=1
i.e.
dx (t)
= −Ax (t) i − xT (t) x̄ (t) x (t) , (7.24)
dt
where x̄ (t) denotes the complex conjugate values of x (t). From Equations
(7.21), (7.22) and (7.24), we know that analyzing the convergence properties
of Equation (7.24) is equal to analyzing that of RNN (7.21).
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
144 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
7.2.1. Preliminaries
Aξ = –λ iξ. (7.29)
for all t ≥ 0.
Proof: from Lemma 7.4, we know that S1 , S2 , · · · , Sn construct an
orthonormal basis of C n×n . Since xk = yk (t) + i zk (t), thus
∑
n
x (t) = (zk (t) + i yk (t))Sk . (7.32)
k=1
146 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
therefore
d ∑n
[ 2 ]
zk (t) = λk zk (t) − zj (t) + yj2 (t) zk (t) , (7.33)
dt j=1
d ∑n
[ 2 ]
yk (t) = λk yk (t) − zj (t) + yj2 (t) yk (t) . (7.34)
dt j=1
1 d ∑[ n
] 1 d
λk − zk (t) = zj2 (t) + yj2 (t) = λr − zr (t) ,
zk (t) dt j=1
zr (t) dt
i.e.
[ ]
d 1 λk ∑n
zj2 (t) yj2 (t)
+2 2 =2 + . (7.38)
dt zk2 (t) zk (t) j=1
zk2 (t) zk2 (t)
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
Take Equations (7.35) and (7.37) into Equation (7.38), it follows that
[ ] [ ]
d exp (2λk t) ∑n
zj2 (0) + yj2 (0)
=2 exp (2λj t), (7.39)
dt zk2 (t) j=1
zk2 (0)
so
exp (2λk t′ ) zk2 (0)
zk2 (t′ ) = ∑
n [ 2 ] ∫ t′ ,
1+2 zj (0) + yj2 (0) 0 exp (2λj t) dt
j=1
therefore
exp (λk t′ ) zk (0)
zk (t′ ) = √ , (7.40)
n [
∑ ] ∫ t′
1+2 2 2
zj (0) + yj (0) 0 exp (2λj t) dt
j=1
or,
− exp (λk t′ ) zk (0)
zk (t′ ) = √ .
n [
∑ ] ∫ ′
t
1+2 zj2 (0) + yj2 (0) 0 exp (2λj t) dt
j=1
148 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
when the equilibrium vector ξ exists, there exists the following relationship
ξ = lim x (t) . (7.45)
t→∞
√
[ym (0)+i ym (0)]Sm
ξ = lim
t→∞ exp(−2λm t)+ λ1m [zm
2 (0)+y 2 (0)] [1−exp(−2λ t)]
m m
∈ Vm
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
λ1 ≥ 0.
t→∞ ∑
n ∫t
1+2 [zj2 (0)+yj2 (0)] 0
exp(2λj τ )dτ
j=1
∑
n
[yk (0)+i yk (0)]Sk , (7.50)
= lim √k=1
t→∞ ∑
n
1+2 [zj2 (0)+yj2 (0)]t
j=1
=0
and
ξ T ξ¯i = 0 = λ1 i, (7.51)
∑
n
[y1 (0)+i y1 (0)]S1 + [yk (0)+i yk (0)] exp[(λk −λ1 )t]Sk
ξ = lim √ k=2
,
t→∞ ∫t ∑
n ∫t
exp(−2λ1 t)+2[z12 (0)+y12 (0)] 0
exp[2λ1 (τ −t)]dτ +2 [zj2 (0)+yj2 (0)] 0
exp(2λj τ −2λ1 t)dτ
j=2
thus
√
λ1
ξ= [y1 (0) + iy1 (0)] S1
[z12 (0)+y12 (0)] , (7.52)
∈ V1
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
150 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
and
√ √
ξ T ξ¯ i = λ1
[y1 (0) + iy1 (0)] S1 λ1
[y1 (0) − iy1 (0)] S̄1
[ z12 (0)+y12 (0) ] [z12 (0)+y12 (0)] .
= λ1 i
(7.53)
If λ2 i exists, from the denotation of λ1 i, λ2 i, · · · , λn i, we know
λ2 i = λ1 i = −λ1 i, (7.54)
and from Equation (7.54) and Lemma 7.2, it easily follows
Aξ¯ = λ2 i ξ,
¯ (7.55)
so ξ¯ is the eigenvector corresponding to λ2 i .
From Equations (7.50), (7.51), (7.52), (7.53), (7.54) and (7.55), it is
concluded that the theorem is correct.
Since λ1 ≥ 0,
λr1 ≥ 0,
the supposition λr1 > · · · λrq ≥ 0 > λrq+1 · · · λrn is rational. Suppose the first
nonzero component of x (t) is xp (0) ̸= 0 (1 ≤ p ≤ n). Then there are two
cases:
1) If 1 ≤ p ≤ q, it follows that
ξ = lim x (t)
t→∞
∑q [ ] ( ) ∑
n [ ] ( )
yk (0)+izk (0) exp λr k t Sk + yk (0)+izk (0) exp λr k t Sk
k=p k=q+1
= lim v
t→∞ u n [ ]∫ ( )
u ∑
t1+2 z 2 (0)+y 2 (0) 0t exp 2λr τ dτ
j j j
j=1
[ ] ∑
n [ ] ( )
yp (0)+izp (0) Sp + lim y (0)+izk (0) exp λr r
k t−λp t Sk
t→∞ k=p+1 k
= v
u ( ) [ ][ ( )]/ [ ]∫ ( )
u ∑
n
lim texp −2λr 2 2
p t + zp (0)+yp (0) 1−exp −2λp t
r λrp +2 z 2 (0)+y 2 (0) t r r
0 exp 2λj τ −2λp t dτ
t→∞ j j
j=p+1
√
[yp (0)+izp (0)] λp r
= √ Sp ,
2 2
zp (0)+yp (0)
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
so
√
|ξ|√
= ξ T ξ¯
√ √
[yp (0)+izp (0)] λrp [yp (0)+izp (0)] λrp
= √ SpT √ S̄p . (7.56)
zp2 (0)+yp2 (0) zp2 (0)+yp2 (0)
= λrp
2) If k + 1 ≤ p ≤ n, it follows that
ξ = lim x (t)
t→∞
∑
n
[yk (0)+izk (0)] exp(λrk t)Sk
k=p
= lim √ . (7.57)
t→∞ ∑
n ∫t
1+2 [zj2 (0)+yj2 (0)] 0
exp(2λrj τ )dτ
j=1
=0
Since the two sets [λ1 , λ2 , · · · , λn ] and [λr1 , λr2 , · · · , λrn ] are the same. So,
from Equations (7.56) and (7.57), it follows that |ξ| is λk > 0 (1 ≤ k ≤ n)
or zero. So, the theorem is drawn.
7.2.4. Simulation
In order to evaluate RNN (7.21), a numerical simulation is provided, and
Matlab is used to generate the simulation.
Example: For Equation (7.24), A is randomly generated as
0 -0.0511 0.2244 -0.0537 0.0888 -0.1191 0.1815
0.0511 0 0.1064 0.2901 -0.0073 0.0263 0.2322
-0.2244 -0.1064 0 0.1406 -0.0206 0.1402 -0.0910
A= 0.0537 -0.2901 -0.1406 0 -0.2708 -0.0640 -0.2373 ,
-0.0888 0.0073 0.0206 0.2708 0 0.1271 0.1094
0.1191 -0.0263 -0.1402 0.0640 -0.1271 0 -0.1200
-0.1815 -0.2322 0.0910 0.2373 -0.1094 0.1200 0
and the initial value is
0.0511-0.2244i
-0.1064i
0.1064
x (0) = 0.2901+0.1406i .
-0.0073-0.0206i
0.0263+0.1402i
0.2322-0.0910i
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
152 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
-0.1355-0.1563i
0.0716-0.2914i
0.2410-0.0716i
ξ= 0.2840+0.3073i ,
0.1556-0.2338i
0.1076+0.1511i
0.3334-0.1507i
λs1 i = ξ T ξ¯ i
= 0.6182 i
- 0.2630 - 0.0084i 0.5318
- 0.2105 - 0.3184i 0.0207 - 0.4494i
0.1411 - 0.2869i - 0.0997 + 0.3784i
µ1 = 0.5322 , µ3 = 0.2835 - 0.1898i ,
- 0.0841 - 0.3471i - 0.2272 - 0.0857i
0.2340 + 0.0299i - 0.1797 - 0.3099i
0.1470 - 0.4415i 0.2063 + 0.1246i
0.1852 + 0.2252i 0.3549
- 0.5360
0.0424 + 0.0809i
0.2436 + 0.3907i - 0.2557
µ5 = - 0.2443 + 0.1977i , µ7 = - 0.0568 .
- 0.2995 + 0.1881i 0.6138
0.4992 0.3656
0.0769 - 0.4644i 0.0880
µ2 = µ̄1 , µ4 = µ̄3 , µ6 = µ̄5 , λ1 i = 0.6184i, λ2 i = −0.6184i,
λ3 i = 0.3196i, λ4 i = −0.3196i, λ5 i = 0.0288i, λ6 i = −0.0288i and
λ7 i = 0.
When λs1 i is compared with λ1 i, the difference is
∆λ1 = λs1 i − λ1 i
= 0.6182i - 0.6184i .
= −0.0002i
The modulus value of ∆λ1 is very small, which means that the eigenvalue
computed by RNN (7.21) is very close to the true value. This verifies the
validity of RNN (7.21) for computing eigenvalues. On the other side, it
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
154 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
ξ ∈ V1 .
dv (t)
= [A′ − B (t)] v (t) (7.58)
dt
( ) ( n )
′ I0 A IU v (t) −IUn v (t)
where t ≥ 0, v (t) ∈ R , A =
2n
, B (t) = .
−A I0 IUn v (t) IU n v (t)
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
156 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
dx(t) dy(t) ∑n
+i = −Ai [x (t) + iy (t)] − [xj (t) + yj (t) i] [x (t) + iy (t)] ,
dt dt j=1
that is,
dz (t) ∑n
= −Az (t) i − zj (t)z (t) . (7.61)
dt j=1
From Equations (7.58), (7.59) and (7.61), we know that analyzing the
convergence properties of Equation (7.61) is equal to analyzing those of
RNN (7.58).
Actually, RNN (7.58) could easily transform
( to another neural ) network,
−IU n v (t) IUn v (t)
with only matrix B changed to B (t) = while all
−IUn v (t) −IU n v (t)
the other components remain the same with RNN (7.58). Similar to what
will be discussed in this section, the theoretical analysis and numerical
valuation of this new form of neural network can be found in [25] in detail.
7.3.1. Preliminaries
We will use some similar preliminaries here as in Section 7.2. Thus we only
give these Lemmas and please refer back to Lemmas 7.2, 7.3 and 7.4 for
proof of these Lemmas.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
for t ≥ 0.
Proof: By using the ideas in proving Theorem 7.9, this theorem can
be proved.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
158 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
∑
n ∫ t
xl (0) exp (λl τ ) dτ < 0,
l=1 0
∑n ∫t
Let F (t) = 1 + l=1 xl (0) 0 exp (λl τ ) dτ . Evidently F (t) < 1 and
∑n
Ḟ (t) = l=1 xl (0) exp (λl t) < 0, that is, there must exist a time tg which
leads to
∑n ∫ tg
1+ xl (0) exp (λl τ ) dτ = 0,
l=1 0
i.e. at time tg RNN (7.58) enters into overflow state. This theorem is
proved.
∑n
Theorem 7.15. If z (0) ∈ Vm and x (0) > 0, when ξ ̸= 0, then i k=1 ξk =
λm i.
Proof: Using Lemma 7.3, it gives that Vm ⊥Vk (1 ≤ k ≤ n, k ̸= m), so
zk (0) = 0, (1 ≤ k ≤ n, k ̸= m). Using Theorems 7.1 and 7.8 gives that
∑
n ∑
n
zk (0) exp(λk t)
ξk = lim ∫t
t→∞ k=1 1+ ∑ zl (0)
n
k=1 exp(λl τ )dτ
l=1
0 . (7.69)
zm (0) exp(λm t)
= lim ∫t
t→∞ 1+zm (0) 0 exp(λm τ )dτ
∑
n
When λm ≤ 0, it easily follows that ξk = 0, i.e. ξ = 0, which
k=1
contradicts ξ ̸= 0. So λm > 0. In this instance, using Equation (7.69), it
gives that
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
∑
n
zm (0)
ξk = lim 1
k=1 t→∞ exp(−λm t)+ λm zm (0)[1−exp(−λm t)] ,
= λm
∑n
i.e. i k=1 ξk = λm i. This theorem is proved.
Theorem 7.16. If z (0) ∈ / Vm (m = 1, · · · , n) and xl (0) ≥ 0 (l =
∑n
1, 2, · · · n), then ξ ∈ V1 , and ξk i = λ1 i.
k=1
Proof: Since z (0) ∈ / Vm and Equation (7.62), so zl (0) ̸= 0. By the
result of xl (0) ≥ 0 and Theorem 7.2, it can be concluded that there does
not exist overflow. Using Theorems 7.1 and 7.8 gives that
z1 (0) exp(λ1 t)S1 +z2 (0) exp(λ2 t)S2 +···+zn (0) exp(λn t)Sn
ξ = lim ∑n ∫
t→∞ 1+ zl (0) 0t exp(λl τ )dτ
l=1
z1 (0)S1 +z2 (0) exp[(λ2 −λ1 )t]S2 +···+zn (0) exp[(λn −λ1 )t]Sn
= lim ∫ ,
t→∞ exp(−λ1 t)+ 1 zl (0)[1−exp(−λ1 t)]+ ∑ zl (0) t exp(λl τ )dτ
n
λ 1 0
l=2
= λ1 S1 ∈ V1
∑
n
obviously ξk i = λ1 i.
k=1
∑
n ∑
n
Since ξ¯j = ξj = λ1 = λ1 , λ2 = −λ1 and Equation (7.70),
j=1 j=1
Aξ¯i = −λ2 ξ,
¯
i.e.
Aξ¯ = λ2 i ξ.
¯
160 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
Proof: Since z (0) ∈ Vl1 ⊕ Vl2 ⊕ · · · ⊕ Vlp , so at least there exists one
component zk ̸= 0 for k ∈ (l1 , l2 , · · · , lp ) and zk′ = 0 k ′ ∈
/ (l1 , l2 , · · · , lp ).
Let λl denote λl1 = λl2 = · · · = λlp . Using Theorem 7.1 and Equation
(7.63), it gives that
∑
n ∑
n
zk (0) exp(λk t)
ξk = lim ∫t
t→∞ k=1 1+ ∑ zl (0)
n
k=1 0
exp(λl τ )dτ
l=1
∑
n ∑
n
zk (0) exp(λk t)+ zk (0) exp(λk t)
k∈(l1,l2,··· ,lp) k∈(l1,l2,···
/ ,lp)
= lim ∑
n ∫t ∑
n ∫t ,
t→∞ 1+ zk (0) exp(λk τ )dτ + zk (0) exp(λk τ )dτ
0 0
k∈(l1,l2,··· ,lp) k∈(l1,l2,···
/ ,lp)
∑
n
zk (0) exp(λk t)
k∈(l1,l2,··· ,lp)
= lim ∑n ∫t
t→∞ 1+ zk (0) exp(λk τ )dτ
0
k∈(l1,l2,··· ,lp)
(7.71)
if λl = 0, using Equation (7.71), it gives that
∑
n
zk (0)
∑
n
k∈(l1,l2,··· ,lp)
ξk = lim = 0 = λl ,
t→∞ ∑n
k=1 1+ zk (0) t
k∈(l1,l2,··· ,lp)
= λl
From Equations (7.71) and (7.72), this theorem is proved.
7.3.4. Simulation
In order to evaluate RNN (7.58), we provide two examples here. One will
demonstrate the effectiveness of RNN (7.58), and the other will illustrate
that the convergence behavior remains good for large dimension matrices.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
Example 1: let
0 -0.2564 0.2862 -0.0288 0.4294
0.2564 0 -0.1263 -0.1469 -0.1567
A=
-0.2862 0.1263 0 -0.2563 -0.1372 .
0.0288 0.1469 0.2563 0 -0.0657
-0.4294 0.1567 0.1372 0.0657 0
and
∑5
λ1 i = i ξk = 0.6311 i.
k=1
162 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
∑5
Fig. 7.14. The trajectories of real part and imaginary part of i=1 zi (t).
0.6315
-0.2274-0.2944i
S1 =
-0.2217+0.3406i , S2 = S1 ,
0.0143+0.1155i
0.0130+0.5329i
0.0024 + 0.2991i 0.1533
0.1267 + 0.2947i 0.7193
S3 = -0.0802 + 0.5391i , S4 = S3 and S5 = -0.2749 .
0.6904 -0.1397
-0.0284-0.1817i 0.6033
When λ1 i is compared with –λ1 i, it can be seen that their difference is
trivial. Therefore, RNN (7.58) is effective to compute the largest modulus
eigenvalues –λ1 i and –λ2 i. Since ξ T S̄2 = ξ T S̄3 = ξ T S̄4 = ξ T S̄5 = 0 and
ξ T S̄1 = 0.2513 + 0.8327 i, ξ is equivalent to S1 . As ξ¯T S̄1 = ξ¯T S̄3 =
ξ¯T S̄4 = ξ¯T S̄5 = 0 and ξ¯T S̄2 = 0.2513 − 0.8327 i, so ξ¯ is equivalent to S2 .
Therefore the largest modulus eigenvalues (–λ1 i, –λ2 i) and their corresponding
eigenvectors are computed.
The above results are obtained when the initial vector is z (0) =
T
(0.1520i 0.2652i 0.1847i 0.0139i 0.2856i) . When the initial vector is
replaced with other random values, the results have trvial changes and
keep being very close to the corresponding ground truth values.
Example 2: Let A ∈ R85×85 , A (j, i) = −A (i, j) = −(sin (i) + sin (j))/2
T
and z (0) = (sin (1) i sin (2) i · · · sin (85) i) . The calculated largest
modulus eigenvalue is 19.6672i which is very close to the corresponding
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
∑85
Fig. 7.16. The trajectories of real and imaginary parts of i=1 zi (t).
ground truth value 19.6634i. Convergence behaviors are shown in Figs 7.15
and 7.16. From these figures, we can see that the iteration numbers do
not increase with the dimension, and that this method is still effective even
when the dimension number is very large.
164 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
synchronously. But for the stand power method [10, 17], as it serially runs
in nature, it can achieve high performance in a serial running platform.
To compare RNN (7.58) with the stand power method, let A ∈ R15×15 ,
A (j, i) = −A (i, j) = −(cos (i) + sin (j))/2 for i ̸= j and A (i, i) = 0. The
simulated results corresponding to RNN (7.58) and the stand power method
are shown in Figs 7.17, 7.18, 7.19 and 7.20 respectively. From Fig. 7.17, we
can see that RNN (7.58) goes into a unique equilibrium state fast, while the
stand power method falls into a cycle procedure, which can be seen in Fig.
7.19. The eigenvalue calculated by RNN (7.58) is 4.0695i, and that by the
stand power method is 4.0693i, both of which are very close to the ground
truth value 4.0693i. The convergence behaviors of these two methods are
shown in Figs 7.18 and 7.20, respectively. Also from Figs 7.18 and 7.20, we
can see that the convergence rates of the two methods are similar. After
around five iterations, the calculated values approach to the ground truth
eigenvalue.
Fig. 7.17. The trajectories of the real parts of zi (t) for 1 ≤ i ≤ 15 by RNN (7.58).
∑15
Fig. 7.18. The trajectory of i=1 zi (t) by RNN (7.58).
Fig. 7.19. The trajectories of the components of z(t) by the standard power method.
reference [13, 14] involves a neural network model which is different from
RNN (7.58), and the scheme extracting eigen-parameters in these two
references would fail when the matrix had repeated eigenvalues and the
initial state was not specially restricted. References [8, 21, 22] are only
suitable for a real symmetric matrix. RNN (7.58) is more concise than the
model in reference [23] since RNN (7.58) does not include the computation
2
of |z (t)| . Briefly speaking, compared with the published methods, our one
has some novelties in different aspects.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
166 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
Fig. 7.20. The trajectory of the variable which converges to the largest modulus
eigenvalue by the standard power method.
Let
with i denoting the imaginary unit, it can be easily followed from Equation
(7.74) that
dx(t) dy(t) ∑n
[ 2 ]
+i = −Ai [x (t) + iy (t)] − xj (t) + yj2 (t) [x (t) + iy (t)] ,
dt dt j=1
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
168 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
i.e.
dz (t)
= −Az (t) i − z T (t) z̄ (t) z (t) , (7.76)
dt
where z̄ (t) denotes the complex conjugate vector of z (t). Obviously,
Equation (7.76) is a complex differential system. A set of ordinary
differential equations is just a model which may approximate the real
behavior of some neural networks, although there are differences between
them. In the following subsections, we will discuss the convergence
properties of Equation (7.76) instead of RNN (7.73).
2
7.4.1. Analytic expression of |z (t)|
All eigenvalues of A are denoted as λR 1 + λ1 i, λ2 + λ2 i, · · · , λn + λn i
I R I R I
are denoted as µ1 , · · · , µn .
With any general real matrix A, there are two cases for µ1 , · · · , µn :
1) when the rank of A is deficient, some of λR 1 +λ1 i, λ2 +λ2 i, · · · , λn +λn i
I R I R I
R I
may be zeros. When λj + λj i = 0, uj can be randomly chosen ensuring
µ1 , · · · , µn construct a basis in C n×n ;
2) When A is full rank, µ1 , · · · , µn are decided by A. Although they
may not be orthogonal to each other, they can still construct a basis in
C n×n .
Let Sk = µk /|µk |, obviously S1 , · · · , Sn construct a normalized basis in
n×n
C .
Theorem 7.18. Let zk (t) = xk (t) + i yk (t) denote the projection value of
2
z (t) onto Sk . The analytic expression of |z (t)| is
∑
n ( )[ ]
exp 2λIk t x2k (0) + yk2 (0)
2 k=1
|z (t)| = ∑n [ ]∫t ( ) (7.77)
1+2 x2j (0) + yj2 (0) 0 exp 2λIj τ dτ
j=1
for t ≥ 0.
Proof: the proof of this theorem is similar to that of Theorem 7.7.9
and omitted here.
Thus,
v
u ∑
n ( )
u exp 2λIk t [x2k (0) + yk2 (0)]
u
u k=1
|ξ| = lim |z (t)| = lim u ∑n [ ]∫t ( ) .
t→∞ t→∞ t
1+2 x2j (0) + yj2 (0) 0 exp 2λIj τ dτ
j=1
(7.80)
Since each eigenvalue is real, so
Theorem 7.20. Denote λIm = max λIk . If λIm > 0, then ξ T ξ¯ = λIm .
1≤k≤n
Proof: Using Equation (7.78) and Theorem 7.1 gives that
∑
n ( )[ ]
exp 2λIk t x2k (0) + yk2 (0)
2 k=1
ξ T ξ¯ = lim |z (t)| = lim n [
∑ ]∫t ( ) ,
t→∞ t→∞
1+2 x2j (0) + yj2 (0) 0 exp 2λIj τ dτ
j=1
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
170 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
i.e.
∑
n
exp(2λIm t)[x2m (0)+ym
2
(0)]+ exp(2λIk t)[x2k (0)+yk
2
(0)]
ξ T ξ¯ = lim
k=1,k̸=m
∫t ∑n ∫t
t→∞ 1+2[x2 (0)+y 2 (0)] exp(2λIm τ )dτ +2 [x2j (0)+yj2 (0)] exp(2λIj τ )dτ
m m 0 0
j=1,j̸=m
∑
n
[x2m (0)+ym
2
(0)]+ exp[2(λIk −λIm ) t] [x2k (0)+yk
2
(0)]
k=1,k̸=m
= lim ∑
n ∫t
t→∞ exp(−2λI t)+ 1
[x2m (0)+ym
2 (0)][1−exp(−2λI t)]+2
[x2j (0)+yj2 (0)] exp(2λIj τ −2λIm t)dτ
m I
λm m 0
j=1,j̸=m
= λIm .
This theorem is proved. From this theorem, we know that when the
maximal imaginary part of eigenvalues is positive, RNN (7.73) will converge
to a nonzero equilibrium vector. In addition, the square modulus of the
vector is equal to the largest imaginary part.
( )
′ A0
Theorem 7.21. If A is replaced by , then
0 A
∑
n ( )[ 2 ]
exp 2λR 2
k t xk (0) + yk (0)
2 k=1
|z (t)| = ∑n [ ]∫t ( ) .
1+2 x2j (0) + yj2 (0) 0 exp 2λR
j τ dτ
j=1
) (
A0 ′
Proof: When A = , by a similar way of obtaining Equation
0 A
(7.76), RNN (7.73) is transformed into
dz (t)
= Az (t) − z T (t) z̄ (t) z (t) . (7.82)
dt
Using the denotations of xk (t) and yk (t), we have
d ∑n
[ 2 ]
k xk (t) − λk yk (t) −
I
xk (t) = λR xj (t) + yj2 (t) xk (t) , (7.83)
dt j=1
d ∑n
[ 2 ]
yk (t) = λIk xk (t) + λR y
k k (t) − xj (t) + yj2 (t) yk (t) . (7.84)
dt j=1
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
d [ 2 ] [ 2 ] ∑ n
[ 2 ][ ]
xk (t) + yk2 (t) = 2λR
k x k (t) + yk
2
(t) −2 xj (t) + yj2 (t) x2k (t) + yk2 (t) .
dt j=1
(7.85)
The remaining procedures can refer to the proof of Theorem 7.9.
This theorem provides a way to extract the maximal real part of all
eigenvalues through rearranging the connection weights.
( )
R R ′ A0
Theorem 7.22. Let λm = max λk . When A is replaced by , if
1≤k≤n 0 A
T¯ T¯
λRm ≤ 0, then ξ ξ = 0. If λm > 0, then ξ ξ = λm .
R R
ξ T ξ¯ = 0, (7.87)
if λR
m = 0, it follows that
∑
n
exp(2λR 2 2
m t)[xk (0)+yk (0)]+ exp(2λR 2 2
k t)[xk (0)+yk (0)]
ξ ξ¯ = lim
T k=1,k̸=m
∫t ∑n ∫t
t→∞ 1+2[x2 (0)+y 2 (0)] [x2j (0)+yj2 (0)] exp(2λR
j j 0
exp(2λR
m τ )dτ +2 0 j τ )dτ
j=1,j̸=m
∑
n
[x2k (0)+yk2 (0)]+ exp(2λR 2 2
k t)[xk (0)+yk (0)]
k=1,k̸=m
= lim ∑n ∫t
t→∞ 1+2[x2 (0)+y 2 (0)] t+2 [x2j (0)+yj2 (0)] exp(2λR
j j 0 j τ )dτ
j=1,j̸=m
= 0,
(7.88)
and if λR
m > 0, like the deductive procedures of Theorem 7.3, it follows that
ξ T ξ¯ = λR
m. (7.89)
172 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
– Im = ξ T ξ¯ = 0.5397.
λ
When λIm is compared with λ – Im , it can be easily seen that they are very
close. The trajectories of |zk (t)| (k = 1, 2, · · · , 7) and z T (t) z̄ (t) which will
approach λIm are shown in Figs 7.21 and 7.22.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
174 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
( )
A0
When A′ = , the computed maximum real part is
0 A
m = λm − –
∆λR m = |2.9506 − 2.9504| = 0.0002,
R
λR
176 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
The calculated results are –λIm = 0.3244 and –λR m = 4.6944. Convergence
behaviors of |zk (t)| and z T (t) z̄ (t) in searching –λIm and –λR
m are shown in
Figs 7.25 to 7.28. Comparing –λIm and –λR m with corresponding true ones
I R
λm = 0.3246 and λm = 4.7000, we find each pair in comparison is very
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
close. From Figs 7.25 to 7.28, we can also see the system gets to equilibrium
state fast though the dimensionality has reached 50.
In order to ulteriorly show the effectiveness of this approach when
dimensionality becomes large, let n = 100. The corresponding convergence
behaviors are shown in Figs 7.29 to 7.32. λ –R
m = 7.3220 keeps very close to
λm = 7.3224, and –λm = 0.2541 is very close to λIm = 0.2540. From Figs
R I
7.25 to 7.32, we can see that the iteration number at which the system
enters into equilibrium state is not sensitive to dimensionality.
178 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
7.5. Conclusions
References
[1] N. Li, A matrix inverse eigenvalue problem and its application, Linear
Algebra And Its Applications. 266(15), 143–152, (1997).
[2] F.-L. Luo, R. Unbehauen, and A. Cichocki, A minor component analysis
algorithm, Neural Networks. 10(2), 291–297, (1997).
[3] C. Ziegaus and E. Lang, A neural implementation of the jade algorithm
(njade) using higher-order neurons, Neurocomputing. 56, 79–100, (2004).
[4] F.-L. Luo, R. Unbehauen, and Y.-D. Li, A principal component analysis
algorithm with invariant norm, Neurocomputing. 8(2), 213–221, (1995).
[5] J. Song and Y. Yam, Complex recurrent neural network for computing the
inverse and pseudo-inverse of the complex matrix, Applied Mathematics and
Computing. 93(2–3), 195–205, (1998).
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
180 Yiguang Liu, Zhisheng You, Bingbing Liu and Jiliu Zhou
[23] Y. Liu, Z. You, and L. Cao, A functional neural network for computing
the largest modulus eigenvalues and their corresponding eigenvectors of an
anti-symmetric matrix, Neurocomputing. 67, 384–397, (2005).
[24] A neural network for computing eigenvectors and eigenvalues, Biological
Cybernetics. 65, 211–214, (1991).
[25] Y. Liu, Z. You, and L. Cao, A functional neural network computing some
eigenvalues and eigenvectors of a special real matrix, Neural Networks. (18),
1293–1300, (2005).
[26] Y. Nakamura, K. Kajiwara, and H. Shiotani, On an integrable discretization
of rayleigh quotient gradient system and the power method with a shift,
Journal of Computational and Applied Mathematics. 96, 77–90, (1998).
[27] J. M. Vegas and P. J. Zufiria, Generalized neural networks for spectral
analysis: dynamics and liapunov functions, Neural Networks. 17, 233–245,
(2004).
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 7
182
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 8
Chapter 8
Contents
183
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 8
8.1. Introduction
the so-called “Teach Method”, are suitable for large batch operations that
do not require any changes of the assembly process, e.g. the production
of high-volume products as needed for the automotive industry. The teach
method is based on the assumption that a specific screw insertion process
will have a unique signature based on the measured “Torque-Insertion
Depth” curve. Comparing signals that are acquired on-line during the
screw insertion process to signals that were recorded during a controlled
insertion phase (teaching phase) differences can be easily flagged and a
match between on-line signals and signals from the teaching phase indicates
a correct insertion. Such methods have usually no adaptive or learning
capabilities and require a labor-intensive set-up before production can
commence. Improvements to the standard teach method were proposed and
implemented, e.g. the “Torque-Rate” approach [10]. For this approach,
the measured insertion signals need to fall within a-priori-defined torque
rate levels and has shown to be capable of coping with a number of
often-occurring faults such as stripped threads, excessive yielding of bolts,
crossed threads, presence of foreign materials in the thread, insufficient
thread cut and burrs. Despite the improvements this approach has no
generalization capabilities and thus cannot cope with unknown, new cases
that have not been part of the original training set. In a number of cases,
the required tools such as screwdrivers can be specifically set-up to carry
out the required fastening process to join components by means of screws
or bolts and are economical if the components and the screw fastening
requirements do not change. However, for small-batch production this
approach is prohibitive because a “re-teaching” is required every time the
components or fastening requirements change. Where smaller production
runs are the norm, e.g. for wind turbine manufacturing, the industry is
still relying on human operators to a great extent. These operators insert
and fasten screws and bolts manually or by means of handheld power tools.
With a view to create flexible manufacturing approaches, this book chapter
discusses novel, intelligent monitoring methods for screw insertion and bolt
tightening capable of adapting to uncertainty and changes in the involved
components.
This book chapter investigates novel neural-network-based systems for
the monitoring of the insertion of self-tapping screws. Section 8.2 provides
an overview of the screw insertion process. Section 8.3 describes the
methodology behind the process. Section 8.4 discusses the results from
simulations. Section 8.5 presents an in-depth experimental study and
discusses the results obtained. Conclusions are drawn in Section 8.6.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 8
8.2. Background
Screw insertion can be divided into two categories: the first describes the
insertion of self-tapping screws into holes that have not been threaded; the
second deals with the insertion of screws or bolts into pre-threaded holes or
fixed nuts. The advantage of the first approach lies in its simplicity when
preparing the components for joining – a hole in each of the components
is the only requirement. The latter approach needs to undergo a more
complex preparatory step, in addition to drilling holes threads have to be
cut or nuts to be attached. Industry is progressively employing more of
the former as it is more cost-effective, more generally applicable and allows
for more rapid insertion of screws as possible with the other approach.
However, the downside is that the approach based on self-tapping screws is
in need of a more advanced control policy to ensure that the screw thread
cuts an appropriate groove through at least one of the components during
the insertion process. Owing to its complexity, the process of inserting
self-tapping screws is usually carried out by human operators who can
ensure that an appropriate and precise screw advancement is guaranteed,
making good use of the multitude of tactile and force sensors as well as
feedback control loops they have available.
In order to develop an advanced monitoring strategy for the insertion
of self-tapping screws, a proper theoretical understanding of the underlying
behavior of the insertion process is needed [11]. At King’s College
London, mathematical models, based on an equilibrium analysis of
forces and describing the insertion of self-tapping screws into metal and
plastic components, have been created to identify the importance of the
torque-vs.-insertion-depth profile as a good representation for the five
distinct stages of the insertion process, Fig. 8.2 [12]. Existing knowledge
on screw advance and screw tightening of screws in pre-tapped holes is
included in the proposed model for the relevant stages of the overall process
[9, 10, 13–15]. The key points of this profile – representative for the five
main stages occurring during the insertion – are defined as shown below.
(1) Screw engagement represents the stage of the insertion process lasting
from the moment where the tip of the screw touches the tap plate (on
the top component) until the moment when the cutting portion of
the conical screw taper has been completely inserted into the hole,
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 8
Fig. 8.1. (left) Notation for self-tapping screw insertions; (right) Key stages of the screw
insertion in a two-plate joint [1].
Fig. 8.2. Average experimental torque profile of an insertion on two-joint plate and the
corresponding theoretical prediction [1].
cos β 4πLt h
T1 = ( √ )(Ds2 − Dh2 )σuts (Df Ds P + πµLt (Ds + Dh )){θ + }
64πLt 3 P Ds
(8.1)
cos β D D P
T3 = ( √ )(Ds2 − Dh2 )σuts {(2Rf µLt − f s )θ
32Lt 3 π
2t
+ (2Rf µπLt − Df Ds P )} (8.3)
P
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 8
cos β
T4 = ( √ )πtµσuts (Ds3 − Ds Dh2 + Dh Ds2 − Dh3 ) (8.4)
32P 3
µEP (Dsh 3
− Ds3 )(θ − π4 (Dsh
2
− Ds2 ))
T5 = (8.5)
24l
where T1 to T5 are the torques required for stages one to five respectively, θ
is the angle of rotation of the screw, Ls is the screw length, Lt is the taper
length, Ds is the major screw diameter, Dm is the minor screw diameter, Dh
is the lower hole diameter, t is the lower plate thickness, l is the total plate
thickness, Dsh is the screw head diameter, P is the pitch of the thread, β
is the thread angle, µ is the friction coefficient between the screw and plate
material and σuts is the ultimate tensile strength of the plate material. All
the parameters are constant and depend on the screw and hole dimensions,
as well as on the friction and material properties.
Realistically modeling a common manufacturing process, the process of
inserting self-tapping screws can be experimentally investigated in a test
environment with two plates (representing two manufacturing components)
to be united [11]. As part of the experimental study, holes are drilled into
the plates to be joined; the latter are then clamped together such that the
hole centers of the top plate (also called near plate) are in line with the
centers of the holes of the second plate (also called tap plate), Fig. 8.1. A
self-tapping screw is then inserted into a hole during an experiment and
screwed in, applying appropriate axial force and torque values. Ngemoh
showed that the axial insertion force does not influence the advancement
of the screw very much, even if it varies over quite a wide range as long
as it does not go beyond a maximum value whereby the threads of the
screw and/or the hole are in danger of being destroyed [11]. During this
insertion process a helical groove is cut into the far plate’s hole and forces as
described above are experienced across the five main stages of the insertion
process until the screw head is finally tightened against the near plate [11].
Based on the modeling and practical knowledge of the screw insertion
and fastening process monitoring strategies can be created [12, 19]. The
signature that is obtained when recording the torque-vs.-insertion-depth
throughout the insertion process provides important clues on the final
outcome of the overall process and its likelihood of success. The resultant
torques can be best estimated if the signature is closest to its ideal one. This
book chapter presents monitoring methods that are capable of predicting
to what extend a pre-set final clamping torque is achieved by employing
artificial intelligence in the form of an artificial neural network that uses
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 8
8.3. Methodology
Table 8.1. Simulation and experimental study. The table shows the parameters for all
screws used in this study. Results with regards to the number of successful/unsuccessful
insertion signals per training/test set are shown. Generalization capabilities are
investigated by training the ANN on two hole diameters (*) and testing it on
intermediate hole diameters (see also Sections 8.4 and 8.5) [1].
Simulation Study Experimental Study
Section 8.4.1 8.4.2 8.4.3 8.5.1 8.5.2 8.5.3
Polycarbonate
Polycarbonate
Polycarbonate
Aluminium
Mild Steel
Acrylic
Acrylic
Acrylic
Acrylic
Brass
Plate material
Screw type 8 8 4 6 8 4 6 4 6 8
Far plate thickness (mm) 6.0 6.0 3.0 3.0 3.0 3.0 6.0 3.0 5.0 5.0
Hole diameter (mm) 3.8 3.7* 2.7 3.2 3.9 1.0 1.0* 1.5 2.5 3.5
3.8 1.5
3.9 2.0*
4.0
Training set
successful 10 12 6 6 6 6 10 5 5 5
unsuccessful 8 8 8 8 8 16
Test set
successful 5 14 4 4 4 6 12 4 4 4
unsuccessful 8 8 8 8 8 16
binary
binary
binary
binary
The ability of the network to handle multiple insertion cases and to cope
with cases not seen during training is investigated. Four insertion cases,
corresponding to four different hole diameters, 3.7 mm, 3.8 mm, 3.9 mm
and 4.0 mm, are examined (Column 2 of Table 8.1). The network is only
trained on insertion signals from the smallest (3.7 mm, 6 signals) and the
widest hole (4.0 mm, 6 signals) as well as 8 signals representing unsuccessful
insertions. However, during testing the network has, in addition, been
presented with signals from insertions into the 2 middle-sized holes (3.8
and 3.9 mm). The test set consists of 22 signals including 14 successful
signals (2 from a 3.7 mm hole, 2 from a 4.0 mm hole, 5 from a 3.8 mm
hole and 5 from a 3.9 mm hole) and 8 unsuccessful signals. Hence, the
network is required to interpolate from cases learnt during training. All
the signals are correctly classified after a relatively modest training period
of four cycles (including initialization), Fig. 8.4.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 8
Fig. 8.3. Output values as training evolves (Simulated insertion signals). Single case
experiment [1].
Fig. 8.4. Output values as training evolves (Simulated insertion signals). Variation of
hole diameter experiment [1].
sum squared error (SSE) of the network reduces quickly to relatively low
values indicating a good and steady training behavior. Figure 8.5 shows
the ANN output in response to the test set after 20 training cycles. Signals
1 to 4, 5 to 8 and 9 to 12 are correctly classified as the insertions of Classes
A, B and C, respectively, and the remaining signals are correctly classified
as unsuccessful insertions.
Fig. 8.5. Activation output on test set after 20 training cycles. Four-output
classification experiment using simulated signals [1].
the simulation study, a network with 60 input nodes, 15 hidden layer nodes
and 1 output node is used. After initialization, the network is trained over
22 cycles, Fig. 8.6. After a modest training period (eight cycles including
initialization), the network output clearly differentiates between successful
and unsuccessful insertions.
Fig. 8.6. Output values as training evolves (insertion signals recorded from electric
screwdriver). Single case experiment [1].
Fig. 8.7. Insertion signals from screwdriver. Variation of hole diameter [1].
8.6. Conclusions
Fig. 8.8. Output values as training evolves (real insertion signals). Variation of hole
diameter experiment [1].
cannot be achieved with standard methods such as the teach method and
is particularly useful where insertions have to be logged as is the case where
high safety standards are required.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 8
Fig. 8.9. Insertion signals from screwdriver. Four-output classification experiment [1].
Fig. 8.10. Network activation output for the test set after 200 training cycles.
Four-output classification experiment using real insertion signals. All signals are
correctly attributed [1].
Acknowledgments
was funded by CONACYT. I also thank Allen Jiang for compositing this
chapter.
References
210
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 8
PART 4
211
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 8
212
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Chapter 9
Over the past decade or so, support vector machines have established
themselves as a very effective means of tackling many practical
classification and regression problems. This chapter relates the theory
behind both support vector classification and regression, including an
example of each applied to real-world problems. Specifically, a classifier
is developed which can accurately estimate the risk of developing heart
disease simply from the signal derived from a finger-based pulse oximeter.
The regression example shows how SVMs can be used to rapidly and
effectively recognize hand-written characters particularly designed for
the so-called graffiti character set.
Contents
213
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
9.1. Introduction
This chapter relates the rise in interest, largely over the past decade, of
supervised learning machines that employ a hypothesis space of linear
discriminant functions in a higher dimensional feature space, trained and
optimized from theory based on statistical learning theory. Vladimir Vapnik
is largely credited with introducing this learning methodology that since
its inception has outperformed many other systems in a broad selection of
applications. We refer, of course, to support vector machines.
‹w,xi› + b = 0
Class “+1”
2
b w
‹w,xi+› + b = +1
Class “–1”
‹w,xi–› + b = –1
that a good separating plane is one that works well at generalizing, this
implies a plane which has a higher probability of correctly classifying new,
unseen, data. Hence, an optimal classifier is one which finds the best
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
‹w,xi› + b = 0
Class “+1”
i
j
‹w,xi› + b = +1
Class “–1”
‹w,xi› + b = –1
Fig. 9.2. Soft-margin classifier with some errors denoted by the slack variables ξ which
represent the errors.
by maximizing the margin whilst at the same time allowing the margin
constraints to be violated according to the preset slack variables ξi . Leading
∑m
to the minimization of: 12 ∥w∥2 +C i=1 ξi subject to yi (⟨w, xi ⟩+b) ≥ 1−ξi
and ξi ≥ 0 for i = 1, . . . , m. Lagrangian duality theory [4] is typically
applied to solve the minimization problems presented by linear inequalities.
Thus, one can form the primal Lagrangian, L(w, b, ξ, α, β)
1 ∑m ∑m ∑m
= ∥w∥2 + C ξi − β i ξi − αi [yi (⟨w, xi ⟩ + b) − 1 + ξi ] , (9.3)
2 i=1 i=1 i=1
and
∑
m
0= yi αi , (9.5)
i=1
these are then re-substituted into the primal, which then yields,
∑
m
1 ∑
m
L(w, b, ξ, α, β) = αi − yi yj αi αj ⟨xi , xj ⟩ . (9.6)
i=1
2 i,j=1
We note that this result is the same as for the maximum-margin classifier.
The only difference is the constraint α + β = C, where both α and β ≥ 0,
hence 0 ≤ α, β ≤ C. Thus, the value C sets an upper limit on the size
of the Lagrangian optimization variables αi and βi , this limit is sometimes
referred to as the box constraint. The selection of the value of C results in
a trade-off between accuracy of data fit and regularization. The optimum
choice of C will depend on the underlying data and nature of the problem
and is usually found by experimental cross-validation (whereby the data
is divided into a training set and testing or hold-out set and the classifier
is tested on the hold-out set alone). This is best performed on a number
of different sets (sometimes referred to as folds) of unseen data, leading
to so-called “multi-folded cross-validation”. Quadratic Programming (QP)
algorithms are typically employed to solve these equations and calculate the
Lagrangian multipliers. There are many online resources of such algorithms
available for download, see website referred to in the excellent book by
Cristianini and Shawe-Taylor [4] for an up to date listing.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Fig. 9.3. Support vector regression showing the linear ε-insensitive loss function “tube”
with width 2ε. Only those points lying outside the tube are considered support vectors.
SVMs can also lend themselves easily to the task of regression with only
a simple extension to the theory. In so doing, they allow for real-valued
targets to be estimated by modeling a linear function (see Fig. 9.3) in
feature space (see Section 9.1.5). The same concept of maximizing the
margin is retained but it is extended by a so-called loss function; which
can take on different forms. The loss functions can be linear or quadratic
or more usefully allow for an insensitive region whereby training data
points that lie within this insensitive range then no error is deemed to
have occurred. In a manner similar to the soft-margin classifier, errors are
accounted for by the inclusion of slack variables that tolerate data points
that violate this constraint to a limited extent. This then leads to a modified
∑m
set of constraints requiring the minimization of: 12 ∥w∥2 + C i=1 (ξi + ξi∗ )
subject to yi −⟨w, xi ⟩−b ≤ ε+ξi and ⟨w, xi ⟩+b−yi ≤ ε+ξi∗ with ξi , ξi∗ ≥ 0
for i = 1, . . . , m. The solution for the above QP problem is provided once
again by the use of Lagrangian duality theory, however, we have omitted a
full derivation for the sake of brevity (please see the tutorial by Smola and
Schölkopf [5] and the references therein). The above Fig. 9.3 shows only the
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
linear loss function whereby the points that are further than |ε| from the
optimal line are apportioned weight in a linearly increasing fashion. There
are, however, numerous different types of loss functions in the literature,
including the quadratic and Huber’s loss function among others which is in
fact a hybrid or piecewise combination of quadratic and linear functions.
Whilst most of them include an insensitive region to facilitate sparseness
of solution this is not always the case. Figure 9.4 shows the quadratic and
a) b)
L( ) L( )
2 2
0 yi – ‹w,xi› – b 0 yi – ‹w,xi› – b
Fig. 9.4. Quadratic (a) and Huber (b) type loss functions with ε-insensitive regions.
∑s
nSV
f (xj ) = (αi − αi∗ )K(xi , xj ) + b , (9.8)
i=1
where nSV s denotes the number of support vectors, yi are the labels, αi
and αi∗ are the Lagrangian multipliers, b the bias, xi the Support Vectors
previously identified through the training process and xj the test data
vector. The use of Kernel functions transforms a simple linear classifier
into a powerful and general nonlinear classifier (or regressor). There are
a number of different Kernel functions available, here are few popular
types [4]:
[ −∥x − x ∥2 ]
i j
Gaussian radial basis function: K(xi , xj ) = exp (9.11)
2σ 2
10 (
∏ 1
Spline function: K(xi , xj ) = 1 + xik · xjk + xik · xjk min(xik , xjk )
2
k=1
1 )
− min(xik , xjk )3 (9.12)
6
where c is an arbitrary constant parameter (often c is simply set to zero or
unity), d is the degree of the polynomial Kernel and σ controls the width
of the Gaussian radial basis function. When experimenting with training
and cross-validation it is common practice to perform a grid search over
these Kernel parameters along with the box constraint, C, for the lowest
classification error within a given training set.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Fig. 9.5. Kernel functions generate implicit nonlinear mappings, enabling SVMs to
operate on a linear separating hyperplane in higher dimensional feature space.
Fig. 9.6. The Photoplethysmograph device used to acquire the DVP waveform attached
to the index finger.
density and hence with blood vessel diameter during the cardiac cycle.
Typically the DVP waveform (see Fig. 9.7) exhibits an initial systolic peak
followed by a diastolic peak. The time between these two peaks, called the
peak-to-peak time (PPT), is related to the time taken for the pressure wave
to propagate from the heart to the peripheral blood vessels and back again.
Thus PPT is related to PWV in the large arteries. This has led to the
development of the so-called stiffness index (SI), which is simply the ratio
of PPT divided by the subject height and can be used as a crude estimate
of PWV. Older subjects, however, and in subjects with premature arterial
stiffening, the systolic and diastolic peaks in the DVP become difficult to
distinguish and SI cannot be used to estimate arterial stiffness.
A group of 461 subjects were recruited from the local area of South East
London. None of the subjects had a previous history of cardiovascular
disease or were receiving heart medication. The subjects ranged from 16 to
81 years of age, with an average age of 50 and standard deviation of 13.6
years. The DVP waveform was measured for each of the subjects along
with their PWV and a number of other basic physiological measures such
as systolic and diastolic blood pressure, their height and weight. There are
various ways in which features can be derived from the DVP waveform.
Previous work in this field [13] has led to the selection of what we have
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
x x y
y
PPT
PPT
CT
CT
Fig. 9.7. Digital Volume Pulse waveforms as extracted from the Photoplethysmograph
device and their derivatives. Labeled to show various features, e.g. PPT, RI and CT.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
and they observed that subjects with hypertension or arterial heart disease
exhibited an “increase in the crest time” compared to healthy subjects.
The Crest Time (CT), is the time from the foot of the DVP waveform to
its peak (as shown in Figs 9.7c and d) and it has proved to be a useful
feature for the classifier. Also Peak-to-Peak Time (PPT), defined as the
time between the first peak and the second peak or inflection point of
the DVP waveform (see Figs 9.7c and d). As mentioned previously in
the introduction, the second peak/inflection point on the DVP is generally
accepted to be due to reflected waves. So its timing would be related to
arterial stiffness and PWV. The definition of PPT depends on the DVP
waveform as its contour varies with subjects. When there is a second peak
as is the case with “Waveform A” in Fig. 9.7a, PPT is defined as time
between the two maxima. Hence, the time between the two positive to
negative zero-crossings of the derivative can be considered to be the PPT.
In some DVP waveforms, however, the second peak is not distinct as in
“Waveform B” in Fig. 9.7b. When this occurs, the time between the peak
of the waveform and the inflection point on the downward going slope of
the waveform (which is a local maximum of the first derivative, as shown in
Fig. 9.7d) is defined as the PPT. Another measurement extracted from the
DVP used in the classifier, is the so-called Reflection Index (RI), which is
the ratio of relative amplitudes of the first and second peaks (see Figs 9.7a
and b). These four features then: PPT, CT, RI and SI were empirically
found [14] to be amongst the best physiologically motivated features for
successful classification of PWV.
A = VΣV−1 . (9.13)
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
9.2.4. Results
Table 9.1. SVM classification rates % and (SD) for various data sets using GRBF kernel.
Data sets
Physiological Eigenvalues Combinations
P1 P2 Σ1 Σ2 Σ1+P1 Σ1+P2
Total 84.0 (0.5) 84.0 (0.5) 85.1 (1.4) 85.3 (1.6) 86.1 (0.9) 87.5 (0.0)
Sens 90.3 (2.0) 88.2 (2.1) 90.3 (3.3) 90.3 (4.0) 86.7 (1.4) 87.5 (0.0)
Spec 77.4 (0.5) 79.9 (1.0) 80.2 (2.8) 80.5 (3.4) 85.3 (2.5) 87.5 (0.0)
between the limits zero and one, thus simplifying the grid search somewhat.
Experimentation revealed that the Gaussian RBF kernel performed as well
or better than the others and hence the results in Table 9.1 are based
on this kernel (the performance of the other kernels is compared below in
Table 9.2). After performing thorough model hyper-parameter grid searches
(see Fig. A.1 in Appendix A), results were averaged from a “block” of
nine individual results obtained from a range of both constraint factor,
ν, and GRBF width, γ. This technique was applied to all the results to
mitigate against over-training of the model parameters, ensuring a more
general classifier and more realistic results. As shown in Table 9.1, the
SVM method using the physiological feature set P1 alone, gives a fairly
high degree of classification accuracy, with a significantly high sensitivity of
90.3% achieved. There was a slightly lower result of only 77.4% specificity.
Hence, the overall average successful classification rate becomes 84.0%.
The results for the signal subspace-based features, are better still, showing
a distinct improvement in the specificity when compared with those of
the physiologically motivated features. It is readily possible to achieve
sensitivities in the region of 90%, with specificities of over 80%, bringing
overall classification to 85.3%. Finally, combinations of both feature sets
were tested and it was found that two pairs gave good results whilst the
other two combinations were less effective. In fact, the combinations of
Σ1+P1 and Σ1+P2 gave the best results overall. The latter set giving an
overall classification rate of 87.5%, with an equal rate for both sensitivity
and specificity. This is definitely the best classification rate achieved in this
study so far and is a very high classification rate for PWV based solely on
features extracted from the DVP waveform. Further tests were performed
with different kernel functions to compare with those of the GRBF kernel;
linear and second and third order polynomial kernels were tested but the
GRBF kernel consistently outperformed them for all of the data sets (see
Table 9.2). For a complete treatment of this work please see Alty et al [6].
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table 9.2. Overall SVM classification rates % and (SD) for various Kernel functions for
each data set.
Data sets
Physiological Eigenvalues Combinations
Kernel P1 P2 Σ1 Σ2 Σ1+P1 Σ1+P2
Linear 84.0 (0.5) 83.3 (0.0) 82.3 (0.9) 80.9 (0.5) 84.4 (0.0) 85.8 (0.5)
Poly2 83.9 (0.8) 84.0 (0.5) 83.9 (1.4) 84.5 (1.1) 86.0 (1.2) 86.8 (0.5)
Poly3 83.7 (0.9) 83.9 (1.1) 85.0 (0.9) 84.6 (1.4) 86.1 (0.9) 86.5 (0.0)
GRBF 84.0 (0.5) 84.0 (0.5) 85.1 (1.4) 85.3 (1.6) 86.1 (0.9) 87.5 (0.0)
Table 9.3. Hand-written one-stroke digits and commands. The dot denotes the
starting point of the character.
Class No. Class Name Strokes Class No. Class Name Strokes
1 0(a) 9 6
2 0(b) 10 7
3 1 11 8(a)
4 2 12 8(b)
5 3 13 9
6 4 14 Backspace
7 5(a) 15 Carriage Return
8 5(b) 16 Space
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Fig. 9.8. Written trace of digit “2” (solid line) and 10 feature points characterized by
coordinate (ϕk , βk ) (indicated by dots).
yij ∈ ℜ10 will be produced. All output vectors yij are fed to the class
determiner simultaneously for determination of the most likely input graffiti
indicated by an integer in the output. The integer from the output of the
class determiner is in range of 1 and 16 indicating the class label of the
possible input graffiti as shown in Table 9.3.
different settings are shown in the tables in the appendices at the end
of this chapter. The value of ε is chosen to be zero for quadratic loss
function. It is found experimentally that the recognition performance of
the SVR-based graffiti recognizers with ε-insensitive loss function subject
to ε = 0.01 or ε = 0.1 is not comparable to ε = 0.05. Thus, the tables
(in the appendices) showing the recognition rate for ε = 0.01 and ε = 0.1
are omitted. Referring to the tables, the training and testing recognition
rates for each sub-recognizer j, j = 1, 2, · · · , 16, and the average training
and testing recognition rates of the SVR-based recognizer are illustrated.
The worst recognition rates given by the sub-recognizers and the best
average recognition rate are highlighted in bold under different settings
and parameters.
Table 9.5. Recognition performance of SVR-based graffiti recognizers with the average
recognition rate over 99% for both training and testing data.
Case Kernel♭ Loss Function♯ c, σ C Average (%)† Worst (%)∗
1 Linear Quadratic 0 0.1 99.2500/99.0000 93(5)/96(1,11,12)
2 Linear ϵ-insensitive 0 0.01 99.0000/99.0000 92(5)/96(1,11)
3 Spline Quadratic – 0.1 99.2500/99.0000 93(5)/96(1,11,12)
4 Spline ϵ-insensitive – 0.1 99.5625/99.1250 95(5)/94(1)
5 Poly Quadratic 1 0.1 99.2500/99.0000 93(5)/96(1,11,12)
6 Poly Quadratic 2 0.1 99.3750/99.0000 94(5)/94(12)
7 Poly ϵ-insensitive 1 0.1 99.6250/99.0000 96(5)/94(1)
8 Poly ϵ-insensitive 2 0.01 99.3750/99.2500 94(5)/96(1,11)
9 Poly ϵ-insensitive 2 0.1 99.8125/99.0000 98(5)/94(8)
10 RB Quadratic 2 1 99.3125/99.0000 93(5)/96(1,11,12)
11 RB Quadratic 5 10 99.3750/99.0000 94(5)/94(1)
12 RB Quadratic 10 10 99.2500/99.0000 93(5)/96(1,11,12)
13 RB ϵ-insensitive 1 0.01 99.1875/99.0000 93(5)/96(1,11)
14 RB ϵ-insensitive 1 0.1 99.5625/99.1250 96(5)/96(1,11)
15 RB ϵ-insensitive 2 0.01 99.1250/99.0000 92(5)/96(1,11)
16 RB ϵ-insensitive 2 0.1 99.1875/99.0000 93(5)/96(1,11)
17 RB ϵ-insensitive 5 0.01 99.1250/99.0000 92(5)/96(1,11)
18 RB ϵ-insensitive 5 0.1 99.0625/99.0000 92(5)/96(1,11)
19 RB ϵ-insensitive 5 1 99.2500/99.1250 93(5)/96(1,11)
20 RB ϵ-insensitive 10 0.01 99.1250/99.0000 92(5)/96(1,11)
21 RB ϵ-insensitive 10 0.1 99.0625/99.0000 92(5)/96(1,11)
22 RB ϵ-insensitive 10 1 99.0625/99.0000 92(5)/96(1,11)
23 RB ϵ-insensitive 10 10 99.5000/99.0000 95(5)/94(1)
♭ Poly and RB stand for polynomial and radial basis functions, respectively.
♯ When ϵ-insensitive loss function is employed, ε = 0.05 is used.
† Average recognition rate in the format of A/B. A is for the training data and B is for the
testing data.
∗ The worst recognition rate in the format of A(a)/B(b). A is for the training data and B is
for the testing data. a and b denote the class labels of the graffiti.
9.4. Conclusion
Acknowledgements
References
90
85
80
Classification Rate %
75
70
65
60
55
50
2.0
1.5 0.9
0.8
0.7
1.0 0.6
0.5
γ 0.5 0.3
0.4
0.1 0.1
0.2 ν
Fig. A.1. Grid search for optimal hyperparameters during training, showing values of
γ and ν versus overall classification rate.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table B.1. Training and testing results with linear kernel function, quadratic loss
function and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/96 100/96 100/88 98/78 98/76
2 100/100 100/100 100/100 99/94 98/70 98/64
3 100/100 100/100 100/100 98/96 93/90 92/88
4 97/100 99/100 99/100 99/100 96/100 96/100
5 93/100 93/100 95/100 98/100 100/100 100/100
6 100/98 100/98 100/98 98/94 94/86 93/84
7 98/100 98/100 100/100 98/100 90/100 86/96
8 100/98 100/98 100/94 100/86 99/76 99/76
9 100/100 100/100 100/100 100/92 94/86 94/82
10 99/100 99/100 99/100 97/80 75/36 55/34
11 99/96 99/96 99/98 99/100 97/98 97/98
12 100/96 100/96 100/96 100/96 100/96 100/96
13 100/98 100/100 100/100 100/100 100/100 100/100
14 100/100 100/100 100/100 100/100 97/66 82/40
15 100/100 100/100 100/100 100/100 99/96 96/96
16 100/100 100/100 100/100 100/92 100/78 100/76
Average 99.1250/ 99.2500/ 99.5000/ 99.1250/ 95.6250/ 92.8750/
98.8750 99.0000 98.8750 94.8750 84.7500 81.6250
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table B.2. Training and testing results with linear kernel function, ϵ-insensitive loss
function (ϵ = 0.05) and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/90 97/70 100/98 100/98 100/98
2 100/100 100/100 100/100 100/100 100/100 100/100
3 100/100 100/100 31/22 100/100 100/100 100/100
4 97/100 99/100 97/100 100/100 100/100 100/100
5 92/100 96/100 97/98 98/100 98/100 98/100
6 99/98 86/96 100/98 100/98 100/98 100/98
7 98/100 99/100 69/98 91/94 91/94 91/94
8 100/98 100/98 98/70 99/74 99/74 99/74
9 100/100 100/100 97/100 100/98 100/98 100/98
10 99/100 99/100 98/96 100/100 100/100 100/100
11 99/96 99/98 91/80 99/98 99/98 99/98
12 100/98 99/84 100/96 100/96 100/96 100/96
13 100/98 100/100 99/88 100/100 100/100 100/100
14 100/100 100/100 100/100 100/100 100/100 100/100
15 100/100 100/100 100/100 100/100 100/100 100/100
16 100/100 100/100 100/98 100/94 100/94 100/94
Average 99.0000/ 98.5625/ 92.1250/ 99.1875/ 99.1875/ 99.1875/
99.0000 97.8750 88.3750 96.8750 96.8750 96.8750
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table C.1. Training and testing results with spline kernel function, quadratic loss
function and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/96 100/96 100/94 100/92 96/96
2 100/100 100/100 100/100 98/96 100/100 100/100
3 100/100 100/100 100/100 99/98 100/100 100/100
4 98/100 99/100 99/100 99/100 100/100 100/90
5 93/100 93/100 96/100 98/100 97/100 66/46
6 100/98 100/98 100/98 100/94 97/94 4/20
7 98/100 98/100 99/100 99/100 99/98 67/98
8 100/98 100/98 100/94 100/86 99/74 100/74
9 100/100 100/100 100/100 100/96 97/92 100/84
10 98/100 99/100 99/100 98/84 95/68 99/98
11 99/96 99/96 99/98 99/100 99/98 100/96
12 100/96 100/96 100/94 100/96 100/96 99/94
13 100/98 100/100 100/100 100/100 100/100 100/100
14 100/100 100/100 100/100 100/100 100/100 100/100
15 100/100 100/100 100/100 100/100 100/100 100/100
16 100/100 100/100 100/100 100/94 100/98 100/78
Average 99.1250/ 99.2500/ 99.5000/ 99.3750/ 98.9375/ 89.4375/
98.8750 99.0000 98.7500 96.1250 94.3750 85.8750
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table C.2. Training and testing results with spline kernel function, ϵ-insensitive loss
function (ϵ = 0.05) and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/94 100/100 100/100 100/100 100/100
2 100/100 100/100 100/100 100/100 100/100 100/100
3 100/100 100/100 100/100 100/100 100/100 100/100
4 98/100 99/100 100/100 100/100 100/100 100/100
5 93/100 95/100 97/100 97/100 97/100 97/100
6 100/98 100/100 100/98 100/98 100/98 100/98
7 98/100 100/100 100/100 96/98 96/98 96/98
8 100/98 100/98 100/88 100/80 100/80 100/80
9 100/100 100/100 100/98 100/96 100/96 100/96
10 99/100 100/100 100/100 100/100 100/100 100/100
11 99/96 99/98 99/100 99/100 99/100 99/100
12 100/96 100/96 100/96 100/96 100/96 100/96
13 100/98 100/100 100/100 100/100 100/100 100/100
14 100/100 100/100 100/100 100/100 100/100 100/100
15 100/100 100/100 100/100 100/100 100/100 100/100
16 100/100 100/100 100/98 100/94 100/94 100/94
Average 99.1875/ 99.5625/ 99.7500/ 99.5000/ 99.5000/ 99.5000/
98.8750 99.1250 98.6250 97.6250 97.6250 97.6250
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table D.1. Training and testing results with polynomial function (c = 1), quadratic
loss function and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/96 100/96 100/90 98/78 98/76
2 100/100 100/100 100/100 98/90 98/70 98/64
3 100/100 100/100 100/100 98/96 93/90 92/88
4 98/100 99/100 99/100 99/100 96/100 96/100
5 93/100 93/100 95/100 98/100 100/100 100/100
6 100/98 100/98 100/98 99/94 94/86 93/84
7 98/100 98/100 99/100 98/100 90/100 86/96
8 100/98 100/98 100/94 100/86 99/76 99/76
9 100/100 100/100 100/100 100/94 94/86 94/82
10 98/100 99/100 99/100 97/80 75/36 55/34
11 99/96 99/96 99/98 99/100 97/98 97/98
12 100/96 100/96 100/94 100/96 100/96 100/96
13 100/98 100/100 100/100 100/100 100/100 100/100
14 100/100 100/100 100/100 100/100 97/66 82/40
15 100/100 100/100 100/100 100/100 99/96 96/96
16 100/100 100/100 100/100 100/92 100/78 100/76
Average 99.1250/ 99.2500/ 99.4375/ 99.1250/ 95.6250/ 92.8750/
98.8750 99.0000 98.7500 94.8750 84.7500 81.6250
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table D.2. Training and testing results with polynomial function (c = 2), quadratic
loss function and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/98 100/96 100/96 100/94 100/86 99/68
2 100/100 100/100 99/96 98/96 100/100 100/96
3 99/100 100/100 100/100 98/96 100/100 100/100
4 98/100 99/100 100/100 99/100 99/100 99/100
5 94/100 94/100 96/100 100/100 98/100 92/96
6 100/98 100/100 100/98 97/94 97/92 94/90
7 99/100 99/100 99/100 98/100 98/96 97/96
8 100/98 100/98 100/94 100/86 99/76 76/20
9 100/100 100/100 100/100 99/94 97/92 95/86
10 98/100 99/100 98/100 96/72 96/72 93/64
11 99/96 99/96 99/98 99/100 99/98 98/96
12 100/94 100/94 100/92 100/96 100/96 100/96
13 100/98 100/100 100/100 100/100 100/100 100/96
14 100/100 100/100 100/100 100/100 100/100 100/88
15 100/100 100/100 100/100 100/100 100/100 100/100
16 100/100 100/100 100/100 100/92 100/98 100/84
Average 99.1875/ 99.3750/ 99.4375/ 99.0000/ 98.9375/ 96.4375/
98.8750 99.0000 98.3750 95.0000 94.1250 86.0000
Table D.3. Training and testing results with polynomial function (c = 5), quadratic
loss function and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/92 100/92 100/98 100/92 100/94 100/68
2 97/98 98/86 98/88 99/100 100/100 100/100
3 96/100 99/100 99/100 100/100 100/100 100/100
4 98/100 99/100 97/100 100/100 100/100 99/80
5 91/100 96/100 97/100 99/100 99/100 73/68
6 100/100 100/100 100/96 100/94 96/94 97/92
7 96/88 98/92 98/98 98/96 99/98 90/72
8 100/98 100/96 100/90 100/82 94/60 15/8
9 100/100 100/100 99/98 98/94 97/90 93/92
10 98/98 98/100 98/90 98/76 99/78 86/86
11 99/90 99/90 98/94 99/100 99/98 100/94
12 99/90 99/90 100/82 100/94 100/96 100/96
13 100/98 100/98 100/100 100/100 100/100 100/98
14 100/100 100/100 100/100 100/100 100/100 100/100
15 99/100 99/100 99/100 100/100 100/100 100/100
16 100/100 100/100 100/94 100/98 100/96 100/84
Average 98.3125/ 99.0625/ 98.9375/ 99.4375/ 98.9375/ 90.8125/
97.0000 96.5000 95.5000 95.3750 94.0000 83.6250
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table D.4. Training and testing results with polynomial function (c = 10), quadratic
loss function and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 99/78 99/68 100/76 100/96 100/96 100/80
2 98/82 98/70 98/78 100/94 100/100 100/100
3 92/98 98/100 100/100 100/100 100/100 100/100
4 96/100 98/100 100/100 100/100 100/100 100/88
5 94/100 97/100 92/96 98/100 96/100 98/84
6 100/98 100/96 100/96 100/94 96/92 98/94
7 99/96 96/84 98/96 100/98 98/98 76/64
8 100/98 100/92 100/90 98/52 98/78 76/50
9 100/98 99/98 100/98 96/84 99/96 94/96
10 98/98 98/100 98/86 96/80 100/88 84/84
11 99/98 99/98 99/98 97/88 98/98 100/94
12 100/100 98/82 100/76 100/94 100/94 100/96
13 100/98 95/96 100/100 100/100 99/100 100/100
14 100/100 100/100 100/100 100/100 100/100 100/100
15 99/100 99/100 99/100 100/100 100/100 100/100
16 100/82 100/96 100/100 100/96 100/98 100/80
Average 98.3750/ 98.3750/ 99.0000/ 99.0625/ 99.0000/ 95.3750/
95.2500 92.5000 93.1250 92.2500 96.1250 88.1250
Table D.5. Training and testing results with polynomial function (c = 1),
ϵ-insensitive loss function (ϵ = 0.05) and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/94 100/96 100/98 100/98 100/98
2 100/100 100/100 100/100 100/100 100/100 100/100
3 100/100 100/100 100/100 100/100 100/100 100/100
4 98/100 99/100 100/100 100/100 100/100 100/100
5 93/100 96/100 98/100 97/100 97/100 97/100
6 100/98 100/100 100/98 100/98 100/98 100/98
7 98/100 100/100 100/100 94/98 94/98 94/98
8 100/98 100/98 100/88 99/80 99/80 99/80
9 100/100 100/100 100/98 100/96 100/96 100/96
10 99/100 100/100 100/100 100/100 100/100 100/100
11 99/96 99/96 99/98 99/100 99/100 99/100
12 100/96 100/96 100/96 100/96 100/96 100/96
13 100/98 100/100 100/100 100/100 100/100 100/100
14 100/100 100/100 100/100 100/100 100/100 100/100
15 100/100 100/100 100/100 100/100 100/100 100/100
16 100/100 100/100 100/98 100/94 100/94 100/94
Average 99.1875/ 99.6250/ 99.8125/ 99.3125/ 99.3125/ 99.3125/
98.8750 99.0000 98.2500 97.5000 97.5000 97.5000
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table D.6. Training and testing results with polynomial function (c = 2),
ϵ-insensitive loss function (ϵ = 0.05) and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/96 100/100 100/100 100/100 100/100
2 100/100 100/100 100/100 100/100 100/100 100/100
3 100/100 100/100 100/100 100/100 100/100 100/100
4 99/100 100/100 100/100 100/100 100/100 100/100
5 94/100 98/100 98/100 97/100 97/100 97/100
6 100/100 100/100 100/98 100/98 100/98 100/98
7 99/100 100/100 96/100 93/98 93/98 93/98
8 100/98 100/94 100/82 99/74 99/74 99/74
9 100/100 100/100 100/94 100/94 100/94 100/94
10 99/100 100/100 100/100 100/100 100/100 100/100
11 99/96 99/98 99/100 99/100 99/100 99/100
12 100/98 100/96 100/98 100/98 100/98 100/98
13 100/100 100/100 100/100 100/100 100/100 100/100
14 100/100 100/100 100/100 100/100 100/100 100/100
15 100/100 100/100 100/100 100/100 100/100 100/100
16 100/100 100/100 100/98 100/98 100/98 100/98
Average 99.3750/ 99.8125/ 99.5625/ 99.2500/ 99.2500/ 99.2500/
99.2500 99.0000 98.1250 97.5000 97.5000 97.5000
Table D.7. Training and testing results with polynomial function (c = 5),
ϵ-insensitive loss function (ϵ = 0.05) and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/92 100/96 100/96 100/96 100/96 100/96
2 98/88 100/90 100/88 100/88 100/88 100/88
3 99/100 100/100 100/100 100/100 100/100 100/100
4 98/100 100/100 100/100 100/100 100/100 100/100
5 96/100 97/100 99/100 99/100 99/100 99/100
6 100/100 100/98 100/98 100/98 100/98 100/98
7 98/92 98/94 98/98 98/98 98/98 98/98
8 100/96 100/90 100/76 100/76 100/76 100/76
9 100/100 100/96 98/88 98/88 98/88 98/88
10 98/100 100/98 100/94 100/94 100/94 100/94
11 99/90 96/78 97/90 97/90 97/90 97/90
12 99/90 100/90 100/96 100/96 100/96 100/96
13 100/98 100/100 100/100 100/100 100/100 100/100
14 100/100 100/100 100/100 100/100 100/100 100/100
15 99/100 99/100 100/100 100/100 100/100 100/100
16 100/100 100/96 100/98 100/98 100/98 100/98
Average 99.0000/ 99.3750/ 99.5000/ 99.5000/ 99.5000/ 99.5000/
96.6250 95.3750 95.1250 95.1250 95.1250 95.1250
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table D.8. Training and testing results with polynomial function (c = 10),
ϵ-insensitive loss function (ϵ = 0.05) and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/74 97/66 98/76 98/76 98/76 98/76
2 99/66 99/72 94/60 93/60 94/60 93/60
3 95/98 98/98 98/98 98/98 98/98 98/98
4 98/100 98/100 99/100 99/100 99/100 99/100
5 98/100 100/100 99/98 99/98 99/98 99/98
6 100/96 100/96 100/96 100/96 100/96 100/96
7 97/90 97/96 97/98 97/98 97/98 97/98
8 94/94 100/90 100/82 100/82 100/82 100/82
9 99/96 96/88 92/80 92/80 92/80 92/80
10 98/98 99/84 99/84 99/84 99/84 99/84
11 99/98 99/100 99/100 99/100 99/100 99/100
12 99/90 100/98 100/100 100/100 100/100 100/100
13 93/90 96/98 96/96 96/96 96/96 96/96
14 99/86 100/94 100/96 100/96 100/96 100/96
15 99/100 100/100 100/98 100/98 100/98 100/98
16 98/72 100/92 99/86 99/86 99/86 99/86
Average 97.8125/ 98.6875/ 98.1250/ 98.0625/ 98.1250/ 98.0625/
90.5000 92.0000 90.5000 90.5000 90.5000 90.5000
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table E.1. Training and testing results with radial basis kernel function (σ = 1),
quadratic loss function and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/96 100/96 100/90 99/78 100/90
2 100/100 100/100 100/100 98/90 98/68 100/100
3 100/100 100/100 100/100 98/96 95/90 100/100
4 97/100 99/100 99/100 99/100 100/100 100/100
5 93/100 93/100 95/100 99/100 100/100 99/100
6 100/98 100/98 100/98 99/94 96/96 100/98
7 98/100 98/100 99/100 98/100 90/90 100/98
8 100/98 100/98 100/94 100/86 99/76 100/80
9 100/100 100/100 100/100 100/92 95/90 100/96
10 99/100 99/100 99/100 97/78 82/60 100/88
11 99/96 99/96 99/98 99/100 97/92 100/100
12 100/96 100/96 100/96 100/96 100/92 100/92
13 100/98 100/98 100/100 100/100 100/98 100/100
14 100/100 100/100 100/100 100/100 100/100 100/100
15 100/100 100/100 100/100 100/100 99/100 100/100
16 100/100 100/100 100/100 100/94 100/96 100/100
Average 99.1250/ 99.2500/ 99.4375/ 99.1875/ 96.8750/ 99.9375/
98.8750 98.8750 98.8750 94.7500 89.1250 96.3750
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table E.2. Training and testing results with radial basis kernel function (σ = 2),
quadratic loss function and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/96 100/96 100/92 100/84 100/100
2 100/100 100/100 100/100 100/100 98/80 100/94
3 100/100 100/100 100/100 100/100 95/94 97/92
4 97/100 98/100 99/100 100/100 98/100 100/100
5 93/100 93/100 93/100 97/100 100/100 97/100
6 100/98 100/98 100/98 100/98 97/92 99/96
7 98/100 98/100 99/100 100/100 95/100 81/72
8 100/98 100/98 100/98 100/90 99/76 98/72
9 100/100 100/100 100/100 100/100 98/88 99/88
10 99/100 99/100 99/100 99/100 92/64 79/82
11 99/96 99/96 99/96 99/98 99/100 100/98
12 100/96 100/96 100/96 100/96 100/96 100/98
13 100/98 100/98 100/100 100/100 100/100 99/100
14 100/100 100/100 100/100 100/100 100/96 100/98
15 100/100 100/100 100/100 100/100 99/100 100/100
16 100/100 100/100 100/100 100/98 100/88 100/88
Average 99.1250/ 99.1875/ 99.3125/ 99.6875/ 98.1250/ 96.8125/
98.8750 98.8750 99.0000 98.2500 91.1250 92.3750
Table E.3. Training and testing results with radial basis kernel function (σ = 5),
quadratic loss function and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/96 100/96 100/94 100/92 100/90
2 100/100 100/100 100/100 100/100 100/98 100/100
3 100/100 100/100 100/100 100/100 100/100 100/100
4 97/100 97/100 98/100 99/100 99/100 100/100
5 93/100 93/100 93/100 94/100 98/100 98/100
6 100/98 100/98 100/98 100/98 100/94 100/96
7 98/100 98/100 98/100 99/100 100/100 99/94
8 100/98 100/98 100/98 100/98 100/90 100/74
9 100/100 100/100 100/100 100/100 100/96 100/96
10 99/100 99/100 99/100 99/100 98/100 99/98
11 99/96 99/96 99/96 99/96 99/98 99/96
12 100/96 100/96 100/96 100/98 100/96 100/98
13 100/98 100/98 100/98 100/100 100/100 100/100
14 100/100 100/100 100/100 100/100 100/100 100/100
15 100/100 100/100 100/100 100/100 100/100 100/100
16 100/100 100/100 100/100 100/100 100/94 100/100
Average 99.1250/ 99.1250/ 99.1875/ 99.3750/ 99.6250/ 99.6875/
98.8750 98.8750 98.8750 99.0000 97.3750 96.3750
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table E.4. Training and testing results with radial basis kernel function (σ = 10),
quadratic loss function and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/96 100/96 100/96 100/96 100/96
2 100/100 100/100 100/100 100/100 100/100 100/100
3 100/100 100/100 100/100 100/100 100/100 100/100
4 97/100 97/100 97/100 99/100 99/100 99/100
5 93/100 93/100 93/100 93/100 95/100 97/100
6 100/98 100/98 100/98 100/98 100/98 100/96
7 98/100 98/100 98/100 98/100 100/100 98/98
8 100/98 100/98 100/98 100/98 100/94 100/90
9 100/100 100/100 100/100 100/100 100/100 100/100
10 99/100 99/100 99/100 99/100 99/100 98/98
11 99/96 99/96 99/96 99/96 99/98 99/98
12 100/96 100/96 100/96 100/96 100/96 100/94
13 100/98 100/98 100/98 100/100 100/100 100/100
14 100/100 100/100 100/100 100/100 100/100 100/100
15 100/100 100/100 100/100 100/100 100/100 100/100
16 100/100 100/100 100/100 100/100 100/100 100/98
Average 99.1250/ 99.1250/ 99.1250/ 99.2500/ 99.5000/ 99.4375/
98.8750 98.8750 98.8750 99.0000 98.8750 98.0000
Table E.5. Training and testing results with radial basis kernel function (σ = 1),
ϵ-insensitive loss function (ϵ = 0.05) and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/96 100/94 100/98 100/98 100/98
2 100/100 100/100 100/100 100/100 100/100 100/100
3 100/100 99/100 100/100 100/100 100/100 100/100
4 98/100 99/100 100/100 100/100 100/100 100/100
5 93/100 96/100 98/100 97/100 97/100 97/100
6 100/98 100/98 100/98 100/98 100/98 100/98
7 98/100 100/100 100/100 94/98 94/98 94/98
8 100/98 100/98 100/86 99/74 99/74 99/74
9 100/100 100/100 100/98 100/96 100/96 100/96
10 99/100 100/100 100/100 100/100 100/100 100/100
11 99/96 99/96 99/98 99/100 99/100 99/100
12 100/98 100/98 100/98 100/96 100/96 100/96
13 100/98 100/100 100/100 100/100 100/100 100/100
14 100/100 100/100 100/100 100/100 100/100 100/100
15 100/100 100/100 100/100 100/100 100/100 100/100
16 100/100 100/100 100/98 100/94 100/94 100/94
Average 99.1875/ 99.5625/ 99.8125/ 99.3125/ 99.3125/ 99.3125/
99.0000 99.1250 98.1250 97.1250 97.1250 97.1250
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table E.6. Training and testing results with radial basis kernel function (σ = 2),
ϵ-insensitive loss function (ϵ = 0.05) and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/96 100/96 100/98 100/98 100/98
2 100/100 100/100 100/100 100/100 100/100 100/100
3 100/100 100/100 99/100 100/100 100/100 100/100
4 98/100 98/100 100/100 100/100 100/100 100/100
5 92/100 93/100 98/100 97/100 97/100 97/100
6 100/98 100/98 100/98 100/98 100/98 100/98
7 98/100 98/100 100/100 94/98 94/98 94/98
8 100/98 100/98 100/94 99/74 99/74 99/74
9 100/100 100/100 100/100 100/96 100/96 100/96
10 99/100 99/100 100/100 100/100 100/100 100/100
11 99/96 99/96 99/98 99/100 99/100 99/100
12 100/98 100/98 100/96 100/96 100/96 100/96
13 100/98 100/98 100/100 100/100 100/100 100/100
14 100/100 100/100 100/100 100/100 100/100 100/100
15 100/100 100/100 100/100 100/100 100/100 100/100
16 100/100 100/100 100/98 100/94 100/94 100/94
Average 99.1250/ 99.1875/ 99.7500/ 99.3125/ 99.3125/ 99.3125/
99.0000 99.0000 98.7500 97.1250 97.1250 97.1250
Table E.7. Training and testing results with radial basis kernel function (σ = 5),
ϵ-insensitive loss function (ϵ = 0.05) and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/96 100/96 100/96 100/98 100/98
2 100/100 100/100 100/100 100/100 100/100 100/100
3 100/100 100/100 100/100 100/100 100/100 100/100
4 98/100 98/100 99/100 100/100 100/100 100/100
5 92/100 92/100 93/100 98/100 97/100 97/100
6 100/98 99/98 100/98 100/98 100/98 100/98
7 98/100 98/100 98/100 100/100 94/98 94/98
8 100/98 100/98 100/98 100/92 99/74 99/74
9 100/100 100/100 100/100 100/100 100/96 100/96
10 99/100 99/100 99/100 100/100 100/100 100/100
11 99/96 99/96 99/96 99/98 99/100 99/100
12 100/98 100/98 100/98 100/96 100/96 100/96
13 100/98 100/98 100/100 100/100 100/100 100/100
14 100/100 100/100 100/100 100/100 100/100 100/100
15 100/100 100/100 100/100 100/100 100/100 100/100
16 100/100 100/100 100/100 100/98 100/94 100/94
Average 99.1250/ 99.0625/ 99.2500/ 99.8125/ 99.3125/ 99.3125/
99.0000 99.0000 99.1250 98.6250 97.1250 97.1250
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
Table E.8. Training and testing results with radial basis kernel function (σ = 10),
ϵ-insensitive loss function (ϵ = 0.05) and various values of C.
Recognition rate (%) (Training/Testing phase)
Class No. C = 0.01 C = 0.1 C=1 C = 10 C = 100 C = Inf
1 100/96 100/96 100/96 100/94 100/94 100/98
2 100/100 100/100 100/100 100/100 100/100 100/100
3 100/100 100/100 100/100 99/100 100/100 100/100
4 98/100 98/100 98/100 99/100 100/100 100/100
5 92/100 92/100 92/100 95/100 98/100 97/100
6 100/98 99/98 99/98 100/98 100/98 100/98
7 98/100 98/100 98/100 100/100 99/100 94/98
8 100/98 100/98 100/98 100/98 100/86 99/74
9 100/100 100/100 100/100 100/100 100/98 100/96
10 99/100 99/100 99/100 100/100 100/100 100/100
11 99/96 99/96 99/96 99/96 99/98 99/100
12 100/98 100/98 100/98 100/98 100/96 100/96
13 100/98 100/98 100/98 100/100 100/100 100/100
14 100/100 100/100 100/100 100/100 100/100 100/100
15 100/100 100/100 100/100 100/100 100/100 100/100
16 100/100 100/100 100/100 100/100 100/98 100/94
Average 99.1250/ 99.0625/ 99.0625/ 99.5000/ 99.7500/ 99.3125/
99.0000 99.0000 99.0000 99.0000 98.0000 97.1250
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 9
254
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 10
Chapter 10
∗ †‡
Weidong Chen, Steven W. Su, † Yi Zhang, § Ying Guo, † Nghir Nguyen,
#‡
Branko G. Celler and † Hung T. Nguyen
Faculty of Engineering and Information Technology,
University of Technology,
Sydney, Australia
[email protected]
∗ Department
of Automation, Shanghai Jiao Tong University, Shanghai, China.
† Centre
for Health Technologies, Faculty of Engineering and Information Technology,
University of Technology, Sydney, Australia.
‡ Human Performance Group, Biomedical Systems Lab, School of Electrical Engineering
and Telecommunications, University of New South Wales, UNSW Sydney N.S.W. 2052
Australia.
# College of Health and Science, University of Western Sydney, UWS Penrith NSW 2751
Australia.
§ Autonomous Systems Lab, CSIRO ICT Center, Sydney, Australia.
255
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 10
Contents
10.1. Introduction
Let {ui , yi }N
i=1 be a set of inputs and outputs data points (ui ∈ U ⊆
Rd , yi ∈ Y ⊆ R, N is the number of points). The goal of the support vector
regression is to find a function f (u) which has the following form
f (u) = w · ϕ(u) + b, (10.1)
where ϕ(u) is the high-dimensional feature spaces which are nonlinearly
transformed from u. The weight vector w and bias b are defined as the
hyperplane by the equation ⟨w · ϕ(u)⟩ + b = 0. The hyperplane is estimated
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 10
4min
Va
3min 5min
Vb
t
t1 t2 t3 t4
0
Ax (g)
−10
−20
0 1 2 3 4 5
4
x 10
10
Ay (g)
−10
0 1 2 3 4 5
4
x 10
10
Az (g)
−10
0 1 2 3 4 5
Samples 4
x 10
1 ∑
N
1
∥w∥2 + C Lε (yi , f (ui )). (10.2)
2 N i=1
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 10
200
−200
0 1 2 3 4 5
4
x 10
90
80
70
0 1 2 3 4 5
4
x 10
200
−200
0 1 2 3 4 5
4
x 10
Fig. 10.3. Roll, pitch and yaw angles provided by the Micro IMU.
The first term is called the regularized term. The second term is the
empirical error measured by ε-insensitivity loss function which is defined
as:
{
| yi − f (ui ) | −ε, | yi − f (ui ) |> ε
Lε (yi , f (ui )) = (10.3)
0, | yi − f (ui ) |≤ ε.
This defines an ε tube. The radius ε of the tube and the regularization
constant C are both determined by user.
The selection of parameter C depends on application knowledge of the
domain. Theoretically, a small value of C will under-fit the training data
because the weight placed on the training data is too small, thus resulting
in large values of MSE (mean square error) on the test sets. However,
1
when C is too large, SVR will over-fit the training set so that ∥w∥2 will
2
lose its meaning and the objective goes back to minimize the empirical risk
only. Parameter ε controls the width of the ε-insensitive zone. Generally,
the larger the ε the fewer number of support vectors and thus the sparser
the representation of the solution. However, if the ε is too large, it can
deteriorate the accuracy on the training data.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 10
80
60
40
20
−20
−40
−60
−80
−100
0 100 200 300 400 500 600
Samples
∑
N
f (u) = βi ⟨ui , u⟩ + b. (10.6)
i=1
For nonlinear SVR, there are a number of kernel functions which have
been found to provide good generalization capabilities, such as polynomials,
radial basis function (RBF), sigmod. Here we present the polynomials and
RBF kernel functions as follows:
Polynomial kernel: k(u, u′ ) = ((u · u′ ) + h)p .
∥u − u′ ∥2
RBF Kernel: k(u, u′ ) = exp(− ).
2σ 2
Details about SVR, such as the selection of radius ε of the tube, kernel
function and the regularization constant C, can be found in [18] [21].
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 10
100
90
80
2100 2200 2300 2400 2500 2600 2700 2800 2900 3000
100
95
90
85
2900 3000 3100 3200 3300 3400 3500 3600 3700
100
95
90
3600 3700 3800 3900 4000 4100 4200 4300 4400
t(seconds)
10.3. Experiment
A 41-year-old healthy male joined the study. He was 178 cm tall and 79
kg heavy. Experiments were performed in the afternoon, and the subject
was allowed to have a light meal one hour before the measurements. After
walking for about 10 minutes on the treadmill to get acquainted with this
kind of exercise, the subject walked at six sets of exercise protocol (see
Fig. 10.1) to test step response. The values of walking speed Va and
Vb were designed to vary exercise intensity and are listed in Table 10.1.
To properly identify time constants for onset and offset of exercise, the
recorded data should be precisely synchronized. Therefore, time instants
t1 , t2 , t3 , and t4 should be identified and marked accurately. In this study,
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 10
140
120
100
0 1 2 3 4 5 6 7 8
5
x 10
140
120
100
0 1 2 3 4 5 6 7 8
5
x 10
140
120
100
0 1 2 3 4 5 6 7 8
5
x 10
130
Measured
Estimated
125
120
Heart Rate (bmp)
115
110
105
100
0 50 100 150 200 250 300
Time (Seconds)
Original signals of IMU, ECG and Sp O2 are shown in Figs 10.3, 10.5 and
10.6 respectively. It is well known that even in the absence of external
interference the heart rate can vary substantially over time under the
influence of various internal or external factors. [12] As mentioned before,
in order to reduce the variance, designed experimental protocol has been
repeated three times. Experimental data of these repeated experiments has
been synchronized and averaged.
A typical measured heart rate response is shown in Fig. 10.7. Paper
[12] found that heart rate response to exercise can be approximated as
first order process from a control application point of view. Therefore we
established first order model for six averaged step response data by using
Matlab System Identification Toolbox [22].
Table 10.2 shows the identified steady state gain (K) and time constant
(T ) by using averaged data of three sets of experimental data. A typical
curve fitting result is shown in Fig. 10.8. From Table 10.2, we can clearly
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 10
50
Time Constant (Seconds)
40
30
20
10
0
1 1.5 2 2.5 3 3.5 4
Speed (mph)
Fig. 10.9. SVM regression results for time constant at onset of exercise.
see that both steady state gain and time constant vary when walking speed
Va and Vb change. Furthermore, time constants of the offset of exercise are
noticeably bigger than those of the onset of exercise. However, it should be
pointed out that the variants of time constants are not distinctly dependent
on walking speed when walking speed is less than three miles/hour. Overall,
experimental results indicate that heart rate dynamics at the onset and
offset of exercise exhibit high nonlinearity when walking speed is higher
than three miles/hour.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 10
100
Time constant (Seconds)
80
60
40
20
0
1 1.5 2 2.5 3 3.5 4
Speed (mph)
Fig. 10.10. SVM regression results for time constant at offset of exercise.
Table 10.2. The identified time constants and steady state gains
by using averaged data.
Sets Onset Offset
DC gain Time constant DC gain Time constant
1 9.2583 9.4818 7.9297 27.358
2 11.264 10.193 9.8561 27.365
3 10.006 13.659 8.9772 26.741
4 12.807 18.618 12.087 30.865
5 17.753 38.192 17.953 48.114
6 32.911 55.974 25.733 81.693
30
25
Steady State Gain
20
15
10
0
1 1.5 2 2.5 3 3.5 4
Speed (mph)
points of input–output data. The circled plus markers are the support
points. It should be emphasized that ε-insensitive SVR uses just less than
30% of total points to sparsely describe the nonlinear relationship efficiently.
It can be seen that the time constant at the offset of exercise is bigger
than that at the onset of exercise. It can also be observed that the time
constant at the onset of exercise is more accurately identified than that at
the offset of exercise. This is indicated by the ε tube (or the width of the
ε-nsensitive zone). It is probable that the recovery stage can be influenced
by other exercise-unrelated factors than those at the onset of exercise.
The SVM regression results for DC gain at the onset and offset of
exercise are shown in Figs 10.11 and 10.12. It can be observed that the DC
gain for recovery stage is less than that at the onset of exercise, especially
when walking speed is greater than three miles/hour.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 10
25
20
Steady State Gain
15
10
0
1 1.5 2 2.5 3 3.5 4
Speed (mph)
10.5. Conclusion
This study aims to capture the nonlinear behavior of heart rate response to
treadmill walking exercises by using support vector machine-based analysis.
We identified both steady state gain and the time constant under different
walking speeds by using the data from a healthy middle-aged male subject.
Both steady state gain and the time constant are variant under different
walking speeds. The time constant for the recovery stage is longer than
that at the onset of exercise as predicted. In this study, these nonlinear
behaviors have been quantitatively described by using an effective machine
learning-based approach, named SVM regression. Based on the established
model, we have already developed a new switching control approach which
will be reported somewhere else. We believe this integrated modeling and
control approach can be utilized to a broad range of process control. [6] In
the next step of this study, we are planning to recruit more subjects to test
the established nonlinear modeling and control approach further.
March 8, 2012 14:59 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 10
References
Chapter 11
† †‡
Yi Zhang, Steven W. Su, #‡
Branko G. Celler and † Hung T. Nguyen
[email protected]
Contents
and Telecommunications, University of New South Wales, Sydney NSW 2052 Australia.
# College of Health and Science, University of Western Sydney, Penrith NSW 2751
Australia.
271
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 11
11.1. Introduction
In this study, one of the most popular MPC algorithms, Dynamic Matrix
Control, is selected to control the heart rate responses based on previously
established SVM-based nonlinear time variant model. The major benefit of
using a DMC controller is its simplicity and efficiency for the computation
of optimal control action, which is essentially a least-square optimization
problem.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 11
11.2. Background
The basic structure of the MPC controller consists of two main elements as
depicted in Fig. 11.2. The first element is the predicted model of the process
to be controlled. Further, if any measurable disturbance or noise exists in
the process it can be added to the model of the system for compensation.
The second element is the optimizer that calculates the future control action
to be executed in the next step while taking into account the cost function
and constraints [14].
DMC uses a linear finite step response model of the process to predict
the process variable profile, ŷ(k + j) over j sampling instants ahead of the
current time, k:
∑
j ∑
N −1
ŷ(k + j) = y0 + Ai △ u(k + j − i) + Pi △ u(k + j − i)
i=1 i=j+1
| {z } | {z }
Effect of current and future moves Effect of past moves
∑
N −1
ŷ(k + j) = y0 + (Pi △ u(k + j − i)) + d(k + j), (11.2)
i=j+1
where the term d(k + j) combines the unmeasured disturbances and the
inaccuracies due to plant–model mismatch. Since future values of the
disturbances are not available, d(k + j) over future sampling instants is
assumed to be equal to the current value of the disturbance, or
∑
N −1
d(k + j) = d(k) = y(k) − y0 − (Hi △ u(k − j)), (11.3)
i=1
∑
N −1
Rsp (k + j) − y0 − Pi △ u(k + j − i) − d(k + j) =
i=j+1
| {z }
Predicted error based on past moves, e(k+j)
∑
j
Ai △ u(k + j − i) . (11.5)
i=1
| {z }
Effect of current and future moves to be determined
ē = A △ ū, (11.7)
where ē is the vector of predicted errors over the next P sampling instants,
A is the dynamic matrix and △ū is the vector of controller output moves
to be determined.
An exact solution to (11.7) is not possible since the number of equations
exceeds the degrees of freedom (P > M ). Hence, the control objective is
posed as a least squares optimization problem with a quadratic performance
objective function of the form determined.
Fig. 11.3. Block diagram for double model predictive switching control system.
is the controller output for onset of exercise, uoffset is the controller output
for offset of exercise, yonset is the measured output of the process for onset
of exercise and yoffset is the measured output of the process for offset of
exercise, then
If △R(k) > 0 then
umeas = uonset . (11.17)
And if △u(k) > 0
ymeas = yonset . (11.18)
Otherwise
ymeas = yoffset . (11.19)
If △R(k) = 0
And if △u(k) > 0
umeas = uonset . (11.20)
Otherwise
From this point of view, ymeas is the actual measured output of the
control system and umeas is the actual controller output after which the
controller is selected by the above conditions.
25
20
15
10
0
100 200 300 400 500 600 700 800 900
time, 30 minutes
Fig. 11.4. Simulation results for machine learning-based double nonlinear model
predictive switching control for cardio-respiratory response to exercise with all tuning
parameters (I).
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 11
Fig. 11.5. Simulation results added noise for machine learning-based double nonlinear
model predictive switching control for cardio-respiratory response to exercise with all
tuning parameters.
11.3.4. Simulation
30.1
30
29.9
29.8
29.7
29.6
29.5
29.4
250 300 350 400 450 500 550 600 650
time, 30 minutes
Fig. 11.6. Simulation results for machine learning-based double nonlinear model
predictive switching control for cardio-respiratory response to exercise with all tuning
parameters (II).
References
286
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 12
Chapter 12
∗
Davood Dehestani, † Ying Guo, ∗ Sai Ho Ling, ∗ Steven W. Su and ∗ Hung
T. Nguyen
Faculty of Engineering and Information Technology,
University of Technology,
Sydney, Australia
[email protected]
Contents
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
12.2 General Introduction on HVAC System . . . . . . . . . . . . . . . . . . . . . 289
12.3 HVAC Parameter Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
∗ Faculty of Engineering and Information Technology, University of Technology, Sydney,
Australia
† Autonomous System Lab, CSIRO ICT Center, Australia
287
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 12
288 Davood Dehestani, Ying Guo, Sai Ho Ling, Steven W. Su and Hung T. Nguyen
12.1. Introduction
There are not many energy systems so commonly used in both industry
and domestic as HVAC systems. Moreover, HVAC systems usually
consume the largest portion of energy in a building both in industry and
domestically. It is reported in [1] that the air-conditioning of buildings
accounts for 28% of the total energy end use of commercial sectors. From
15% to 30% of the energy waste in commercial buildings is due to the
performance degradation, improper control strategy and malfunctions of
HVAC systems. Regular checks and maintenance are usually the keys
to reaching these goals. However, due to the high cost of maintenance,
preventive maintenance plays an important role. A cost-effective strategy
is the development of fault detection and isolation (FDI).
Several strategies have been employed as an FDI modular in an
HVAC system. These strategies can be mainly classified in two
categories: model-based strategy and signal processing-based strategy [2–4].
Model-based techniques either use a mathematical model or a knowledge
model to detect and isolate the faulty modes. These techniques include but
are not limited to observer-based approach [5], parity-space approach [6],
and parameter identification-based methods [7]. Henao [8] reviewed fault
detection based on signal processing. This procedure involves mathematical
or statistical operations which are directly performed on the measurements
to extract the features of faults.
Intelligent methods such as genetic algorithm (GA), neural network
(NN) and fuzzy logic had been applied during the last decade for fault
detection. Neural network has been used in a range of systems for fault
detection even for HVAC systems [9]. Lo [10] proposed intelligent technique
based on fuzzy-genetic algorithm (FGA) for automatically detecting faults
on HVAC systems. However, many intelligent methods such as NN
often require big data sets for training. Some of them are not fast
enough to realize real-time fault detection and isolation. This chapter
investigates methods with real-time operation capability and requiring less
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 12
data. Support vector machine (SVM) has been extensively studied in data
mining and machine learning communities for the last two decades. SVM
is capable of both classification and regression. It is easy to formulate a
fault detection and isolation problem as a classification problem.
SVM can be treated as a special neural network. In fact, an SVM model
is equivalent to a two-layer, perceptron neural network. With using a kernel
function, SVM is an alternative training method for multi-layer perceptron
classifiers in which the weights of the network are identified by solving a
quadratic programming problem under linear constraints, rather than by
solving a non-convex unconstrained minimization problem as in standard
neural network training.
Liang [2] studied FDI for HVAC systems by using standard SVM
(offline). In this chapter, incremental SVM (online) has been applied. It is
required to solve a quadratic programming (QP) for the training of an SVM.
However, standard numerical techniques for QP are unfeasible for very large
data sets which is the situation for fault detection and isolation for HVAC
systems. By using online SVM, the large-scale classification problems can
be implemented in real-time configuration under limited hardware and
software resources. Furthermore, this chapter also provides a potential
approach for the implementation of FDI under an unsupervised learning
framework.
Based on the model structure given in paper [2], we constructed a
HVAC model by using Matlab/Simulink and identified the variables which
are more sensitive to commonly encountered HVAC faults. Finally, the
effectiveness of the proposed online FDI approach has been verified and
illustrated by using Simulink Simulation Platform.
290 Davood Dehestani, Ying Guo, Sai Ho Ling, Steven W. Su and Hung T. Nguyen
system starts to work, fresh air passes from a heat exchanger cooling coil
section to change heat between fresh air and cooling water. Cooled fresh
air is forced by a supply fan to the room.
After just a few minutes, the return damper opens to allow room air to
come back to AHU. Then the mixing air passes from the cooling coil section
to decrease its temperature and humidity. A trade off among exhaust, fresh
and return air is decided by the control unit. Also the temperature of the
room is regulated by adjusting the flow rate of cooling water by a certain
control valve. Figure 12.2 shows the block diagram of the HVAC system
with a simple PI controller. The model consists of eight variables in which
six variables define as state variables.
Two pressures (air supply pressure Ps and room air pressure Pa ) and
four temperatures (wall temperature Tw , cooling coil temperature Tcc , air
supply temperature Ts and room air temperature Ta ) are considered as six
states. Cooling water flow rate fcc is considered as control signal. Also
all six mentioned state variables with cooling water outlet temperature
Twaterout are considered as system outputs. But just one of them (room
temperature Ta ) acts as feedback signa. It should be noted that outlet
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 12
X = [Pa ; Ps ; Tw ; Tcc ; Ts ; Ta ]
U = [fcc ]
Y = [Pa ; Ps ; Tw ; Tcc ; Ts ; Ta ]
For the control of HVAC systems, the most popular method is
Proportion-Integration (PI) control with the flow rate fc c served as the
control input. In this study, we therefore select a PI controller, and tune the
controller by using the Ziegler–Nichols method (Reaction Curve Method).
In order to simulate the environmental disturbance in real application,
two disturbances are considered in the model: outdoor temperature and
outdoor heating (or cooling) load. Outdoor heat/cool loading will disturb
the system but it cannot be measured directly. Though it can be estimated
based on the supply/return air temperature/humidity via a load observer
[11], but for convenience, it is assumed that the two disturbances are
sinusoidal functions.
Table 12.1 shows some major parameters used in this simulation. Without
loss of generality, the simulation runs in cooling mode. It is easy to consider
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 12
292 Davood Dehestani, Ying Guo, Sai Ho Ling, Steven W. Su and Hung T. Nguyen
All initial values of temperature are set at the morning time. Mathematical
model simulated for 12 hours (from 8 am to 8 pm). The output temperature
for normal condition (without fault) is shown in Fig. 12.3. The first
half hour of simulation is the transient mode of the system. We can see
room temperature could stay in the desired value (set point) but other
temperatures change with noise profile due to their effort for adjusting the
room temperature with the set point. It is clear that wall temperature
is a function of outside temperature as it follows outdoor temperature
profile. But other internal temperatures follow an inverse profile of outdoor
temperature.
March 8, 2012 14:59 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 12
Figure 12.4 shows the control signal during simulation time. This flow
rate is controlled by a control valve in which the control valve signal
generated from the PI controller. Cooling water flow rate reaches its
maximum value at noon due to highest value of outlet heat and temperature
at this time. Some fluctuation in control signal and other variables at the
beginning time of the simulation is related to transient behaviors of the
system.
294 Davood Dehestani, Ying Guo, Sai Ho Ling, Steven W. Su and Hung T. Nguyen
shows the general profile of the fault during one day. The amplitude of the
faults gradually increases for six hours to reach their maximum values then
they stay in this state for four hours and finally return gradually to normal
condition during the last six hours.
This fault profile has been applied for each fault with some minor
changes. In the air supply fan fault this profile is used with gain of 10.
In damper fault it is used with gain of −2 and shift point of +4. For the
pipe fault it is used with gain −0.3 and shift point of +1. The most sensitive
parameters have been identified for each fault. For the air supply fan fault
the most sensitive parameter is the air supply pressure that changes between
0 to 10 Pascal during fault period. The mixed air ratio is selected as the
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 12
indicator of the damper fault that decreases from four to two. Cooling
water flow rate decreases from fcc to 0.7fcc in the cooling coil tube fault.
296 Davood Dehestani, Ying Guo, Sai Ho Ling, Steven W. Su and Hung T. Nguyen
The main advantages of SVM include the usage of kernel trick (no
need to know the nonlinear mapping function), the global optimal
solution (quadratic problem) and the generalization capability obtained
by optimizing the margin [14]. However, for very large data sets, standard
numeric techniques for QP become unfeasible. An online alternative, that
formulates the (exact) solution for ℓ+1 training data in terms of that for
ℓ data and one new data point, is presented in online incremental method.
Training an SVM incrementally on new data by discarding all previous
data except their support vectors, gives only approximate results [15].
Cauwenberghs [16] consider incremental learning as an exact online method
to construct the solution recursively, one point at a time. The key is
to retain the Kuhn–Tucker (KT) conditions on all previous data, while
adiabatically adding a new data point to the solution. Leave-one-out is
a standard procedure in predicting the generalization power of a trained
classifier, both from a theoretical and empirical perspective [17].
Giving n data, S = {xi , yi } and yi ∈ {+1, −1} where xi represents the
condition attributes, yi is the class label (correct label is +1 and faulty label
is -1) and i is the number of data for training. The decision hyperplane
of SVM can be defined as (w,b), where w is a weight vector and b a bias.
Let w0 and b0 denote the optimal values of the weight vector and bias.
Correspondingly, the optimal hyperplane can be written as:
w0 T + b0 = 0. (12.1)
To find the optimum values of w and b, it is necessary to solve the
following optimization problem:
∑
minw,b,ξ , 1/2wT w + C ξi , (12.2)
i
subject to
yi (wT φ(xi ) + b) ≥ 1 − ξi , (12.3)
where ξ is the slack variable, C is the user-specified penalty parameter of
the error term (C > 0) and φ is the kernel function. SVM can change
the original nonlinear separation problem into a linear separation case
by mapping input vector on to a higher feature space. On the feature
space, the two-class separation problem is reduced to find the optimal
hyperplane that linearly separates the two classes transformed into a
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 12
There are huge amounts of data generated before a fault happens as most
HVAC systems are rather reliable. For large data sets, standard SVM
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 12
298 Davood Dehestani, Ying Guo, Sai Ho Ling, Steven W. Su and Hung T. Nguyen
Fig. 12.8. Schematic of label generation algorithm for training system of each fault.
%! &
'()*+',&-..
/)0*+,&-..
$" '()*+',&- 1)+(2
/)0*+,&- 1)+(2
/)0*+&32(45.+567
$!
#"
#!
"
"&
! " #! #" $! $"
300 Davood Dehestani, Ying Guo, Sai Ho Ling, Steven W. Su and Hung T. Nguyen
$! %
$"
#!
#"
"
!
&'()*&+%,--)./0%,-.)%*'12'3(*43'
#" 5(4)*+%,--)./0%,-.)%*'12'3(*43'
5(4)*%23'6.,*.-/
1(30./
#! %
! " ! #" #! $" $!
for each fault. For each fault all labels should be combined together in
a proper logic to generate one label as one fault needs just one label for
training. A combination of labels can be used to generate the final label.
But for really complex HVAC systems, we recommend using a fuzzy logic
membership function and some rules to the generated final label. In this
chapter, a specific fault can happen if all errors of sensitive parameters can
be passed from their thresholds.
Since the SVM classifier presented in the last section can only be used to
deal with two-class cases, a multi-layer SVM framework has to be designed
for the FDI problem with various faulty conditions. In order to use online
SVM classification methods to achieve a better isolation performance, three
faulty models are used in the isolation section. A four-layer SVM classifier
is designed, in which the normal and three different HVAC fault conditions
are all taken into consideration. Furthermore, it should be pointed out that
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 12
#! %
#"
"
#"
&'()*+
',-.'
#! %
! " ! #" #! $" $!
other unknown faulty conditions can be placed in the upper layer of the FDI
system. The kernel function must be properly selected for SVM classifier
in order to achieve high classification accuracy. In general, linear function,
polynomial function, radial basis function (RBF), sigmoid function and
Gaussian function can be adopted as the kernel function. In this chapter,
Gaussian function is used as it has excellent performance in the simulation.
In this research, two tests are conducted systematically. The diagnosis
results and corresponding characteristics of the SVM classifiers are shown in
Figs 12.9, 12.10, 12.11. Test 1 is designed to investigate the SVM classifier
performance on known incipient faults. The steady-state data is used to
build the four-layer SVM classifier: as mentioned in previous sections, the
data within the threshold under the normal condition indicate fault free,
and the data beyond the threshold indicate faults one to three. For each
normal/faulty condition, two days data (20 hours data per day between 2
am and 10 pm) are used. Therefore, a total of 4 times 40 hour samples
are collected. Half of the data for each condition are used as the training
data, whilst the rest are used as the testing data for fault diagnosis. In
March 8, 2012 14:59 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 12
302 Davood Dehestani, Ying Guo, Sai Ho Ling, Steven W. Su and Hung T. Nguyen
12.10. Conclusion
This chapter focuses on the fault detection and isolation of HVAC systems
under real-time working conditions. An online SVM FDI classifier has been
developed which can be trained during the operating of the HVAC system.
Different to the offline method, the proposed approach can even detect new
unknown faults for the training of the classifier in real-time. Furthermore,
this online approach can more efficiently train the FDI modular by throwing
out unnecessary data (leave out vectors) and just using a series of data
with high priority regarding classification. Due to these properties, the
proposed algorithm can be implemented in a semi-unsupervised learning
framework. Simulation study indicates that the proposed approach can
efficiently detect and isolate typical HVAC faults. In the next step, we will
validate the proposed approach by using real experimental data.
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 12
References
304 Davood Dehestani, Ying Guo, Sai Ho Ling, Steven W. Su and Hung T. Nguyen
[15] N. A. Syed, H. Liu, and K. K. Sung, Incremental learning with support vector
machines, Proceeding of the International Joint Conference on Artificial
Intelligence (IJCAI-99). 11(7), 143–148, (1999).
[16] G. Cauwenberghs and T. Poggio, Incremental and decremental support
vector machine learning, Advances in Neural Information Processing
Systems. 13(13), 409–415, (2001).
[17] V. Vapnik, The Nature of Statistical Learning Theory. (Springer–Verlag,
New York, 1995).
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 12
Index
305
March 2, 2012 11:56 World Scientific Review Volume - 9in x 6in 01˙ws-rv9x6/Chp. 12
306 Index
Index 307