Learning With Fractional Orthogonal Kernel Classifiers in Support Vector Machines
Learning With Fractional Orthogonal Kernel Classifiers in Support Vector Machines
Learning With Fractional Orthogonal Kernel Classifiers in Support Vector Machines
Learning
with Fractional
Orthogonal Kernel
Classifiers in
Support Vector
Machines
Theory, Algorithms and Applications
Industrial and Applied Mathematics
Editors-in-Chief
G. D. Veerappa Gowda, TIFR Centre For Applicable Mathematics, Bengaluru,
Karnataka, India
S. Kesavan, Institute of Mathematical Sciences, Chennai, Tamil Nadu, India
Fahima Nekka, Universite de Montreal, Montréal, QC, Canada
Editorial Board
Akhtar A. Khan, Rochester Institute of Technology, Rochester, USA
Govindan Rangarajan, Indian Institute of Science, Bengaluru, India
K. Balachandran, Bharathiar University, Coimbatore, Tamil Nadu, India
K. R. Sreenivasan, NYU Tandon School of Engineering, Brooklyn, USA
Martin Brokate, Technical University, Munich, Germany
M. Zuhair Nashed, University of Central Florida, Orlando, USA
N. K. Gupta, Indian Institute of Technology Delhi, New Delhi, India
Noore Zahra, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
Pammy Manchanda, Guru Nanak Dev University, Amritsar, India
René Pierre Lozi, University Côte d’Azur, Nice, France
Zafer Aslan, İstanbul Aydın University, İstanbul, Turkey
The Industrial and Applied Mathematics series publishes high-quality research-
level monographs, lecture notes, textbooks, contributed volumes, focusing on areas
where mathematics is used in a fundamental way, such as industrial mathematics,
bio-mathematics, financial mathematics, applied statistics, operations research and
computer science.
Jamal Amani Rad · Kourosh Parand ·
Snehashish Chakraverty
Editors
Mathematics Subject Classification: 68T05, 65-04, 33C45, 65Lxx, 65Mxx, 65Nxx, 65Rxx
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
In recent years, machine learning has been applied in different areas of science
and engineering including computer science, medical science, cognitive science,
psychology, and so on. In these fields, machine learning non-specialists utilize it to
address their problems.
One of the most popular families of algorithms in machine learning is support
vector machine (SVM) algorithms. Traditionally, these algorithms are used for binary
classification problems. But recently, the SVM algorithms have been utilized in
various areas including numerical analysis, computer vision, and so on. Therefore,
the popularity of SVM algorithms has risen in recent years.
The main part of the SVM algorithms is the kernel function and the performance
of a given SVM is related to the power of the kernel function. Different kernels
provide different capabilities to the SVM algorithms; therefore, understanding the
properties of the kernel functions is a crucial part of utilizing the SVM algorithms.
Up until now, various kernel functions have been developed by researchers. One of
the most significant families of the kernel functions is the orthogonal kernel function
which has been attracting much attention. The computational power of this family
of kernel functions has been illustrated by the researchers in the last few years. But
despite the computational power of orthogonal kernel functions they have not been
used in real application problems. This issue has various reasons, some of which are
summarized in the following:
1. The mathematical complexity of orthogonal kernel functions formulation.
2. Lack of a simple and comprehensive resource for expressing the orthogonal
kernels and their properties.
3. Implementation difficulties of these kernels and lack of a convenient package
that implements these kernels.
For the purpose of solving the aforementioned issues, in this book, we are going
to present a simple and comprehensive tutorial on orthogonal kernel functions and a
Python package that is named ORSVM that contains the orthogonal kernel functions.
The reason we chose Python as the language of the ORSVM package is Python is
open source, very popular, easy to learn, and there are lots of tutorial for it:
v
vi Preface
1. Python has a lot of packages for manipulating data and they can be used besides
the ORSVM for solving a machine learning problem.
2. Python is a multi-platform language and can be launched on different operating
systems.
In addition to the developed orthogonal kernels, we aim to introduce some new
kernel functions which are called fractional orthogonal kernels. The name fractional
comes from the order x being a positive real number instead of being an integer.
In fact, the fractional orthogonal kernels are extensions of integer order orthogonal
functions. All introduced fractional orthogonal kernels in this book are implemented
in the ORSVM package and their performance is illustrated by testing on some real
datasets.
This book contains 12 chapters, including an appendix at the end of the book to
cover any programming preliminaries. Chapter 1 includes the fundamental concepts
of machine learning. In this chapter, we explain the definitions of pattern and simi-
larity and then a geometrical intuition of the SVM algorithm is presented. At the end
of this chapter, a historical review of the SVM and the current applications of SVM
are discussed.
In Chap. 2, we present the basics of SVM and least-square SVM (LS-SVM). The
mathematical background of SVM is presented in this chapter in detail. Moreover,
Mercer’s theorem and kernel trick are discussed too. In the last part of this chapter,
function approximation using the SVM is illustrated.
In Chap. 3, the discussion is about Chebyshev polynomials. At first, the properties
of Chebyshev polynomials and fractional Chebyshev functions are explained. After
that, a review of Chebyshev kernel functions is presented and the fractional Cheby-
shev kernel functions are introduced. In the final section of this chapter, the perfor-
mance of fractional Chebyshev kernels on real datasets is illustrated and compared
with other state-of-the-art kernels.
In Chap. 4, the Legendre polynomials are considered. In the beginning, the proper-
ties of the Legendre polynomials and fractional Legendre functions are explained. In
the next step after reviewing the Legendre kernel functions, the fractional Legendre
kernel functions are introduced. Finally, the performance of fractional Legendre
kernels is illustrated by applying them to real datasets.
Another orthogonal polynomial series is discussed in Chap. 5 (the Gegenbauer
polynomials). Similar to the previous chapters, this chapter includes properties of the
Gegenbauer polynomials, properties of the fractional Gegenbauer functions, a review
on Gegenbauer kernels, introducing fractional Gegenbauer kernels, and showing the
performance of fractional Gegenbauer kernels on real datasets.
In Chap. 6, we focus on Jacobi polynomials. This family of polynomials is the
general form of the previous polynomials which are presented in Chaps. 3–5. There-
fore, the relations between the polynomials and the kernels are discussed in this
chapter. In addition to the relations between the polynomials, other parts of this
chapter are similar to the three previous chapters.
Preface vii
ix
x Contents
xi
xii Editors and Contributors
Contributors
Hadi Veisi
Abstract In this chapter, a review of the machine learning (ML) and pattern recog-
nition concepts is given, and basic ML techniques (supervised, unsupervised, and
reinforcement learning) are described. Also, a brief history of ML development from
the primary works before the 1950s (including Bayesian theory) up to the most recent
approaches (including deep learning) is presented. Then, an introduction to the sup-
port vector machine (SVM) with a geometric interpretation is given, and its basic
concepts and formulations are described. A history of SVM progress (from Vap-
nik’s primary works in the 1960s up to now) is also reviewed. Finally, various ML
applications of SVM in several fields such as medical, text classification, and image
classification are presented.
H. Veisi (B)
Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 3
J. A. Rad et al. (eds.), Learning with Fractional Orthogonal Kernel Classifiers in Support
Vector Machines, Industrial and Applied Mathematics,
https://doi.org/10.1007/978-981-19-6553-1_1
4 H. Veisi
understand a speech lecture using speech recognition and natural language under-
standing, buy a stock of a company using algorithmic trading methods, and can drive
a car automatically in self-driving cars. The term machine learning was coined by
Arthur Samuel in 1959, an American pioneer in the field of computer gaming and
artificial intelligence that defines this term as “it gives computers the ability to learn
without being explicitly programmed” Samuel (2000). In 1997, Tom Mitchell, an
American computer scientist and a former Chair of the Machine Learning Depart-
ment at the Carnegie Mellon University (CMU) gave a mathematical and relational
definition that “A computer program is said to learn from experience E with respect
to some task T and some performance measure P, if its performance on T, as mea-
sured by P, improves with experience E” Mitchell (1997). So, if you want your ML
program to predict the growth of a stock (task T) to decide for buying it, you can
run a machine learning algorithm with data about past price patterns of this stock
(experience E, which is called training data) and, if it has successfully “learned”, it
will then do better at predicting future price (performance measure P). The primary
works in ML return to the 1950s and this field has received several improvements
during the last 70 years. There is a short history of ML:
• Before the 1950s: Several ML-related theories have been developed including
Bayesian theory Bayes (1991), Markov chain Gagniuc (2017), regression, and esti-
mation theories Fisher (1922). Also, Donald Hebb in 1949 Hebb (1949) presented
his model of brain neuron interactions which is the basis of McCulloch-Pitts’s
neural networks McCulloch and Pitts (1943).
• The 1950s: In this decade, ML pioneers have proposed the primary ideas and
algorithms for machine learning. The Turing test, originally called the imitation
game, was proposed by Alan Turing as a test of a machine’s ability to exhibit
intelligent behavior equivalent to, or indistinguishable from, that of a human.
Arthur Samuel of IBM developed a computer program for playing checkers Samuel
(1959). Frank Rosenblatt extended Hebb’s learning model of brain cell interaction
with Arthur Samuel’s ideas and created the perceptron Rosenblatt (1958).
• The 1960s: Bayesian methods are introduced for probabilistic inference
Solomonoff (1964). The primary idea of Support Vector Machines (SVMs) is
given by Vapnik and Lerner Vapnik (1963). Widrow and Hoff developed the delta
learning rules for neural networks which was the precursor of the backpropagation
algorithm Vapnik (1963). Sebestyen Sebestyen (1962) and Nilsson Nilsson (1965)
proposed the nearest neighbor idea. Donald Michie used reinforcement learning
to play Tic-tac-toe Michie (1963). The decision tree was introduced by Morgan
and Sonquist Morgan and Sonquist (1963).
• The 1970s: The quit years which is also known as AI Winter caused by pessimism
about machine learning effectiveness due to the limitation of the ML methods in
solving only linearly separable problems Minsky and Papert (1969).
• The 1980s: The birth of brilliant ideas resulted in renewed enthusiasm. Backprop-
agation publicized by Rumelhart et al. (1986) causes a resurgence in machine
learning. Hopfield popularizes his recurrent neural networks Hopfield (Hop-
field). Watkins develops Q-learning in reinforcement learning Watkins (1989).
1 Introduction to SVM 5
Machine learning techniques are classified into three following categories, depending
on the nature of the learning data or learning process:
1. Supervised learning: In this type of learning, there is a supervisor to teach
machines in learning a concept. This means that the algorithm learns from labeled
data (i.e., training data) which include example data and related target responses,
i.e., input and output pairs. If we assume the ML algorithm as a system (e.g., a
face identification), in the training phase of the system, we provide both input
sample (e.g., an image from a face) and the corresponding output (e.g., the ID of
the person to whom face belongs) to the system. The collection of labeled data
requires skilled human agents (e.g., a translator to translate a text from a language
to another) or a physical experiment (e.g., determining whether there is rock or
metal near to a sonar system of a submarine) that is costly and time-consuming.
The supervised learning methods can be divided into classification and regres-
sion. When the number of classes of the data is limited (i.e., the output label of the
data is a discrete variable) the learning is called classification (e.g., classifying an
email to spam and not-spam classes), and when the output label is a continuous
variable (e.g., the price of a stock index) the topic is called regression. Examples
of the most widely used supervised learning algorithms are SVM Boser et al.
(1992), Vapnik (1995), Cortes and Vapnik (1995), artificial neural networks (e.g.,
multi-layer perceptron Rumelhart et al. (1986), LSTM Hochreiter and Schmid-
huber (1997), Gers et al. (1999)), linear and logistic regression Cramer (2002),
Naïve Bayes Hand and Yu (2001), decision trees Morgan and Sonquist (1963),
and K-Nearest Neighbor (KNN) Sebestyen (1962), Nilsson (1965).
2. Unsupervised learning: In this case, there is not any supervision in the learning
and the ML algorithm works on the unlabeled data. It means that the algorithm
learns from plain examples without any associated response, leaving to the algo-
rithm to determine the data patterns based on the similarities in the data. This
type of algorithm tends to restructure the data and cluster them. From a sys-
tem viewpoint, this kind of learning receives sample data as the input (e.g., the
human faces) without the corresponding output and groups with similar samples
in the same clusters. The categorization of unlabeled data is commonly called
clustering. Also, association as another type of unsupervised learning refers to
methods which can discover rules that describe large portions of data, such as
people who buy product X also tend to buy the other product Y. Dimensionality
1 Introduction to SVM 7
Classification
Supervised
Clustering
Unsupervised Association
Dimensionality Reduction
Reinforcement
tion P(Class, Data) is learned and the prediction is performed according to this
distribution. On the other side, discriminative models do predictions by estimating
conditional probability P(Class|Data).
Examples of generative methods are deep generative models (DGMs) such as
Variational Autoencoder (VAE) and GANs, Naïve Bayes, Markov random fields,
and Hidden Markov Models (HMM). SVM is a discriminative method that learns
the decision boundary like some other methods such as logistic regression, traditional
neural networks such as multi-layer perceptron (MLP), KNN, and Conditional Ran-
dom Fields (CRFs).
In ML, we seek to design and build machines that can learn and recognize patterns,
as is also called pattern recognition. To do this, the data need to have regularity
or arrangement, called pattern, to be learned by ML algorithms. The data may be
created by humans such as stock price or a signature or have a natural nature such
as speech signals or DNS. Therefore, a pattern includes elements that are repeated
in a predictable manner. The patterns in natural data, e.g., speech signals, are often
chaotic and stochastic and do not exactly repeat. There are various types of natural
patterns which include spirals, meanders, waves, foams, tilings, cracks, and those
created by symmetries of rotation and reflection. Some types of patterns such as a
geometric pattern in an image can be directly observed while abstract patterns in
a huge amount of data or a language may become observable after analyzing the
data using pattern discovery methods. In both cases, the underlying mathematical
structure of a pattern can be analyzed by machine learning techniques which are
mainly empowered by mathematical tools. The techniques can learn the patterns to
predict or recognize them or can search them to find the regularities. Accordingly, if
a dataset suffers from any regularities and repeatable templates, the modeling result
of ML techniques will not be promising.
Support Vector Machine (SVM), also known as support vector network, is a super-
vised learning approach used for classification and regression. Given a set of training
labeled examples belonging to two classes, the SVM training algorithm builds a deci-
sion boundary between the samples of these classes. SVM does this in such a way
that optimally discriminates between two classes by maximizing the margin between
two data categories. For data samples in an N -dimensional space, SVM constructs an
N − 1-dimensional separating hyperplane to discriminate two classes. To describe
1 Introduction to SVM 9
(a) Samples of data in the vector space. (b) Several possible separating lines.
(a) Support vectors and optimum decision (b) Decision boundary equation and max-
boundary. imum margin.
These two equations can be integrated into an inequality, as in Eq. 1.2 by intro-
ducing a supplementary variable, yi which is equal to +1 for positive samples and
is equal to −1 for negative samples. This inequality is considered as equality, i.e.,
yi (w.xi + w0 ) − 1 = 0, to define the main constraint of the problem which means
the examples lying on the margins (i.e., support vectors) to be constrained to 0. This
equation is equivalence to a line that is the answer to our problem. This decision
boundary line in this example becomes a hyperplane in the general N -dimensional
case.
yi (w.xi + w0 ) − 1 ≥ 0. (1.2)
To find the maximum margin that separating positive and negative examples,
we need to know the width of the margin. To calculate the width of the margin,
(x+ − x− ) need to be projected onto unit normalized w w
. Therefore, the width is
computed as (x+ − x− ). w in which by using yi (w.xi + w0 ) − 1 = 0 to substituting
w
w.x+ = 1 − w0 and w.x− = 1 + w0 in that, the final value for width is obtained as
2w (see Fig. 1.3). Finally, maximizing this margin is equivalent to Eq. 1.3 which
is a quadratic function:
1
min w2 ,
w,w0 2
yi (w.xi + w0 ) − 1 ≥ 0. (1.3)
Fig. 1.4 Transforming data from a nonlinear space into a linear higher dimensional space
equation in which the problem depends only on dot products of pairs of data samples.
Also, αi = 0 for the training examples that are not support vectors that means these
examples do not affect the decision boundary. Another interesting fact about this
optimization problem is that it is a convex problem and it is guaranteed to always
find a global optimum.
1
L(w, w0 ) = w2 + [yi (w.xi + w0 ) − 1]αi . (1.4)
2 i
The above-described classification problem and its solution using the SVM
assumes the data is linearly separable. However, in most real-life applications, this
assumption is not correct and most problems are not classified simply using a linear
boundary. The SVM decision boundary is originally linear Vapnik (1963) but has
been extended to handle nonlinear cases as well Boser et al. (1992). To do this, SVM
proposes a method called kernel trick in which an input vector is transformed using a
nonlinear function like φ(.) into a higher dimensional space. Then, in this new space,
the maximum-margin linear boundary is found. It means that a nonlinear problem
is converted into a linearly separable problem in the new higher dimensional space
without affecting the convexity of the problem. A simple example of this technique
is given in Fig. 1.4 in which one-dimensional data samples, xi , are transformed into
two-dimensional space using (xi , xi × xi ) transform. In this case, the dot product of
two samples, i.e., xi .x j , in the optimization problem is replaced with φ(xi ).φ(x j ). In
practice, if we have a function like such that K (xi , x j ) = φ(xi ).φ(x j ), then we do
not need to know the transformation φ(.) and only function K (.) (which is called the
kernel function) is required. Some common kernel functions are linear, polynomial,
sigmoid, and radial basis functions (RBF).
Although the kernel trick is a clever method to handle the nonlinearity, the SVM
still assumes that the data is linearly separable in this transformed space. This assump-
tion is not true in most real-world applications. Therefore, another type of SVM is
proposed which is called soft-margin SVM Cortes and Vapnik (1995). The described
SVM method up to now is known as hard-margin SVM. As given, hard-margin SVMs
assume the data is linearly separable without any errors, whereas soft-margin SVMs
allow some misclassification and results in a more robust decision in nonlinearly
12 H. Veisi
separable data. Today, soft-margin SVMs are the most common SVM techniques in
ML which utilize so-called slack variables in the optimization problem to control the
amount of misclassification.
Vladimir Vapnik, the Russian statistician, is the main originator of the SVM tech-
nique. The primary work on the SVM algorithm was proposed by Vapnik (1963) as
the Generalized Portrait algorithm for pattern recognition. However, it was not the
first algorithm for pattern recognition, and Fisher in 1936 had proposed a method
for this purpose Fisher (1936). Also, Frank Rosenblatt had been proposed the per-
ceptron linear classifier which was an early feedforward neural network Rosenblatt
(1958). One year after the primary work of Vapnik and Lerner, in 1964, Vapnik fur-
ther developed the Generalized Portrait algorithm Vapnik and Chervonenkis (1964).
In this year, a geometrical interpretation of the kernels was introduced in Aizerman
et al. (1964) as inner products in a feature space. The kernel theory which is the
main concept in the development of SVM and is called “kernel trick” was previously
proposed in Aronszajn (1950). In 1965, a large margin hyperplane in the input space
was introduced in Cover (1965) which is another key idea of the SVM algorithm.
At the same time, a similar optimization concept was used in pattern recognition
by Mangasarian (1965). Another important research that defines the basic idea of
the soft-margin concept in SVM was introduced by Smith (1968). This idea was
presented as the use of slack variables to overcome the problem of noisy samples
that are not linearly separable. In the history of SVM development, the breakthrough
work is the formulation of statistical learning framework or VC theory proposed
by Vapnik and Chervonenkis (1974) which presents one of the most robust predic-
tion methods. It is not surprising to say that the rising of SVM was in this decade
and this reference has been translated from Russian into other languages such as
German Vapnik and Chervonenkis (1979) and English Vapnik (1982). The use of
polynomial kernel in SVM was proposed by Poggio (1975) and the improvement
of kernel techniques for regression was presented by Wahba (1990). Studying the
connection between neural networks and kernel regression was done by Poggio and
Girosi (1990). The improvement of the previous work on slack variables in Smith
(1968) was done by Bennett and Mangasarian (1992). Another main milestone in the
development of SVM is in 1992 in which SVM has been presented in its today’s form
by Boser et al. (1992). In this work, the optimal margin classifier of linear classifiers
(from Vapnik (1963)) was extended to nonlinear cases by utilizing the kernel trick
to maximum-margin hyperplanes Aizerman et al. (1964). In 1995, soft margin of
SVM classifiers to handle noisy and not linearly separable data was introduced using
slack variables by Cortes and Vapnik (1995). In 1996, the algorithm was extended to
the case of regression Drucker et al. (1996) which is called Support Vector Regres-
sion (SVR). The rapid growth of SVM and using this technique in various applica-
tions has been increased after 1995. Also, the theoretical aspects of SVM have been
1 Introduction to SVM 13
studied and it has been extended in other domains than the classification. The statis-
tical bounds on the generalization of hard margin were given by Bartlett (1998) and
it was presented for soft margin and the regression case in 2000 by Shawe-Taylor
and Cristianini (2000). The SVM was originally developed for supervised learning
which has been extended to the unsupervised case in Ben-Hur et al. (2001) called
support vector clustering. Another improvement of SVM was its extension from the
binary classification into multiclass SVM by Duan and Keerthi in Duan and Keerthi
(2005) by distinguishing between one of the labels and the rest (one-versus-all) or
between every pair of classes (one-versus-one). In 2011, SVM was analyzed as a
graphical model and it was shown that it admits a Bayesian interpretation using data
augmentation technique Polson and Scott (2011). Accordingly, a scalable version of
the Bayesian SVM was developed in Wenzel et al. (2017) enabling the application
of Bayesian SVMs in big data applications. A summary of the related researches to
SVM development is given in Table 1.1.
Today, machine learning algorithms are on the rise and are widely used in real
applications. Every year new techniques are proposed that overcome the current
leading algorithms. Some of them are only little advances or combinations of existing
algorithms and others are newly created and lead to astonishing progress. Although
deep learning techniques are dominant in many real applications such as image
processing (e.g., for image classification) and sequential data modeling (e.g., in
natural language processing tasks such as machine translation), these techniques
require a huge amount of training data for success modeling. Large-scale labeled
datasets are not available in many applications in which other ML techniques (called
classical ML methods) such as SVM, decision tree, and Bayesian family methods
have higher performance than deep learning techniques. Furthermore, there is another
fact in ML. Each task in ML applications can be solved using various methods;
however, there is no single algorithm that will work well for all tasks. This fact is
known as the No Free Lunch Theorem in ML Wolpert (1996). Each task that we
want to solve has its idiosyncrasies and there are various ML algorithms to suit the
problem. Among the non-deep learning methods, SVM, as a well-known machine
learning technique, is widely used in various classification and regression tasks today
due to its high performance and reliability across a wide variety of problem domains
and datasets Cervantes et al. (2020). Generally, SVM can be applied to any ML task
in any application such as computer vision and image processing, natural language
processing (NLP), medical applications, biometrics, and cognitive science. In the
following, some common applications of SVM are reviewed.
1 Introduction to SVM 15
References
Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Aggarwal, C.C., Zhai, C.
(eds.) Mining Text Data, 163–222. Springer, Boston (2012)
Aizerman, M.A., Braverman, E.M., Rozonoer, L.I.: Theoretical foundations of the potential function
method in pattern recognition learning. Autom. Remote. 25, 821–837 (1964)
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337–404 (1950)
Assael, Y.M., Shillingford, B., Whiteson, S., De Freitas, N.: Lipnet: End-to-end Sentence-Level
Lipreading (2016). arXiv:1611.01599
Azar, A.T., El-Said, S.A.: Performance analysis of support vector machines classifiers in breast
cancer mammography recognition. Neural. Comput. Appl. 24, 1163–1177 (2014)
Bartlett, P.L.: The sample complexity of pattern classification with neural networks: the size of the
weights is more important than the size of the network. IEEE Trans. Inf. Theory 44, 525–536
(1998)
Bayes, T.: An essay towards solving a problem in the doctrine of chances. MD Comput. 8, 157
(1991)
Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V.: Support vector clustering. J. Mach. Learn.
Res. 2, 125–137 (2001)
Bennett, K.P., Mangasarian, O.L.: Robust linear programming discrimination of two linearly insep-
arable sets. Optim. Methods. Softw. 1, 23–34 (1992)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Singapore (2006)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In:
COLT92: 5th Annual Workshop Computers Learning Theory, PA (1992)
Byvatov, E., Schneider, G.: Support vector machine applications in bioinformatics. Appl. Bioinf.
2, 67–77 (2003)
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., Lopez, A.: A comprehensive survey on
support vector machine classification: applications, challenges and trends. Neurocomputing 408,
189–215 (2020)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Cover, T.M.: Geometrical and statistical properties of systems of linear inequalities with applications
in pattern recognition. IEEE Trans. Elect. Comput. 3, 326–334 (1965)
Cramer, J.S.: The origins of logistic regression (Technical report). Tinbergen Inst. 167–178 (2002)
Drucker, H., Burges, C.J., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines.
Adv. Neural Inf. Process Syst. 9, 155–161 (1996)
Duan, K. B., Keerthi, S. S.: Which is the best multiclass SVM method? An empirical study. In:
International Workshop on Multiple Classifier Systems, pp. 278–285. Springer, Heidelberg (2005)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Ferrucci, D., Levas, A., Bagchi, S., Gondek, D., Mueller, E.T.: Watson: beyond jeopardy! Artif.
Intell. 199, 93–105 (2013)
Fisher, R.A.: The goodness of fit of regression formulae, and the distribution of regression coeffi-
cients. J. R. Stat. Soc. 85, 597–612 (1922)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188
(1936)
Fukushima, K.: Neocognitron: a hierarchical neural network capable of visual pattern recognition.
Neural Netw. 1, 119–130 (1988)
Gagniuc, P.A.: Markov Chains: from Theory to Implementation and Experimentation. Wiley, Hobo-
ken, NJ (2017)
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. In:
International Conference on Artificial Neural Networks, Edinburgh (1999)
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning. MIT Press, England (2016)
Hand, D.J., Yu, K.: Idiot’s Bayes-not so stupid after all? Int. Stat. Rev. 69, 385–398 (2001)
Hebb, D.: The Organization of Behavior. Wiley, New York (1949)
Hinton, G.E.: Analyzing Cooperative Computation. 5th COGSCI, Rochester (1983)
1 Introduction to SVM 17
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput.
18, 1527–1554 (2006)
Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference. on Document
Analysis and Recognition, Montreal (1995)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Hopfield, J.J.: Neurons with graded response have collective computational properties like those of
two-state neurons. Proc. Natl. Acad. Sci. 81, 3088–3092 (1984)
Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML
99: Proceedings of the Sixteenth International Conference on Machine Learning, 200–209 (1999)
Khan, A., Sohail, A., Zahoora, U., Qureshi, A.S.: A survey of the recent architectures of deep
convolutional neural networks. Artif. Intell. Rev. 53, 5455–5516 (2020)
Li, B., He, J., Huang, J., Shi, Y.Q.: A survey on image steganography and steganalysis. J. Inf. Hiding
Multimedia Signal Process. 2, 142–172 (2011)
Li, S., Zhou, W., Yuan, Q., Geng, S., Cai, D.: Feature extraction and recognition of ictal EEG using
EMD and SVM. Comput. Biol. Med. 43, 807–816 (2013)
Li, Y., Li, J., Pan, J.S.: Hyperspectral image recognition using SVM combined deep learning. J.
Internet Technol. 20, 851–859 (2019)
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network
architectures and their applications. Neurocomputing 234, 11–26 (2017)
Mangasarian, O.L.: Linear and nonlinear separation of patterns by linear programming. Oper. Res.
13, 444–452 (1965)
McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull.
Math. Biol. 5, 115–133 (1943)
Melgani, F., Bazi, Y.: Classification of electrocardiogram signals with support vector machines and
particle swarm optimization. IEEE Trans. Inf. Technol. Biomed. 12, 667–677 (2008)
Michie, D.: Experiments on the mechanization of game-learning Part I. Characterization of the
model and its parameters. Comput. J. 6, 232–236 (1963)
Minsky, M., Papert, S.A.: Perceptrons: An Introduction to Computational Geometry. MIT Press,
England (1969)
Miranda, E., Aryuni, M., Irwansyah, E.: A survey of medical image classification techniques. In:
International Conference on Information Management and Technology, pp. 56–61 (2016)
Mitchell, T.M.: Machine Learning. McGraw-Hill Higher Education, New York (1997)
Morgan, J.N., Sonquist, J.A.: Problems in the analysis of survey data, and a proposal. J. Am. Stat.
Assoc. 58, 415–434 (1963)
Müller, K.R., Smola, A.J., Rätsch, G., Schölkopf, B., Kohlmorgen, J., Vapnik, V.: Predicting time
series with support vector machines. In: International Conference on Artificial Neural Networks,
pp. 999–1004. Springer, Heidelberg (1997)
Nilsson, N.J.: Learning Machines. McGraw-Hill, New York (1965)
Pan, Z., Yu, W., Yi, X., Khan, A., Yuan, F., Zheng, Y.: Recent progress on generative adversarial
networks (GANs): a survey. IEEE Access 7, 36322–36333 (2019)
Pavlidis, P., Weston, J., Cai, J., Grundy, W N.: Gene functional classification from heterogeneous
data. In: Proceedings of the Fifth Annual International Conference on Computational Biology,
pp. 249–255 (2001)
Poggio, T.: On optimal nonlinear associative recall. Biol. Cybern. 19, 201–209 (1975)
Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78, 1481–1497 (1990)
Polson, N.G., Scott, S.L.: Data augmentation for support vector machines. Bayesian Anal. 6, 1–23
(2011)
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in
the brain. Psychol. Rev. 65, 386–408 (1958)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors.
Nature 323, 533–536 (1986)
Samuel, A.L.: Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 3,
210–229 (1959)
18 H. Veisi
Samuel, A.L.: Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 44,
206–226 (2000)
Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5, 197–227 (1990)
Sebestyen, G.S.: Decision-Making Processes in Pattern Recognition. Macmillan, New York (1962)
Shawe-Taylor, J., Cristianini, N.: Margin distribution and soft margin. In: Smola, A.J., Bartlett, P.,
Scholkopf, B., Schuurmans, D., (eds.), Advances in Large Margin Classifiers, pp. 349–358. MIT
Press, England (2000)
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker,
L., Lai, M., Bolton, A., Chen, Y.: Mastering the game of go without human knowledge. Nature
550, 354–359 (2017)
Smith, F.W.: Pattern classifier design by linear programming. IEEE Trans. Comput. 100, 367–372
(1968)
Solomonoff, R.J.: A formal theory of inductive inference. Part II. Inf. Control. 7, 224–254 (1964)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, USA (2018)
Tuia, D., Volpi, M., Copa, L., Kanevski, M., Munoz-Mari, J.: A survey of active learning algorithms
for supervised remote sensing image classification. IEEE J. Sel. Topics Signal Process. 5, 606–617
(2011)
Vapnik, V.: Pattern recognition using generalized portrait method. Autom. Remote. Control. 24,
774–780 (1963)
Vapnik, V.N.: Estimation of Dependencies Based on Empirical Data. Springer, New York (1982)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Vapnik, V.N., Chervonenkis, A.Y.: On a class of perceptrons. Autom. Remote. 25, 103–109 (1964)
Vapnik, V., Chervonenkis, A.: Theory of pattern recognition: statistical problems of learning (Rus-
sian). Nauka, Moscow (1974)
Vapnik, V., Chervonenkis, A.: Theory of Pattern Recognition (German). Akademie, Berlin (1979)
Wahba, G.: Spline Models for Observational Data. SIAM, PA (1990)
Warwick, K.: A Brief History of Deep Blue. IBM’s Chess Computer, Mental Floss (2017)
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge, England
(1989)
Wenzel, F., Galy-Fajou, T., Deutsch, M., Kloft, M.: Bayesian nonlinear support vector machines for
big data. In: European Conference on Machine Learning and Knowledge Discovery in Databases,
pp. 307–322. Springer, Cham (2017)
Widrow, B., Hoff, M.E.: Adaptive Switching Circuits (No. TR-1553-1). Stanford University Cali-
fornia Stanford Electronics Labs (1960)
Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8,
1341–1390 (1996)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. Learn. Syst. 16,
645–678 (2005)
Chapter 2
Basics of SVM Method and Least
Squares SVM
Abstract The learning process of support vector machine algorithms leads to solv-
ing a convex quadratic programming problem. Since this optimization problem has
a unique solution and also satisfies the Karush–Kuhn–Tucker conditions, it can be
solved very efficiently. In this chapter, the formulation of optimization problems
which have arisen in the various forms of support vector machine algorithms is
discussed.
As mentioned in the previous chapter, the linear support vector machine method
for categorizing separable data was introduced by Vapnik and Chervonenkis (1964).
This method finds the best discriminator hyperplane, which
separates samples of
two classes among a training dataset. Consider D = (xi , yi )| i = 1 . . . N , xi ∈
Rd and yi ∈ {−1, +1} as a set of training data, where the samples in classes C1 and
C2 have +1 and −1 labels, respectively.
In this section, the SVM method is explained for two cases. The first case occurs
when the training samples are linearly separable and the goal is to find a linear
separator by the hard margin SVM method. The second is about the training samples
K. Parand (B)
Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti
University, Tehran, Iran
e-mail: [email protected]
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
F. Baharifard
School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
A. A. Aghaei · M. Jani
Department of Computer and Data Science, Faculty of Mathematical Sciences,
Shahid Beheshti University, Tehran, Iran
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 19
J. A. Rad et al. (eds.), Learning with Fractional Orthogonal Kernel Classifiers in Support
Vector Machines, Industrial and Applied Mathematics,
https://doi.org/10.1007/978-981-19-6553-1_2
20 K. Parand et al.
which are not linearly separable (e.g., due to noise) and so have to use the soft margin
SVM method.
Assume that the input data is linearly separable. Figure 2.1 shows an example of this
data as well as indicates that the separator hyperplane is not unique. The aim of the
SVM method is to find a unique hyperplane that has a maximum distance to the
closest points of both classes. The equation of this hyperplane can be considered as
follows:
d
w, x + w0 = w T x + w0 = wi xi + w0 = 0, (2.1)
i=1
x1
2 Basics of SVM Method and Least Squares SVM 21
wT x + w0 = 0
x2
2D
||w||
wT x + w0 = D
D
||w||
wT x + w0 = −D
x1
Fig. 2.2 Hard margin SVM method: a unique hyperplane with the maximum margin
So, the following optimization problem is obtained to find the separator hyper-
plane, which is depicted in Fig. 2.2 for two-dimensional space
2D
max (2.3)
D,w,w0 w
s.t. (w T xi + w0 ) ≥ D, ∀xi ∈ C1 ,
(w T xi + w0 ) ≤ −D, ∀xi ∈ C2 .
One can set w = wD and w0 = wD0 and combine the above two inequalities as an
inequality yi (w T xi + w0 ) ≥ 1 to have
2
max (2.4)
w ,w0 w
T
s.t. yi (w xi + w0 ) ≥ 1, i = 1, . . . , N .
1
min w T w (2.5)
w,w0 2
s.t. yi (w T xi + w0 ) ≥ 1, i = 1, . . . , N .
1
min xT Qx + c T x (2.6)
x 2
s.t. Ax ≤ b,
Ex = d.
According to the above definition, the problem Eq. 2.5 can be considered as a
convex QP problem (Q be an identity matrix, vector c equals to zero and constraints
are reformulated to become Ax ≤ b form). If the problem has a feasible solution, it
is a global minimum and can be obtained by an efficient method.
On the other hand, instead of solving the primal form of the problem, the dual
form of it can be solved. The dual problem is often easier and helps us to have
better intuition about the optimal hyperplane. More importantly, it enables us to take
advantage of the kernel trick which will be explained later.
A common method for solving constrained optimization problems is using the
technique of Lagrangian multipliers. In the Lagrangian multipliers method, a new
function, namely, the Lagrangian function is formed from the objective function and
constraints, and the goal is to obtain the stationary point of this function. Consider
the minimization problem as follows:
m
p
L(x, α, λ) = f (x) + αi gi (x) + λi h i (x), (2.8)
i=1 i=1
The dual form of Eq. 2.10 will be obtained by swapping the order of max
and min
d ∗ = max min L(x, α, λ). (2.11)
αi ≥0,λi x
The weak duality is always held and so d ∗ ≤ p ∗ . In addition, because the primal
problem is convex, the strong duality is also established and d ∗ = p ∗ . So, the primal
optimal objective and the dual optimal objective are equal and instead of solving the
primal problem, the dual problem can be solved.
Here, the dual formulation of the problem Eq. 2.5 is discussed. Combining the
objective function and the constraints gives us
1 N
L(w, w0 , α) = w2 + αi (1 − yi (w T xi + w0 )), (2.12)
2 i=1
1 N
min max w2 + αi (1 − yi (w T xi + w0 )) . (2.13)
w,w0 αi ≥0 2 i=1
Corresponding to the strong duality, the following dual optimization problem can
be examined to find the optimal solution of the above problem
1 N
max min w2 + αi (1 − yi (w T xi + w0 )) . (2.14)
αi ≥0 w,w0 2 i=1
In these equations, w0 has been removed and a global constraint has been set for
α. By substituting w from the above equation in the Lagrangian function of Eq. 2.12,
we have
N
1
N N
L(α) = αi − αi α j yi y j xiT x j . (2.16)
i=1
2 i=1 j=1
24 K. Parand et al.
N
1
N N
max αi − αi α j yi y j xiT x j (2.17)
α 2
i=1 i=1 j=1
N
s.t. αi yi = 0,
i=1
αi ≥ 0, i = 1, . . . , N .
The equivalent of the problem Eq. 2.17 can be considered as follows, which is a
quadratic programming due to Eq. 2.6
⎡ ⎤
y1 y1 x1T x1 . . . y1 y N x1T x N
1 ⎢ .. .. .. ⎥
min α T ⎣ . . . ⎦ α + (−1) α
T
(2.18)
α 2
y N y1 xTN x1 . . . y N y N xTN x N
s.t. − α ≤ 0,
yT α = 0.
The first condition indicates the feasibility of the solution and states that the
conditions at the optimal point should not be violated. The second condition ensures
that there is no direction that can both improve the objective function and be feasible.
The third condition is related to the complementary slackness, which together with the
fourth condition indicates that the Lagrangian multipliers of the inequality constraints
are zero and the Lagrangian multipliers of the equality constraints are positive. In
other words, for active constraint yi (w T xi + w0 ) = 1, αi can be greater than zero
and the corresponding xi is defined as a support vector. But for inactive constraints
yi (w T xi + w0 ) > 1, αi = 0 and xi is not a support vector (see Fig. 2.3).
2 Basics of SVM Method and Least Squares SVM 25
wT x + w0 = 0
x2
wT x + w0 = 1
wT x + w0 = −1
, : Support Vectors (α > 0)
x1
Fig. 2.3 The support vectors in the hard margin SVM method
If we define a set of support vectors as SV = {xi |αi > 0} and consequently define
S = {i|xi ∈ SV }, the direction of the hyperplane which is related to w can be found
as follows:
w= αs ys xs . (2.20)
s∈S
Moreover, any sample whose Lagrangian multiplier is greater than zero is on the
margin and can be used to compute w0 . Using the sample xs j that s j ∈ S for which
the equality ys j (w T xs j + w0 ) = 1 is established, we have
w0 = ys j − w T xs j . (2.21)
By assuming that the linear classifier is y = sign(w0 + w T x), new samples can
be classified using only support vectors of the problem. Consider a new sample x,
thus the label of this sample ( ŷ) can be computed as follows:
ŷ = sign(w0 + w T x) (2.22)
= sign ys j − ( αs ys xs )T xs j + ( αs ys xs )T x
s∈S s∈S
= sign ys j − αs ys xsT xs j + αs ys xsT x .
s∈S s∈S
26 K. Parand et al.
Soft margin SVM is a method for obtaining a linear classifier for some training sam-
ples which are not actually linearly separable. The overlapping classes or separable
classes that include some noise data are examples of these problems. In these cases,
the hard margin SVM does not work well or even does not get the right answer.
One solution for these problems is trying to minimize the number of misclassified
points as well as maximize the number of samples that are categorized correctly.
But this counting solution falls into the category of NP-complete problems Cortes
and Vapnik (1995). So, an efficient solution is defining a continuous problem that
is solvable deterministically in a polynomial time by some optimization techniques.
The extension of the hard margin method, called the soft margin SVM method, was
introduced by Cortes and Vapnik (1995) for this purpose.
In the soft margin method, the samples are allowed to violate the conditions,
while the total amount of these violations should not be increased. In fact, the soft
margin method tries to maximize the margin while minimizing the total violations.
Therefore, for each sample xi , a slack variable ξi ≥ 0, which indicates the amount of
violation of it from the correct margin, should be defined and the inequality of this
sample should be relaxed to
yi (w T xi + w0 ) ≥ 1 − ξi . (2.23)
N
Moreover, the total violation is i=1 ξi , which should be minimized. So, the
primal optimization problem to obtain the separator hyperplane becomes
1 T N
min w w+C ξi (2.24)
w,w0 ,{ξi }i=1
N 2 i=1
s.t. yi (w T xi + w0 ) ≥ 1 − ξi , i = 1, . . . , N ,
ξi ≥ 0,
wT x + w0 = 0
x2
ξ>1
0<ξ<1
ξ>1
wT x + w0 = 1
wT x + w0 = −1
, : Margin Support Vectors x1
(α > 0, ξ = 0)
Fig. 2.4 Hard margin SVM method: types of support vectors and their corresponding ξ values
1
N
N
N
L(w, w0 , ξ, α, β) = w2 + C ξi + αi (1 − ξi − yi w T xi + w0 ) − βi ξi ,
2
i=1 i=1 i=1
(2.25)
where αi ’s and βi ’s are the Lagrangian multipliers. The Lagrange formulation should
be minimized with respect to w, w0 , and ξi ’s while be maximized with respect to
positive Lagrangian multipliers αi ’s and βi ’s. To find minw,w0 ,ξ L(w, w0 , ξ, α, β),
we have
⎧ N
⎪
⎨∇w L(w, w0 , ξ, α, β) = 0 → w = i=1 αi yi xi ,
∂L(w,w0 ,ξ,α,β) N
∂w0
=0 → i=1 αi yi = 0, (2.26)
⎪
⎩ ∂L(w,w0 ,ξ,α,β)
∂ξi
=0 → C − αi − βi = 0.
By substituting w from the above equation in the Lagrangian function of Eq. 2.25,
the same equation as Eq. 2.16 is achieved. But, here two constraints on α are created.
One is the same as before and the other is 0 ≤ αi ≤ C. It should be noted that βi does
not appear in L(α) of Eq. 2.16 and just need to consider that βi ≥ 0. Therefore, the
condition C − αi − βi = 0 can be replaced by condition 0 ≤ αi ≤ C. So, the dual
form of problem Eq. 2.24 becomes as follows:
N
1
N N
max αi − αi α j yi y j xiT x j (2.27)
α 2 i=1 j=1
i=1
N
s.t. αi yi = 0,
i=1
0 ≤ αi ≤ C, i = 1, . . . , N .
28 K. Parand et al.
After solving the above QP problem, w and w0 can be obtained based on Eqs. (2.20)
and (2.21), respectively, while still defined S = {i|xi ∈ SV }. As mentioned before,
the set of support
vectors (SV ) is obtained
based on the complementary slackness
criterion of αi∗ 1 − yi (w∗ T xi + w0∗ ) = 0 for sample xi . Another complementary
slackness condition of Eq. 2.24 is βi∗ ξi = 0. According to it, the support vectors are
divided into two categories, the margin support vectors and the non-margin support
vectors:
Also, the classification formula of a new sample in the soft margin SVM is the same
as hard margin SVM which is discussed in Eq. 2.22.
Some problems have nonlinear decision surfaces and therefore linear methods do not
provide a suitable answer for them. The nonlinear SVM classifier was introduced by
Vapnik Vapnik (2000) for these problems. In this method, the input data is mapped
to a new feature space by a nonlinear function and a hyperplane is found in the
transformed feature space. By applying reverse mapping, the hyperplane becomes a
curve or a surface in the input data space (see Fig. 2.5).
Applying a transformation φ : Rd → Rk on the input space, sample x = [x1 , . . .
xd ] can be represented by φ(x) = [φ1 (x), . . . , φk (x)] in the transformed space, where
φi (x) : Rd → R. The primal problem of soft margin SVM in the transformed space
is as follows:
x2 φ(x2 )
φ : x → φ(x)
wT φ(x) + w0 = 0
x1 φ(x1 )
Fig. 2.5 Mapping input data to a high-dimensional feature space to have a linear separator hyper-
plane in the transformed space
2 Basics of SVM Method and Least Squares SVM 29
1 T N
min w w+C ξi (2.28)
w,w0 ,{ξi }i=1
N 2 i=1
s.t. yi (w T φ(xi ) + w0 ) ≥ 1 − ξi , i = 1, . . . , N ,
ξi ≥ 0,
N
1
N N
max αi − αi α j yi y j φ(xi )T φ(x j ) (2.29)
α 2 i=1 j=1
i=1
N
s.t. αi yi = 0,
i=1
0 ≤ αi ≤ C, i = 1, . . . , N .
As can be seen, the expansion of this kernel function equals to the inner product
of the following second-order φ’s:
√√ √
φ(x) = [1, 2x1 , 2x2 , x12 , x22 , 2x1 x2 ], (2.32)
√ √ √
φ(t) = [1, 2t1 , 2t2 , t12 , t22 , 2t1 t2 ].
So, we can substitute the dot product φ(x)T φ(t) with the kernel function K (x, t) =
(1 + xT t)2 without directly calculating φ(x) and φ(t).
The polynomial kernel function can similarly be generalized to d-dimensional
feature space x = [x1 , . . . , xd ] where φ’s are polynomials of order M as follows:
√ √ √
φ(x) = [1, 2x1 , . . . , 2xd , x12 , . . . , xd2 , 2x1 x2 , . . . ,
√ √ √
2x1 xd , 2x2 x3 , . . . , 2xd−1 xd ]T .
In many cases, the inner product in the embedding space can be computed effi-
ciently by defining a kernel function. Some common kernel functions are listed
below:
• K (x, t) = xT t,
• K (x, t) = (xT t + 1) M ,
K (x, t) = exp(− x−t
2
• γ
),
• K (x, t) = tanh(ax t + b).
T
These functions are known as linear, polynomial, Gaussian, and sigmoid kernels,
respectively Cheng et al. (2017).
A valid kernel corresponds to an inner product in some feature space. A necessary
and sufficient condition to check the validity of a kernel function is the Mercer
condition Mercer (1909). This condition states that any symmetric positive definite
matrix can be regarded as a kernel matrix. By restricting a kernel function to a set
of points {x1 , . . . x N }, the corresponding kernel matrix K N ×N is a matrix that the
element in row i and column j is K (xi , x j ) as follows:
⎡ ⎤
K (x1 , x1 ) . . . K (x1 , x N )
⎢ .. .. .. ⎥
K =⎣ . . . ⎦. (2.34)
K (x N , x1 ) . . . K (x N , x N )
2 Basics of SVM Method and Least Squares SVM 31
Therefore, using a valid and suitable kernel function, the optimization problem
Eq. 2.29 becomes the following problem:
N
1
N N
max αi − αi α j yi y j K (xi , x j ) (2.36)
α 2 i=1 j=1
i=1
N
s.t. αi yi = 0,
i=1
0 ≤ αi ≤ C, i = 1, . . . , N ,
which is still a convex QP problem and is solved to find αi ’s. Moreover, for classifying
new data, the similarity of the input sample x is compared with all training data
corresponding to the support vectors by the following formula:
ŷ = sign w0 + αi yi K (xi , x) , (2.37)
αi >0
where
w0 = ys j − αi yi K (xi , xs j ), (2.38)
αi >0
32 K. Parand et al.
Theorem 2.2 The product of some valid kernel functions is also a valid kernel
function [Zanaty and Afifi (2011)].
The first implication of Theorems 2.1 and 2.2 is if K be a valid Mercer kernel
function, then any polynomial function with positive coefficients of K makes also
a valid Mercer kernel function. On the other hand, the exp(K ) also makes a valid
Mercer kernel function (i.e., consider Maclaurin expansion of exp() for the proof)
Genton (2001), Shawe-Taylor and Cristianini (2004).
The support vector machine model for classification tasks can be modified for the
regression problems. This can be achieved by applying the -insensitive loss function
to the model Drucker et al. (1997). This loss function is defined as
0 |y − ŷ| ≤ ,
loss(y, ŷ; ) = |y − ŷ| =
|y − ŷ| − otherwise,
where x ∈ Rd and y ∈ R. The formulation of the primal form of the support vector
regression model is constructed as
1 T N
min w w+C ξi + ξi∗
w,w0 ,ξ,ξ ∗ 2 i=1
Using the Lagrangian multipliers, the dual form of this optimization problem leads
to
2 Basics of SVM Method and Least Squares SVM 33
1
N
max∗ − αi − αi∗ α j − α ∗j K (xi , x j )
α,α 2 i, j=1
N
N
− αi + αi∗ + yi αi − αi∗
i=1 i=1
N
s.t. αi − αi∗ = 0,
i=1
αi , αi∗ ∈ [0, C].
where |S| is the cardinality of support vectors set. The unknown function y(x) in the
dual form can be computed as
N
y(x) = αi − αi∗ K (x, xi ) + w0 .
i=1
The least squares support vector machine (LS-SVM) is a modification of SVM for-
mulation for machine learning tasks which was originally proposed by Suykens and
Vandewalle (1999). The LS-SVM replaces the inequality constraints of SVM’s pri-
mal model with equality ones. Also, the slack variables loss function changes to the
squared error loss function. Taking advantage of these changes, the dual problem
leads to a system of linear equations. Solving this system of linear equations can
be more computationally efficient than a quadratic programming problem in some
cases. Although this reformulation preserves the kernel trick property, the sparseness
of the model is lost. Here, the formulation of LS-SVM for two-class classification
tasks will be described.
Same as support vector machines, the LS-SVM considers the separating hyper-
plane in a feature space:
y(x) = sign w T φ(x) + w0 .
34 K. Parand et al.
γ 2
N
1 T
min w w+ e
w,e,w0 2 2 i=1 i
T
s.t. yi w φ (xi ) + w0 = 1 − ei , i = 1, . . . , N ,
where γ is the regularization parameter and ei is the slack variables which can be
positive or negative. The corresponding Lagrangian function is
γ 2 T
N N
1 T
L(w, w0 , e, α) = w w+ ei − αi yi w φ (xi ) + w0 − 1 + ei .
2 2 i=1 i=1
where
Z T = φ (x1 )T y1 ; . . . ; φ (x N )T y N ,
y = [y1 ; . . . ; y N ] ,
1v = [1; . . . ; 1],
e = [e1 ; . . . ; e N ] ,
α = [α1 ; . . . ; α N ] ,
and
i, j = yi y j φ (xi )T φ x j
= yi y j K xi , x j , i, j = 1, . . . , N .
2 Basics of SVM Method and Least Squares SVM 35
The LS-SVM for function estimation, known as LS-SVR, is another type of LS-SVM
which deals with regression problems Suykens et al. (2002). The LS-SVR primal
form can be related to the ridge regression model, but for a not-explicitly defined
feature map, the dual form with a kernel trick sense can be constructed. The LS-SVM
for function estimation considers the unknown function as
y(x) = w T φ(x) + w0 ,
1 2
N
1 T
min w w+γ e
w,e,w0 2 2 i=1 i
s.t. yi = w T φ(xi ) + w0 + ei , i = 1, . . . , N .
By applying the Lagrangian function and computing the optimality conditions, the
dual form of the problem takes the form
0 1T w0 0
= ,
1 + I /γ α y
where
y = [y1 ; . . . ; y N ] ,
1v = [1; . . . ; 1],
α = [α1 ; . . . ; α N ] ,
i, j = φ (xi ) φ x j = K xi , x j .
T
N
y(x) = αi K (xi , x) + w0 ,
i=1
References
Cheng, K., Lu, Z., Wei, Y., Shi, Y., Zhou, Y.: Mixed kernel function support vector regression for
global sensitivity analysis. Mech. Syst. Signal Process. 96, 201–214 (2017)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A.J., Vapnik, V.: Support vector regression
machines. Adv. Neural Inf. Process. Syst. 9, 155–161 (1997)
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Logist. Q. 3, 95–110
(1956)
Genton, M.G.: Classes of kernels for machine learning: a statistics perspective. J. Mach. Learn.
Res. 2, 299–312 (2001)
Karush, W.: Minima of functions of several variables with inequalities as side constraints. M.Sc.
Dissertation. Department of Mathematics, University of Chicago (1939)
Kuhn, H.W., Tucker, A.W.: Nonlinear programming. In: Berkeley Symposium on Mathematical
Statistics and Probability. University of California Press, Berkeley (1951)
Mercer, J.: XVI. Functions of positive and negative type, and their connection the theory of integral
equations. In: Philosophical Transactions of the Royal Society of London. Series A, Containing
Papers of a Mathematical or Physical Character, vol. 209, pp. 415–446 (1909)
Murty, K.G., Yu, F.T.: Linear Complementarity, Linear and Nonlinear Programming. Helderman,
Berlin (1988)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press,
UK (2004)
Suykens, J.A.K., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process.
Lett. 9, 293–300 (1999)
Suykens, J.A.K., Gestel, T.V., Brabanter, J.D., Moor, B.D., Vandewalle, J.: Least Squares Support
Vector Machines. World Scientific, Singapore (2002)
Vapnik, V., Chervonenkis, A.: A note one class of perceptrons. Autom. Remote. Control. 44, 103–
109 (1964)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (2000)
Zanaty, E.A., Afifi, A.: Support vector machines (SVMs) with universal kernels. Appl Artif. Intell.
25, 575–589 (2011)
Part II
Special Kernel Classifiers
Chapter 3
Fractional Chebyshev Kernel Functions:
Theory and Application
Abstract Orthogonal functions have many useful properties and can be used for
different purposes in machine learning. One of the main applications of the orthogonal
functions is producing powerful kernel functions for the support vector machine
algorithm. Maybe the simplest orthogonal function that can be used for producing
kernel functions is the Chebyshev polynomials. In this chapter, after reviewing some
essential properties of Chebyshev polynomials and fractional Chebyshev functions,
various Chebyshev kernel functions are presented, and fractional Chebyshev kernel
functions are introduced. Finally, the performance of the various Chebyshev kernel
functions is illustrated on two sample datasets.
3.1 Introduction
As mentioned in the previous chapters, the kernel function plays a crucial role in the
performance of the SVM algorithms. In the literature of the SVM, various kernel
functions have been developed and applied on several datasets (Hussain et al. 2011;
Ozer et al. 2011; An-na et al. 2010; Padierna et al. 2018; Tian and Wang 2017), but
each kernel function has its own benefits and limitations (Achirul Nanda et al. 2018;
Hussain et al. 2011). The radial basis function (RBF) (Musavi et al. 1992; Scholkopf
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 39
J. A. Rad et al. (eds.), Learning with Fractional Orthogonal Kernel Classifiers in Support
Vector Machines, Industrial and Applied Mathematics,
https://doi.org/10.1007/978-981-19-6553-1_3
40 A. H. Hadian Rasanan et al.
et al. 1997) and polynomial kernel functions (Reddy et al. 2014; Yaman and Pele-
canos 2013) perhaps are the most popular ones, because they are easy to learn,
have acceptable performance in pattern classification, and are very computationally
efficient. However, there are many other examples where the performance of those
kernels is not satisfactory (Moghaddam and Hamidzadeh 2016). One of the well-
established alternatives to these two kernels is the orthogonal kernel functions which
have many useful properties embedded in their nature. These orthogonal functions
are very useful in various fields of science as well as machine learning (Hajimo-
hammadi et al. 2021; Tian and Wang 2017; Sun et al. 2015). It can be said that the
simplest family of these functions is Chebyshev. This family of orthogonal functions
has been used in different cases such as signal and image processing (Shuman et al.
2018), digital filtering (Pavlović et al. 2013), spectral graph theory (Hadian Rasanan
et al. 2019), astronomy (Capozziello et al. 2018), numerical analysis (Sedaghat et al.
2012; Zhao et al. 2017; Shaban et al. 2013; Kazem et al. 2012; Parand et al. 2019;
Hadian-Rasanan and Rad 2020; Kazem et al. 2017), and machine learning (Mall
and Chakraverty 2020). On the other hand, in the literature of numerical analysis
and scientific computing, the Chebyshev polynomials have been used for solving
various problems in fluid dynamics (Parand et al. 2017), theoretical physics (Parand
and Delkhosh 2017), control (Hassani et al. 2019), and finance (Glau et al. 2019;
Mesgarani et al. 2021). One of the exciting applications of the Chebyshev polyno-
mials has been introduced by Mall and Chakraverty (Mall and Chakraverty 2015),
where they used the Chebyshev polynomials as an activation function of functional
link neural network. The functional link neural network is a kind of single-layer neu-
ral network which utilizes orthogonal polynomials as the activation function. This
framework based on Chebyshev polynomials has been used for solving various types
of equations such as ordinary, partial, or system of differential equations (Mall and
Chakraverty 2017; Chakraverty and Mall 2020; Omidi et al. 2021).
aFor more information about Chebyshev and his contribution in orthogonal polynomials
visit: http://mathshistory.st-andrews.ac.uk/Biographies/Chebyshev.html.
3 Fractional Chebyshev Kernel Functions: Theory and Application 41
One of the recent improvements in the field of special functions is the development
of fractional orthogonal polynomials (Kazem et al. 2013). The fractional orthogonal
functions can be obtained by some nonlinear transformation function and have much
better performance in function approximation (Dabiri et al. 2017; Kheyrinataj and
Nazemi 2020; Habibli and Noori Skandari 2019). However, these functions have not
been used as a kernel. Thus, this chapter first aims to present essential backgrounds
for Chebyshev polynomial and its fractional version, then bring all the Chebyshev
kernel functions together, and finally introduce and examine the fractional Chebyshev
kernel functions.
This chapter is organized as follows. The basic definitions and properties of
orthogonal Chebyshev polynomials and the fractional Chebyshev functions are pre-
sented in Sect. 3.2. Then, the ordinary Chebyshev kernel function and two previously
proposed kernels based on these polynomials are discussed and the novel fractional
Chebyshev kernel functions are introduced in Sect. 3.3. In Sect. 3.4, the results of
42 A. H. Hadian Rasanan et al.
experiments of both the ordinary Chebyshev kernel and the fractional one are cov-
ered and then a comparison between the obtained accuracy results of the mentioned
kernels and RBF and polynomial kernel functions in the SVM algorithm on well-
known datasets is exhibited to specify the validity and efficiency of the fractional
kernel functions. Finally, in Sect. 3.5, conclusion remarks of this chapter are pre-
sented.
3.2 Preliminaries
There are four kinds of Chebyshev polynomials, but the focus of this chapter is on
introducing the first kind of Chebyshev polynomials which is denoted by Tn (x). The
interested readers can investigate the other types of Chebyshev polynomials in Boyd
(2001).
The power of polynomials originally comes from their relation with trigonometric
functions (sine and cosine) that are very useful in describing all kinds of natural
phenomena (Mason and Handscomb 2002). Since Tn (x) is a polynomial, then it is
possible to define Tn (x) using trigonometric relations.
Let us assume z be a complex number over a unit circle |z| = 1, in which θ is the
argument of z and θ ∈ [0, 2π ], in other words:
1
x = z = (z + z −1 ) = cos θ ∈ [−1, 1], (3.1)
2
where is the real axis of a complex number.
Considering Chebyshev polynomials of the first kind is denoted by Tn (x), so one
can define (Mason and Handscomb 2002; Boyd 2001; Shen et al. 2011; Asghari et al.
2022):
1
Tn (x) = z n = (z n + z −n ) = cos nθ. (3.2)
2
Therefore, the n-th order of Chebyshev polynomial can be obtained by
Based on definition (3.2), the Chebyshev polynomials for some n are defined as
follows:
1 0
z 0 = (z + z 0 ), =⇒ T0 (x) = 1, (3.4)
2
1 1
z 1 = (z + z −1 ), =⇒ T1 (x) = x, (3.5)
2
1
z 2 = (z 2 + z −2 ), =⇒ T2 (x) = 2x 2 − 1, (3.6)
2
1
z 3 = (z 3 + z −3 ), =⇒ T3 (x) = 4x 3 − 3x, (3.7)
2
1 4
z = (z + z −4 ),
4
=⇒ T4 (x) = 8x 4 − 8x 2 + 1. (3.8)
2
Consequently, by considering this definition for Tn+1 (x) as follows:
1 n+1
Tn+1 (x) = (z + z −n−1 )
2
1 1
= (z n + z −n )(z + z −1 ) − (z n−1 + z 1−n )
2 2
= 2 cos(nθ ) cos(θ ) − cos((n − 1)θ ), (3.9)
the following recursive relation can be obtained for the Chebyshev polynomials:
44 A. H. Hadian Rasanan et al.
T0 (x) = 1,
T1 (x) = x,
Tn (x) = 2x Tn−1 (x) − Tn−2 (x), n ≥ 2. (3.10)
Thus, any order of the Chebyshev polynomials can be generated using this recursive
formula.
Additional to the recursive formula, the Chebyshev polynomials can be obtained
directly by the following expansion for n ∈ Z+ (Mason and Handscomb 2002; Boyd
2001; Shen et al. 2011; Asghari et al. 2022):
[ 2 n]1
n!
Tn (x) = (−1)k (1 − x 2 )k x n−2k . (3.11)
k=0
(2k)!(n − 2k)!
The other way to obtain the Chebyshev polynomials is by solving their Sturm-
Liouville differential equation.
Theorem 3.1 (Mason and Handscomb (2002); Boyd (2001); Shen et al. (2011);
Asghari et al. (2022)) Tn (x) is the solution of the following second-order linear
Sturm-Liouville differential equation:
d2 y dy
(1 − x 2 ) −x + n 2 y = 0, (3.12)
dx2 dx
where −1 < x < 1 and n is an integer number.
Proof By considering d
dx
cos−1 (x) = − √1−x
1
2
, we have
d d
Tn (x) = cos(n cos−1 x),
dx dx
1
= n. √ sin(n cos−1 x).
1 − x2
(3.13)
So, by replacing the derivatives with their obtained formula in (3.12), the following
is yielded:
3 Fractional Chebyshev Kernel Functions: Theory and Application 45
Program Code
import sympy
x = sympy.Symbol("x")
sympy.expand(sympy.simplify(Tn(x, 3)))
> 4x 3 − 3x
In the above code, the third order of first-kind Chebyshev polynomial is generated
which is equal to 4x 3 − 3x.
In order to explore the behavior of the Chebyshev polynomials, there are some
useful properties that are available. The first property of Chebyshev polynomials is
their orthogonality.
Theorem 3.2 (Mason and Handscomb (2002); Boyd (2001); Shen et al. (2011))
{Tn (x)} forms a sequence of orthogonal polynomials which are orthogonal to each
other over the interval [−1, 1], with respect to the following weight function:
1
w(x) = √ , (3.14)
1 − x2
1 https://numpy.org/doc/stable/reference/routines.polynomials.chebyshev.html.
46 A. H. Hadian Rasanan et al.
1
π cn
Tn (x)Tm (x)w(x)d x = δn,m , (3.15)
−1 2
In case of n = m = 0, we have
1
1
Tn (x)Tm (x) √ d x,
−1 1 − x2
π
= cos nθ cos nθ dθ,
0 π π
1
= cos nθ dθ =
2
(1 + cos 2nθ )dθ,
0 0 2
π
1 −1 π
= θ+ sin 2nθ = . (3.18)
2 2n 0
2
(2k − 1)π
xk = cos . (3.20)
2n
Note that x = 0 is a root of Tn (x) for all odd orders n, other roots are symmetri-
cally placed on either side of x = 0. For k = 0, 1, . . . , n extremums of Chebyshev
polynomial of first kind can be found using
πk
x = cos . (3.21)
n
Chebyshev polynomials are even and odd symmetrical, meaning even orders only
have the even powers of x and odd orders only the odd powers of x. Therefore we
have
Tn (−x), n even
Tn (x) = (−1)n Tn (−x) = . (3.22)
−Tn (−x), n odd
F T0α (x) = 1,
α
x −a
F T1α (x) = 2 − 1,
b−a
2α α
x −a x −a
F T2α (x) = 8 −8 + 1,
b−a b−a
3α 2α α
x −a x −a x −a
F T3α (x) = 32 − 48 + 18 − 1,
b−a b−a b−a
4α 3α 2α α
x −a x −a x −a x −a
F T4α (x) = 128 − 256 + 160 − 32 + 1. (3.26)
b−a b−a b−a b−a
The readers can use the following Python code to generate any order of Chebyshev
Polynomials of Fractional Order symbolically:
3 Fractional Chebyshev Kernel Functions: Theory and Application 49
Program Code
import sympy
x = sympy.Symbol("x")
eta = sympy.Symbol(r’\eta’)
alpha = sympy.Symbol(r’\alpha’)
a = sympy.Symbol("a")
b = sympy.Symbol("b")
x=sympy.sympify(2*((x-a)/(b-a))**alpha -1)
Program Code
sympy.expand(sympy.simplify(FTn(x, 5)))
x−a α
> 512( b−a ) − 1280( b−a
x−a 5α
) + 1120( b−a
x−a 4α
) − 400( b−a
x−a 3α
) + 50( b−a
x−a 2α
) −1
1
As Chebyshev polynomials are orthogonal respecting the weight function √1−x 2
Therefore, we can define the orthogonality relation of the fractional Chebyshev poly-
nomial as
1 b
π
Tn (x )Tm (x )w(x )d x = F Tnα (x)F Tmα (x)w(x)d x = cn δmn , (3.28)
−1 a 2
Fig. 3.3 Fractional Chebyshev functions of the first kind up to sixth order where a = 0, b = 5, and
α = 0.5
Fig. 3.4 Fractional Chebyshev functions of the first kind of order 5, for different α values
α1
1 − cos( (2k−1)π )
xk = 2n
, f or k = 1, 2, . . . , n. (3.29)
2
Figure 3.3 shows the fractional Chebyshev functions of the first kind up to sixth
order where a = 0, b = 5, and α is 0.5.
Also, Fig. 3.4 depicts the fractional Chebyshev functions of the first kind of order
5 for different values of α while η is fixed at 5.
3 Fractional Chebyshev Kernel Functions: Theory and Application 51
Following the Chebyshev polynomial principles and their fractional type discussed
in the previous sections, this section presents the formulation of Chebyshev kernels.
Therefore, in the following, first, the ordinary Chebyshev kernel function, then some
other versions of this kernel function, and finally the fractional Chebyshev kernels
will be explained.
Many kernels constructed from orthogonal polynomials have been proposed for
SVM and other kernel-based learning algorithms (Ye et al. 2006; Moghaddam and
Hamidzadeh 2016; Tian and Wang 2017). Some characteristics of such kernels have
attracted attention such as lower data redundancy in feature space or on some occa-
sions these kernels need fewer support vectors during the fitting procedure which
thereby leads to less execution time (Jung and Kim 2013). On the other hand, orthog-
onal kernels have shown superior performance in classification problems than tradi-
tional kernels like RBF and polynomial (Moghaddam and Hamidzadeh 2016; Ozer
et al. 2011; Padierna et al. 2018).
Now, using the fundamental definition of the orthogonal kernel we will formulate
the Chebyshev kernel. As we know, the unweighted orthogonal polynomial kernel
function for SVM, for scalar inputs, is defined as
n
K (x, z) = Ti (x)Ti (z), (3.31)
i=0
where T (.) denotes the evaluation of the polynomial; x, z are kernel’s input argu-
ments; and n is the highest polynomial order. In most applications of the real world,
input data is a multidimensional vector, hence two approaches have been proposed to
extend one-dimensional polynomial kernel to multidimensional vector input (Ozer
et al. 2011; Vert et al. 2007):
1. Pairwise
According to Vapnik’s theorem (Vapnik 2013):
Let a multidimensional set of functions be defined by the basis functions that are the
tensor products of the coordinate-wise basis functions. Then the kernel that defines the
inner product in the n-dimensional basis is the product of one-dimensional kernels.
m
K (x, z) = K j (x j , z j ). (3.32)
j=1
In this approach, the function evaluation of element pair of each input vectors x
and z is multiplied in pairs, which means multiplying the corresponding elements
of x and z and then overall kernel output is the multiplication of all outcomes of
the previous step. Therefore, for an m-dimensional input vector x ∈ Rm , given
x = {x1 , x2 , ..., xm }, z ∈ Rm , and z = {z 1 , z 2 , . . . , z m }, the unweighted Cheby-
shev kernel is defined as
m
m
n
K Cheb (x, z) = K j (x j , z j ) = Ti (x j )Ti (z j ), (3.33)
j=1 j=1 i=0
where {Ti (.)} are Chebyshev polynomials. Simply by multiplying the weight
function defined in (3.14) which the orthogonality of the Chebyshev polynomial
of first kind is defined with (see (3.15)), one can construct the pairwise orthogonal
Chebyshev kernel function as
n
Ti (x)Ti (z)
K Cheb (x, z) = √
i=0
, (3.34)
1 − xz
where x and z are scalar-valued inputs and for vector input we have
n
m
Ti (x j )Ti (z j )
K Cheb (x, z) =
i=0
, (3.35)
j=1
1 − xjzj
2. Vectorized
In this method proposed by Ozer et al. (2008), the generalization problem of
pairwise is tackled by applying the input vectors as a whole rather than per element
and this is done by means of evaluation of the inner product of input vectors. Based
on the fact that the inner product of two vectors x and z is defined as x, z =
x z T and considering Eq. 3.31, one can construct the unweighted Generalized
Chebyshev kernel as Ozer and Chen (2008); Ozer et al. (2011)
3 Fractional Chebyshev Kernel Functions: Theory and Application 53
n
n
K (x, z) = Ti (x)TiT (z) = Ti (x)Ti (z) . (3.36)
i=0 i=0
2(x − Min)
x new = − 1, (3.38)
Max − Min
where Min and Max are minimum and maximum values of the input data, respectively.
It is clear that if the input data is a dataset, normalization should be done column-wise.
The multiplication of two valid kernels is also a valid kernel. Therefore, one can
express any order of the Chebyshev kernel as a product of two kernel functions:
where
n
k(1) (x, z) = T j (x)T jT (z) = T0 (x)T0T (z) + T1 (x)T1T (z) + · · · + Tn (x)TnT (z)
j=0
(3.41)
54 A. H. Hadian Rasanan et al.
and
1
k(2) (x, z) = √ . (3.42)
m − x, z
Therefore, the kernel k(1) (x, z) is a valid Mercer kernel. To prove that k(2) (x, z) =
√ 1
m− x,z
(in the simplest form where m = 1) is a valid kernel function, we can say
that < x, y > is the linear kernel and the Taylor series of √ 1
1−x.y
is
1 3 5 35
1+ < x, y > + < x, y >2 + < x, y >3 + < x, y >4 + · · · ,
2 8 16 128
n(2i−1)
which for each coefficient i=1
n!2n
0, k(2) (x, z) = √ 1
m− x,z
is also a valid
kernel.
A valid kernel should satisfy the Mercer conditions (Vapnik 2013; Schölkopf et al.
2002), and based on Mercer’s theorem, a kernel made from multiplication or summa-
tion of two valid Mercer kernels is also a valid kernel. This idea motivated Jafarzadeh
et al. (2013) to construct Chebyshev-Wavelet kernel which is in fact the multiplication
of generalized Chebyshev and Wavelet kernels as follows:
3 Fractional Chebyshev Kernel Functions: Theory and Application 55
n
i=0 Ti (x), Ti (z)
k G_Cheb (x, z) = √ ,
m − x, z
m
xj − zj x−z 2
kwavelet (x, z) = cos 1.75 ex p − . (3.44)
i=1
α 2α 2
where n is the Chebyshev kernel parameter (the order), a is the wavelet kernel
parameter, and m is the dimension of the input vector.
while the trigonometric relation of the Chebyshev polynomials of the first kind
is introduced at 3.3, polynomials of the second kind are described by Mason and
Handscomb (2002)
sin((n + 1)θ )
Un (cosθ ) = , n = 0, 1, 2, . . . . (3.47)
sin(θ )
∞
1
= Un (x)t n , |x| 1, |t| 1. (3.48)
1 − 2xt + t 2
n=0
According to defined weight function Eq. 3.46, Zhao et al. (2013) defined the
orthogonality relation for the Chebyshev polynomials of the second kind as
56 A. H. Hadian Rasanan et al.
π
1
, (m = n = 0)
1 − x 2 Um (x)Un (x)d x = 2 . (3.49)
−1 0, ((m = n)or (m = n = 0))
Moreover, they used Pn (.) to denote unified Chebyshev polynomials (UPC), but
in this book, notation Pn (.) is used for Legendre Orthogonal Polynomials. Therefore,
in order to avoid any ambiguity, we consciously use U C Pn (.) instead. Considering
the generating function of the first and the second kind, Zhao et al. (2013) constructed
the generating function of UCP:
∞
1 − xt 1
× = U C Pn (x)t n , (3.50)
1 − axt 1 − 2xt + t 2 n=0
where x ∈ [−1, 1], t ∈ [−1, 1], n = 0, 1, 2, 3 . . ., and U C Pn (x) is the UCP of the
nth order. It is clear that the Chebyshev polynomials of the first and the second kinds
are special cases of UCP where a = 0 and a = 1, respectively. Also, Zhao et al.
(2013) introduced the recurrence relation of the nth order of UCP as
U C Pn (x) = (a + 2)xU C Pn−1 (x) − (2ax 2 + 1)U C Pn−2 (x) + axU C Pn−3 (x).
(3.51)
Therefore using (3.51) some instances of these polynomials are
U C P0 (x) = 1,
U C P1 (x) = ax + x,
U C P2 (x) = (a 2 + a + 2)x 2 − 1,
U C P3 (x) = (a 3 + a 2 + 2a + 4)x 3 − (a + 3)x,
U C P4 (x) = (a 4 + a 3 + 2a 2 + 4a + 8)x 4 − (a 2 + 3a + 8)x 2 + 1.
n
(m − x, z + r )− 2
1
U C P j (x), U C P jT (z) f (x) f (z)d xdz 0. (3.52)
j=0
Hence: n
j=0 U C P j (x)U C P j (z)
kU C P (X, Z ) = √ , (3.53)
(m − x, z + r )
where m is the dimension of the input vector and r is the minimum positive value to
prevent the equation from being 0.
3 Fractional Chebyshev Kernel Functions: Theory and Application 57
Using the weight function Eq. 3.27, with the same approach performed in Eqs. 3.35
and 3.37, we can introduce the corresponding fractional kernels as follows:
n
m
F T α (x j )F Tiα (z j )
k FCheb (x, z) = i=0 i α α , (3.54)
x j z j −a
j=1 1 − 2 b−a −1
n
F T α (x), F Tiα (z)
k Gen−FCheb (x, z) = i=0 i α α , (3.55)
−a
m − 2 x,z b−a
− 1
Proof Considering Mercer theorem states any SVM kernel to be a valid kernel must
be non-negative (see (3.39)) and by use of the fact that multiplication of two valid
kernels is also a kernel (see (3.40)), we have
n
k(1) (x, z) = F T jα (x)F T jα (z),
j=0
1
k(2) (x, z) = w(x) = x−a α α .
1 − 2 b−a − 1
n
K (1) (x, z) f (x) f (z)d xdz = (F T jα )(x)(F T jα )T (z) f (x) f (z)d xdz,
j=0
n
= (F T jα )(x)(F T jα )T (z) f (x) f (z)d xdz,
j=0
n
= (F T jα )(x) f (x)d x (F T jα )T (z) f (z)dz ,
j=0
n
= (F T jα )(x) f (x)d x (F T jα )T (z) f (z)dz 0.
j=0
(3.56)
Therefore, the kernel k(1) (x, z) is a valid Mercer kernel. In order to prove that
k(2) (x, z) is a valid Mercer kernel too, one can show that k(2) (x, z) is positive semi-
definite. According to definition Eq. 3.24, both x and b are positive because 0
x b, b ∈ R+ and α ∈ R+ ; in other words, by mapping defined in Eq. 3.24, we are
sure the output is always positive. Hence, the second kernel Eq. 3.56 or precisely the
weight function is positive semi-definite, so k(2) (x, z) 0.
In this section, we will illustrate the use of fractional Chebyshev kernel in SVM
applied to some real datasets which are widely used by machine learning experts to
examine the accuracy of any given model. Then, we will compare the accuracy of
the fractional Chebyshev kernel with the accuracy of the Chebyshev kernel, and two
of other well-known kernels used in kernel-based learning algorithms, such as RBF
and polynomial kernel. As we know, applying SVM to a dataset involves a number of
pre-processing tasks, such as data cleansing and generation of the training/test. We
do not dwell in these steps except for the normalization of the dataset that already
has been mentioned that is mandatory using Chebyshev polynomials of any kind as
the kernel function.
There are some online data stores available for public use, such a widely used
datastore is the UCI Machine Learning Repository2 of the University of California,
Irvine, and also Kaggle.3 In this section, four datasets are selected from UCI, which
are well known and widely used for machine learning practitioners.
The polynomial kernel is widely used in kernel-based models and, in general, is
defined as
K (X 1 , X 2 ) = (a + X 1T X 2 )b . (3.57)
2 https://archive.ics.uci.edu/ml/datasets.php.
3 https://www.kaggle.com/datasets.
3 Fractional Chebyshev Kernel Functions: Theory and Application 59
Also, RBF kernel a.k.a Gaussian RBF kernel is another popular kernel for SVM,
and is defined as
x−x 2
K (X 1 , X 2 ) = ex p − . (3.58)
2σ 2
The Spiral dataset is a famous classic dataset, but there are many datasets that have
a spiraling distribution of data points such as the one that is considered here; how-
ever, there are simulated ones with similar characteristics and even better controlled
distributional properties. The Spiral dataset is consisted of 1000 data points, equally
clustered into 3 numerical labels with 2 other attributes “Chem 1” and “Chem2” of
float type. It is known that SVM is a binary classification algorithm; therefore, to deal
with multi-class datasets the common method is to split such datasets into multiple
binary class datasets. Two examples of such methods are
• One-vs-All (OVA),
• One-vs-One (OVO).
Using the OVA, Spiral classification task with numerical classes ∈ {1, 2, 3} will be
divided into 3 classifications of the following forms:
• Binary Classification task 1: class 1 versus class {2, 3},
• Binary Classification task 2: class 2 versus class {1, 3},
• Binary Classification task 3: class 3 versus class {1, 2}.
Also, using the OVO method, the classification task is divided into any possible
binary classification; then, for the Spiral dataset, we have
• Binary Classification task 1: class 1 versus class 2,
• Binary Classification task 2: class 1 versus class 3,
• Binary Classification task 3: class 2 versus class 3.
However, in our example, final number of split datasets are equally in both methods,
but generally OVO method generates more datasets to classify. Assume a multi-class
dataset with 4 numerical labels {1, 2, 3, 4}; afterward, using the first method binary
classification datasets are
• Binary Classification task 1: class 1 versus class {2, 3, 4},
• Binary Classification task 2: class 2 versus class {1, 3, 4},
• Binary Classification task 3: class 3 versus class {1, 2, 4},
• Binary Classification task 4: class 4 versus class {1, 2, 3}.
While using the second method, we have the following binary datasets to classify:
• Binary Classification task 1: class 1 versus class 2,
• Binary Classification task 2: class 1 versus class 3,
60 A. H. Hadian Rasanan et al.
Fig. 3.6 Normal and fractional Chebyshev kernel of order 3 and alpha = 0.3 applied on Spiral
dataset
In order to have a better intuition of how a kernel function takes the dataset to a
higher dimension, see Fig. 3.6, which shows the Chebyshev kernel function of the
first kind (both normal and fractional forms) applied to the Spiral dataset. The figure
on the left depicts the dataset in a higher dimension when the normal Chebyshev
kernel of order 3 is applied, and the one at right demonstrates how the data points are
changed when the fractional Chebyshev kernel of order 3 and alpha 0.3 is applied to
the original dataset. It is clear that the transformation has moved the data points to
the positive side of the axes x, y, and also z.
After the transformation of the dataset and taking the data points to a higher
dimension, it is time to determine the decision boundaries of the classifier.4 The
decision boundary is highly dependent on the kernel function. In the case of orthog-
onal polynomials, the order of these functions has a critical role. For example, the
Chebyshev kernel function with order 3 outcomes a different decision boundary in
comparison to the same kernel function with the order of 6. There is no general rule
to know which order outcomes the most suitable decision boundary. Hence, trying
different decision boundaries gives a useful metric to compare and find the best order
or even the kernel function itself.
In order to have an intuition of how the decision boundaries are different, take
a look at Fig. 3.7 which depicts corresponding Chebyshev classifiers of different
orders {3, 4, 5, 6} on the original Spiral dataset where the binary classification of
1-versus-{2, 3} is chosen. From these plots, one can see that decision boundaries are
getting more twisted and twirled as we approach higher orders. In order 6 (the last
subplot on the right), the decision boundary is slightly more complex in comparison
with order 3 (the first subplot on the right). The complexity of the decision boundary
even gets worse in fractional space. Figure 3.8 demonstrates the decision boundary
of the fractional Chebyshev classifier with different orders of 3, 4, 5, and 6 where
4 Please note that “decision boundary” refers to the 2D space and the “decision surface” refers to
the 3D space.
62 A. H. Hadian Rasanan et al.
Fig. 3.8 Fractional Chebyshev kernel with orders of 3, 4, 5, and 6 on Spiral dataset (α = 0.3)
Table 3.1 Comparison of RBF, polynomial, Chebyshev, and fractional Chebyshev kernels on Spiral
dataset. It is clear that the RBF kernel outperforms other kernels
Sigma Power Order Alpha(α) Accuracy
RBF 0.73 – – – 0.97
Polynomial – 8 – – 0.9533
Chebyshev – – 5 – 0.9667
Fractional – – 3 0.3 0.9733
Chebyshev
Table 3.2 Comparison of RBF, polynomial, Chebyshev, and fractional Chebyshev kernels on Spiral
dataset. It is clear that RBF kernel outperforms other kernels
Sigma Power Order Alpha(α) Accuracy
RBF 0.1 – – – 0.9867
Polynomial – 5 – – 0.9044
Chebyshev – – 8 – 0.9411
Fractional – – 8 0.5 0.9467
Chebyshev
64 A. H. Hadian Rasanan et al.
Table 3.3 Comparison of RBF, polynomial, Chebyshev, and fractional Chebyshev kernels on Spiral
dataset. It is clear that RBF and polynomial kernels are better than others
Sigma Power Order Alpha(α) Accuracy
RBF 0.73 – – – 0.9856
Polynomial – 5 – – 0.9856
Chebyshev – – 6 – 0.9622
Fractional – – 6 0.6 0.9578
Chebyshev
The learning task is a binary classification one. Each problem is given by a logical description
of a class. Robots belong either to this class or not, but instead of providing a complete
class description to the learning problem, only a subset of all 432 possible robots with its
classification is given. The learning task is to generalize over these examples and if the
particular learning technique at hand allows this to derive a simple class description.
Table 3.4 Comparison of RBF, polynomial, Chebyshev, and fractional Chebyshev kernels on the
Monk’s first problem. It is clear that the RBF kernel outperforms other kernels
Sigma Power Order Alpha(α) Accuracy
RBF 2.844 – – – 0.8819
Polynomial – 3 – – 0.8681
Chebyshev – – 3 – 0.8472
Fractional – – 3 1/16 0.8588
Chebyshev
Table 3.5 Comparison of RBF, polynomial, Chebyshev, and fractional Chebyshev kernels on the
Monk’s second problem. Fractional Chebyshev Kernel outperforms other kernels with highest
accuracy at 96%
Sigma Power Order Alpha(α) Accuracy
RBF 5.5896 – – – 0.875
Polynomial – 3 – – 0.8657
Chebyshev – – 3 – 0.8426
Fractional – – 6 1/15 0.9653
Chebyshev
Table 3.6 Comparison of RBF, polynomial, Chebyshev, and fractional Chebyshev kernels on the
Monk’s third problem. Fractional Chebyshev Kernel and RBF kernel had same accuracy result at
0.91
Sigma Power Order Alpha(α) Accuracy
RBF 2.1586 – – – 0.91
Polynomial – 3 – – 0.875
Chebyshev – – 6 – 0.8958
Fractional – – 5 1/5 0.91
Chebyshev
66 A. H. Hadian Rasanan et al.
3.5 Conclusion
This chapter is started with a brief history and basics of Chebyshev orthogonal poly-
nomials. Chebyshev polynomials have been used in many use cases and recently as
the kernel function in kernel-based learning algorithms and are proved to surpass tra-
ditional kernels in many situations. Construction of Chebyshev kernels explained and
proved. It is demonstrated in this chapter that the fractional form of Chebyshev poly-
nomials extends the applicability of the Chebyshev kernel. Thus, by using fractional
Chebyshev kernel functions, a wider set of problems can be tackled. Experiments
demonstrate fractional Chebyshev kernel leverages the accuracy of classification
with SVM when being used as the kernel in the SVM kernel trick. In the last section,
the results of such experiments are depicted.
References
Achirul Nanda, M., Boro Seminar, K., Nandika, D., Maddu, A.: A comparison study of kernel
functions in the support vector machine and its application for termite detection. Information 9,
5–29 (2018)
An-na, W., Yue, Z., Yun-tao, H., Yun-lu, L.I.: A novel construction of SVM compound kernel
function. In: 2010 International Conference on Logistics Systems and Intelligent Management
(ICLSIM), vol. 3, pp. 1462–1465 (2010)
Asghari, M., Hadian Rasanan, A.H., Gorgin, S., Rahmati, D., Parand, K.: FPGA-orthopoly: a
hardware implementation of orthogonal polynomials. Eng. Comput. (2022). https://doi.org/10.
1007/s00366-022-01612-x
Boyd, J.P.: Chebyshev and Fourier Spectral Methods. Courier Corporation (2001)
Boyd, J.P.: Chebyshev and Fourier Spectral Methods. Courier Corporation, Massachusetts (2001)
Capozziello, S., D’Agostino, R., Luongo, O.: Cosmographic analysis with Chebyshev polynomials.
MNRAS 476, 3924–3938 (2018)
Chakraverty, S., Mall, S.: Single layer Chebyshev neural network model with regression-based
weights for solving nonlinear ordinary differential equations. Evol. Intell. 13, 687–694 (2020)
Dabiri, A., Butcher, E.A., Nazari, M.: Coefficient of restitution in fractional viscoelastic compliant
impacts using fractional Chebyshev collocation. J. Sound Vib. 388, 230–244 (2017)
Glau, K., Mahlstedt, M., Pötz, C.: A new approach for American option pricing: the dynamic
Chebyshev method. SIAM J. Sci. Comput. 41, B153–B180 (2019)
Habibli, M., Noori Skandari, M.H.: Fractional Chebyshev pseudospectral method for fractional
optimal control problems. Optim. Control Appl. Methods 40, 558–572 (2019)
Hadian Rasanan, A.H., Rahmati, D., Gorgin, S., Rad, J.A.: MCILS: Monte-Carlo interpolation
least-square algorithm for approximation of edge-reliability polynomial. In: 9th International
Conference on Computer and Knowledge Engineering (ICCKE), pp. 295–299 (2019)
Hadian Rasanan, A.H., Rahmati, D., Gorgin, S., Parand, K.: A single layer fractional orthogonal
neural network for solving various types of Lane-Emden equation. New Astron. 75, 101307
(2020)
Hadian-Rasanan, A.H., Rad, J.A.: Brain activity reconstruction by finding a source parameter in an
inverse problem. In: Chakraverty, S. (ed.) Mathematical Methods in Interdisciplinary Sciences,
pp. 343–368. Wiley, Amsterdam (2020)
Hajimohammadi, Z., Baharifard, F., Ghodsi, A., Parand, K.: Fractional Chebyshev deep neural
network (FCDNN) for solving differential models. Chaos, Solitons Fractals 153, 111530 (2021)
3 Fractional Chebyshev Kernel Functions: Theory and Application 67
Hassani, H., Machado, J.T., Naraghirad, E.: Generalized shifted Chebyshev polynomials for frac-
tional optimal control problems. Commun. Nonlinear Sci. Numer. Simul. 75, 50–61 (2019)
Hussain, M., Wajid, S.K., Elzaart, A., Berbar, M.: A comparison of SVM kernel functions for
breast cancer detection. In: 2011 Eighth International Conference Computer Graphics, Imaging
and Visualization, pp. 145–150 (2011)
Jafarzadeh, S.Z., Aminian, M., Efati, S.: A set of new kernel function for support vector machines:
an approach based on Chebyshev polynomials. In: ICCKE, pp. 412–416 (2013)
Jung, H.G., Kim, G.: Support vector number reduction: survey and experimental evaluations. IEEE
Trans. Intell. Transp. Syst. 15, 463–476 (2013)
Kazem, S., Shaban, M., Rad, J.A.: Solution of the coupled Burgers equation based on operational
matrices of d-dimensional orthogonal functions. Zeitschrift für Naturforschung A 67, 267–274
(2012)
Kazem, S., Abbasbandy, S., Kumar, S.: Fractional-order Legendre functions for solving fractional-
order differential equations. Appl. Math. Model. 37, 5498–5510 (2013)
Kazem, S., Shaban, M., Rad, J.A.: A new Tau homotopy analysis method for MHD squeezing flow
of second-grade fluid between two parallel disks. Appl. Comput. Math. 16, 114–132 (2017)
Kheyrinataj, F., Nazemi, A.: Fractional Chebyshev functional link neural network-optimization
method for solving delay fractional optimal control problems with Atangana-Baleanu derivative.
Optim. Control Appl. Methods 41, 808–832 (2020)
Mall, S., Chakraverty, S.: Numerical solution of nonlinear singular initial value problems of Emden-
Fowler type using Chebyshev Neural Network method. Neurocomputing 149, 975–982 (2015)
Mall, S., Chakraverty, S.: Single layer Chebyshev neural network model for solving elliptic partial
differential equations. Neural Process. Lett. 45, 825–840 (2017)
Mall, S., Chakraverty, S.: A novel Chebyshev neural network approach for solving singular arbitrary
order Lane-Emden equation arising in astrophysics. NETWORK-COMP NEURAL 31, 142–165
(2020)
Mason, J.C., Handscomb, D.C.: Chebyshev Polynomials. Chapman and Hall/CRC (2002)
Mason, J.C., Handscomb, D.C.: Chebyshev Polynomials. CRC Press, Florida (2002)
Mesgarani, H., Beiranvand, A., Aghdam, Y.E.: The impact of the Chebyshev collocation method
on solutions of the time-fractional Black-Scholes. Math. Sci. 15, 137–143 (2021)
Moghaddam, V.H., Hamidzadeh, J.: New Hermite orthogonal polynomial kernel and combined
kernels in support vector machine classifier. Pattern Recognit. 60, 921–935 (2016)
Musavi, M.T., Ahmed, W., Chan, K.H., Faris, K.B., Hummels, D.M.: On the training of radial basis
function classifiers. Neural Netw. 5, 595–603 (1992)
Omidi, M., Arab, B., Hadian Rasanan, A.H., Rad, J.A., Parand, K.: Learning nonlinear dynamics
with behavior ordinary/partial/system of the differential equations: looking through the lens of
orthogonal neural networks. Eng. Comput. 1–20 (2021)
Ozer, S., Chen, C.H.: Generalized Chebyshev kernels for support vector classification. In: 19th
International Conference on Pattern Recognition, pp. 1–4 (2008)
Ozer, S., Chen, C.H., Cirpan, H.A.: A set of new Chebyshev kernel functions for support vector
machine pattern classification. Pattern Recognit. 44, 1435–1447 (2011)
Padierna, L.C., Carpio, M., Rojas-Domínguez, A., Puga, H., Fraire, H.: A novel formulation of
orthogonal polynomial kernel functions for SVM classifiers: the Gegenbauer family. Pattern
Recognit. 84, 211–225 (2018)
Pan, Z.B., Chen, H., You, X.H.: Support vector machine with orthogonal Legendre kernel. In: 2012
International Conference on Wavelet Analysis and Pattern Recognition, pp. 125–130 (2012)
Parand, K., Delkhosh, M.: Solving Volterra’s population growth model of arbitrary order using the
generalized fractional order of the Chebyshev functions. Ric. Mat. 65, 307–328 (2016)
Parand, K., Delkhosh, M.: Accurate solution of the Thomas-Fermi equation using the fractional
order of rational Chebyshev functions. J. Comput. Appl. Math. 317, 624–642 (2017)
Parand, K., Moayeri, M.M., Latifi, S., Delkhosh, M.: A numerical investigation of the boundary
layer flow of an Eyring-Powell fluid over a stretching sheet via rational Chebyshev functions.
Eur. Phys. J. 132, 1–11 (2017)
68 A. H. Hadian Rasanan et al.
Parand, K., Moayeri, M.M., Latifi, S., Rad, J.A.: Numerical study of a multidimensional dynamic
quantum model arising in cognitive psychology especially in decision making. Eur. Phys. J. Plus
134, 109 (2019)
Pavlović, V.D., Dončov, N.S., Ćirić, D.G.: 1D and 2D economical FIR filters generated by Cheby-
shev polynomials of the first kind. Int. J. Electron. 100, 1592–1619 (2013)
Reddy, S.V.G., Reddy, K.T., Kumari, V.V., Varma, K.V.: An SVM based approach to breast cancer
classification using RBF and polynomial kernel functions with varying arguments. IJCSIT 5,
5901–5904 (2014)
Scholkopf, B., Sung, K.K., Burges, C.J., Girosi, F., Niyogi, P., Poggio, T., Vapnik, V.: Comparing
support vector machines with Gaussian kernels to radial basis function classifiers. IEEE Trans.
Signal Process. 45, 2758–2765 (1997)
Schölkopf, B., Smola, A.J., Bach, F.: Learning with Kernels: Support Vector Machines, Regular-
ization, Optimization, and Beyond. MIT press, Cambridge (2002)
Sedaghat, S., Ordokhani, Y., Dehghan, M.: Numerical solution of the delay differential equations
of pantograph type via Chebyshev polynomials. Commun. Nonlinear Sci. Numer. Simul. 17,
4815–4830 (2012)
Shaban, M., Kazem, S., Rad, J.A.: A modification of the homotopy analysis method based on
Chebyshev operational matrices. Math. Comput. Model. 57, 1227–1239 (2013)
Shen, J., Tang, T., Wang, L. L.: Spectral methods: algorithms, analysis and applications, vol. 41.
Springer Science & Business Media, Berlin (2011)
Shuman, D.I., Vandergheynst, P., Kressner, D., Frossard, P.: Distributed signal processing via Cheby-
shev polynomial approximation. IEEE Trans. Signal Inf. Process. Netw. 4, 736–751 (2018)
Sun, L., Toh, K.A., Lin, Z.: A center sliding Bayesian binary classifier adopting orthogonal poly-
nomials. Pattern Recognit. 48, 2013–2028 (2015)
Thrun, S.B., Bala, J.W., Bloedorn, E., Bratko, I., Cestnik, B., Cheng, J., De Jong, K.A., Dzeroski,
S., Fisher, D.H., Fahlman, S.E. Hamann, R.: The monk’s problems: a performance comparison
of different learning algorithms (1991)
Tian, M., Wang, W.: Some sets of orthogonal polynomial kernel functions. Appl. Soft Comput. 61,
742–756 (2017)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (2013)
Vert, J.P., Qiu, J., Noble, W.S.: A new pairwise kernel for biological network inference with support
vector machines. BMC Bioinform. BioMed Cent. 8, 1–10 (2007)
Yaman, S., Pelecanos, J.: Using polynomial kernel support vector machines for speaker verification.
IEEE Signal Process. Lett. 20, 901–904 (2013)
Ye, N., Sun, R., Liu, Y., Cao, L.: Support vector machine with orthogonal Chebyshev kernel. In:
18th International Conference on Pattern Recognition (ICPR’06), vol. 2, pp. 752–755 (2006)
Zhao, J., Yan, G., Feng, B., Mao, W., Bai, J.: An adaptive support vector regression based on a new
sequence of unified orthogonal polynomials. Pattern Recognit. 46, 899–913 (2013)
Zhao, F., Huang, Q., Xie, J., Li, Y., Ma, L., Wang, J.: Chebyshev polynomials approach for numeri-
cally solving system of two-dimensional fractional PDEs and convergence analysis. Appl. Math.
Comput. 313, 321–330 (2017)
Zhou, F., Fang, Z., Xu, J.: Constructing support vector machine kernels from orthogonal polynomials
for face and speaker verification. In: Fourth International Conference on Image and Graphics
(ICIG), pp. 627–632 (2007)
Chapter 4
Fractional Legendre Kernel Functions:
Theory and Application
Abstract The support vector machine algorithm has been able to show great flexibil-
ity in solving many machine learning problems due to the use of different functions
as a kernel. Linear, radial basis functions, and polynomial functions are the most
common functions used in this algorithm. Legendre polynomials are among the
most widely used orthogonal polynomials that have achieved excellent results in the
support vector machine algorithm. In this chapter, some basic features of Legendre
and fractional Legendre functions are introduced and reviewed, and then the kernels
of these functions are introduced and validated. Finally, the performance of these
functions in solving two problems (two sample datasets) is measured.
4.1 Introduction
A. Azmoon (B)
Department of Computer Science, The Institute for Advance Studies in Basic Sciences (IASBS),
Zanjan, Iran
e-mail: [email protected]
S. Chakraverty
Department of Mathematics, National Institute of Technology Rourkela, Sundargarh, OR, India
S. Kumar
Department of Mathematics, National Institute of Technology, Jamshedpur 831014, JH, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 69
J. A. Rad et al. (eds.), Learning with Fractional Orthogonal Kernel Classifiers in Support
Vector Machines, Industrial and Applied Mathematics,
https://doi.org/10.1007/978-981-19-6553-1_4
70 A. Azmoon et al.
aFor more information about A.M. Legendre and his contribution, please see: https://
mathshistory.st-andrews.ac.uk/Biographies/Legendre/.
Marianela and Gómez (2014), Benouini et al. (2019). In Table 4.1, some applications
of different kinds of Legendre polynomials are mentioned.
The following is how this chapter is organized: Section 4.2 presents the fun-
damental definitions and characteristics of orthogonal Legendre polynomials and
fractional Legendre functions. In Sect. 4.3, the ordinary Legendre kernel is provided,
and the innovative fractional Legendre kernel is introduced. In Sect. 4.4, the results
of experiments on both the ordinary Legendre kernel and the fractional covered and
give a comparison of the accuracy results of the mentioned kernels used as a kernel
in the SVM algorithm with the normal polynomial and Gaussian kernels and ordi-
nary Chebyshev and fractional Chebyshev on well-known datasets to demonstrate
the validity and efficiency of kernels. Finally, in Sect. 4.5, the concluding remarks of
this chapter are presented.
4.2 Preliminaries
This section presents the definition and basic properties of Legendre orthogonal
polynomials. Addition to the basics of these polynomials, the fractional form of this
family is discussed.
72 A. Azmoon et al.
d2 y dy
(1 − x 2 ) − 2x + n(n + 1)y = 0, (4.1)
d2x dx
where n is a positive integer. This family is also orthogonal over the interval [−1, 1]
with respect to the weight function w(x) = 1 Olver et al. (2010), such that
1
2
Pn (x)Pm (x) d x = δnm , (4.2)
−1 2n + 1
P0 (x) = 1,
P1 (x) = x,
(n + 1)Pn+1 (x) − (2n + 1)x Pn (x) + n Pn−1 (x) = 0, n ≥ 1. (4.6)
4 Fractional Legendre Kernel Functions: Theory and Application 73
P0 (x) = 1,
P1 (x) = x,
P2 (x) = (3x 2 − 1)/2,
P3 (x) = (5x 3 − 3x)/2,
P4 (x) = (35x 4 − 30x 2 + 3)/8,
P5 (x) = (63x 5 − 70x 3 + 15x)/8,
P6 (x) = (231x 6 − 315x 4 + 105x 2 − 5)/16, (4.7)
Also, the following recursive relations, by combining the above relations and their
derivatives with each other, create other forms of recursive relations of Legendre
polynomials Lamb Jr (2011):
(2n + 1)Pn (x) = Pn+1 (x) − Pn−1 (x),
(1 − x 2 )Pn (x) = n[Pn−1 (x) − x Pn (x)]. (4.9)
The reader can use the following Python code to generate any order of Legendre
polynomials symbolically:
Program Code
import sympy
x = sympy.Symbol("x")
sympy.expand(sympy.simplify(Ln(x,3)))
3
> 5x2 − 3x2 .
Similar to other classical orthogonal polynomials, Legendre polynomials follow
symmetry. Therefore, Legendre polynomials of even order have even symmetry and
contain only even powers of x and similarly odd orders of Legendre polynomials
have odd symmetry and contain only odd powers of x:
Pn (−x), n is even,
Pn (x) = (−1) Pn (x) =
n
(4.10)
−Pn (−x), n is odd.
Pn (−x), n is even,
Pn (x) = (−1)n Pn (x) =
(4.11)
−Pn (−x), n is odd.
For Pn (x), n = 0, 1, ..., n, there exist exactly n zeros in [−1, 1] that are real, distinct
from each other Hadian Rasanan et al. (2020). If they are regarded as dividing the
interval [−1, 1] into n + 1 sub-intervals, each sub-interval will contain exactly one
zero of Pn+1 Hadian Rasanan et al. (2020). Also, Pn (x) has n − 1 local minimum
and maximum in (−1, 1) Hadian Rasanan et al. (2020). On the other hand, these
polynomials have the following basic properties:
4 Fractional Legendre Kernel Functions: Theory and Application 75
Pn (1) = 1, (4.12)
1, n = 2m
Pn (−1) = , (4.13)
−1, n = 2m + 1
(−1)m 2m (−1)m (2m)!
= 22m (m!)2
, n = 2m
Pn (0) = 4m m . (4.14)
0, n = 2m + 1
Above all, to shift the Legendre polynomials from [−1, 1] to [a, b], we should
use the following transformation:
2t − a − b
x= , (4.15)
b−a
where x ∈ [−1, 1] and t ∈ [a, b]. It is worth mentioning that all the properties of
the Legendre polynomials remain unchanged for shifted Legendre polynomials with
this difference that the position of −1 transfers to a and the position of 1 transfers
to b.
Legendre polynomials of fractional order of α (called F Pnα (x)) over the finite interval
[a, b] by use of mapping x = 2( b−a
x−a α
) − 1, (α > 0) that x ∈ [−1, 1], are defined
as Kazem et al. (2013)
x −a α
F Pnα (x) = Pn (x ) = Pn (2( ) − 1). (4.16)
b−a
Also, the generating function for fractional Legendre is defined in the same way as
the generating function for Legendre, with the difference that in this function x is
x−a α
defined as (x = 2( b−a ) − 1) in the interval [a, b] Shen (1994), Kazem et al. (2013),
Hadian Rasanan et al. (2020), Rad et al. (2014), Asghari et al. (2022):
∞
∞
1 x −a α
= F Pnα (x)t n = Pn (2( ) − 1)t n . (4.17)
x−a α
1 − 2t ( b−a ) − 1 + t2 n=0 n=0
b−a
On the other hand, the fractional Legendre function with weight function w(x) =
x α−1 , like the Legendre polynomial, has the property of orthogonality Hadian
Rasanan et al. (2020), this property, for example, over the interval [0, 1] is easily
defined as follows:
76 A. Azmoon et al.
1
1
F Pnα (x)F Pmα (x)d x = δnm , (4.18)
0 (2n + 1)α
F P0α (x) = 1,
x −a α
F P1α (x) = 2( ) − 1,
b−a
α 2n + 1 x −a α n
F Pn+1 (x) = ( )(2( ) − 1)F Pnα (x ) − ( α
)F Pn−1 (x ).
n+1 b−a n+1
(4.19)
With this in mind, the first few fractional Legendre polynomials order in the
explicit form are
F P0α (x) = 1,
x −a α
F P1α (x) = 2( ) − 1,
b−a
x − a 2α x −a α
F P2α (x) = 6( ) − 6( ) + 1,
b−a b−a
x − a 3α x − a 2α x −a α
F P3α (x) = 20( ) − 30( ) + 12( ) − 1,
b−a b−a b−a
x − a 4α x − a 3α x − a 2α
F P4α (x) = 70( ) − 140( ) + 90( )
b−a b−a b−a
x −a α
−20( ) + 1,
b−a
x − a 5α x − a 4α x − a 3α
F P5α (x) = 252( ) − 630( ) + 560( )
b−a b−a b−a
x − a 2α x −a α
−210( ) + 30( ) − 1,
b−a b−a
x − a 6α x − a 5α x − a 4α
F P6α (x) = 924( ) − 2772( ) + 3150( )
b−a b−a b−a
x − a 3α x − a 2α x −a α
−1680( ) + 420( ) − 42( ) + 1. (4.20)
b−a b−a b−a
Interested readers can use the following Python code to generate any order of Leg-
endre polynomials of fractional order symbolically
4 Fractional Legendre Kernel Functions: Theory and Application 77
Fig. 4.2 The first six order of fractional Legendre function over the finite interval [0, 5] where
α = 0.5
Program Code
import sympy
x = sympy.Symbol("x")
alpha = sympy.Symbol(r’\alpha’)
a = sympy.Symbol("a")
b = sympy.Symbol("b")
x=sympy.sympify(2*((x-a)/(b-a))**alpha -1)
sympy.simplify(FLn(x,3))
x−a α
> 20( b−a ) − 30( b−a
x−a 3α
) + 12( b−a
x−a 2α
) −1
Figure 4.2 shows the fractional Legendre functions of the first kind up to the sixth
order where a = 0, b = 5, and α is 0.5.
78 A. Azmoon et al.
Fig. 4.3 Fifth order of fractional Legendre function over the finite interval [0, 5] with different α
Also, Fig. 4.3 depicts the fractional Legendre functions of the first kind of order
5 for different values of α.
In this section, the ordinary Legendre polynomial kernel formula is presented first.
Then, some other versions of this kernel function will be introduced, and at the end,
the fractional Legendre polynomial kernel will be introduced in this section.
Legendre polynomials, like other polynomials, are orthogonal except that Legendre
polynomial weight function is a constant function of value 1 (w(x) = 1). So accord-
ing to Eqs. 4.2 and 4.3 for different m and n, Legendre polynomials are orthogonal
to each other with weight function 1. This enables us to construct the Legendre ker-
nel without denominators. Therefore, for scalar inputs x and z, Legendre kernel is
defined as follows:
4 Fractional Legendre Kernel Functions: Theory and Application 79
n
K (x, z) = Pi (x)Pi (z) =< φ(x), φ(z) >, (4.21)
i=0
where <, > is an inner product and the unique parameter n is the highest order of the
Legendre polynomials, and the nonlinear mapping determined by Legendre kernel
is
φ(x) = (P0 (x), P1 (x), ..., Pn (x)) ∈ Rn+1 . (4.22)
Now, with this formula, it can be seen that the φ(x) values are orthogonal to each
other, indicating that the Legendre kernel is also orthogonal Shawe-Taylor and Cris-
tianini (2004).
Theorem 4.1 (Pan et al. (2012)) Legendre kernel is a valid Mercer kernel.
Proof According to the Mercer theorem introduced at (Sect. 2.2.1), a valid kernel
function that needs to be positive semi-definite, or equivalently should satisfy the
necessary and sufficient conditions of Mercer’s theorem. As Mercer theorem states
any SVM kernel to be a valid kernel should be non-negative, in a precise way:
K (x, z)w(x, z) f (x) f (z)d xdz ≥ 0, (4.23)
where
n
K (x, z) = Pi (x)PiT (z), (4.24)
i=0
K (x, z) = 1 + x z,
9(x z)2 − 3x 2 − 3z 2 + 1
K (x, z) = 1 + x z + ,
4
9(x z)2 − 3x 2 − 3z 2 + 1 25(x z)3 − 15x 3 z − 15x z 3 + 9x z
K (x, z) = 1 + x z + + .
4 4
(4.26)
This kernel can be expanded and specified as the following for vector inputs x, z ∈
Rd :
d d
n
K (x, z) = K j (x j , z j ) = Pi (x j )Pi (z j ), (4.27)
j=1 j=1 i=0
2(xiold − Min i )
xinew = − 1, (4.28)
Maxi − Min i
where xi is the i-th feature of the vector x, Maxi and Min i are the minimum and
maximum values along the i-th dimensions of all the training and test data, respec-
tively.
Apart from the ordinary form of the Legendre kernel function, other kernels with
unique properties and special weight functions are defined on Legendre polynomials,
some of which are introduced in this section.
Ozer et al. (2011) applied kernel functions onto vector inputs directly instead of
applying them to each input element. In fact, the generalized Legendre kernel by
generalized Legendre polynomial was proposed as follows Tian and Wang (2017):
n
K G−Legendr e (x, z) = Pi (x)PiT (z), (4.29)
i=0
4 Fractional Legendre Kernel Functions: Theory and Application 81
where
P0 (x) = 1,
P1 (x) = x,
2n − 1 T n−1
Pn (x) = x Pn−1 (x) − Pn−2 (x). (4.30)
n n
Regarding Chebyshev generalized kernels, it should be noted that these kernels have
more complex expressions than generalized Legendre kernels due to their more com-
plex weight function, so generalized Chebyshev kernels can depict more abundant
nonlinear information.
The exponentially modified orthogonal polynomial kernels are actually the product
of the well-known Gaussian kernel and the corresponding generalized orthogonal
polynomial kernels (without weighting function). Since an exponential function (the
Gaussian kernel) can capture local information along the decision surface better
than the square root function, Ozer et al. (2011) replaced the weighting function
with Gaussian kernel ( exp γ 1x−z 2 ), and defined the exponentially modified Legendre
kernels as Tian and Wang (2017)
n
Pi (x)PiT (z)
K ex p−Legendr e (x, z) = i=0
. (4.31)
exp γ x − z 2
It should be noted that the modified generalized orthogonal polynomial kernels can
be seen as semi-local kernels. Also, having two parameters, n and γ > 0, makes the
optimization of these two kernels more difficult to exploit than that of the generalized
orthogonal polynomial kernels Tian and Wang (2017).
The Triangular kernel, which is basically an affine function of the Euclidean distance
(d(i, j)) between the points in the original space, is expressed as Fleuret and Sahbi
(2003)
x−z
K (x, z) = (1 − )+ , (4.32)
λ
where the ()+ forces this mapping to be positive and ensures this expression to be
a kernel. Therefore, the triangularly modified Legendre kernels can be written as
follows Tian and Wang (2017), Belanche Muñoz (2013); Fleuret and Sahbi (2003):
82 A. Azmoon et al.
x −z n
K T ri−Legendr e (x, z) = (1 − )+ Pi (x)PiT (z), (4.33)
λ i=0
N
where λ = max{d(xi , x̄)|x̄ = N1 i=1 xi , xi ∈ X }, X is a finite sample set, and N is
the number of samples. Thus, all the data live in a ball of radius λ. Since the parameter
λ only depends on the input data, the triangularly modified orthogonal polynomial
kernels have a unique parameter chosen from a small set of integers.
Given that the fractional weight function is ( f w(x, z) = (x z T )α−1 ) and similar to
the Legendre kernel, the corresponding fractional forms are introduced as
m
n
K F Legendr e (X, Z ) = F Piα (x x j )F Piα (x z j ) f w(x x j , x z j ), (4.34)
j=1 i=0
where
n
K (x, z) = F Piα (x x )F Piα (x z )T f w(x x j , x z j ), (4.36)
i=0
n
= F Piα (x x )F Piα (x z )T f w(x x j , x z j ) f (x) f (z)d xdz,
i=0
n
= F Piα (x x )F Piα (x z )T (x x x zT )α−1 f (x) f (z)d xdz,
i=0
n
= F Piα (x x )(x x )α−1 f (x)d x F Piα (x z )T (x zT )α−1 f (z)dz ,
i=0
n
= F Piα (x x )(x x )α−1 f (x)d x F Piα (x z )T (x zT )α−1 f (z)dz ,
i=0
0. (4.37)
In this section, the application of the ordinary Legendre kernel and the fractional
Legendre kernel is shown in SVM. Also, the obtained results on two real datasets
are compared with the results of RBF kernels, ordinary polynomial kernel, ordinary
Chebyshev kernel, and fractional Chebyshev kernel.
The Spiral dataset has already been introduced in the previous chapter. In this section,
the Legendre kernel and the fractional Legendre kernel are used to classify spiral
datasets using SVM. As we mentioned before, this multi-class classification dataset
can be split into three binary classification datasets. It was also previously explained
(Fig. 3.5), by transferring data to the fractional space using the α coefficient (0.1 ≤
α ≤ 0.9), the data density gradually decreases with decreasing alpha value, and the
central axis of data density from the point (0, 0) is transferred to point (1, 1). Although
it seems that data points are conglomerated on one corner of the axis on a 2D plot,
Fig. 4.4, this is not necessarily what happens in 3D space when the kernel function
is applied, Fig. 4.5.
Using the data transferred to the fractional space, the decision boundaries are found
for the problem with the above-mentioned divisions with the help of the Legendre
and fractional Legendre kernel functions.
In order to have an intuition of how the decision boundaries are different, we can
see Fig. 4.6, which depicts corresponding Legendre classifiers of different orders of
84 A. Azmoon et al.
3, 4, 5 and 6 on the original Spiral dataset, where the binary classification of 1-vs-2, 3
is chosen. From these figures, it can be seen that the boundaries of decision-making
are becoming more twisted.
The complexity of the decision boundary even gets worse in fractional space.
Figure 4.7 demonstrates the decision boundary of the fractional Legendre classifier
with different orders of 3, 4, 5, and 6, where α = 0.4. Again there is a complicated
decision boundary as one approach from order 3 to order 6.1
Each decision boundary or decision surface in 3D space classifies the data points
in a specific form. As the decision boundary or surface of a kernel function for each
order is fixed and determined, it is the degree of correlation between that specific
shape and the data points that determine the final accuracy.
The experiment results are summarized in the following tables. In particular, Table
4.2 summarizes the results of class 1-vs-{2, 3}. As it can be seen, the fractional
Legendre kernel outperforms other kernels, and the fractional Chebyshev kernel
has the second best accuracy. Also, the classification accuracy of the mentioned
kernels on class 2-vs-{1, 3} is summarized in Table 4.3, where the RBF kernel has
the best accuracy score. Finally, Table 4.4 is the classification accuracy scores on
class 3-vs-{1, 2}, in which the fractional Legendre kernel has the best performance.
1Based on many studies, it was concluded that the fractional Legendre kernel with order 3 is not a
suitable option for use on the spiral dataset.
4 Fractional Legendre Kernel Functions: Theory and Application 85
As another case example, the three Monks’ problem is considered here (see Chap. 3
for more information about the dataset). The Legendre kernel (i.e., Eq. 4.21) and the
fractional Legendre kernel (i.e., Eq. 4.34) are applied to the datasets from the three
Monks’ problem. Table 4.5 illustrates the output accuracy of each model on the first
problem of Monk’s every model, where the RBF kernel has the best accuracy by
σ ≈ 2.844 at 1 and the Chebyshev kernel has the worst among them at 0.8472.
Table 4.6 also shows the output accuracy of each model on the second problem
of Monk’s each model, with fractional Legendre kernel having the best accuracy by
α ≈ 0.8 at 1 and Legendre kernel having the worst accuracy of 0.8032.
Finally, Table 4.7 illustrates the output accuracy of each model on the third prob-
lem of Monk’s each model, where fractional Chebyshev kernel by α ≈ 16 1
and RBF
kernel has the best accuracies at 0.91 and fractional Legendre kernel has the worst
among them at 0.8379.
86 A. Azmoon et al.
Fig. 4.7 Fractional Legendre kernel functions of orders 3, 4, 5, and 6 on Spiral dataset with α = 0.4
Table 4.2 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Legendre, and frac-
tional Legendre accuracy scores on 1-vs-{2,3} Spiral dataset. It is clear that the Fractional Legendre
kernel outperforms other kernels
Sigma Power Order Alpha(α) Lambda(λ) Accuracy
RBF 0.73 – – – – 0.97
Polynomial – 8 – – – 0.9533
Chebyshev – – 5 - – 0.9667
Fractional – – 3 0.3 – 0.9733
Chebyshev
Legendre – – 7 – – 0.9706
Fractional – – 7 0.4 – 0.9986
Legendre
4 Fractional Legendre Kernel Functions: Theory and Application 87
Table 4.3 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Legendre, and frac-
tional Legendre accuracy scores on 2-vs-{1,3} Spiral dataset. It is clear that the RBF kernel outper-
forms other kernels
Sigma Power Order Alpha(α) Lambda(λ) Accuracy
RBF 0.1 – – – – 0.9867
Polynomial – 5 – – – 0.9044
Chebyshev – – 6 – – 0.9289
Fractional – – 6 0.8 - 0.9344
Chebyshev
Legendre – – 8 – – 0.9773
Fractional – – 8 0.4 – 0.9853
Legendre
Table 4.4 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Legendre, and frac-
tional Legendre accuracy scores on 3-vs-{1,2} Spiral dataset. It is clear that the Fractional Legendre
and polynomials kernels are better than others
Sigma Power Order Alpha(α) Lambda(λ) Accuracy
RBF 0.73 – – – – 0.98556
Polynomial – 5 – – – 0.98556
Chebyshev – – 6 – – 0.9622
Fractional – – 6 0.6 – 0.9578
Chebyshev
Legendre – – 7 – – 0.9066
Fractional – – 5 0.4 – 0.9906
Legendre
Table 4.5 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Legendre, and frac-
tional kernels on Monks’ first problem
Sigma Power Order Alpha(α) Lambda(λ) Accuracy
RBF 2.844 – – – – 0.8819
Polynomial – 3 – – – 0.8681
Chebyshev – – 3 – – 0.8472
Fractional – – 3 1/16 – 0.8588
Chebyshev
Legendre – – 4 – – 0.8333
Fractional – – 4 0.1 – 0.8518
Legendre
88 A. Azmoon et al.
Table 4.6 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Legendre, and frac-
tional kernels on Monks’ second problem
Sigma Power Order Alpha(α) Lambda(λ) Accuracy
RBF 5.5896 – – – – 0.875
Polynomial – 3 – – – 0.8657
Chebyshev – – 3 – – 0.8426
Fractional – – 3 1/16 – 0.9653
Chebyshev
Legendre – – 3 – – 0.8032
Fractional – – 3 0.1 – 1
Legendre
Table 4.7 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Legendre, and frac-
tional kernels on Monks’ third problem.
Sigma Power Order Alpha(α) Lambda(λ) Accuracy
RBF 2.1586 – – – – 0.91
Polynomial – 3 – – – 0.875
Chebyshev – - 6 – – 0.895
Fractional – – 5 1/5 – 0.91
Chebyshev
Legendre – – 4 – – 0.8472
Fractional – – 3 0.8 – 0.8379
Legendre
4.5 Conclusion
In this chapter, the Legendre polynomial kernel in fractional order for support vector
machines is introduced and presented by using the ordinary Legendre polynomial
kernel. The Legendre polynomial kernel can extract good properties from data due
to the orthogonality of elements in the feature vector, thus reducing data redundancy.
Also, based on the results of the experiment on two datasets in the previous section, it
can be shown that SVM with Legendre and fractional Legendre kernels can separate
nonlinear data well.
4 Fractional Legendre Kernel Functions: Theory and Application 89
References
Afifi, A., Zanaty, EA.: Generalized legendre polynomials for support vector machines (SVMS)
classification. Int. J. Netw. Secur. Appl. (IJNSA) 11, 87–104 (2019)
Asghari, M., Hadian Rasanan, A.H., Gorgin, S., Rahmati, D., Parand, K.: FPGA-orthopoly: a
hardware implementation of orthogonal polynomials. Eng. Comput. (2022). https://doi.org/10.
1007/s00366-022-01612-x
Belanche Muñoz, L. A.: Developments in kernel design. In ESANN 2013 Proceedings: European
Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning,
pp. 369–378 (2013)
Benouini, R., Batioua, I., Zenkouar, Kh., Mrabti, F.: New set of generalized Legendre moment
invariants for pattern recognition. Pattern Recognit. Lett. 123, 39–46 (2019)
Bhrawy, A.H., Abdelkawy, M.A., Machado, J.T., Amin, A.Z.M.: Legendre-Gauss-Lobatto collo-
cation method for solving multi-dimensional Fredholm integral equations. Comput. Math. Appl.
4, 1–13 (2016)
Bhrawy, A.H., Doha, E.H., Ezz-Eldien, S.S., Abdelkawy, M.A.: A numerical technique based on
the shifted Legendre polynomials for solving the time-fractional coupled KdV equations. Calcolo
53, 1–17 (2016)
Chang, P., Isah, A.: Legendre Wavelet Operational Matrix of fractional Derivative through wavelet-
polynomial transformation and its Applications in Solving Fractional Order Brusselator system.
J. Phys.: Conf. Ser. 693 (2016)
Chang, R.Y., Wang, M.L.: Model reduction and control system design by shifted Legendre poly-
nomial functions. J. Dyn. Syst. Meas. Control 105, 52–55 (1983)
Chang, R.Y., Wang, M.L.: Optimal control of linear distributed parameter systems by shifted Leg-
endre polynomial functions. J. Dyn. Syst. Meas. Control 105, 222–226 (1983)
Chang, R.Y., Wang, M.L.: Shifted Legendre function approximation of differential equations; appli-
cation to crystallization processes. Comput. Chem. Eng. 8, 117–125 (1984)
Dahmen, S., Morched, B.A., Mohamed Hédi, B.G.: Investigation of the coupled Lamb waves prop-
agation in viscoelastic and anisotropic multilayer composites by Legendre polynomial method.
Compos. Struct. 153, 557–568 (2016)
Dash, R., Dash, P. K.: MDHS-LPNN: a hybrid FOREX predictor model using a Legendre polynomial
neural network with a modified differential harmony search technique. Handbook of Neural
Computation, pp. 459–486. Academic Press (2017)
Dash, R.: Performance analysis of an evolutionary recurrent Legendre Polynomial Neural Network
in application to FOREX prediction. J. King Saud Univ.—Comput. Inf. Sci. 32, 1000–1011 (2020)
Doman, B.G.S.: The Classical Orthogonal Polynomials. World Scientific, Singapore (2015)
Ezz-Eldien, S.S., Doha, E.H., Baleanu, D., Bhrawy, A.H.: A numerical approach based on Legendre
orthonormal polynomials for numerical solutions of fractional optimal control problems. J VIB
Control 23, 16–30 (2017)
Fleuret, F., Sahbi, H.: Scale-invariance of support vector machines based on the triangular kernel.
In: 3rd International Workshop on Statistical and Computational Theories of Vision (2003)
Fleuret, F., Sahbi, H.: Scale-invariance of support vector machines based on the triangular kernel.
In: 3rd International Workshop on Statistical and Computational Theories of Vision, pp. 1–13
(2003)
Gao, J., Lyu, Y., Zheng, M., Liu, M., Liu, H., Wu, B., He, C.: Application of Legendre orthogonal
polynomial method in calculating reflection and transmission coefficients of multilayer plates.
Wave Motion 84, 32–45 (2019)
Gao, J., Lyu, Y., Zheng, M., Liu, M., Liu, H., Wu, B., He, C.: Application of state vector formalism
and Legendre polynomial hybrid method in the longitudinal guided wave propagation analysis
of composite multi-layered pipes. Wave Motion 100, 102670 (2021)
Hadian Rasanan, A.H., Rahmati, D., Gorgin, S., Parand, K.: A single layer fractional orthogonal
neural network for solving various types of Lane-Emden equation. New Astron. 75, 101307
(2020)
90 A. Azmoon et al.
Hadian Rasanan, A.H., Bajalan, N., Parand, K., Rad, J.A.: Simulation of nonlinear fractional dynam-
ics arising in the modeling of cognitive decision making using a new fractional neural network.
Math. Methods Appl. Sci. 43, 1437–1466 (2020)
Haitjema, H.: Surface profile and topography filtering by Legendre polynomials. Surf. Topogr. 9,
15–17 (2021)
HWANG, C., Muh-Yang, C.: Analysis and optimal control of time-varying linear systems via shifted
Legendre polynomials. Int. J. Control 41, 1317–1330 (1985)
Kaghashvili, E.K., Zank, G.P., Lu, J.Y., Dröge, W. : Transport of energetic charged particles. Part
2. Small-angle scattering. J. Plasma Phys. 70, 505–532 (2004)
Kazem, S., Shaban, M., Rad, J.A.: Solution of the coupled Burgers equation based on operational
matrices of d-dimensional orthogonal functions. Zeitschrift für Naturforschung A 67, 267–274
(2012)
Kazem, S., Abbasbandy, S., Kumar, S.: Fractional-order Legendre functions for solving fractional-
order differential equations. Appl. Math. Model. 37, 5498–5510 (2013)
Lamb, G.L., Jr.: Introductory Applications of Partial Differential Equations: with Emphasis on Wave
Propagation and Diffusion. Wiley, Amsterdam (2011)
Holdeman, J.H., Jr., Jonas, T., Legendre polynomial expansions of hypergeometric functions with
applications: J Math Phys 11, 114–117 (1970)
Mall, S., Chakraverty, S.: Application of Legendre neural network for solving ordinary differential
equations. Appl. Soft Comput. 43, 347–356 (2016)
Marianela, P., Gómez, J.C.: Legendre polynomials based feature extraction for online signature
verification. Consistency analysis of feature combinations. Pattern Recognit. 47, 128–140 (2014)
Moayeri, M.M., Rad, J.A., Parand, K.: Dynamical behavior of reaction-diffusion neural networks
and their synchronization arising in modeling epileptic seizure: A numerical simulation study.
Comput. Math. with Appl. 80, 1887–1927 (2020)
Mohammadi, F., Hosseini, M.M.: A new Legendre wavelet operational matrix of derivative and its
applications in solving the singular ordinary differential equations. J. Franklin Inst. 348, 1787–
1796 (2011)
N Parand, K., Delafkar, Z., Rad, J. A., Kazem S.: Numerical study on wall temperature and surface
heat flux natural convection equations arising in porous media by rational Legendre pseudo-
spectral approach. Int. J. Nonlinear Sci 9, 1–12 (2010)
Olver, F.W.J., Lozier, D.W., Boisvert, R.F., Clark, C.W.: NIST Handbook of Mathematical Functions
Hardback and CD-ROM. Cambridge University Press, Singapore (2010)
Ozer, S., Chi, H., Chen, Hakan, A., Cirpan.: A set of new Chebyshev kernel functions for support
vector machine pattern classification. Pattern Recognit. 44, 1435–1447 (2011)
Pan, Z.B., Chen, H., You, X.H.: Support vector machine with orthogonal Legendre kernel. In:
International Conference on Wavelet Analysis and Pattern Recognition, pp. 125–130. IEEE (2012)
Parand, K., Razzaghi, M.: Rational Legendre approximation for solving some physical problems
on semi-infinite intervals. Phys. Scr. 69, 353 (2004)
Parand, K., Shahini, M., Dehghan, M.: Rational Legendre pseudospectral approach for solving
nonlinear differential equations of Lane-Emden type. J. Comput. Phys. 228, 8830–8840 (2009)
Qian, C.B., Tianshu, L., Jinsong, L. H. Liu, Z.: Synchrophasor estimation algorithm using Legendre
polynomials. IEEE PES General Meeting Conference and Exposition (2014)
Rad, J.A., Kazem, S., Shaban, M., Parand, K., Yildirim, A.H.M.E.T.: Numerical solution of frac-
tional differential equations with a Tau method based on Legendre and Bernstein polynomials.
Math. Methods Appl. Sci. 37, 329–342 (2014)
Saadatmandi, A., Dehghan, M.: A new operational matrix for solving fractional-order differential
equations. Comput. Math. Appl. 59, 1326–1336 (2010)
Sánchez-Ruiz, J., Dehesa, J.S.: Expansions in series of orthogonal hypergeometric polynomials. J.
Comput. Appl. Math. 89, 155–170 (1998)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press,
Cambridge (2004)
4 Fractional Legendre Kernel Functions: Theory and Application 91
Shen, J.: Efficient spectral-Galerkin method I. Direct solvers of second-and fourth-order equations
using Legendre polynomials. SISC 15, 1489–1505 (1994)
Spencer, L.V.: Calculation of peaked angular distributions from Legendre polynomial expansions
and an application to the multiple scattering of charged particles. Phys. Rev. 90, 146–150 (1953)
Tian, M., Wang, W.: Some sets of orthogonal polynomial kernel functions. Appl. Soft Comput. 61,
742–756 (2017)
Voelker, A., Kajić, I., Eliasmith, C.: Legendre memory units: Continuous-time representation in
recurrent neural networks. Adv. Neural Inf. Process. Syst. 32 (2019)
Zeghdane, R.: Numerical approach for solving nonlinear stochastic Itô-Volterra integral equations
using shifted Legendre polynomials. Int. J. Dyn. Syst. Diff. Eqs. 11, 69–88 (2021)
Zheng, M., He, C., Lyu, Y., Wu, B.: Guided waves propagation in anisotropic hollow cylinders by
Legendre polynomial solution based on state-vector formalism. Compos. Struct. 207, 645–657
(2019)
Chapter 5
Fractional Gegenbauer Kernel
Functions: Theory and Application
Abstract Because of the usage of many functions as a kernel, the support vec-
tor machine method has demonstrated remarkable versatility in tackling numerous
machine learning issues. Gegenbauer polynomials, like the Chebyshev and Legender
polynomials which are introduced in previous chapters, are among the most com-
monly utilized orthogonal polynomials that have produced outstanding results in the
support vector machine method. In this chapter, some essential properties of Gegen-
bauer and fractional Gegenbauer functions are presented and reviewed, followed
by the kernels of these functions, which are introduced and validated. Finally, the
performance of these functions in addressing two issues (two example datasets) is
evaluated.
5.1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 93
J. A. Rad et al. (eds.), Learning with Fractional Orthogonal Kernel Classifiers in Support
Vector Machines, Industrial and Applied Mathematics,
https://doi.org/10.1007/978-981-19-6553-1_5
94 S. Nedaei Janbesaraei et al.
dimensions (Avery 2012). Spherical harmonics are special functions defined on the
surface of a sphere. They are often being used in applied mathematics for solving dif-
ferential equations (Doha 1998; Elliott 1960). In recent years, the classical orthogonal
polynomials, especially the Gegenbauer orthogonal polynomials, have been used to
address pattern recognition (Liao et al. 2002; Liao and Chen 2013; Herrera-Acosta
et al. 2020), classification (Hjouji et al. 2021; Eassa et al. 2022), and kernel-based
learning in many fields such as physic (Ludlow and Everitt 1995), medical image
processing (Arfaoui et al. 2020; Stier et al. 2021; Öztürk et al. 2020), electronic
(Tamandani and Alijani 2022), and other basic fields (Soufivand et al. 2021; Park
2009; Feng and Varshney 2021). In 2006, Pawlak (2006) showed the benefits of the
image reconstruction process based on Gegenbauer polynomials and corresponding
moments. Then in the years that followed, Hosny (2011) used the high compu-
tational requirements of Gegenbauer moments to propose novel methods for image
analysis and recognition, which in these studies, fractional-order shifted Gegenbauer
moments (Hosny et al. 2020) and Gegenbauer moment Invariants (Hosny 2014) are
seen. Also in 2014, Abd Elaziz et al. (2019) used artificial bee colony based on
orthogonal Gegenbauer moments to be able to classify galaxies images with the help
of support vector machines. In 2009, with the help of the orthogonality property of
the Gegenbauer polynomial, Langley and Zhao (2009) introduced a new 3D phase
unwrapping algorithm for the analysis of magnetic resonance imaging (MRI). Also,
in 1995, Ludlow studied the application of Gegenbauer polynomials to the emission
of light from spheres (Ludlow and Everitt 1995). Clifford polynomials are partic-
ularly well suited as kernel functions for a higher dimensional continuous wavelet
transform. In 2004, Brackx et al. (2004) constructed Clifford–Gegenbauer and gener-
alized Clifford–Gegenbauer polynomials as new specific wavelet kernel functions for
a higher dimensional continuous wavelet transform. Then, Ilić and Pavlović (2011)
in 2011, with the use Christoffel–Darboux formula for Gegenbauer orthogonal poly-
nomials, present the filter function solution that exhibits optimum amplitude as well
as optimum group delay characteristics.
Flexibility in using different kernels in the SVM algorithm is one of the rea-
sons why orthogonal classical polynomials have recently been used as kernels. The
Gegenbauer polynomial is one of those that have been able to show acceptable results
in this area. In 2018, Padierna et al. (2018) introduced a formulation of orthogonal
polynomial kernel functions for SVM classification. Also in 2020, he (Padierna et al.
2020) used this kernel to classify peripheral arterial disease in patients with type 2
diabetes. In addition to using orthogonal polynomials as kernels in SVM to solve
classification problems, these polynomials can be used as kernels in support vector
regression (SVR) to solve regression problems as well as help examine time series
problems. In 1992, Azari et al. (1992) proposed a corresponding ultra-spherical ker-
nel. Also, in 2019, Feng suggests using extended support vector regression (X-SVR),
a new machine-learning-based metamodel, for the reliability study of dynamic sys-
tems using first-passage theory. Furthermore, the power of X-SVR is strengthened
by a novel kernel function built from the vectorized Gegenbauer polynomial, specif-
ically for handling complicated engineering problems (Feng et al. 2019). On the
other hand, in 2001, Ferrara and Guégan (2001) dealt with the k-factor extension
5 Fractional Gegenbauer Kernel Functions: Theory and Application 95
of the long memory Gegenbauer process, in which this model investigated the pre-
dictive ability of the k-factor Gegenbauer model on real data of urban transport
traffic in the Paris area. Other applications of orthogonal Gegenbauer polynomi-
als include their effectiveness in building and developing neural networks. In 2018,
Zhang et al. (2018) investigated and constructed a two-input Gegenbauer orthogonal
neural network (TIGONN) using probability theory, polynomial interpolation, and
approximation theory to avoid the inherent problems of the back-propagation (BP)
training algorithm. Then, In 2019, a novel type of neural network based on Gegen-
bauer orthogonal polynomials, termed as GNN, was constructed and investigated
(He et al. 2019). This model could overcome the computational robustness problems
of extreme learning machines (ELM), while still having comparable structural sim-
plicity and approximation capability (He et al. 2019). In addition to the applications
mentioned here, Table 5.1 can be seen for some applications for different kinds of
these polynomials.
aFor more information about L.B. Gegenbauer and his contribution, please see: https://
mathshistory.st-andrews.ac.uk/Biographies/Gegenbauer/.
compared with similar results from the previous chapters. Finally, in Sect. 5.5, the
conclusion remarks will summarize the whole chapter.
5.2 Preliminaries
In this section, the basics of Gegenbauer polynomials have been covered. These
polynomials have been defined using the related differential equation and, in the
5 Fractional Gegenbauer Kernel Functions: Theory and Application 97
following, the properties of Gegenbauer polynomials have been introduced, and also
the fractional form of them has been defined besides their properties too.
Gegenbauer polynomials of degree n, G λn (x) and order λ > − 21 are solutions to the
following Sturm–Liouville differential equation (Doman 2015; Padierna et al. 2018;
Asghari et al. 2022):
d2 y dy
(1 − x 2 ) − (2λ + 1)x + n(n + 2λ)y = 0, (5.1)
dx2 dx
where n is a positive integer and λ is a real number and greater than −0.5. Gegenbauer
polynomials are orthogonal on the interval [−1, 1] with respect to the weight function
(Padierna et al. 2018; Parand et al. 2018; Asghari et al. 2022):
w(x) = (1 − x 2 )λ− 2 .
1
(5.2)
Therefore, the orthogonality relation is defined as (Ludlow and Everitt 1995; Parand
et al. 2018; Hadian-Rasanan et al. 2019; Asghari et al. 2022)
1
π 21−2λ (n + 2λ)
G λn (x)G λm (x)w(x)d x = δnm , (5.3)
−1 n!(n + λ)((λ))2
where δnm is the Kronecker delta function (El-Kalaawy et al. 2018). The standard
Gegenbauer polynomial G (λ)
n (x) can be defined also as follows (Parand et al. 2018;
Hadian-Rasanan et al. 2019):
2
n
(n + λ − j)
G λn (x) = (−1) j (2x)n−2 j , (5.4)
n=1
j!(n − 2 j)!(λ)
1
G λn (x, z) = . (5.5)
(1 − 2x z + z 2 )λ
It can be shown that it is true for |z| < 1, |x| 1, λ > − 21 (Cohl 2013; Doman 2015).
Considering for fixed x the function is holomorphic in |z| < 1, it can be expanded
in Taylor series (Reimer 2013):
98 S. Nedaei Janbesaraei et al.
∞
G λn (x, z) = G λn (x)z n . (5.6)
n=0
(n + 2)G λn+2 (x) = 2(λ + n + 1)x G λn+1 (x) − (2λ + n)G λn (x), (5.8)
λ+1(x)
nG λn (x) = 2λ{x G λ+1
n−1 (x) − G n−2 }, (5.9)
λ+1
(n + 2λ)G λn (x) = 2λ{G λ+1
n (x) − x G n−1 (x)}, (5.10)
nG λn (x) = (n − 1 + 2λ)x G λn−1 (x) − 2λ(1 −x 2
)G λ−1
n−2 (x), (5.11)
d λ
G = 2λG λ+1
n+1 . (5.12)
dx n
Everyone who has tried to generate Gegenbauer polynomials has experienced
some difficulties as the number of terms rises at each higher order, for example, for
the orders of zero to four, in the following:
G λ0 (x) = 1,
G λ1 (x) = 2λx,
G λ2 (x) = (2λ2 + 2λ)x 2 − λ,
λ 4 3 8
G 3 (x) = λ + 4λ + λ x 3 + (−2λ2 − 2λ)x,
2
3 3
λ 2 22 2
G 4 (x) = λ + 4λ + λ + 4λ x 4 + (−2λ3 − 6λ2 − 4λ)x 2 ,
4 3
3 3
(5.13)
which are depicted in Figs. 5.1, 5.2, and there will be more terms in higher orders
subsequently. To overcome this difficulty Pochhammer polynomials have already
been used to simplify (Doman 2015). The Pochhammer polynomials or the rising
factorial is defined as
n
x (n) = x(x + 1)(x + 2) · · · (x + n − 1) = (x + k − 1). (5.14)
k=1
5 Fractional Gegenbauer Kernel Functions: Theory and Application 99
x (0) = 1,
x (1) = x,
x (2) = x(x + 1) = x 2 + x,
x (3) = x(x + 1)(x + 2) = x 3 + 3x 2 + 2x,
x (4) = x(x + 1)(x + 2)(x + 3) = x 4 + 6x 3 + 11x 2 + 6x,
x (5) = x 5 + 10x 4 + 35x 3 + 50x 2 + 24x,
x (6) = x 6 + 15x 5 + 85x 4 + 225x 3 + 274x 2 + 120x. (5.15)
G λ0 (x) = 1,
G λ1 (x) = 2λx,
G λ2 (x) = a (2) 2x 2 − λ,
a (3) 4x 3
G λ3 (x) = − 2a (2) x,
3
(4)
a 2x 4
G λ4 (x) = − a (3) 2x 2 + a (2) ,
3
(5)
a 4x 5 a (4) 4x 3
G λ5 (x) = − + a (3) x,
15 3
(6)
a 4x 6 a (5) 2x 4 a (4) x 2
G λ6 (x) = − + − a (3) , (5.16)
45 3 2
Program Code
import sympy
x = sympy.Symbol("x")
lambd = sympy.Symbol (r’\lambda’)
return (sympy.Rational(1,n)*(2*x*(n+lambd-1)*Gn(x,n-1)\
-(n+2*lambd-2)*Gn(x,n-2)))
Program Code
sympy.expand(sympy.simplify(Gn(x,2)))
> 2x 2 λ2 + 2x 2 λ − λ
xnk (λ), k = 1, · · · , n and are enumerated in decreasing order 1 > xn1 (λ) > xn2 >
· · · > xnn (λ) > −1 (Reimer 2013). The zeros of G λn (x) and G λm (x), m > n separate
each other and between any two zeros of G λn (x), there is at least one zero (Olver et al.
2010).
Gegenbauer polynomials follow the symmetry same as other classical orthogonal
polynomials. Therefore, Gegenbauer polynomials of even order have even symmetry
and contain only even powers of x and similarly odd orders of Gegenbauer polyno-
mials have odd symmetry and contain only odd powers of x (Olver et al. 2010).
λ n λ G λn (−x), n even
G n (x) = (−1) G n (x) = , (5.17)
−G λn (−x). n odd
(2λ)n
G λn (1) = , (5.18)
n!
(−1)n (λ)n
G λ2n (0) = , (5.19)
n!
2(−1) (λ)n + 1
n
G λ2n+1 (0) = . (5.20)
n!
Gegenbauer polynomial of the fractional order of λ over a finite interval [a, b] by use
of mapping x = 2( b−a
x−a α
) − 1, where α > 0 and x ∈ [−1, 1], is defined as (Parand
and Delkhosh 2016) follows:
x −a α
F G α,λ
n (x) = G λn (x ) = G λn 2 −1 , (5.21)
b−a
5 Fractional Gegenbauer Kernel Functions: Theory and Application 101
1
F G α,λ
n (x) = [2x(n + λ − 1)F G α,λ α,λ
n−1 (x) − (n + 2λ − 2)F G n−2 (x)],
n
α,λ α,λ x −a α
F G 0 (x) = 1, F G 1 (x) = 2λ 2 −1 . (5.22)
b−a
F G α,λ
0 (x) = 1,
x −a α
F G α,λ
1 (x) = 4λ − 2λ,
b−a
2α α
α,λ 2 x −a 2 x −a x − a 2α
F G 2 (x) = 8λ − 8λ + 2λ + 8λ
2
b−a b−a b−a
α
x −a
−8λ + λ.
b−a
For higher orders there exist many terms which makes it impossible to write so for
sake of convenience one can use the below Python code to generate any order of
Gegenbauer polynomials of fractional order:
Program Code
import sympy
x = sympy.Symbol("x")
a = sympy.Symbol("a")
b = sympy.Symbol("b")
lambd = sympy.Symbol(r’\lambda’)
alpha = sympy.Symbol(r’\alpha’)
x=sympy.sympify(1-2*(x-a/b-a)**alpha)
Program Code
sympy.expand(sympy.simplify(FGn(x,2)))
> x−a )2α − 8λ2 ( x−a )α + 2λ2 + 8λ( x−a )2α − 8λ( x−a )α + λ
8λ2 ( b−a b−a b−a b−a
1
F G α,λ λ
n (x) = G n (x ) = x−a α λ
. (5.23)
1 − 2 2 b−a − 1 z + z2
Similarly, the weight function to which the fractional Gegenbauer functions are
orthogonal is simple
x−atoα define. Considering the weight function as Eq. 5.2 and the
transition x = 2 b−a − 1, we have
α−1 α 2α λ− 21
α,λ x −a x −a x −a
w (x ) = 2 4 −4 . (5.24)
b−a b−a b−a
The fractional Gegenbauer polynomials are orthogonal over a finite interval respect-
ing weight function in Eq. 5.24, and therefore one can define the orthogonality relation
as (Dehestani et al. 2020) follows:
1 1
G λn (x )G λm (x )w(x )d x = F G α,λ α,λ
n (x)F G m (x)w
α,λ
(x)d x
−1 0
21−4λ π (2λ + m)
= δmn , (5.25)
(λ + m)m! 2 (λ)
In this section, the ordinary Gegenbauer kernel function has been covered consider-
ing addressing annihilation and explosion problems and proving the validity of such
a kernel. Furthermore, the generalized Gegenbauer kernel function and also the frac-
tional form of the Gegenbauer kernel function have been covered and its validation
has been proved according to the Mercer theorem.
5 Fractional Gegenbauer Kernel Functions: Theory and Application 103
Gegenbauer polynomials have been used as a kernel function due to less need to
support vectors while acquiring high accuracy in comparison with well-known kernel
functions such as RBF or many other classical and orthogonal kernels (Padierna
et al. 2018). That is because Gegenbauer polynomials, same as any other orthogonal
polynomial kernels produce kernel matrices with lower significant eigenvalues which
in turn means fewer support vectors are needed (Padierna et al. 2018).
As we know, a multidimensional SVM kernel function can be defined as follows:
d
K (X, Z ) = K j (x j , z j ). (5.26)
j=0
It can be seen that two undesired results are produced, annihilation effect, when any
or both of x j and z j of k(x j , z j ) is/are close to zero, then kernel outputs very small
values, and the second one is considered as explosion effect which refers to very big
output of the kernel | dj=1 K (x j , z j )| −→ ∞ which leads to numerical difficulties
(Padierna et al. 2018). To overcome annihilation and explosion effects, Padierna
et al. (2018) proposed a new formulation for SVM kernel:
d
n
k(X, Z ) = φ(X ), φ(Z ) = pi (x j ) pi (z j )w(x j , z j )u( pi )2 , (5.27)
j=1 i=0
Fig. 5.1 Gegenbauer polynomials with λ = 0.25 in the positive range of λ ∈ (−0.5, 0.5]
Fig. 5.2 Gegenbauer polynomials with λ = −0.25 in the negative range of λ ∈ (−0.5, 0.5]
The weight function introduced in Eq. 5.28 is the univariate weight function,
meaning that is the weight function for one variable. To use the Gegenbauer polyno-
mials kernel function, a bivariate weight function is needed, in which one can define
a product of the corresponding univariate weight functions (Dunkl and Yuan 2014):
Fig. 5.4 The weighted Gegenbauer polynomials in the second group, where λ = 2
The ε term in Eq. 5.30 is to prevent weight function from getting zero at values on
the border in the second group of Gegenbauer polynomials (Figs. 5.4 and 5.5) by
adding a small value to the output (e.g., = 0.01) (Padierna et al. 2018).
Using relations in Eqs. 5.27, 5.29, and 5.30, one can define the Gegenbauer kernel
function as
d
n
K Geg (X, Z ) = G iλ (x j )G λj (z j )wλ (x j , z j )u(G λj )2 . (5.31)
j=1 i=0
106 S. Nedaei Janbesaraei et al.
Fig. 5.5 The weighted-and-scaled Gegenbauer polynomials in the second group, where λ = 2
Bearing in mind that K Geg (x, z) = dj=1 K Geg (x j , z j ), denoting the multidimen-
sionality of the Gegenbauer kernel function, consequently the Gegenbauer for scalars
x and z is
n
K Geg (x, z) = G iλ (x)G iλ (z)wλ (x, z)u(G iλ )2 . (5.33)
i=0
n
G iλ (x)G iλ (z)wλ (x, z)u(G iλ )2 g(x)g(z)d xdz. (5.34)
i=0
Also, by inserting weight function from Eq. 5.30 into Eq. 5.34, we have
n
G iλ (x)G iλ (z) (1 − x 2 )λ− 2 (1 − z 2 )λ− 2 + ε u(G iλ )2 g(x)g(z)d xdz.
1 1
=
i=0
(5.35)
It should be noted that u(G iλ ) is always positive and independent of the data, so
n
u(G iλ )2 ) G iλ (x)G iλ (z)(1 − x 2 )λ− 2 (1 − z 2 )λ− 2 g(x)g(z)d xdz,
1 1
=
i=0
n
+ εu(G iλ )2 G iλ (x)G iλ (z)g(x)g(z)d xdz, (5.36)
i=0
n
u(G iλ )2 G iλ (x)(1 2 λ− 21
G iλ (z)(1 − z 2 )λ− 2 g(z)dz,
1
= −x ) g(x)d x
i=0
n
+ εu(G iλ )2 G iλ (x)g(x)d x G iλ (z)g(z)dz,
i=0
n 2
n 2
1
= u(G iλ )2 G iλ (x)(1 − x 2 )λ− 2 g(x)d x + εu(G iλ )2 G iλ (x)g(x)d x ≥ 0.
i=0 i=0
In this section, the generalized Gegenbauer kernel has been introduced which was
recently proposed by Yang et al. (2020).
The Generalized Gegenbauer kernel (GGK) is constructed by using the partial sum of
the inner products of generalized Gegenbauer polynomials. The generalized Gegen-
bauer polynomials have the recursive relation as follows (Yang et al. 2020):
108 S. Nedaei Janbesaraei et al.
GG λ0 (x) = 1,
GG λ1 (x) = 2λx,
1
GG λn (x) = [2x(d + λ − 1)GG λn−1 (x) − (n + 2λ − 2)GG λn−2 (x)],
d
(5.37)
where x ∈ Rn denotes the vector of input variables. The output of generalized Gegen-
bauer polynomial GG λn (x) is scalar for even orders of n and is a vector for odd orders
of n.
Then Yang et al. (2020) proposed generalized Gegenbauer kernel function
K GG (xi , x j ) for order n of two input vectors xi and x j as
n
GG lλ (xi )T GG lλ (x j )
K GG (xi , x j ) = l=0
, (5.38)
exp (σ xi − x j 22 )
where each element of xi and x j is normalized in the range [−1, 1]. In this context,
both α and σ are considered as the kernel scales or the so-called decaying parameters
of the proposed kernel function.
Theorem 5.2 (Padierna et al. (2018)) The proposed GGK is a valid Mercer kernel.
Proof The proposed GGK can be alternatively formulated as the product of two
kernel functions such that
d
K 2 (xi , x j ) = GG lα (xi )T GG lα (x j ),
l=0
K GG (xi , x j ) = K 1 (xi , x j )K 2 (xi , x j ). (5.39)
As already been discussed in 2.2.1, the multiplication of two valid Mercer Kernels
is also a valid kernel function. Since that K 1 (xi , x j ) is the Gaussian kernel (σ > 0)
which satisfies the Mercer theorem, K GG (xi , x j ) can be proved as a valid kernel by
verifying that K 2 (xi , x j ) satisfies the Mercer theorem. Given an arbitrary squared
integrable function g(x) defined as g : Rn → R and assuming each element in xi
and x j is independent with each other, we can conclude that
K 2 (xi , x j )g(xi )g(x j )d xi d x j =
5 Fractional Gegenbauer Kernel Functions: Theory and Application 109
n
= GG lλ (xi )T GG lλ (x j )g(xi )g(x j )d xi d x j ,
l=0
n
= GG lλ (xi )T GG lλ (x j )g(xi )g(x j )d xi d x j ,
l=0
n
= GG lλ (xi )T g(xi )d xi GG lλ (x j )g(x j )d x j ,
l=0
n
= GG lλ (xi )T g(xi )d xi GG lλ (x j )g(x j )d x j 0.
l=0
(5.40)
Thus, K 2 (xi , x j ) is a valid Mercer kernel, and it can be concluded that the proposed
GGK K GG (xi , x j ) is an admissible Mercer kernel function.
Similar to the latest kernel function introduced at Eq. 5.31 as the Gegenbauer kernel
and the weight function introduced at Eq. 5.28 as the fractional weight function, the
fractional Gegenbauer kernel has been introduced. First, the bivariate form of the
fractional weight function has to be defined that its approach is again similar to the
previous corresponding definition at Eq. 5.30, i.e.,
2α 2α λ− 21
x −a z−a
f wα,λ (x, z) = 1− 1− + . (5.41)
b−a b−a
d
n
K F Geg (X, Z ) = G iλ (x x j )G λj (x z j ) f wα,λ (x x j , x z j )u(G λj )2 . (5.42)
j=1 i=0
Theorem 5.3 Gegenbauer kernel function introduced at Eq. 5.42 is a valid Mercer
kernel.
Proof According to the Mercer theorem, a valid kernel must satisfy sufficient con-
ditions of Mercer’s theorem. As Mercer theorem states, any SVM kernel to be a valid
kernel must be non-negative, in a precise way:
K (x, z)w(x, z) f (x) f (z)d xdz ≥ 0. (5.43)
110 S. Nedaei Janbesaraei et al.
By the fact that here the kernel is as Eq. 5.42, it can be seen with a simple replacement
that
K F Geg (x, z)g(x)g(z)d xdz =
n
G iλ (x x j )G λj (x z j ) f wλ (x x j , x z j )u(G λj )2 g(x)g(z)d xdz. (5.44)
i=0
In the last equation, the fractional bivariate weight function (i.e., Eq. 5.41) is consid-
ered, so we have
⎡ ⎤
n 2α 2α λ− 21 2
x −a z−a
G iλ (x x j )G λj (x z j ) ⎣ 1− 1− + ⎦ u G λj g(x)g(z)d xdz.
b−a b−a
i=0
(5.45)
Note that u(G iλ ) is always positive and independent of the data, therefore:
2α 2α λ− 21
n λ 2
n λ (x )G λ (x )
= i=0 u(G i ) G
i=0 i xj j zj 1 − x−a
b−a 1 − z−a
b−a g(x)g(z)d xdz
n
+ i=0 u(G iλ )2 G iλ (x x j )G λj (x z j )g(x)g(z)d xdz. (5.46)
In this section, the result of Gegenbauer and fractional Gegenbauer kernels has
been compared on some datasets, with other well-known kernels such as RBF, poly-
nomial, and also Chebyshev and fractional Chebyshev kernels, introduced in the
previous chapters. In order to have a neat classification, there may need to be some
preprocessing steps according to the dataset. Here, these steps have not been focused
on, but they are mandatory when using Gegenbauer polynomials as the kernel. For
this section, two datasets have been selected, which are well known and helpful for
machine learning practitioners.
The spiral dataset as already has been introduced in Chap. 3 is one of the well-known
multi-class classification tasks. Using the OVA method, this multi-class classification
dataset has been split into three binary classification datasets and applies the SVM
5 Fractional Gegenbauer Kernel Functions: Theory and Application 111
Gegenbauer kernel. Figure 5.6 depicts the data points in normal and fractional space
of the spiral dataset.
Despite the data density in the spiral dataset fraction mode, using more features
(three dimensions or more) it can be seen that the Gegenbauer kernel can display
data classification separately more clearly and simply (see Fig. 5.7).
Also, Fig. 5.8 shows how the classifiers of the Gegenbauer kernel for different
orders (Elliott 1960; Arfaoui et al. 2020; Stier et al. 2021; Soufivand et al. 2021) and
λ = 0.6 have chosen the boundaries. Similarly, Fig. 5.6 depicts the data points of
the spiral dataset after transforming into fractional space of order α = 0.5. Thereby,
Fig. 5.9 depicts the decision boundaries of relevant fractional Gegenbauer kernel,
where α = 0.5 and λ = 0.6.
On the other hand, the following tables provide a comparison of the experiments on
the spiral dataset. Three possible binary classifications have been examined according
to the One-versus-All method. As it is clear from Table 5.2, fractional Legendre,
kernel shows better performance for the class-1-versus-(Doha 1998; Elliott 1960).
However, for other binary classifications on this spiral dataset, according to results
in Tables 5.3 and 5.4, RBF kernels outperform fractional kinds of Legendre kernels.
112 S. Nedaei Janbesaraei et al.
Fig. 5.8 Gegenbauer kernel with orders of 3, 4, 5, and 6 on Spiral dataset with λ = 0.6
Table 5.2 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Gegenbauer, and
fractional Gegenbauer kernels functions on Spiral dataset
Sigma Power Order Alpha(α) Lambda(λ) Accuracy
RBF 0.73 – – – – 0.97
Polynomial – 8 – – – 0.9533
Chebyshev – – 5 – – 0.9667
Fractional – – 3 0.3 – 0.9733
Chebyshev
Legendre – – 7 – – 0.9706
Fractional – – 7 0.4 – 0.9986
Legendre
Gegenbauer – – 6 – 0.3 0.9456
Fractional – – 6 0.3 0.7 0.9533
Gegenbauer
5 Fractional Gegenbauer Kernel Functions: Theory and Application 113
Fig. 5.9 Fractional Gegenbauer kernel with orders of 3, 4, 5, and 6 on Spiral dataset with α = 0.5
and λ = 0.6
Table 5.3 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Gegenbauer, and
fractional Gegenbauer kernels on Spiral dataset
Sigma Power Order Alpha(α) Lambda(λ) Accuracy
RBF 0.1 – – – – 0.9867
Polynomial – 5 – – – 0.9044
Chebyshev – – 6 – – 0.9289
Fractional – – 6 0.8 – 0.9344
Chebyshev
Legendre – – 8 – – 0.9773
Fractional – – 8 0.4 – 0.9853
Legendre
Gegenbauer – – 5 – 0.3 0.9278
Fractional – – 4 0.6 0.6 0.9356
Gegenbauer
114 S. Nedaei Janbesaraei et al.
Table 5.4 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Gegenbauer, and
fractional Gegenbauer kernels on Spiral dataset
Sigma Power Order Alpha(α) Lambda(λ) Accuracy
RBF 0.73 – – – – 0.9856
Polynomial – 5 – – – 0.9856
Chebyshev – – 6 – – 0.9622
Fractional – – 6 0.6 – 0.9578
Chebyshev
Legendre – – 7 – – 0.9066
Fractional – – 5 0.4 – 0.9906
Legendre
Gegenbauer – – 6 – 0.3 0.9611
Fractional – – 6 0.9 0.3 0.9644
Gegenbauer
Table 5.5 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Legendre, fractional
Legendre, Gegenbauer, and fractional Gegenbauer kernels on Monk’s first problem. It is shown
that the Gegenbauer kernel outperforms others, and also the fractional Gegenbauer and fractional
Legendre have the most desirable accuracy which is 1
Sigma Power Order Alpha(α) Lambda(λ) Accuracy
RBF 2.844 – – – – 0.8819
Polynomial – 3 – – – 0.8681
Chebyshev – – 3 – – 0.8472
Fractional – – 3 1/16 – 0.8588
Chebyshev
Legendre – – 4 – – 0.8333
Fractional – – 4 0.1 – 0.8518
Legendre
Gegenbauer – – 3 – –0.2 0.9931
Fractional – – 3 0.7 0.2 1
Gegenbauer
Another problem in point is the three Monks’ problem, which is addressed here (see
Chap. 3 for more information about the dataset). We applied the Gegenbauer kernel
introduced at Eq. 5.31 on the datasets from the three Monks’ problem and append
the result to Tables 5.5, 5.6, and 5.7. It can be seen that fractional Gegenbauer kernel
shows strong performance on these datasets, specifically on the first problem which
had 100% accuracy, and also on the third dataset, both kinds of Gegenbauer kernels
have the best accuracy among all kernels under comparison. Tables 5.5, 5.6, and 5.7
illustrate the details of these comparisons.
5 Fractional Gegenbauer Kernel Functions: Theory and Application 115
Table 5.6 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Legendre, fractional
Legendre, Gegenbauer, and fractional Gegenbauer kernels on the Monk’s second problem. It is
shown that the fractional Chebyshev kernel has the second best result, following the fractional
Legendre kernel
Sigma Power Order Alpha(α) Lambda(λ) Accuracy
RBF 5.5896 – – – – 0.875
Polynomial – 3 – – – 0.8657
Chebyshev – – 3 – – 0.8426
Fractional – – 3 1/16 – 0.9653
Chebyshev
Legendre – – 3 – – 0.8032
Fractional – – 3 0.1 – 1
Legendre
Gegenbauer – – 3 – 0.5 0.7824
Fractional – – 3 0.1 0.5 0.9514
Gegenbauer
Table 5.7 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Legendre, fractional
Legendre, Gegenbauer, and fractional Gegenbauer kernels on the Monk’s third problem. Note that
the Gegenbauer and also fractional Gegenbauer kernels have the best result
Sigma Power Order Alpha(α) Lambda(λ) Accuracy
RBF 2.1586 – – – – 0.91
Polynomial – 3 – – – 0.875
Chebyshev – – 6 – – 0.895
Fractional – – 5 1/5 – 0.91
Chebyshev
Legendre – – 4 – – 0.8472
Fractional – – 3 0.8 – 0.8379
Legendre
Gegenbauer – – 4 – –0.2 0.9259
Fractional – – 3 0.7 –0.2 0.9213
Gegenbauer
5.5 Conclusion
last section, how to successfully use this new kernel through an experiment on well-
known datasets in kernel-based learning algorithms such as SVM has been depicted.
References
Abd Elaziz, M., Hosny, K.M., Selim, I.M.: Galaxies image classification using artificial bee colony
based on orthogonal Gegenbauer moments. Soft. Comput. 23, 9573–9583 (2019)
Arfaoui, S., Ben Mabrouk, A., Cattani, C.: New type of Gegenbauer-Hermite monogenic polyno-
mials and associated Clifford wavelets. J. Math. Imaging Vis. 62, 73–97 (2020)
Asghari, M., Hadian Rasanan, A.H., Gorgin, S., Rahmati, D., Parand, K.: FPGA-orthopoly: a
hardware implementation of orthogonal polynomials. Eng. Comput. (2022). https://doi.org/10.
1007/s00366-022-01612-x
Avery, J.S.: Hyperspherical Harmonics: Applications in Quantum Theory, vol. 5. Springer Science
& Business Media, Berlin (2012)
Azari, A.S., Mack, Y.P., Müller, H.G.: Ultraspherical polynomial, kernel and hybrid estimators for
non parametric regression. Sankhya: Indian J. Stat. 80–96 (1992)
Belmehdi, S.: Generalized Gegenbauer orthogonal polynomials. J. Comput. Appl. Math. 133, 195–
205 (2001)
Brackx, F., De Schepper, N., Sommen, F.: The Clifford-Gegenbauer polynomials and the associated
continuous wavelet transform. Integr. Transform. Spec. Funct. 15, 387–404 (2004)
Cohl, H.S.: On a generalization of the generating function for Gegenbauer polynomials. Integr.
Transform. Spec. Funct. 24, 807–816 (2013)
Dehestani, H., Ordokhani, Y., Razzaghi, M.: Application of fractional Gegenbauer functions in
variable-order fractional delay-type equations with non-singular kernel derivatives. Chaos, Soli-
tons Fractals 140, 110111 (2020)
Doha, E.H.: The ultraspherical coefficients of the moments of a general-order derivative of an
infinitely differentiable function. J. Comput. Appl. Math. 89, 53–72 (1998)
Doman, B.G.S.: The Classical Orthogonal Polynomials. World Scientific, Singapore (2015)
Dunkl, C.F., Yuan, X.: Orthogonal Polynomials of Several Variables. Cambridge University Press,
Cambridge (2014)
Eassa, M., Selim, I.M., Dabour, W., Elkafrawy, P.: Automated detection and classification of galaxies
based on their brightness patterns. Alex. Eng. J. 61, 1145–1158 (2022)
El-Kalaawy, A.A., Doha, E.H., Ezz-Eldien, S.S., Abdelkawy, M.A., Hafez, R.M., Amin, A.Z.M.,
Zaky, M.A.: A computationally efficient method for a class of fractional variational and optimal
control problems using fractional Gegenbauer functions. Rom. Rep. Phys. 70, 90109 (2018)
Elliott, D.: The expansion of functions in ultraspherical polynomials. J. Aust. Math. Soc. 1, 428–438
(1960)
Feng, B.Y., Varshney, A.: SIGNET: efficient neural representation for light fields. In: Proceedings
of the IEEE/CVF (2021)
Feng, J., Liu, L., Wu, D., Li, G., Beer, M., Gao, W.: Dynamic reliability analysis using the extended
support vector regression (X-SVR). Mech. Syst. Signal Process. 126, 368–391 (2019)
Ferrara, L., Guégan, D.: Forecasting with k-factor Gegenbauer processes: theory and applications.
J. Forecast. 20, 581–601 (2001)
Hadian-Rasanan, A.H., Nikarya, M., Bahramnezhad, A., Moayeri, M.M., Parand, K.: A comparison
between pre-Newton and post-Newton approaches for solving a physical singular second-order
boundary problem in the semi-infinite interval. arXiv:1909.04066
He, J., Chen, T., Zhang, Z.: A Gegenbauer neural network with regularized weights direct determi-
nation for classification (2019). arXiv:1910.11552
Herrera-Acosta, A., Rojas-Domínguez, A., Carpio, J.M., Ornelas-Rodríguez, M., Puga, H.:
Gegenbauer-based image descriptors for visual scene recognition. In: Intuitionistic and Type-
5 Fractional Gegenbauer Kernel Functions: Theory and Application 117
2 Fuzzy Logic Enhancements in Neural and Optimization Algorithms: Theory and Applications,
pp. 629–643 (2020)
Hjouji, A., Bouikhalene, B., EL-Mekkaoui, J., Qjidaa, H.: New set of adapted Gegenbauer-
Chebyshev invariant moments for image recognition and classification. J. Supercomput. 77,
5637–5667 (2021)
Hosny, K.M.: Image representation using accurate orthogonal Gegenbauer moments. Pattern Recog-
nit. Lett. 32, 795–804 (2011)
Hosny, K.M.: New set of Gegenbauer moment invariants for pattern recognition applications. Arab.
J. Sci. Eng. 39, 7097–7107 (2014)
Hosny, K.M., Darwish, M.M., Eltoukhy, M.M.: New fractional-order shifted Gegenbauer moments
for image analysis and recognition. J. Adv. Res. 25, 57–66 (2020)
Ilić, A.D., Pavlović, V.D.: New class of filter functions generated most directly by Christoffel-
Darboux formula for Gegenbauer orthogonal polynomials. Int. J. Electron. 98, 61–79 (2011)
Langley, J., Zhao, Q.: A model-based 3D phase unwrapping algorithm using Gegenbauer polyno-
mials. Phys. Med. Biol. 54, 5237–5252 (2009)
law Pawlak, M.: Image analysis by moments: reconstruction and computational aspects. Oficyna
Wydawnicza Politechniki Wrocławskiej (2006)
Liao, S., Chiang, A., Lu, Q., Pawlak, M.: Chinese character recognition via Gegenbauer moments.
In: Object Recognition Supported by User Interaction for Service Robots, vol. 3, pp. 485–488
(2002)
Liao, S., Chen, J.: Object recognition with lower order Gegenbauer moments. Lect. Notes Softw.
Eng. 1, 387 (2013)
Liu, W., Wang, L.L.: Asymptotics of the generalized Gegenbauer functions of fractional degree. J.
Approx. Theory 253, 105378 (2020)
Ludlow, I.K., Everitt, J.: Application of Gegenbauer analysis to light scattering from spheres: theory.
Phys. Rev. E 51, 2516–2526 (1995)
Olver, F.W.J., Lozier, D.W., Boisvert, R.F., Clark, C.W.: NIST Handbook of Mathematical Functions
Hardback and CD-ROM. Cambridge University Press, Cambridge (2010)
Öztürk, Ş, Ahmad, R., Akhtar, N.: Variants of artificial Bee Colony algorithm and its applications
in medical image processing. Appl. Soft Comput. 97, 106799 (2020)
Padierna, L.C., Carpio, M., Rojas-Dominguez, A., Puga, H., Fraire, H.: A novel formulation of
orthogonal polynomial kernel functions for SVM classifiers: the Gegenbauer family. Pattern
Recognit. 84, 211–225 (2018)
Padierna, L.C., Amador-Medina, L.F., Murillo-Ortiz, B.O., Villaseñor-Mora, C.: Classification
method of peripheral arterial disease in patients with type 2 diabetes mellitus by infrared ther-
mography and machine learning. Infrared Phys. Technol. 111, 103531 (2020)
Parand, K., Delkhosh, M.: Solving Volterra’s population growth model of arbitrary order using the
generalized fractional order of the Chebyshev functions. Ricerche mat. 65, 307–328 (2016)
Parand, K., Dehghan, M., Baharifard, F.: Solving a laminar boundary layer equation with the rational
Gegenbauer functions. Appl. Math. Model. 37, 851–863 (2013)
Parand, K., Bahramnezhad, A., Farahani, H.: A numerical method based on rational Gegenbauer
functions for solving boundary layer flow of a Powell-Eyring non-Newtonian fluid. Comput.
Appl. Math. 37, 6053–6075 (2018)
Park, R.W.: Optimal compression and numerical stability for Gegenbauer reconstructions with
applications. Arizona State University (2009)
Reimer, M.: Multivariate Polynomial Approximation. Springer Science & Business Media, Berlin
(2003)
Soufivand, F., Soltanian, F., Mamehrashi, K.: An operational matrix method based on the Gegen-
bauer polynomials for solving a class of fractional optimal control problems. Int. J. Ind. Electron.
4, 475–484 (2021)
Srivastava, H.M., Shah, F.A., Abass, R.: An application of the Gegenbauer wavelet method for
the numerical solution of the fractional Bagley-Torvik equation. Russ. J. Math. Phys. 26, 77–93
(2019)
118 S. Nedaei Janbesaraei et al.
Stier, A.C., Goth, W., Hurley, A., Feng, X., Zhang, Y., Lopes, F.C., Sebastian, K.R., Fox, M.C.,
Reichenberg, J.S., Markey, M.K., Tunnell, J.W.: Machine learning and the Gegenbauer kernel
improve mapping of sub-diffuse optical properties in the spatial frequency domain. In: Molecular-
Guided Surgery: Molecules, Devices, and Applications VII, vol. 11625, p. 1162509 (2021)
Tamandani, A., Alijani, M.G.: Development of an analytical method for pattern synthesizing of
linear and planar arrays with optimal parameters. Int. J. Electron. Commun. 146, 154135 (2022)
Wu, Q., Zhou, D.X.: SVM soft margin classifiers: linear programming versus quadratic program-
ming. Neural Comput. 17, 1160–1187 (2005)
Yang, W., Zhang, Z., Hong, Y.: State recognition of bolted structures based on quasi-analytic
wavelet packet transform and generalized Gegenbauer support vector machine. In: 2020 IEEE
International Instrumentation and Measurement Technology Conference (I2MTC), pp. 1–6 (2020)
Yang, W., Zhousuo, Z., Hong, Y.: State recognition of bolted structures based on quasi-analytic
wavelet packet transform and generalized Gegenbauer support vector machine. In: 2020 IEEE
International Instrumentation and Measurement Technology Conference (I2MTC), pp. 1–6 (2020)
Zhang, Z., He, J., Tang, L. : Two-input gegenbauer orthogonal neural network with growing-and-
pruning weights and structure determination. In: International Conference on Cognitive Systems
and Signal Processing, pp. 288–300 (2018)
Chapter 6
Fractional Jacobi Kernel Functions:
Theory and Application
Amir Hosein Hadian Rasanan, Jamal Amani Rad, Malihe Shaban Tameh,
and Abdon Atangana
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 119
J. A. Rad et al. (eds.), Learning with Fractional Orthogonal Kernel Classifiers in Support
Vector Machines, Industrial and Applied Mathematics,
https://doi.org/10.1007/978-981-19-6553-1_6
120 A. H. Hadian Rasanan et al.
6.1 Introduction
aFor more information about Carl Gustav Jacob Jacobi and his contribution, please see:
https://mathshistory.st-andrews.ac.uk/Biographies/Jacobi/.
6.2 Preliminaries
This section covers the basic definitions and properties of Jacobi orthogonal polyno-
mials. Moreover, the fractional form of Jacobi orthogonal polynomials is introduced
and relevant properties are clarified.
122 A. H. Hadian Rasanan et al.
Let’s start with a simple recap on the definition of orthogonal polynomials. In Szeg
(1939), Szego defined the orthogonal polynomials using a function, such as F(x),
to be non-decreasing which includes many points of increase in the interval [a, b].
Suppose the following moments exist for this function:
b
cn = x n d F(x), n = 0, 1, 2, . . . . (6.1)
a
where ψ, ω > −1. However, Eq. 6.4 formulation of the weight function suffers from
computational difficulties at two distinct input points where ψ, ω < 0 Doman (2015).
The input data has to be normalized into interval [−1, 1]. In general, it can be written
that:
The orthogonal polynomials with the weight function (b − x)ψ (x − a)ω on the inter-
ψ,ω
val [a, b], for any orthogonal polynomial denoted by Jn (x), can be expressed in
the following form Doman (2015), Bhrawy and Zaky (2016):
x −a
Jn(ψ,ω) 2( )−1 , (6.5)
b−a
Similarly, it happens when ω < 0 for the second part (1 + x)ω , while x = −1. To
tackle this issue adding a trivial noise to x is proposed through summing with a slack
variable = 10−4 Tian and Wang (2017).
Consequently, one can rewrite the relation (6.4) as follows:
Note that in programming languages like Python, it is necessary that floating point
part precision of 1 − x and 1 + x should be handled and set equal to precision of the
slack variable, for example, following figures compare Eqs. 6.4 and 6.6 for differ-
ent ψ, ω, and = 0.0001. Since there are some difficulties with the calculation of
ordinary weight functions on boundaries, we have used Maple software for plotting.
Figure 6.1 demonstrates the weight function of Jacobi polynomials without a noise
term for different ψ and ω, whereas Fig. 6.2 depicts the plots of weight function
of Jacobi polynomials for different ψ and ω while a noise ( = 0.0001) is added
to terms. As these plots depict, there is no considerable difference in the output of
related functions.
Thus, the orthogonality relation is Bhrawy et al. (2016), Doman (2015), Askey
(1975):
124 A. H. Hadian Rasanan et al.
1
Jmψ,ω (x)Jnψ,ω (x)w(ψ,ω) (x)d x = 0, m = n. (6.7)
−1
d d ψ,ω ψ,ω
(1 − x + )ψ+1 (1 + x + )ω+1 Jn (x) + (1 − x + )ψ (1 + x + )ω ρn Jn (x) = 0,
dx dx
(6.8)
while
ψ,ω
J0 (x) = 1,
ψ,ω 1 1
J1 (x) = (ψ + ω + 2)x + (ψ − ω),
2 2
(6.10)
where
(2n + ψ + ω + 1)(2n + ψ + ω + 2)
An = ,
2(n + 1)(n + ψ + ω + 1)
(ω2 − ψ 2 )(2n + ψ + ω + 1)
Bn = ,
2(n + 1)(n + ψ + ω + 1)(2n + ψ + ω)
(n + ψ)(n + ω)(2n + ψ + ω + 2)
Cn = .
(n + 1)(n + ψ + ω + 1)(2n + ψ + ω)
(6.11)
The first few Jacobi polynomials are same as Eq. 6.10. On the other hand, because
Jacobi polynomials form lengthy terms, the higher orders are ignored to present here.
For higher orders of Jacobi polynomials, the following code can be used which uses
Python’s sympy module to calculate symbolically rather than numerically:
Program Code
import sympy
x = sympy.Symbol("x")
a = sympy.Symbol("a")
6 Fractional Jacobi Kernel Functions … 125
b = sympy.Symbol("b")
beta = sympy.Symbol(r’\omega’)
alpha = sympy.Symbol(r’\psi’)
x=sympy.Symbol("x")
if n == 0:
return 1
elif n == 1:
return ((alpha+beta+2)*x)/2 +(alpha-beta)/2
elif n >= 2:
return ((An*x+Bn)*Jacobi(x,n-1)-(Cn)*Jacobi(x,n-2))
Moreover, these polynomials have some special properties which are introduced
in Eqs. 6.12–6.14 Doman (2015), Askey (1975):
(n + ψ + 1)
Jnψ,ω (1) = , (6.13)
n! × (ψ + 1)
dm (m + n + ψ + ω + 1) ψ+m,ω+m
(Jnψ,ω (x)) = m J (x). (6.14)
dx m 2 × (n + ψ + ω + 1) n−m
ψ,ω
Theorem 6.1 (Hadian et al. (2020)) The Jacobi polynomial Jn (x) has exactly n
real zeros on the interval (−1, 1).
Proof Referring to Hadian et al. (2020) shows that n zeros of Jacobi polynomial
ψ,ω
Jn (x) can be achieved by calculating the eigenvalues of the following three-
diagonal matrix: ⎡ ⎤
ρ1 γ2
⎢γ1 ρ2 γ3 ⎥
⎢ ⎥
⎢ γ3 ρ3 γ4 ⎥
⎢ ⎥
Kn = ⎢ . . . ⎥
⎢ .. .. .. ⎥
⎢ ⎥
⎣ γn−1 ρn−1 γn ⎦
γn ρn ,
where
126 A. H. Hadian Rasanan et al.
ψ,ω ψ,ω
(x)Ji (x)wψ,ω (x)d x
1
−1 x Ji
ρi+1 = 1 ψ,ω ψ,ω
, (6.15)
−1 Ji (x)Ji (x)wψ,ω (x)d x
⎧
⎨0 i = 0,
γi+1 = 1 ψ,ω
Ji (x)Ji
ψ,ω
(x)wψ,ω (x)d x . (6.16)
⎩ −1
1 ψ,ω ψ,ω i = 1, 2, . . .
−1 Ji−1 (x)Ji−1 (x)wψ,ω (x)d x
d2 y dy
(1 − x 2 )
2
+ ω − ψ(ψ + ω + 2)x + λy = 0. (6.17)
dx dx
The solution can be expressed by means of power series y = ∞ n
n=0 an x :
∞
∞ ∞
(1 − x 2 ) n(n − 1)an x n−2 + ω − ψ − (ψ + ω + 2)x ann−1 + λ ann = 0. (6.18)
n=0 n=0 n=0
Also, generating function for Jacobi Polynomials can be defined as follows Doman
(2015), Askey (1975):
∞
2ψ+ω
ψ ω
= Jnψ,ω (x)x n , (6.19)
R(1 + R − x) (1 + R + X ) n=0
√
where R = 1 − 2x z + z 2 and |z| < 1.
This family of orthogonal polynomials follows the same symmetry as previous
orthogonal families already introduced, i.e.,
Jn (−x), n even
Jnψ,ω (x) = (−1) n
Jnψ,ω (x) = (6.20)
−Jn (−x), n odd
Figures 6.3 and 6.4 illustrate Jacobi polynomials of orders 0 to 6 and compare
negative ψ, positive ω in Fig. 6.3 versus positive ψ while negative ω in Fig. 6.4.
Figures 6.5 and 6.6 depict Jacobi polynomials of order 5 with different ψ, fixed ω,
fixed ψ, and fixed ω, respectively.
and α > 0. By applying this transformation, the fractional order of Jacobi functions
ψ,ω,α
is obtained which is denoted by F Jn (x) as follows:
x −a α
F Jnψ,ω,α (x) = Jnψ,ω (2( ) − 1). (6.21)
b−a
The fractional order of Jacobi function is orthogonal over the interval [−1, 1] with
the following weight function Doman (2015), Askey (1975), Hadian et al. (2020):
x −a ψ x −a ω
wψ,ω,α (x) = 1 − 2( )α − 1 − 1 + 2( )α − 1 + .
b−a b−a
(6.22)
Similarly, there exists a Sturm–Liouville differential equation for the fractional order
of Jacobi functions. This Sturm–Liouville equation is as follows Doman (2015),
Askey (1975), Hadian et al. (2020):
d x −a α x −a α d ψ,ω
(1 − (2( ) − 1) − )ψ+1 (1 + (2( ) − 1) + )ω+1 J (x)
dx b−a b−a dx n
x −a α x −a α
+(1 − (2( ) − 1) − )ψ (1 + (2( ) − 1) + )ω ρn Jnψ,ω (x) = 0, (6.23)
b−a b−a
x−a α
where ρn = n(n + ψ + ω + 1). By applying the mapping z = 2( b−a ) − 1 to the
generating function defined for Jacobi polynomials, one can get the equation for the
fractional form:
∞
2ψ,ω x −a α
x−a α ψ x−a α ψ ω
= F Jnψ,ω,α (x)((2( ) − 1)ψ )n ,
R(1 + R − (2( b−a ) − 1) (1 R + (2( b−a ) − 1) ) b −a
n=0
(6.24)
where
x −a α x −a α
R= 1 − 2x((2( ) − 1)ψ ) + ((2( ) − 1)ψ )2 ,
b−a b−a
x−a α
and (2( b−a ) − 1)ψ ∈ [−1, 1].
6 Fractional Jacobi Kernel Functions … 129
Fractional Jacobi polynomials are also orthogonal. These polynomials are orthog-
onal in interval [−1, 1] with respect to the weight function similar to Eq. 6.6 where
x−a α
the input x is mapped by means of 2( b−a ) − 1. So the proper weight function for
orthogonality relation of fractional Jacobi polynomials is Bhrawy and Zaky (2016),
Hadian et al. (2020), Kazem (2013)
x −a α x −a α
Fwψ,ω,α (x) = (1 − (2( ) − 1) + )ψ (1 − (2( ) − 1) + )ω . (6.25)
b−a b−a
Now one can define the orthogonality relation for the fractional Jacobi polyno-
mials as b
F Pmψ,ω,α (x)F Pnψ,ω,α (x)Fwψ,ω,α (x)d x. (6.26)
a
The recursive relation for fractional Jacobi polynomials can be defined the same
way as already has been done so far by using the mapped x into the normal equation
Doman (2015), Askey (1975), Bhrawy and Zaky (2016), Hadian et al. (2020):
ψ,ω,α
F J0 (x) = 1,
ψ,ω,α 1 x −a α 1
F J1 (x) = (ψ + ω + 2)(2( ) − 1) + (ψ − ω),
2 b−a 2
ψ,ω,α x −a α ψ,ω,α
F Jn+1 (x) = (An (2( ) − 1) + Bn )F Jnψ,ω,α (x) − Cn F Jn+1 (x), n 1,
b−a
(6.27)
where
(2n + ψ + ω + 1)(2n + ψ + ω + 2)
An = ,
2(n + 1)(n + ψ + ω + 1)
(ω2 − ψ 2 )(2n + ψ + ω + 1)
Bn = ,
2(n + 1)(n + ψ + ω + 1)(2n + ψ + ω)
(n + ψ)(n + ω)(2n + ψ + ω + 2)
Cn = .
(n + 1)(n + ψ + ω + 1)(2n + ψ + ω)
(6.28)
Program Code
import sympy
x = sympy.Symbol("x")
a = sympy.Symbol("a")
b = sympy.Symbol("b")
beta = sympy.Symbol(r’\omega’)
alpha = sympy.Symbol(r’\psi’)
delta = sympy.Symbol(r’\alpha’)
x=sympy.sympify(1-2*(x-a/b-a)**delta)
def A(n):
return (2*n+alpha+beta+1)*(2*n+alpha+beta+2)/2*(n+1)*(n+alpha+beta+1)
def B(n):
return (beta**2-alpha**2)*(2*n+alpha+beta+1)/2*(n+1)*(n+alpha+beta+1)
* (2*n+alpha+beta)
def C(n):
return (n+alpha)*(n+beta)*(2*n+alpha+beta+2)/(n+1)*(n+alpha+beta+1)
* (2*n+alpha+beta)
def FJacobi(x, n):
if n == 0:
return 1
elif n == 1:
return ((alpha+beta+2)*x)/2+(alpha-beta)/2
elif n >= 2:
return ((A(n-1)*x+B(n-1))*Jacobi(x,n-1)-(C(n-1))*Jacobi(x,n-2))
Fig. 6.8 Fractional Jacobi polynomials of order 5, positive ψ and ω and different α
In this section, the kernel functions of ordinary type will be introduced and also its
validation will be proved according to Mercer condition. Moreover, Wavelet-Jacobi
kernel will be introduced which recently attracted the attention of researchers. The
last subsection is devoted to fractional Jacobi kernel.
As we already know, the unweighted orthogonal polynomial kernel function for SVM
can be written as follows:
n
K (x, z) = Ji (x)Ji (z), (6.29)
i=0
where J (.) denotes the evaluation of the polynomial. x and z are the kernel’s input
arguments and n is the highest polynomial order. Using this definition and the fact that
need the multi-dimensional form of the kernel function, one can introduce the Jacobi
kernel function by the evaluation of inner product of input vectors (x, z = x z T )
Nadira et al. (2019), Ozer et al. (2011):
n
ψ,ω ψ,ω T
k J acobi (x, z) = Ji (x)Ji (z)wψ,ω (x, z). (6.30)
i=0
132 A. H. Hadian Rasanan et al.
Theorem 6.2 (Nadira et al. (2019)) The Jacobi Kernel introduced in Eq. 6.30 is
valid mercer kernel.
Proof Mercer theorem states that an SVM kernel should be positive semi-definite
or non-negative; in other words, the kernel should satisfy below relation:
K (x, z) f (x) f (z)d xdz 0. (6.31)
By using the fact that multiplication of two valid kernels is also a valid kernel, one
can divide the Jacobi kernel introduced at Eq. 6.30 into two kernels, one the inner
product and the other is weight function, therefore:
n
ψ,ω ψ,ω T
K (1) (x, z) = Ji (x)Ji (z), (6.32)
i=0
ψ,ω
K (2) (x, z) = w (x, z) (6.33)
= (d − x, z + )ψ (d + x, z + )ω ,
Therefore, the kernel K (1) (x, z) is a valid Mercer kernel. To prove K (2) (x, z) 0, we
can prove the weight function in Eq. 6.6 is positive semi-definite, because it is easier to
intuitively consider, then the general weight function for kernel which reformulated
for two vectors input becomes clear. Due to normalization of the input data, this
weight function is positive semi-definite, so the weight function wψ,ω (x, z) = (d −
x, z)ψ (d + x, z)ω which is a generalized form for two input vectors of the kernel
is positive semi-definite too.
6 Fractional Jacobi Kernel Functions … 133
SVM with the kernel-trick heavily depends on the proper kernel to choose according
to data, which is still an attractive topic to introduce and examine new kernels for
SVM. Orthogonality properties of some polynomials such as Jacobi have made them
a hot alternative for such use cases. By the way, some previously introduced kernels
such as wavelet have been used successfully Nadira et al. (2019). Combining the
wavelet and orthogonal Kernel functions such as Chebyshev, Hermit and Legendre
have already been proposed and examined in signal processing Garnier et al. (2003),
solving differential equations Imani et al. (2011), Khader and Adel (2018), optimal
control Razzaghi and Yousefi (2002), Elaydi et al. (2012), and calculus variation
problems Bokhari et al. (2018).
Abassa et al. Nadira et al. (2019) recently introduced SVM kernels based on Jacobi
wavelets. However, the proposed kernel will be glanced here, interested reader may
find the proof and details in original paper Nadira et al. (2019). It should be noted
that some notations have to be changed to preserve the integrity of the book, but the
integration of formulation is intact and same as the original work.
(ψ,ω)
The Jacobi polynomials Jm are defined as follows:
(ψ + ω + 2m − 1)[ψ 2 − ω2 + x(ψ + ω + 2m)(ψ + ω + 2m − 2)] (ψ,ω)
Jm(ψ,ω) (x) = Jm−1 (x)
2m(ψ + ω + 2m − 2)(ψ + ω + m)
(ψ + m − 1)(ω + m − 1)(ψ + ω + 2m) (ψ,ω)
− Jm−2 (x),
m(ψ + ω + 2m − 2)(ψ + ω + m)
where
and w(x) = (1 − x)ψ (1 + x)ω . In addition, δn,m is the Kronecker function, is the
Euler
gamma,
and ., . L 2w denotes the inner product of L 2w ([−1, 1]). The family
(ψ,ω)
Jm forms an orthogonal basis for L 2w ([−1, 1]).
m∈N
Since the weight function of the fractional Jacobi functions is equal to Eq. 6.25,
defining the fractional Jacobi kernel is easy. One can construct the fractional Jacobi
kernel as follows:
n
ψ,ω,α ψ,ω,α
K F J acobi (x, z) = F Ji (x), F Ji (z)Fwψ,ω,α (x, z). (6.35)
i=0
Theorem 6.3 The Jacobi kernel introduced at Eq. 6.35 is valid Mercer kernel.
Proof Similar to the proof for Theorem 6.2, fractional Jacobi kernel function Eq.
6.35 can be considered as multiplication of two kernels:
n
ψ,ω,α ψ,ω,α T
K (1) (x, z) = F Ji (x)F Ji (z), (6.36)
i=0
Therefore, Eq. 6.36 is a valid Mercer kernel. Validity of K (2) (x, z) can be considered
similar to the weight function of ordinary Jacobi kernel function. It can be deduced
that the output of K (2) (x, z) is never less than zero, as the inner product of two vectors
which are normalized over [−1, 1] is in the range [−d, d], where d is the dimension
of the input vectors x or z, and the effect of negative inner product will be neutralized
by the parameter d. Therefore, Eq. 6.37 is also a valid Mercer kernel.
In this section, the results of Jacobi and the fractional Jacobi kernel on some well-
known datasets are compared with other kernels such as RBF, polynomial, Cheby-
shev, fractional Chebyshev, Gegenbauer, fractional Gegenbauer, Legendre, and frac-
tional Legendre kernels introduced in the previous chapters. To have a neat classi-
fication, there may need to apply some prepossessing steps to dataset, here these
steps are not in focus, but the normalization which is mandatory when using Jacobi
polynomials as a kernel. There are some online data stores available for public use,
such a widely used datastore is the UCI Machine Learning Repository1 of University
of California, Irvine, and also Kaggle.2 For this section, four datasets from UCI are
used that are well known for machine learning practitioners.
1 https://archive.ics.uci.edu/ml/datasets.php.
2 https://www.kaggle.com/datasets.
136 A. H. Hadian Rasanan et al.
Fig. 6.10 Jacobi kernel applied to Spiral dataset, both normal and fractional
seem much improvement, but one have to wait for the Jacobi classifiers. Afterward,
one can achieve a better judgment. Figure 6.11 depicts the classifiers of different set-
tings of Jacobi kernel on Spiral dataset. Due to computational difficulties of Jacobi
kernel, orders of 7 and 8 have been ignored. Following figures only present the clas-
sifiers of Jacobi kernel and fractional Jacobi kernels of orders 3 to 6. Since it is clear
in following plots, with the fixed parameters ψ = −0.5 and ω = 0.2, correspond-
ing classifier get more curved and twisted as order raises from 3 to 6. Nevertheless,
this characteristics does not necessarily mean better classifying opportunity due to
its high dependence on multiple parameters besides kernel’s specific parameters.
According to these plots, one can deduce that Jacobi kernels of order 3 and 5 with
ψ = −0.5 and ω = 0.2 are slightly better classifiers for binary classification of class
1-v-[2, 3] on Spiral dataset in comparison to same kernel of orders 4 and 6.
Figure 6.12 demonstrates corresponding plots of fractional Jacobi kernel of orders
3 to 6, with fixed parameters of ψ = −0.5 and ω = 0.2 and fractional order of
0.3. Clearly in fractional space, the classifier gets more twisted and intricate in
comparison to the normal one. The fractional Jacobi kernel is fairly successful in
this classification task, both in normal and fractional space. The following figure
depicts how the fractional Jacobi kernel determines the decision boundary.
6 Fractional Jacobi Kernel Functions … 137
Fig. 6.11 Jacobi kernel applied to Spiral dataset, corresponding classifiers of order 3 to 6, at fixed
parameters ψ = −0.5, ω = 0.2
Fig. 6.12 Fractional Jacobi kernel applied to Spiral dataset, corresponding classifiers of order 3 to
6, at fixed parameters ψ = −0.5, ω = 0.2, and fractional order = 0.3
Table 6.1 Class 1-v-[2, 3], comparison of RBF, polynomial, Chebyshev, fractional Chebyshev,
Gegenbauer, fractional Gegenbauer, Jacobi, fractional Jacobi kernels on the Spiral dataset. It is
clear that RBF, fractional Chebyshev, and fractional Jacobi kernels closely outcome best results
Sigma Power Order Alpha(α) Lambda(λ) Psi(ψ) Omega(ω) Accuracy
RBF 0.73 – – – – – – 0.97
Polynomial – 8 – – – – – 0.9533
Chebyshev – – 5 – – – – 0.9667
Fractional Chebyshev – – 3 0.3 – – – 0.9733
Legendre – – 6 – – – – 0.9706
Fractional Legendre – – 7 0.8 – – – 0.9986
Gegenbauer – – 6 – 0.3 – – 0.9456
Fractional Gegenbauer – – 6 0.3 0.7 – – 0.9533
Jacobi – – 3 – – −0.8 0 0.96
Fractional Jacobi – – 7 0.7 – −0.5 0.6 0.9711
Table 6.2 Class 2-v-[1, 3], comparison of RBF, polynomial, Chebyshev, fractional Chebyshev,
Gegenbauer, fractional Gegenbauer, Jacobi, and fractional Jacobi kernels on the Spiral dataset. As
it is clear, RBF kernel outperforms other kernels
Sigma Power Order Alpha(α) Lambda(λ) Psi(ψ) Omega(ω) Accuracy
RBF 0.1 – – – – – – 0.9867
Polynomial – 5 – – – – – 0.9044
Chebyshev – – 6 – – – – 0.9289
Fractional Chebyshev – – 6 0.8 – – – 0.9344
Legendre – – 8 – – – – 0.9344
Fractional Legendre – – 8 0.4 – – – 0.9853
Gegenbauer - - 5 – 0.3 – – 0.9278
Fractional Gegenbauer – – 4 0.6 0.6 – – 0.9356
Jacobi – – 5 – - −0.2 0.4 0.9144
Fractional Jacobi – – 3 0.3 – −0.2 0 0.9222
Table 6.3 Class 3-v-[1, 2], comparison of RBF, polynomial, Chebyshev, fractional Chebyshev,
Gegenbauer, fractional Gegenbauer, Jacobi, and fractional Jacobi kernels on the Spiral dataset. The
RBF and polynomial kernels have the best accuracy and after them is the fractional Jacobi kernel
Sigma Power Order Alpha(α) Lambda(λ) Psi(ψ) Omega(ω) Accuracy
RBF 0.73 – – – – – – 0.9856
Polynomial – 5 – – – – – 0.9856
Chebyshev – – 6 - – – – 0.9622
Fractional Chebyshev – – 6 0.6 – – – 0.9578
Legendre – – 6 – – – – 0.9066
Fractional Legendre – – 6 0.9 – – – 0.9906
Gegenbauer – – 6 – 0.3 – – 0.9611
Fractional Gegenbauer – – 6 0.9 0.3 – – 0.9644
Jacobi – – 5 – – −0.8 0.3 0.96
Fractional Jacobi – – 7 0.9 – −0.8 0 0.9722
The Jacobi kernels introduced in Eqs. 6.30 and 6.35 are applied on datasets from
The three Monks’ problem and the relevant are appended to Tables 6.4, 6.5, and 6.6.
Fractional Jacobi kernel showed slightly better performance on these datasets in
comparison with other fractional and ordinary kernels on Monks’ problem M1 and
M2.
Finally, some kernels of support vector machine algorithms are summarized in
Table 6.7. A brief explanation of introduced kernels in this book is also presented for
each one that highlights the most notable characteristics of them and gives a suitable
comparison list to glance at and compare the introduced orthogonal kernels of this
book.
140 A. H. Hadian Rasanan et al.
Table 6.4 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Gegenbauer, frac-
tional Gegenbauer, Legendre, fractional Legendre, Jacobi, and fractional Jacobi kernels on Monk’s
first problem. The fractional Gegenbauer and fractional Jacobi kernels have the most desirable
accuracy of 1
Sigma Power Order Alpha(α) Lambda(λ) Psi(ψ) Omega(ω) Accuracy
RBF 2.844 – – – – – – 0.8819
Polynomial – 3 – – – – – 0.8681
Chebyshev – – 3 – – – – 0.8472
Fractional Chebyshev – – 3 1/16 – – – 0.8588
Legendre – – 3 – – - – 0.8333
Fractional Legendre – – 3 0.8 – – – 0.8518
Gegenbauer – – 3 – -0.2 – – 0.9931
Fractional Gegenbauer – – 3 0.7 0.2 – – 1
Jacobi – – 4 – – −0.2 −0.5 0.9977
Fractional Jacobi – – 4 0.4 – −0.2 −0.5 1
Table 6.5 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Gegenbauer, frac-
tional Gegenbauer, Legendre, fractional Legendre, Jacobi, and the fractional Jacobi kernels on
the second Monk’s problem. The fractional Legendre and fractional Jacobi Kernels have the best
accuracy of 1
Sigma Power Order Alpha(α) Lambda(λ) Psi(ψ) Omega(ω) Accuracy
RBF 5.5896 – – – – – – 0.875
Polynomial – 3 – – – – – 0.8657
Chebyshev – – 3 – – – – 0.8426
Fractional – – 3 1/16 – – – 0.9653
Chebyshev
Legendre – – 3 – – – – 0.8032
Fractional – – 3 0.8 – – – 1
Legendre
Gegenbauer – – 3 – 0.5 – – 0.7824
Fractional – – 3 0.1 0.5 – – 0.9514
Gegenbauer
Jacobi – – 3 – – −0.5 −0.2 0.956
Fractional – – 3 0.1 – −0.2 −0.5 1
Jacobi
Jacobi orthogonal polynomials have been covered in this chapter which is the most
general kind of classical orthogonal polynomials and has been used in many cases.
In fact, the basics and the properties of these polynomials are explained, and also
the ordinary Jacobi kernel function is constructed. Moreover, the fractional form of
Jacobi polynomials and the relevant fractional Jacobi kernel function are introduced
here which extended the applicability of such kernel by transforming the input data
into a fractional space that has proved to leverage the accuracy of classification.
Also, a comprehensive comparison is provided between the introduced Jacobi kernels
6 Fractional Jacobi Kernel Functions … 141
Table 6.6 Comparison of RBF, polynomial, Chebyshev, fractional Chebyshev, Gegenbauer, frac-
tional Gegenbauer, Legendre, fractional Legendre, Jacobi, and fractional Jacobi kernels on the third
Monk’s problem. The Gegenbauer and fractional Gegenbauer kernels have the best accuracy score,
then Jacobi, fractional Jacobi, RBF, and fractional Chebyshev Kernels
Sigma Power Order Alpha(α) Lambda(λ) Psi(ψ) Omega(ω) Accuracy
RBF 2.1586 – – – – – – 0.91
Polynomial – 3 – – – – – 0.875
Chebyshev – – 6 – – – – 0.895
Fractional Chebyshev – – 5 1/5 – – – 0.91
Legendre – – 3 – – – – 0.8472
Fractional Legendre – – 3 0.8 – – - 0.8379
Gegenbauer – – 4 – −0.2 – – 0.9259
Fractional Gegenbauer – – 3 0.7 −0.2 – – 0.9213
Jacobi – – 5 – – −0.5 0.0 0.919
Fractional Jacobi – – 4 – – −0.5 0.0 0.9167
and all other kernels discussed in Chaps. 3, 4, and 5. The experiments showed the
efficiency of Jacobi kernels that made them a suitable kernel function for kernel-based
learning algorithms such as SVM.
142 A. H. Hadian Rasanan et al.
References
Abdallah, N.B., Chouchene, F.: New recurrence relations for Wilson polynomials via a system of
Jacobi type orthogonal functions. J. Math. Anal. Appl. 498, 124978 (2021)
Abdelkawy, M.A., Amin, A.Z., Bhrawy, A.H., Machado, J.A.T., Lopes, A.M.: Jacobi collocation
approximation for solving multi-dimensional Volterra integral equations. Int. J. Nonlinear Sci.
Numer. Simul. 18, 411–425 (2017)
Asghari, M., Hadian Rasanan, A.H., Gorgin, S., Rahmati, D., Parand, K.: FPGA-orthopoly: a
hardware implementation of orthogonal polynomials. Eng. Comput. (2022). https://doi.org/10.
1007/s00366-022-01612-x
Askey, R., Wilson, J.A.: Some basic hypergeometric orthogonal polynomials that generalize Jacobi
polynomials. Am. Math. Soc. 319 (1985)
Askey, R.: Orthogonal Polynomials and Special Functions. Society for Industrial and Applied
Mathematics, Pennsylvania (1975)
Bhrawy, A.H.: A Jacobi spectral collocation method for solving multi-dimensional nonlinear frac-
tional sub-diffusion equations. Numer. Algorith. 73, 91–113 (2016)
Bhrawy, A.H., Alofi, A.S.: Jacobi-Gauss collocation method for solving nonlinear Lane-Emden
type equations. Commun. Nonlinear Sci. Numer. Simul. 17, 62–70 (2012)
Bhrawy, A.H., Zaky, M.A.: Shifted fractional-order Jacobi orthogonal functions: application to a
system of fractional differential equations. Appl. Math. Model. 40, 832–845 (2016)
Bhrawy, A., Zaky, M.: A fractional-order Jacobi Tau method for a class of time-fractional PDEs
with variable coefficients. Math. Methods Appl. Sci. 39, 1765–1779 (2016)
Bhrawy, A.H., Hafez, R.M., Alzaidy, J.F.: A new exponential Jacobi pseudospectral method for
solving high-order ordinary differential equations. Adv. Differ. Equ. 2015, 1–15 (2015)
Bhrawy, A.H., Doha, E.H., Saker, M.A., Baleanu, D.: Modified Jacobi-Bernstein basis transforma-
tion and its application to multi-degree reduction of Bézier curves. J. Comput. Appl. Math. 302,
369–384 (2016)
Bokhari, A., Amir, A., Bahri, S.M.: A numerical approach to solve quadratic calculus of variation
problems. Dyn. Contin. Discr. Impuls. Syst. 25, 427–440 (2018)
Boyd, J.P.: Chebyshev and Fourier Spectral Methods. Courier Corporation, MA (2001)
Doha, E.H., Bhrawy, A.H., Ezz-Eldien, S.S.: Efficient Chebyshev spectral methods for solving
multi-term fractional orders differential equations. Appl. Math. Model. 35, 5662–5672 (2011)
Doha, E.H., Bhrawy, A.H., Ezz-Eldien, S.S.: A new Jacobi operational matrix: an application for
solving fractional differential equations. Appl. Math. Model. 36, 4931–4943 (2012)
Doman, B.G.S.: The Classical Orthogonal Polynomials. World Scientific, Singapore (2015)
Elaydi, H.A., Abu Haya, A.: Solving optimal control problem for linear time invariant systems via
Chebyshev wavelet. Int. J. Electr. Eng. 5 (2012)
Ezz-Eldien, S.S., Doha, E.H.: Fast and precise spectral method for solving pantograph type Volterra
integro-differential equations. Numer. Algorithm. 81, 57–77 (2019)
Garnier, H., Mensler, M.I.C.H.E.L., Richard, A.L.A.I.N.: Continuous-time model identification
from sampled data: implementation issues and performance evaluation. Int. J. Control 76, 1337–
1357 (2003)
Guo, B.Y., Shen, J., Wang, L.L.: Generalized Jacobi polynomials/functions and their applications.
Appl. Numer. Math. 59, 1011–1028 (2009)
Hadian Rasanan, A.H., Bajalan, N., Parand, K., Rad, J.A.: Simulation of nonlinear fractional dynam-
ics arising in the modeling of cognitive decision making using a new fractional neural network.
Math. Methods Appl. Sci. 43, 1437–1466 (2020)
Imani, A., Aminataei, A., Imani, A.: Collocation method via Jacobi polynomials for solving non-
linear ordinary differential equations. Int. J. Math. Math. Sci. 2011, 673085 (2011)
Jafarzadeh, S.Z., Aminian, M., Efati, S.: A set of new kernel function for support vector machines:
an approach based on Chebyshev polynomials. In: ICCKE, pp. 412–416 (2013)
Kazem, S.: An integral operational matrix based on Jacobi polynomials for solving fractional-order
differential equations. Appl. Math. Model. 37, 1126–1136 (2013)
6 Fractional Jacobi Kernel Functions … 143
Khader, M.M., Adel, M.: Chebyshev wavelet procedure for solving FLDEs. Acta Appl. Math. 158,
1–10 (2018)
Khodabandehlo, H.R., Shivanian, E., Abbasbandy, S.: Numerical solution of nonlinear delay dif-
ferential equations of fractional variable-order using a novel shifted Jacobi operational matrix.
Eng. Comput. (2021). https://doi.org/10.1007/s00366-021-01422-7
Mastroianni, G., Milovanovic, G.: Interpolation Processes: Basic Theory and Applications. Springer
Science & Business Media, Berlin (2008)
Milovanovic, G.V., Rassias, T.M., Mitrinovic, D.S.: Topics In Polynomials: Extremal Problems.
Inequalities. Zeros. World Scientific, Singapore (1994)
Moayeri, M.M., Hadian Rasanan, A.H., Latifi, S., Parand, K., Rad, J.A.: An efficient space-splitting
method for simulating brain neurons by neuronal synchronization to control epileptic activity.
Eng. Comput. (2020). https://doi.org/10.1007/s00366-020-01086-9
Moayeri, M.M., Rad, J.A., Parand, K.: Desynchronization of stochastically synchronized neural
populations through phase distribution control: a numerical simulation approach. Nonlinear Dyn.
104, 2363–2388 (2021)
Morris, G.R., Abed, K.H.: Mapping a Jacobi iterative solver onto a high-performance heterogeneous
computer. IEEE Trans. Parallel Distrib. Syst. 24, 85–91 (2012)
Nadira, A., Abdessamad, A., Mohamed, B.S.: Regularized Jacobi Wavelets Kernel for support
vector machines. Stat. Optim. Inf. Comput. 7, 669–685 (2019)
Nkengfack, L.C.D., Tchiotsop, D., Atangana, R., Louis-Door, V., Wolf, D.: Classification of EEG
signals for epileptic seizures detection and eye states identification using Jacobi polynomial
transforms-based measures of complexity and least-square support vector machine. Inform. Med.
Unlocked 23, 100536 (2021)
Ozer, S., Chen, C.H., Cirpan, H.A.: A set of new Chebyshev kernel functions for support vector
machine pattern classification. Pattern Recognit. 44, 1435–1447 (2011)
Padierna, L.C., Carpio, M., Rojas-Dominguez, A., Puga, H., Fraire, H.: A novel formulation of
orthogonal polynomial kernel functions for SVM classifiers: the Gegenbauer family. Pattern
Recognit. 84, 211–225 (2018)
Pan, Z.B., Chen, H., You, X. H.: Support vector machine with orthogonal Legendre kernel. In:
International Conference on Wavelet Analysis and Pattern Recognition, pp. 125–130 (2012)
Parand, K., Rad, J.A., Ahmadi, M.: A comparison of numerical and semi-analytical methods for
the case of heat transfer equations arising in porous medium. Eur. Phys. J. Plus 131, 1–15 (2016)
Parand, K., Moayeri, M.M., Latifi, S., Rad, J.A.: Numerical study of a multidimensional dynamic
quantum model arising in cognitive psychology especially in decision making. Eur. Phys. J. Plus
134, 109 (2019)
Ping, Z., Ren, H., Zou, J., Sheng, Y., Bo, W.: Generic orthogonal moments: Jacobi-Fourier moments
for invariant image description. Pattern Recognit. 40, 1245–1254 (2007)
Razzaghi, M., Yousefi, S.: Legendre wavelets method for constrained optimal control problems.
Math. Methods Appl. Sci. 25, 529–539 (2002)
Shojaeizadeh, T., Mahmoudi, M., Darehmiraki, M.: Optimal control problem of advection-
diffusion-reaction equation of kind fractal-fractional applying shifted Jacobi polynomials. Chaos
Solitons Fract. 143, 110568 (2021)
Szeg, G.: Orthogonal Polynomials. American Mathematical Society, Rhode Island (1939)
Tian, M., Wang, W.: Some sets of orthogonal polynomial kernel functions. Appl. Soft Comput. 61,
742–756 (2017)
Upneja, R., Singh, C.: Fast computation of Jacobi-Fourier moments for invariant image recognition.
Pattern Recognit. 48, 1836–1843 (2015)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (2013)
144 A. H. Hadian Rasanan et al.
Yang, W., Zhang, Z., Hong, Y.: State recognition of bolted structures based on quasi-analytic
wavelet packet transform and generalized Gegenbauer support vector machine. In: 2020 IEEE
International Instrumentation and Measurement Technology Conference (I2MTC), pp. 1–6 (2020)
Ye, N., Sun, R., Liu, Y., Cao, L.: Support vector machine with orthogonal Chebyshev kernel. In:
18th International Conference on Pattern Recognition (ICPR’06), vol. 2, pp. 752–755 (2006)
Zhao, J., Yan, G., Feng, B., Mao, W., Bai, J.: An adaptive support vector regression based on a new
sequence of unified orthogonal polynomials. Pattern Recognit. 46, 899–913 (2013)
Part III
Applications of Orthogonal Kernels
Chapter 7
Solving Ordinary Differential Equations
by LS-SVM
Abstract In this chapter, we propose a machine learning method for solving a class
of linear and nonlinear ordinary differential equations (ODEs) which is based on
the least squares-support vector machines (LS-SVM) with collocation procedure.
One of the most important and practical models in this category is Lane-Emden
type equations. By using LS-SVM for solving these types of equations, the solution
is expanded based on rational Legendre functions and the LS-SVM formulation is
presented. Based on this, the linear problems are solved in dual form and a system
of linear algebraic equations is concluded. Finally, by presenting some numerical
examples, the results of the current method are compared with other methods. The
comparison shows that the proposed method is fast and highly accurate with expo-
nential convergence.
7.1 Introduction
Differential equations are a kind of mathematical equation that can be used for
modeling many physical and engineering problems in real life, such as dynamics
M. Razzaghi
Department of Mathematics and Statistics, Mississippi State University, Starkville, USA
e-mail: [email protected]
S. Shekarpaz (B)
Department of Applied Mathematics, Brown University, Providence, RI 02912, USA
e-mail: [email protected]
A. Rajabi
Department of Computer and Data Science, Faculty of Mathematical Sciences, Shahid Beheshti
University, Tehran, Iran
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 147
J. A. Rad et al. (eds.), Learning with Fractional Orthogonal Kernel Classifiers in Support
Vector Machines, Industrial and Applied Mathematics,
https://doi.org/10.1007/978-981-19-6553-1_7
148 M. Razzaghi et al.
of oscillators, cosmology, study of the solar system, the study of unsteady gases,
fluid dynamics, and many other applications (see, e.g., Liu et al. 2018; Rodrigues
et al. 2018; Anderson et al. 2016; Khoury et al. 2018; Farzaneh-Gord and Rahbari
2016; Lusch et al. 2018; Bristeau et al. 1979; Parand et al. 2011; Parand and Rad
2012; Kazem et al. 2011; Parand et al. 2016; Kazem et al. 2012; Parand et al. 2012,
2017; Abbasbandy et al. 2013). In the meantime, Lane-Emden type equations are
an important class of ordinary differential equations on the semi-infinite domain
which was introduced by Jonathan Homer Lane and by Emden as a model for the
temperature of the sun Lane (1870); Emden (1907). These equations have found many
interesting applications in modeling many physical phenomena such as the theory
of stellar structure, thermal behavior of a spherical cloud of gas, isothermal gas
spheres, and the theory of thermionic currents Chandrasekhar and Chandrasekhar
(1957), Wood (1921). As one of the important applications, it can be said that in
astrophysics, this equation describes the equilibrium density distribution in the self-
gravitating sphere of polytropic isothermal gas.
The general form of the Lane-Emden equation is as follows Hadian Rasanan et al.
(2020), Parand et al. (2010), Omidi et al. (2021):
k
y (x) + y (x) + f (x, y(x)) = h(x),
x (7.1)
y(x0 ) = A, y (x0 ) = B,
where A, B are constants and f (x, y) and h(x) are given functions of x and y.
Some of these equations do not have an exact solution, and as a result of their sin-
gularity at x = 0, their numerical solutions are a challenge for scientists Parand et al.
(2009, 2010). Many semi-analytical and numerical approaches have been applied to
solve the Lane-Emden equations, which can be presented as follows: In Bender et al.
(1989), a new perturbation technique based on an artificial parameter δ proposed to
solve the Lane-Emden equation. Also, a non-perturbative analytical solution of this
equation was derived in Shawagfeh (1993) by the Adomian decomposition method
(ADM). On the other hand, Mandelzweig et al. (2001) used the quasi-linearization
method for solving the standard Lane-Emden equation, and Liao (2003) produced an
analytical framework based on the Adomian decomposition method for Lane-Emden
type equations. The approach based on semi-analytical methods continued as He
(2003) obtained the analytical solutions to the problem by using Ritz’s variational
method. Also, in Wazwaz (2006), the modified decomposition method for the ana-
lytic behavior of nonlinear differential equations was used, and Yildirim et al. (2007)
obtained approximate solutions of a class of Lane-Emden equations by homotopy
perturbation method (HPM). Following this approach, Ramos (2005) solved Lane-
Emden equations by using a linearization method and a series solution has been
suggested in Ramos (2008). Those solutions were obtained by writing this equation
as a Volterra integral equation and assuming that the nonlinearities are smooth. Then,
Dehghan (2006; 2008) proposed the Adomian decomposition method for differen-
tial equations with an alternate procedure to overcome the difficulty of singularity.
Also, Aslanov (2008) introduced an improved Adomian decomposition method for
7 Solving Ordinary Differential Equations by LS-SVM 149
equation Mall and Chakraverty (2016). More recently, partial differential equation
Omidi et al. (2021), system, and fractional Hadian Rasanan et al. (2020) versions of
the Lane-Emden equation are solved by orthogonal neural networks.
In this chapter, the LS-SVM algorithm together with the rational Legendre func-
tions as kernel functions is used to solve the Lane-Emden equation. In our proposed
method, the constraints are the collocation form of the residual function. Then the
coefficients of approximate solutions together with the error function are minimized.
To obtain the solution to the minimization problem, the Lagrangian function is used
and the problem is solved in dual form.
The remainder of this chapter is organized as follows. In Sect. 7.2, the LS-SVM
formulation is discussed which is used to solve differential equations. The kernel
trick is also introduced in Sect. 7.3, and their properties together with the operational
matrix of differentiation are presented. In Sect. 7.4, the LS-SVM formulation of the
Lane-Emden equation is given and the problem is solved in dual form. Finally, the
numerical results are presented which show the efficiency of the proposed technique.
We consider the general form of an m-th order initial value problem (IVP), which is
as follows:
1 T γ
minimizew,ε w w + ε T ε,
2 2 (7.3)
subject to yi = wT ϕ(xi ) + εi , i = 0, 1, . . . , N ,
7 Solving Ordinary Differential Equations by LS-SVM 151
In the case that LS-SVM is used to solve a linear ODE, the approximate solution is
obtained by solving the following optimization problem Mehrkanoon et al. (2012),
Pakniyat et al. (2021):
1 T γ
minimizew,ε w w + ε T ε,
2 2 (7.4)
subject to L[ ỹi ] − F(xi ) = εi , i = 0, 1, . . . , N ,
are also chosen to be the roots of rational Legendre functions in the learning phase.
Now we introduce a dual form of the problem:
Theorem 7.1 (Mehrkanoon et al. (2012)) Suppose that ϕi be the basis functions and
α = [α0 , α1 , . . . , α N ]T be the Lagrangian coefficients in the dual form of the mini-
mization problem Eq. 7.4, then the solution of minimization problem is as follows:
⎛ ⎞ ⎛1 ⎞
α0 γ α0
⎛ ⎞ ⎜ 1α ⎟
L(ϕ0 (x0 )) L(ϕ0 (x1 )) ... L(ϕ0 (x N )) ⎜ ⎟
⎜ α1 ⎟ ⎜ γ 1⎟
⎜ L(ϕ1 (x0 )) ⎜ ⎟
⎜ ... ... L(ϕ1 (x N )) ⎟ ⎜ ⎟
⎟⎜ . ⎟, α ⎜ . ⎟
w = Mα = ⎝ ⎠⎜ . ⎟ ε= =⎜ ⎟,
... ... ... ... ⎜ ⎟ γ ⎜ . ⎟
⎜ ⎟
L(ϕ N (x0 )) L(ϕ N (x1 )) ... L(ϕ N (x N )) ⎝ . ⎠ ⎝ . ⎠
αN 1
γ αN
where
1
(M T M + I )α = F, (7.5)
γ
∂G N
= wk − αi (L(ϕk (xi ))) = 0,
∂wk i=0
∂G
= γ εk + αk = 0, (7.7)
∂εk
∂G N
= w j L(ϕ j (xk )) − Fk − εk = 0.
∂αk j=0
N
wk = Mki αi , (Mki = (L(ϕk (xi ))), k = 0, 1, 2, . . . , N ,
i=0
(7.8)
w = Mα,
−αk −1
εk = , ⇒ε= α. (7.9)
γ γ
N
w j L(ϕ j (xk )) − Fk − εk = 0,
j=0 (7.10)
(M T w)k − Fk − εk = 0, ⇒ M T w − ε = F.
1 1
M T Mα + α = F, ⇒ MT M + I α = F, (7.11)
γ γ
Kernel function has an efficient rule in using LS-SVM; therefore, the choice of
kernel function is important. These kernel functions can be constructed by using
orthogonal polynomials. The operational matrices of orthogonal polynomials are
sparse and the derivatives are obtained exactly which makes our method fast and
it leads to well-posed systems. Given that the properties of Legendre polynomials,
fractional Legendre functions, as well as the Legendre kernel functions are discussed
in Chap. 4, we recommend that readers refer to that chapter of the book for review,
and as a result, in this section, we will focus more on the rational Legendre kernels. In
particular, Guo et al. (2000) introduced a new set of rational Legendre functions which
are mutually orthogonal in L 2 (0, +∞) with the weight function w(x) = (L+x) 2L
2 , as
follows:
x−L
R Pn (x) = Pn . (7.12)
x+L
Thus,
x−L
R P0 (x) = 1, R P1 (x) = ,
x+L
(7.13)
x−L
n R Pn (x) = (2n − 1) ( )R Pn−1 (x) − (n − 1) R Pn−2 (x), n ≥ 2,
x+L
where {Pn (x)} are Legendre polynomials which were defined in Chap 5. These
functions are used to solve problems on semi-infinite domains, and they can also
produce sparse matrices and have a high convergence rate Guo et al. (2000), Parand
and Razzaghi (2004).
Since the range of Legendre polynomials is [−1, 1], we have |R Pn (x)| 1. The
operational matrix of the derivative is also a lower Hessenberg matrix which can be
calculated as Parand and Razzaghi (2004), Parand et al. (2009):
1
D= (D1 + D2 ), (7.14)
L
where D1 is a tridiagonal matrix
7i 2 − i − 2 i(i + 1)
D1 = Diag( , −i, ), i = 0, ..., n − 1,
2(2i + 1) 2(2i + 1)
0, j ≥ i − 1,
di j =
(−1)i+ j+1 (2 j + 1), j < i − 1.
154 M. Razzaghi et al.
The rational Legendre kernel function for non-vector data x and z is recommended
as follows:
N
K (x, z) = R Pi (x)R Pi (z). (7.15)
i=0
This function is a valid SVM kernel if that satisfies the conditions of the Mercer
theorem (see, e.g., Suykens et al. (2002)).
Theorem 7.2 To be a valid SVM kernel, for any finite function g(x), the following
integration should always be nonnegative for the given kernel function K (x, z):
Proof Consider g(x) to be a function and K (x, z) that was defined in Eq. 7.15, then
by using the Mercer condition, one obtains
N
K (x, z)g(x)g(z)d xdz = R Pi (x)R Pi (z)g(x)g(z)d xdz
i=0
N
= ( R Pi (x)R Pi (z)g(x)g(z)d xdz)
i=0
N
= ( R Pi (x)g(x)d x R Pi (z)g(z)dz) (7.17)
i=0
N
= ( R Pi (x)g(x)d x R Pi (x)g(x)d x)
i=0
N
= ( R Pi (x)g(x)d x)2 ≥ 0.
i=0
In this section, we implement the proposed method to solve the Lane-Emden type
equations. As we said before, these types of equations are an important class of
ordinary differential equations in the semi-infinite domain. The general form of the
Lane-Emden equation is as follows:
7 Solving Ordinary Differential Equations by LS-SVM 155
k
y (x) + y (x) + f (x, y(x)) = h(x),
x (7.18)
y(x0 ) = A, y (x0 ) = B,
where A, B are constants and f (x, y(x)) and h(x) are given functions of x and y.
Since the function f (x, y(x) can be both linear or nonlinear, in this chapter, we
assume that this function can be reformed to f (x, y(x)) = f (x)y(x). In this case,
there is no need for any linearization method and the LS-SVM method can be directly
applied to Eq. 7.18. For the nonlinear cases, the quasi-linearization method can be
applied to Eq. 7.18 first and then the solution is approximated by using the LS-SVM
algorithm. Now we can consider the LS-SVM formulation of this equation, which is
1 T γ
minimizew,ε w w + ε T ε,
2 2
(7.19)
d 2 ỹi k d ỹi
subject to + ( ) + f (xi ) ỹ(xi ) − h(xi ) = εi , k > 0,
dx2 xi d x
where ỹi = ỹ(xi ) is the approximate solution. We use the Lagrangian function and
then
1 γ N
d 2 ỹi k d ỹi
G = wT w + ε T ε − αi 2
+ ( ) + f (xi ) ỹ(xi ) − h(xi ) − εi .
2 2 i=0
d x x i dx
(7.20)
So, by expanding the solution in terms of the rational Legendre kernels, one obtains
1 2 γ 2
N N N N
G= wi + εi − αi w j ϕ j (xi )
2 i=0 2 i=0 i=0 j=0
(7.21)
k
N N
+ w j ϕ j (xi ) + f (xi ) w j ϕ j (xi ) − h(xi ) − εi ,
xi j=0 j=0
where {ϕ j } are the rational Legendre kernels. Then by employing the Karush-Kuhn-
Tucker (KKT) optimality conditions for 0 ≤ l ≤ N , we conclude that
∂G N
k
= wl − αi ϕl (xi ) + ϕl (xi ) + f (xi )ϕl (xi ) = 0,
∂wl i=0
xi
∂G N
k
N N
= w j ϕ j (xl ) + w j ϕ j (xl ) + f (xl ) w j ϕ j (xl ) − h(xl ) − εl = 0,
∂αl j=0
xl j=0 j=0
∂G
= γ εl + αl = 0,
∂εl
(7.22)
156 M. Razzaghi et al.
N
k
wl = αi (ϕl (xi ) + ϕ (xi ) + f (xi )ϕl (xi ))
i=0
xi l
(7.23)
N
= αi ψl (xi ),
i=0
where ψl (xi ) = ϕl (xi ) + xki ϕl (xi ) f (xi )ϕl (xi ). Now, we can show the matrix form
as follows:
⎡ ⎤ ⎡ ⎤⎡ ⎤
w0 ψ0 (x0 ) ψ0 (x1 ) ψ0 (x2 ) . . . ψ0 (x N ) α0
⎢w1 ⎥ ⎢ ψ1 (x0 ) ψ1 (x1 ) ψ1 (x2 ) . . . ψ1 (x N ) ⎥ ⎢ α1 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ .. ⎥ = ⎢ .. .. .. .. .. ⎥ ⎢ .. ⎥ ,
⎣ . ⎦ ⎣ . . . . . ⎦⎣ . ⎦
wn ψ N (x0 ) ψ N (x1 ) ψ N (x2 ) . . . ψ N (x N ) αN
−αl
∀l : l = , (7.24)
γ
N N
αl
( αi ψ j (xi ))ψ j (xl ) + = h(xl ), 0 ≤ l ≤ N, (7.25)
j=0 i=0
γ
derivatives of the feature map can be written in terms of derivatives of the kernel
function. Let us consider the following differential operator:
∂ m+n
∇ m,n = ,
∂u m ∂vn
which will be used in this section.
(m,n)
We define χ (m,n) (u, v) = ∇ m,n K (u, v) and χl,i = ∇ m,n K (u, v)|u=x(l),v=x(i) ,
and Eq. 7.25 is written as follows:
N
(2,2) k (1,2) (0,2) k (2,1) k 2 (1,1) k (0,1)
hl = αi [χl,i + χl,i + fl χl,i + χl,i + χ + fl χl,i
i=0
xl xi xi xl l,i xi
(2,0) k (1,0) (0,0) αl
+ f i χl,i + fi χ + f i fl χl,i ]+ , 0 ≤ l ≤ N.
xl l,i γ
(7.26)
So we calculate {αi }0≤i≤N and by substituting in Eq. 7.23, {wl }0≤l≤N is computed,
and
n
k
ỹ = αi (χ (2,0) (xi , x) + χ (1,0) (xi , x) + f i χ (0,0) (xi , x)) (7.27)
i=0
xi
Test case 1: Ramos (2005), Yıldırım and Öziş (2009), Chowdhury and Hashim
(2007) Let us consider f (x, y) = −2(2x 2 + 3)y, k = 2, h(x) = 0, A = 1, and B =
0 in Eq. 7.18, then the linear Lane-Emden equation is as follows:
158 M. Razzaghi et al.
2
y (x) +y (x) − 2(2x 2 + 3)y = 0, x ≥ 0,
x (7.28)
y(0) = 1, y (0) = 0.
2
The exact solution is y(x) = e x . This type of equation has been solved with
linearization, VIM, and HPM methods (see, e.g., Ramos (2005), Yıldırım and Öziş
(2009), Chowdhury and Hashim (2007)).
By using the proposed method, the numerical results of this test case in [0, 1]
have been obtained which are depicted in Fig. 7.1 with 30 training points. The results
and related training error function with 180 training points for solving the problem
in [0, 2] are also shown in Fig. 7.2.
The testing error by using 50 equidistant points for solving the problem in intervals
[0, 1] and [0, 2] are displayed in Fig. 7.3a, b, where the maximum testing error in this
example is 2.23 × 10−13 and 4.46−14 , respectively. Moreover, the absolute errors for
arbitrary testing data have been computed and presented in Table 7.1 for N = 180
and the optimal value of L = 4.59.
The maximum norm of testing errors with different numbers of basis functions
(training points) has also been presented in Table 7.2, which indicates the convergence
of the method for solving this kind of linear Lane-Emden equation.
Test case 2: Ramos (2005), Chowdhury and Hashim (2009), Bataineh et al. (2009),
Zhang et al. (2006) Considering f (x, y) = x y, h(x) = x 5 − x 4 + 44x 2 − 30x, k =
8, A = 0, and B = 0 in Eq. 7.18, we have the linear Lane-Emden equation,
8
y (x) +y (x) + x y(x) = x 5 − x 4 + 44x 2 − 30x, x ≥ 0,
x (7.29)
y(0) = 0, y (0) = 0,
which has the exact solution y(x) = x 4 − x 3 and has been solved with lineariza-
tion, HPM, HAM, and two-step ADM (TSADM) methods (see, e.g., (Ramos, 2005;
(a) (b)
Fig. 7.1 a Numerical results for training points in [0, 1]. b Obtained training errors (Test case 1)
7 Solving Ordinary Differential Equations by LS-SVM 159
(a) (b)
Fig. 7.2 a Numerical results for training points in [0, 2]. b Obtained training errors (Test case 1)
(a) (b)
Fig. 7.3 Error function for testing points a in [0, 1] with M = 50 and N = 30 and b in [0, 2] with
M = 50 and N = 180 (Test case 1)
Chowdhury and Hashim, 2009; Bataineh et al., 2009; Zhang et al., 2006)). By apply-
ing Eqs. 7.26–7.27, the approximate solutions are calculated. The numerical solu-
tions together with the training error function by using 30 points in [0, 10] have been
presented in Fig. 7.4.
The proposed algorithm is also tested with some arbitrary points in Table 7.3 for
N = 60 and L = 18. Moreover, in Fig. 7.5, the error function in equidistant testing
data has been plotted. The maximum norm of error by using 50 testing points is
7.77 × 10−12 .
In Table 7.4, the norm of testing error for different values of N has been obtained.
We can see that, with increasing the number of training points, the testing error
decreases, which shows the good performance and convergence of our algorithm to
solve this example.
Test case 3: Ramos (2005), Yıldırım and Öziş (2009), Chowdhury and Hashim
(2007), Zhang et al. (2006) In this test case, we consider f (x, y) = y, h(x) = 6 +
160 M. Razzaghi et al.
Table 7.1 The absolute errors of the present method for testing points in [0, 2] with N = 180 and
L = 4.59 (Test case 1)
Testing data Error Exact value
0.00 0.00000 1.000000000
0.01 2.2730 × 10−17 1.000100005
0.02 3.6336 × 10−18 1.000400080
0.05 1.0961 × 10−16 1.002503127
0.10 1.4026 × 10−16 1.010050167
0.20 1.1115 × 10−15 1.040810774
0.50 7.4356 × 10−16 1.284025416
0.70 2.8378 × 10−15 1.632316219
0.80 1.5331 × 10−15 1.896480879
0.90 8.3772 × 10−15 2.247907986
1.00 1.8601 × 10−14 2.718281828
1.1 4.0474 × 10−15 3.353484653
1.2 2.6672 × 10−14 4.220695816
1.5 3.9665 × 10−14 9.487735836
1.7 9.3981 × 10−15 17.99330960
1.8 4.1961 × 10−14 25.53372174
1.9 3.0212 × 10−14 36.96605281
2.0 3.6044 × 10−16 54.59815003
Table 7.2 Maximum norm of errors for testing data in [0, 2] with M = 50 and different values of
N (Test case 1)
N Error norm N Error norm
12 1.41 × 10−1 80 2.79 × 10−11
20 2.51 × 10−4 100 3.03 × 10−12
30 1.31 × 10−6 120 6.41 × 10−13
40 4.20 × 10−8 140 2.15 × 10−13
50 5.69 × 10−9 150 1.30 × 10−13
60 4.77 × 10−10 180 4.46 × 10−14
2
y (x) +y (x) + y(x) = 6 + 12x + x 2 + x 3 , x ≥ 0,
x (7.30)
y(0) = 0, y (0) = 0,
7 Solving Ordinary Differential Equations by LS-SVM 161
(a) (b)
Fig. 7.4 a Numerical results for training points in [0, 10]. b Obtained training errors (Test case 2)
Table 7.3 The absolute errors of the present method for testing points with N = 60 and L = 18
(Test case 2)
Testing data Error Exact value
0.00 0 0.0000000000
0.01 4.8362 × 10−16 −0.0000009900
0.10 1.4727 × 10−15 −0.0009000000
0.50 1.7157 × 10−15 −0.0625000000
1.00 7.8840 × 10−15 0.0000000000
2.00 2.7002 × 10−15 8.000000000
3.00 2.4732 × 10−14 54.00000000
4.00 2.2525 × 10−14 192.0000000
5.00 4.1896 × 10−14 500.0000000
6.00 6.3632 × 10−15 1080.000000
7.00 5.4291 × 10−14 2058.000000
8.00 7.0818 × 10−14 3584.00000
9.00 1.0890 × 10−14 5832.00000
10.00 6.5032 × 10−16 9000.000000
which has the exact solution y(x) = x 2 + x 3 . This example has also been solved in
Ramos (2005), Yıldırım and Öziş (2009), Chowdhury and Hashim (2007), and Zhang
et al. (2006) with linearization, VIM, HPM, and TSADM methods, respectively.
The proposed method is used and the numerical solutions of this example are
obtained in 30 training points which can be seen in Fig. 7.6. It should be noted that
the optimal value of L = 14 has been used in this example. The testing error function
is also shown in Fig. 7.7 with 50 equidistant points. In Table 7.5, the numerical results
in arbitrary testing data with N = 60 have been reported which show the efficiency
of the LS-SVM model for solving this kind of problem. The maximum norm of
162 M. Razzaghi et al.
Fig. 7.5 Error function for 50 equidistant testing points with N = 30 and L = 18 (Test case 2)
Table 7.4 Maximum norm of testing errors obtained for M = 50 and L = 18 with different values
of N (Test case 2)
N Error norm
8 3.78 × 10−1
12 2.14 × 10−4
20 1.31 × 10−9
30 7.77 × 10−12
40 1.32 × 10−12
50 2.30 × 10−13
60 7.08 × 10−14
(a) (b)
Fig. 7.6 a Numerical results with training points in [0, 10]. b obtained training errors (Test
case 3)
7 Solving Ordinary Differential Equations by LS-SVM 163
Table 7.5 The absolute errors of the present method in testing points with N = 60 and L = 14
(Test case 3)
Testing data Error Exact value
0.00 2.9657 × 10−28 0.0000000000
0.01 3.6956 × 10−17 0.0001010000
0.10 2.1298 × 10−16 0.0110000000
0.50 6.8302 × 10−17 0.3750000000
1.00 8.2499 × 10−17 2.0000000000
2.00 8.3684 × 10−17 12.000000000
3.00 6.7208 × 10−17 36.000000000
4.00 2.2338 × 10−17 80.000000000
5.00 1.5048 × 10−16 150.00000000
6.00 3.9035 × 10−17 252.00000000
7.00 1.3429 × 10−16 392.00000000
8.00 2.4277 × 10−17 576.00000000
9.00 4.1123 × 10−17 810.00000000
10.00 6.5238 × 10−18 1100.0000000
testing errors with different values of N and M = 50 is recorded in Table 7.6 and
so, the convergence of the LS-SVM model is concluded.
Test case 4: Hadian Rasanan et al. (2020), Omidi et al. (2021) (Standard Lane-
Emden equation) Considering f (x, y) = y m , k = 2, h(x) = 0, A = 1, and B = 0 in
Eq. 7.18, then the standard Lane-Emden equation is
2
y (x) + y (x) + y m (x) = 0, x ≥ 0,
x (7.31)
y(0) = 1, y (0) = 0,
164 M. Razzaghi et al.
Table 7.6 Maximum norm of testing errors for M = 50 and L = 14 with different values of N
(Test case 3)
N Error norm
8 4.88 × 10−2
12 5.85 × 10−5
20 9.15 × 10−11
30 4.42 × 10−14
40 2.61 × 10−15
50 3.89 × 10−16
60 1.82 × 10−16
(a) (b)
Fig. 7.8 Numerical results with training points in [0, 10] (a) and obtained training errors (b) for
m = 0 (Test case 4)
1 2 sin(x) x 2 −1
y(x) = 1 − x , y(x) = and y(x) = (1 + )2, (7.32)
3! x 3
respectively. By applying the LS-SVM formulation to solve the standard Lane-
Emden equation, the approximate solutions are calculated. The numerical solutions
of this example for m = 0 and 30 training points together with the error function
are shown in Fig. 7.8. For testing our algorithm, based on 50 equidistant points, the
obtained error function is presented in Fig. 7.9 with 30 training points. Moreover, the
numerical approximations for arbitrary testing data have been reported in Table 7.7,
which shows the accuracy of our proposed method.
7 Solving Ordinary Differential Equations by LS-SVM 165
Fig. 7.9 Error function in equidistant testing points with m = 0, N = 30, and L = 30 (Test
case 4)
Table 7.7 The absolute errors of proposed method in testing points with m = 0, N = 60, and
L = 30 (Test case 4)
Testing data Error Exact value
0 0.00000 1.00000000
0.1 1.2870 × 10−22 0.99833333
0.5 4.3059 × 10−22 0.95833333
1.0 7.0415 × 10−22 0.83333333
5.0 1.1458 × 10−20 −3.16666666
6.0 7.4118 × 10−21 −5.00000000
6.8 1.5090 × 10−20 −6.70666666
The numerical results for m = 1 have been presented in Fig. 7.10 with 30 training
points. Its related error function has also been plotted. The testing error function is
displayed in Fig. 7.11, where the maximum norm of error is equal to 1.56 × 10−12 .
The numerical results for arbitrary values of testing data are also shown in Table 7.8
for m = 1 and N = 60.
By using this model and for m = 0, 1, the maximum norm of errors for differ-
ent numbers of training points have been reported in Table 7.9, which shows the
convergence of the method.
166 M. Razzaghi et al.
(a) (b)
Fig. 7.10 Numerical results with training points in [0, 10] (a) and obtained training errors (b) for
m = 1 (Test case 4)
Fig. 7.11 Error function in equidistant testing points with m = 1, N = 30, and L = 30 (Test
case 4)
Table 7.8 The absolute errors of the present method in testing points with m = 1, N = 60, and
L = 30 (Test case 4)
Testing data Error Exact value
0 1.80 × 10−32 1.0000000000
0.1 6.28 × 10−21 0.9983341665
0.5 1.52 × 10−20 0.9588510772
1.0 4.27 × 10−20 0.8414709848
5.0 2.86 × 10−19 −0.1917848549
6.0 3.71 × 10−19 −0.0465692497
6.8 1.52 × 10−19 0.0726637280
7 Solving Ordinary Differential Equations by LS-SVM 167
Table 7.9 Maximum norm of testing errors for M = 50 and L = 30 with different values of N
(Test case 4)
N Error norm (m = 0) Error norm (m = 1)
8 7.99 × 10−7 9.42 × 10−3
12 4.30 × 10−9 6.10 × 10−4
20 6.78 × 10−18 2.60 × 10−9
30 8.46 × 10−19 1.56 × 10−12
40 8.71 × 10−20 2.91 × 10−15
50 5.81 × 10−20 1.73 × 10−17
60 1.54 × 10−20 3.72 × 10−19
7.6 Conclusion
In this chapter, we developed the least squares-support vector machine model for
solving various forms of Lane-Emden type equations. For solving this problem,
the collocation LS-SVM formulation was applied and the problem was solved in
dual form. The rational Legendre functions were also employed to construct kernel
function, because of their good properties to approximate functions on semi-infinite
domains.
We used the shifted roots of the Legendre polynomial for training our algorithm
and equidistance points as the testing points. The numerical results by applying the
training and testing points show the accuracy of our proposed model. Moreover,
the exponential convergence of our proposed method was achieved by choosing the
different numbers of training points and basis functions. In other words, when the
number of training points is increased, the norm of errors decreased exponentially.
References
Abbasbandy, S., Modarrespoor, D., Parand, K., Rad, J.A.: Analytical solution of the transpira-
tion on the boundary layer flow and heat transfer over a vertical slender cylinder. Quaestiones
Mathematicae 36, 353–380 (2013)
Anderson, D., Yunes, N., Barausse, E.: Effect of cosmological evolution on Solar System constraints
and on the scalarization of neutron stars in massless scalar-tensor theories. Phys. Rev. D 94,
104064 (2016)
Aslanov, A.: Determination of convergence intervals of the series solutions of Emden-Fowler equa-
tions using polytropes and isothermal spheres. Phys. Lett. A 372, 3555–3561 (2008)
Aslanov, A.: A generalization of the Lane-Emden equation. Int. J. Comput. Math. 85, 1709–1725
(2008)
Bataineh, A.S., Noorani, M.S.M., Hashim, I.: Homotopy analysis method for singular IVPs of
Emden-Fowler type. Commun. Nonlinear Sci. Numer. Simul. 14, 1121–1131 (2009)
Bender, C.M., Milton, K.A., Pinsky, S.S., Simmons, L.M., Jr.: A new perturbative approach to
nonlinear problems. J. Math. Phys. 30, 1447–1455 (1989)
168 M. Razzaghi et al.
Bristeau, M.O., Pironneau, O., Glowinski, R., Periaux, J., Perrier, P.: On the numerical solution of
nonlinear problems in fluid dynamics by least squares and finite element methods (I) least square
formulations and conjugate gradient solution of the continuous problems. Comput. Methods
Appl. Mech. Eng. 17, 619–657 (1979)
Chandrasekhar, S., Chandrasekhar, S.: An introduction to the study of stellar structure, vol. 2. North
Chelmsford, Courier Corporation (1957)
Chowdhury, M.S.H., Hashim, I.: Solutions of a class of singular second-order IVPs by Homotopy-
perturbation method. Phys. Lett. A 365, 439–447 (2007)
Chowdhury, M.S.H., Hashim, I.: Solutions of Emden-Fowler equations by Homotopy-perturbation
method. Nonlinear Anal. Real World Appl. 10, 104–115 (2009)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Dehghan, M., Shakeri, F.: The use of the decomposition procedure of Adomian for solving a delay
differential equation arising in electrodynamics. Phys. Scr. 78, 065004 (2008)
Dehghan, M., Shakeri, F.: Approximate solution of a differential equation arising in astrophysics
using the variational iteration method. New Astron. 13, 53–59 (2008)
Dehghan, M., Tatari, M.: The use of Adomian decomposition method for solving problems in
calculus of variations. Math. Probl. Eng. 2006, 1–12 (2006)
Emden, R.: Gaskugeln: Anwendungen der mechanischen Warmetheorie auf kosmologische und
meteorologische Probleme, BG Teubner (1907)
Farzaneh-Gord, M., Rahbari, H.R.: Unsteady natural gas flow within pipeline network, an analytical
approach. J. Nat. Gas Sci. Eng. 28, 397–409 (2016)
Guo, B.Y., Shen, J., Wang, Z.Q.: A rational approximation and its applications to differential equa-
tions on the half line. J. Sci. Comput. 15, 117–147 (2000)
Hadian Rasanan, A.H., Rahmati, D., Gorgin, S., Parand, K.: A single layer fractional orthogonal
neural network for solving various types of Lane-Emden equation. New Astron. 75, 101307
(2020)
He, J.H.: Variational approach to the Lane-Emden equation. Appl. Math. Comput. 143, 539–541
(2003)
Horedt, G.P.: Polytropes: Applications in Astrophysics and Related Fields. Klawer Academic Pub-
lishers, New York (2004)
Hossayni, S.A., Rad, J.A., Parand, K., Abbasbandy, S.: Application of the exact operational matrices
for solving the Emden-Fowler equations, arising in astrophysics. Int. J. Ind. Math. 7, 351–374
(2015)
Kara, A. H., Mahomed, F. M.: Equivalent lagrangians and the solution of some classes of non-linear
equations. Int. J. Non Linear Mechcs. 27, 919–927 (1992)
Kara, A.H., Mahomed, F.M.: A note on the solutions of the Emden-Fowler equation. Int. J. Non
Linear Mechcs. 28, 379–384 (1993)
Kazem, S., Rad, J.A., Parand, K., Abbasbandy, S.: A new method for solving steady flow of a third-
grade fluid in a porous half space based on radial basis functions. Zeitschrift für Naturforschung
A 66, 591–598 (2011)
Kazem, S., Rad, J.A., Parand, K., Shaban, M., Saberi, H.: The numerical study on the unsteady flow
of gas in a semi-infinite porous medium using an RBF collocation method. Int. J. Comput. Math.
89, 2240–2258 (2012)
Khoury, J., Sakstein, J., Solomon, A.R.: Superfluids and the cosmological constant problem. J.
Cosmol. Astropart. Phys. 2018, 024 (2018)
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial
differential equations. IEEE Trans. Neural Netw. 9, 987–1000 (1998)
Lane, H.J.: On the theoretical temperature of the sun, under the hypothesis of a gaseous mass
maintaining its volume by its internal heat, and depending on the laws of gases as known to
terrestrial experiment. Am. J. Sci. 2, 57–74 (1870)
Lázaro, M., Santamaría, I., Pérez-Cruz, F., Artés-Rodríguez, A.: Support vector regression for the
simultaneous learning of a multivariate function and its derivatives. Neurocomputing 69, 42–61
(2005)
7 Solving Ordinary Differential Equations by LS-SVM 169
Liao, S.: A new analytic algorithm of Lane-Emden type equations. Appl. Math. Comput. 142, 1–16
(2003)
Liu, Q.X., Liu, J.K., Chen, Y.M.: A second-order scheme for nonlinear fractional oscillators based
on Newmark-β algorithm. J. Comput. Nonlinear Dyn. 13, 084501 (2018)
Lusch, B., Kutz, J.N., Brunton, S.L.: Deep learning for universal linear embeddings of nonlinear
dynamics. Nat. Commun. 9, 1–10 (2018)
Malek, A., Beidokhti, R.S.: Numerical solution for high order differential equations using a hybrid
neural network-optimization method. Appl. Math. Comput. 183, 260–271 (2006)
Mall, S., Chakraverty, S.: Chebyshev neural network based model for solving Lane-Emden type
equations. Appl. Math. Comput. 247, 100–114 (2014)
Mall, S., Chakraverty, S.: Numerical solution of nonlinear singular initial value problems of Emden-
Fowler type using Chebyshev Neural Network method. Neurocomputing 149, 975–982 (2015)
Mall, S., Chakraverty, S.: Application of Legendre neural network for solving ordinary differential
equations. Appl. Soft Comput. 43, 347–356 (2016)
Mandelzweig, V.B., Tabakin, F.: Quasilinearization approach to nonlinear problems in physics with
application to nonlinear ODEs. Comput. Phys. Commun 141, 268–281 (2001)
Marzban, H.R., Tabrizidooz, H.R., Razzaghi, M.: Hybrid functions for nonlinear initial-value prob-
lems with applications to Lane-Emden type equations. Phys. Lett. A 372, 5883–5886 (2008)
Mehrkanoon, S., Suykens, J.A.: LS-SVM based solution for delay differential equations. J. Phys.:
Conf. Ser. 410, 012041 (2013)
Mehrkanoon, S., Falck, T., Suykens, J.A.: Approximate solutions to ordinary differential equations
using least squares support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 23, 1356–
1367 (2012)
Omidi, M., Arab, B., Hadian Rasanan, A.H., Rad, J.A., Parand, K.: Learning nonlinear dynamics
with behavior ordinary/partial/system of the differential equations: looking through the lens of
orthogonal neural networks. Eng. Comput. 1–20 (2021)
Pakniyat, A., Parand, K., Jani, M.: Least squares support vector regression for differential equations
on unbounded domains. Chaos Solitons Fract. 151, 111232 (2021)
Parand, K., Nikarya, M., Rad, J.A., Baharifard, F.: A new reliable numerical algorithm based on the
first kind of Bessel functions to solve Prandtl-Blasius laminar viscous flow over a semi-infinite
flat plate. Zeitschrift für Naturforschung A 67 665-673 (2012)
Parand, K., Khaleqi, S.: The rational Chebyshev of second kind collocation method for solving a
class of astrophysics problems. Eur. Phys. J. Plus 131, 1–24 (2016)
Parand, K., Pirkhedri, A.: Sinc-collocation method for solving astrophysics equations. New Astron.
15, 533–537 (2010)
Parand, K., Rad, J.A.: Exp-function method for some nonlinear PDE’s and a nonlinear ODE’s. J.
King Saud Univ.-Sci. 24, 1–10 (2012)
Parand, K., Razzaghi, M.: Rational Legendre approximation for solving some physical problems
on semi-infinite intervals. Phys. Scr. 69, 353–357 (2004)
Parand, K., Shahini, M., Dehghan, M.: Rational Legendre pseudospectral approach for solving
nonlinear differential equations of Lane-Emden type. J. Comput. Phys. 228, 8830–8840 (2009)
Parand, K., Dehghan, M., Rezaei, A.R., Ghaderi, S.M.: An approximation algorithm for the solution
of the nonlinear Lane-Emden type equations arising in astrophysics using Hermite functions
collocation method. Comput. Phys. Commun. 181, 1096–1108 (2010)
Parand, K., Abbasbandy, S., Kazem, S., Rad, J.A.: A novel application of radial basis functions for
solving a model of first-order integro-ordinary differential equation. Commun. Nonlinear Sci.
Numer. Simul. 16, 4250–4258 (2011)
Parand, K., Nikarya, M., Rad, J.A.: Solving non-linear Lane-Emden type equations using Bessel
orthogonal functions collocation method. Celest. Mech. Dyn. Astron. 116, 97–107 (2013)
Parand, K., Hossayni, S.A., Rad, J.A.: An operation matrix method based on Bernstein polynomials
for Riccati differential equation and Volterra population model. Appl. Math. Model. 40, 993–1011
(2016)
170 M. Razzaghi et al.
Parand, K., Lotfi, Y., Rad, J.A.: An accurate numerical analysis of the laminar two-dimensional
flow of an incompressible Eyring-Powell fluid over a linear stretching sheet. Eur. Phys. J. Plus
132, 1–21 (2017)
Ramos, J.I.: Linearization techniques for singular initial-value problems of ordinary differential
equations. Appl. Math. Comput. 161, 525–542 (2005)
Ramos, J.I.: Series approach to the Lane-Emden equation and comparison with the homotopy
perturbation method. Chaos Solitons Fract. 38, 400–408 (2008)
Rodrigues, C., Simoes, F.M., da Costa, A.P., Froio, D., Rizzi, E.: Finite element dynamic analysis
of beams on nonlinear elastic foundations under a moving oscillator. Eur. J. Mech. A Solids 68,
9–24 (2018)
Shawagfeh, N.T.: Nonperturbative approximate solution for Lane-Emden equation. J. Math. Phys.
34, 4364–4369 (1993)
Singh, O.P., Pandey, R.K., Singh, V.K.: An analytic algorithm of Lane-Emden type equations arising
in astrophysics using modified Homotopy analysis method. Comput. Phys. Commun. 180, 1116–
1124 (2009)
Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J.: Least Squares
Support Vector Machines. World Scientific, NJ (2002)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Wazwaz, A.M.: A new algorithm for solving differential equations of Lane-Emden type. Appl.
Math. Comput. 118, 287–310 (2001)
Wazwaz, A.M.: The modified decomposition method for analytic treatment of differential equations.
Appl. Math. Comput. 173, 165–176 (2006)
Wood, D.O.: Monographs on physics. In: The Emission of Electricity from Hot Bodies. Longmans,
Green and Company (1921)
Yıldırım, A., Öziş, T.: Solutions of singular IVPs of Lane-Emden type by Homotopy perturbation
method. Phys. Lett. A 369, 70–76 (2007)
Yıldırım, A., Öziş, T.: Solutions of singular IVPs of Lane-Emden type by the variational iteration
method. Nonlinear Anal. Theory Methods Appl. 70, 2480–2484 (2009)
Yousefi, S.A.: Legendre wavelets method for solving differential equations of Lane-Emden type.
Appl. Math. Comput. 181, 1417–1422 (2006)
Yüzbaşı, Ş, Sezer, M.: An improved Bessel collocation method with a residual error function to
solve a class of Lane-Emden differential equations. Math. Comput. Model. 57, 1298–1311 (2013)
Zhang, B.Q., Wu, Q.B., Luo, X.G.: Experimentation with two-step Adomian decomposition method
to solve evolution models. Appl. Math. Comput. 175, 1495–1502 (2006)
Chapter 8
Solving Partial Differential Equations
by LS-SVM
Abstract In recent years, much attention has been paid to machine learning-based
numerical approaches due to their applications in solving difficult high-dimensional
problems. In this chapter, a numerical method based on support vector machines is
proposed to solve second-order time-dependent partial differential equations. This
method is called the least squares support vector machines (LS-SVM) collocation
approach. In this approach, first, the time dimension is discretized by the Crank–
Nicolson algorithm, then, the optimal representation of the solution is obtained in
the primal setting. Using KKT optimality conditions, the dual formulation is derived,
and at the end, the problem is converted to a linear system of algebraic equations that
can be solved by standard solvers. The Fokker–Planck and generalized Fitzhugh–
Nagumo equations are considered as test cases to demonstrate the proposed effec-
tiveness of the scheme. Moreover, two kinds of orthogonal kernel functions are
introduced for each example, and their performances are compared.
8.1 Introduction
Nowadays, most mathematical models of problems in science have more than one
independent variable which usually represents time and space variables (Moayeri
et al. 2020b; Mohammadi and Dehghan 2020, 2019; Hemami et al. 2021). These
models lead to partial differential equations (PDEs). However, these equations usu-
ally do not have exact/analytical solutions due to their complexity. Thus, numerical
methods can help us approximate the solution of equations and simulate models. Let
us consider the following general form of a second-order PDE:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 171
J. A. Rad et al. (eds.), Learning with Fractional Orthogonal Kernel Classifiers in Support
Vector Machines, Industrial and Applied Mathematics,
https://doi.org/10.1007/978-981-19-6553-1_8
172 M. M. Moayeri and M. Hemami
∂ 2u ∂ 2u ∂ 2u ∂u ∂u
A + B + C +D +E + Fu + G = 0, (8.1)
∂x2 ∂ x∂ y ∂ y2 ∂x ∂y
FVM (Ghidaglia et al. 2001; Bertolazzi and Manzini 2001), vertex-centered FVM
(Asouti et al. 2011; Zhang and Zou 2013), Petrov–Galerkin FVM (Dubois 2000;
Moosavi and Khelil 2008), etc. (Zhao et al. 1996; Liu et al. 2014; Fallah 2004).
• Spectral method (Shizgal 2015): Unlike previous approaches, the spectral method
is a high-order global method. In the spectral method, the solution of the differ-
n
ential equation is considered as a finite sum of basis functions as i=0 ai φi (x).
Usually, basis functions φi (x) are one of the orthogonal polynomials because of
their orthogonality properties which make the calculations easier (Asghari et al.
2022). Now, different strategies are developed to calculate the coefficients in the
sum in order to satisfy the differential equation as well as possible. Generally, in
simple geometries for smooth problems, the spectral methods offer exponential
rates of convergence/spectral accuracy (Spalart et al. 1991; Mai-Duy 2006). This
method is divided into three main categories: collocation (Zayernouri and Kar-
niadakis 2014, 2015), Galerkin (Shen 1994; Chen et al. 2008), Petrov–Galerkin
methods (Gamba and Rjasanow 2018; Zayernouri et al. 2015), etc. (Moayeri et al.
2020b; Kopriva 2009; Delkhosh and Parand 2021; Latifi and Delkhosh 2020).
• Meshfree method (Fasshauer 2007; Liu 2003): The meshfree method is used to
establish a system of algebraic equations for the whole problem domain without
the use of a predefined mesh, or uses easily generable meshes in a much more
flexible or freer manner. It can be said that meshfree methods essentially use a
set of nodes scattered within the problem domain as well as on the boundaries
to represent the problem domain and its boundaries. The field functions are then
approximated locally using these nodes (Liu 2003). Meshfree methods have been
considered by many researchers in the last decade due to their high flexibility and
high accuracy of the numerical solution (Rad and Parand 2017a; Dehghan and
Shokri 2008). However, there are still many challenges and questions about these
methods that need to be answered (Abbasbandy and Shirzadi 2010, 2011), such
as optimal shape parameter selection in methods based on radial basis functions
and enforcing boundary conditions in meshfree methods. Some of the meshfree
methods are radial basis function approach (RBF) (Rad et al. 2012, 2014; Parand
et al. 2017; Kazem and Rad 2012; Kazem et al. 2012b; Rashedi et al. 2014;
Parand and Rad 2013), radial basis function generated finite difference (RBF-
FD) (Mohammadi et al. 2021; Abbaszadeh and Dehghan 2020a), meshfree local
Petrov–Galerkin (MLPG) (Rad and Parand 2017a, b; Abbaszadeh and Dehghan
2020b; Rad et al. 2015a), element free Galerkin (EFG) (Belytschko et al. 1994;
Dehghan and Narimani 2020), meshfree local radial point interpolation method
(MLRPIM) (Liu et al. 2005; Liu and Gu 2001; Rad and Ballestra 2015; Rad et al.
2015b), etc. (Hemami et al. 2020, 2019; Mohammadi and Dehghan 2020, 2019).
• Machine learning-based methods (Cortes and Vapnik 1995; Jordan and Mitchell
2015): With the growth of available scientific data and improving machine learn-
ing approaches, recently, researchers of scientific computing have been trying to
develop machine learning and deep learning algorithms for solving differential
equations (Aarts and Van Der Veer 2001; Cheung and See 2021). Especially, they
have had some successful attempts to solve some difficult problems that common
numerical methods are not well able to solve such as PDEs by noisy observations
174 M. M. Moayeri and M. Hemami
∂u ∂u ∂ 2u
= A(x, t, u)u + B(x, t, u) + C(x, t, u) 2 , (8.2)
∂t ∂x ∂x
x ∈ ⊂ R and t ∈ [0, T ].
8 Solving Partial Differential Equations by LS-SVM 175
There are two strategies to solve this PDE by SVM. The first way is to use the LS-
SVM approach for both time and space dimensions simultaneously as proposed by
the authors in (Mehrkanoon and Suykens 2015). Another approach is to use a semi-
discrete method, i.e. in order to solve Eq. 8.2, first, the time discretization method is
recalled and then, LS-SVM is applied to solve the problem. Details of the proposed
algorithm are described below.
The Crank–Nicolson method is chosen for time discretization due to its good con-
vergence and unconditional stability. According to finite-difference methods, first,
derivative formula by applying the Crank–Nicolson approach on Eq. 8.2, we have
where u i (x) = u(x, ti ) and ti = it. Moreover, t = T /m is the size of time steps.
This equation can be rewritten as
t ∂u i+1 ∂ 2 u i+1
u i+1 (x) − A(x, ti , u i )u i+1 (x) + B(x, ti , u i ) + C(x, ti , u i ) =
2 ∂t ∂x2
t ∂u i 2 i
∂ u
u i (x) + A(x, ti , u i )u i (x) + B(x, ti , u i ) + C(x, ti , u i ) 2 . (8.4)
2 ∂t ∂x
Now, the LS-SVM algorithm is applied on Eq. 8.4 to find the solution of the problem
Eq. 8.2.
t ∂u i ∂ 2ui
r i (x) = u i (x) + A(x, ti , u i )u i (x) + B(x, ti , u i ) + C(x, ti , u i ) 2 .
2 ∂t ∂x
176 M. M. Moayeri and M. Hemami
1 T γ
minimi zew,e w w + eT e (8.5)
2 2
s.t. Ãi wT ϕ(xi ) + b + B̃i wT ϕ (xi ) + C̃i wT ϕ (xi ) + r j (xi ) + ei i = 0, . . . , n,
wT ϕ(x0 ) = p1 ,
wT ϕ(xn ) = p2 ,
where Ãi = 1 − t
2
Ai , B̃ = − t
2 i
B , and C̃i = − t
2 i
C . The collocation points {xi }i=0
n
1 T γ n
G= w w + eT e − αi wT Ãi ϕ(xi ) + B̃i ϕ (xi ) + C̃i ϕ (xi )
2 2 i=2
T
−r (xi ) − ei − β1 w ϕ(x0 ) + b − p1 − β2 wT ϕ(xn ) + b − p2 ,
j
(8.6)
∂G n−1
=0→w= αi Ãi ϕ(xi ) + B̃i ϕ (xi ) + C̃i ϕ (xi ) + β1 ϕ(x0 ) + β2 ϕ(xn ),
∂w i=2
∂G n−1
=0→ αi Ãi + β1 + β2 = 0,
∂b i=2
∂G αl
= 0 → el = ,
∂el γ
∂G
= 0 → wT Ãl ϕ(xl ) + B̃l ϕ (xl ) + C̃l ϕ (xl ) + Ãl b − el = r j (xl ),
∂αl
∂G
= 0 → wT ϕ(x0 ) + b = p1 ,
∂β1
∂G
= 0 → wT ϕ(xn ) + b = p2 . (8.7)
∂β2
8 Solving Partial Differential Equations by LS-SVM 177
By substituting the first and third equations in the fourth one, the primal variables
are eliminated. Now, since the multiplications of the basis functions appear in them-
selves or their derivatives, we need to define the derivatives of the kernel function
K (xi , x j ) = ϕ(xi )T ϕ(xi ). According to Mercer’s theorem, derivatives of the feature
map can be written in terms of derivatives of the kernel function. So, we use differ-
ential operator ∇ m,n which is defined in the previous chapters 3 and 4. So, in time
step k, it can be written that
n−1
(0,0) (1,0) (2,0) (0,1) (1,1) (2,1)
α j Ãi χ j,i A˜ j + χ j,i B˜ j + χ j,i C˜ j + B̃i χ j,i A˜ j + χ j,i B˜ j + χ j,i C˜ j +
j=2
(0,2) (1,2) (2,2) (0,0) (0,1) (0,2)
C̃i χ j,i A˜ j + χ j,i B˜ j + χ j,i C˜ j + β1 χ1,i Ãi + χ1,i B̃i + χ1,i C̃i +
α
(0,0) (0,n) (0,2) i
β2 χn,i Ãi + χ1,i B̃i + χn,i C̃i + + Ãi b = rik , i = 2, . . . , n.
γ
n−1
(2,0) ˜
α j χ (0,0)
j,1 Ã j + χ (1,0)
j,1 B̃ j + χ j,1 C j
(0,0)
+ χ1,1 (0,0)
β1 + χn,1 β2 + b = p1 ,
j=2
n−1
(2,0) ˜
α j χ (0,0)
j,n à j + χ (1,0)
j,n B̃ j + χ j,n C j
(0,0)
+ χ1,n (0,0)
β1 + χn,n β2 + b = p2 ,
j=2
n−1
α j à j + β1 + β2 = 0. (8.8)
j=2
where
A = [ Ã2:n−1 ], B = [ B̃2:n−1 ], C = [C̃2:n−1 ],
(0,0) (1,0) (2,0)
M1 = χ1,2:n−1 Diag(A) + χ1,2:n−1 Diag(B) + χ1,2:n−1 Diag(C),
rk = [rik ]2:n−1
T
.
n−1
û k (x) = αi Ã(xi )χ (0,0) (xi , x) + B̃(xi )χ (1,0) (xi , x) + C̃(xi )χ (2,0) (xi , x)
i=2
Going through all the above steps, we can say that the solution of PDE Eq. 8.2 at
each time step is approximated.
where {Li (.)} are Chebyshev or Legendre polynomials, n is the order of polynomials
that in our examples, it is 3 or 6. In addition, δ is the decaying parameter.
In order to illustrate the accuracy of the proposed method, L 2 and root-mean-
square (RMS) errors are computed as follows:
8 Solving Partial Differential Equations by LS-SVM 179
n
i=1 (û(x i ) − u(xi ))2
RMS = ,
n
n
L2 = |û(xi ) − u(xi )|2 .
i=1
∂u ∂ ∂2
= − ψ1 (x, t, u) + 2 ψ2 (x, t, u) u, (8.11)
∂t ∂x ∂x
in which ψ1 and ψ2 are the drift and diffusion coefficients, respectively. If these
coefficients depend on x and t, the PDE is called forward Kolmogorov equation
(Conze et al. 2022; Risken 1989). There is another type of Fokker–Planck equation
similar to the forward Kolmogorov equation called backward Kolmogorov equation
(Risken 1989; Flandoli and Zanco 2016) that is in the form (Parand et al. 2018):
∂u ∂ ∂2
= − ψ1 (x, t) + ψ2 (x, t) 2 u. (8.12)
∂t ∂x ∂x
Moreover, if ψ1 and ψ2 are dependent on u in addition to time and space, then
we have the nonlinear Fokker–Planck equation which has important applications in
biophysics (Xing et al. 2005), neuroscience (Hemami et al. 2021; Moayeri et al.
2021), engineering (Kazem et al. 2012a), laser physics (Blackmore et al. 1986),
nonlinear hydrodynamics (Zubarev and Morozov 1983), plasma physics (Peeters
and Strintzi 2008), pattern formation (Bengfort et al. 2016), and so on (Barkai 2001;
Tsurui and Ishikawa 1986).
180 M. M. Moayeri and M. Hemami
There are some fruitful studies that explored the Fokker–Planck equation by clas-
sical numerical methods. For example, Vanaja (1992) presented an iterative algorithm
for solving this model. In numerical methods, for example, (Zorzano et al. 1999),
researchers employed the finite difference approach to the two-dimensional Fokker–
Planck equation. In 2006, Dehghan and Tatari (2006) developed He’s variational iter-
ation method (VIM) to approximate the solution of this equation. Moreover, Tatari
et al. (2007) investigated the application of the Adomian decomposition method
for solving different types of Fokker–Planck equations. One year later, Lakestani
and Dehghan (2008) obtained the numerical solution of the Fokker–Planck equation
using cubic B-spline scaling functions. In addition, Kazem et al. (2012a) proposed a
meshfree approach to solve linear and nonlinear Fokker–Planck equations. Recently,
a pseudo-spectral method was applied to approximate the solution of the Fokker–
Planck equation with high accuracy (Parand et al. 2018).
By simplifying Eq. 8.11, the Fokker–Planck equation takes the form of Eq. 8.2. It is
worth mentioning that if our equation is linear, then A(x, t, u) = 0. In the following,
we present a number of various linear and nonlinear examples. These examples are
defined over = [0, 1].
It should be noted that the average CPU times for all examples of the Fokker–
Planck equation and the Generalized FHN equation are about 2.48 and 2.5 s depend-
ing on the number of spatial nodes and the time steps. In addition, the order of time
complexity for the proposed method is O(Mn 3 ), in which M is the number of time
steps and n is the number of spatial nodes.
8.3.1.1 Example 1
Consider Eq. 8.11 with ψ1 (x) = −1 and ψ2 (x) = 1, and initial condition f (x) =
x. Using these fixed values, we can write A(x, t, u) = 0, B(x, t, u) = 1, and
C(x, t, u) = 1. The exact solution of this test problem is u(x, t) = x + t (Kazem
et al. 2012a; Lakestani and Dehghan 2008).
In this example, we consider n = 15, t = 0.001, γ = 1015 . Figure 8.1 shows the
obtained result of function u(x, t) by third-order Chebyshev kernel as an example.
Also, the L 2 and RMS errors of different kernels at various times are represented in
Tables 8.1 and 8.2. Note that in these tables, the number next to the polynomial name
indicates its order. Moreover, the value of the decaying parameter for each kernel is
specified in the tables.
8.3.1.2 Example 2
Consider the backward Kolmogorov equation (8.12) with ψ1 (x) = −(x + 1),
ψ2 (x, t) = x 2 exp(t), and initial condition f (x) = x + 1. This equation has the
exact solution u(x, t) = (x + 1) exp(t) (Kazem et al. 2012a; Lakestani and Dehghan
2008). In this example, we have A(x, t, u) = 0, B(x, t, u) = (x + 1), and
C(x, t, u) = x 2 exp(t). The parameters are set as n = 10, t = 0.0001, and
γ = 1015 .
8 Solving Partial Differential Equations by LS-SVM 181
Fig. 8.1 Approximated solution of Example 1 by the third- order Chebyshev kernel and n = 10,
t = 10−4
Table 8.1 Numerical absolute errors (L 2 ) of the method for Example 1 with different kernels
t Chebyshev-3 Chebyshev-6 Legendre-3 Legendre-6
(δ = 3) (δ = 1.5) (δ = 2) (δ = 1)
0.01 2.4178e-04 0.0010 1.4747e-04 4.0514e-04
0.25 3.8678e-04 0.0018 2.5679e-04 8.1972e-04
0.5 3.8821e-04 0.0018 2.5794e-04 8.2382e-04
0.7 3.8822e-04 0.0018 2.5795e-04 8.2385e-04
1 3.8822e-04 0.0018 2.5795e-04 8.2385e-04
Table 8.2 RMS errors of the method for Example 1 with different kernels
t Chebyshev-3 Chebyshev-6 Legendre-3 Legendre-6
(δ = 3) (δ = 1.5) (δ = 3) (δ = 1)
0.01 6.2427e-05 2.6776e-04 3.8077e-05 1.0461e-04
0.25 9.9866e-05 0.0013 6.6303e-05 2.1165e-04
0.5 1.0024e-04 0.0013 6.6600e-05 2.1271e-04
0.7 1.0024e-04 0.0013 6.6602e-05 2.1272e-04
1 1.0024e-04 0.0013 6.6602e-05 2.1272e-04
182 M. M. Moayeri and M. Hemami
Fig. 8.2 Approximated solution of Example 2 by the third- order Legendre kernel and n = 15,
t = 0.0001
Table 8.3 Numerical absolute errors (L 2 ) of the method for Example 2 with different kernels
t Chebyshev-3 Chebyshev-6 Legendre-3 Legendre-6
(δ = 3) (δ = 1.5) (δ = 3) (δ = 1.5)
0.01 3.6904e-04 0.0018 2.0748e-04 7.1382e-04
0.25 6.0740e-04 0.0043 2.7289e-04 0.0020
0.5 8.6224e-04 0.0064 3.6194e-04 0.0031
0.7 0.0011 0.0083 4.5389e-04 0.0040
1 0.0016 0.0116 6.3598e-04 0.0055
Table 8.4 RMS errors of the method for Example 2 with different kernels
t Chebyshev-3 Chebyshev-6 Legendre-3 Legendre-6
(δ = 3) (δ = 1.5) (δ = 3) (δ = 1)
0.01 9.5285e-05 4.7187e-04 5.3572e-05 1.8431e-04
0.25 1.5683e-04 0.0011 7.0461e-05 5.1859e-04
0.5 2.2263e-04 0.0017 9.3453e-05 7.8761e-04
0.7 2.8540e-04 0.0021 1.1719e-04 0.0010
1 4.0064e-04 0.0030 1.6421e-04 0.0014
The obtained solution by the proposed method (with third-order Legendre kernel)
is demonstrated in Fig. 8.2. Additionally, Tables 8.3 and 8.4 depict the numerical
absolute errors and RMS errors of the presented method with different kernels. It
can be deduced that the Legendre kernel is generally a better option for this example.
8 Solving Partial Differential Equations by LS-SVM 183
8.3.1.3 Example 3
Fig. 8.3 Approximated solution of Example 3 by the sixth- order Legendre kernel and n = 20,
t = 0.0001
Table 8.5 Numerical absolute errors (L 2 ) of the method for Example 3 with different kernels
t Chebyshev-3 Chebyshev-6 Legendre-3 Legendre-6
(δ = 4) (δ = 3) (δ = 4) (δ = 4)
0.01 3.0002e-04 0.0010 2.1898e-04 5.0419e-04
0.25 3.0314e-04 0.0011 2.1239e-04 4.5307e-04
0.5 2.1730e-04 7.8259e-04 1.4755e-04 3.3263e-04
0.7 1.7019e-04 6.2286e-04 1.1327e-04 2.6523e-04
1 1.2321e-04 4.5854e-04 8.0082e-05 1.9611e-04
184 M. M. Moayeri and M. Hemami
Table 8.6 RMS errors of the method for Example 3 with different kernels
t Chebyshev-3 Chebyshev-6 Legendre-3 Legendre-6
(δ = 4) (δ = 1.5) (δ = 4) (δ = 4)
0.01 6.7087e-05 2.2583e-04 4.8965e-05 1.1274e-04
0.25 6.7784e-05 2.3724e-04 4.7492e-05 1.0131e-04
0.5 4.8591e-05 1.7499e-04 3.2994e-05 7.4379e-05
0.7 3.8055e-05 1.3927e-04 2.5328e-05 5.9307e-05
1 2.7551e-05 1.0253e-04 1.7907e-05 4.3852e-05
∂u ∂u ∂ 2u
= −v(t) + μ(t) 2 + η(t)u(1 − u)(ρ − u). (8.14)
∂t ∂x ∂x
8 Solving Partial Differential Equations by LS-SVM 185
Table 8.7 Numerical methods used to solve different types of FHN models
Authors Method Type of FHN Year
Li and Guo (2006) First integral method 1D-FHN 2006
Abbasbandy (2008) Homotopy analysis 1D-FHN 2008
method
Olmos and Shizgal (2009) Pseudospectral method 1D- and 2D-FHN systems 2009
Hariharan and Kannan Haar wavelet method 1D-FHN 2010
(2010)
Van Gorder and Vajravelu Variational formulation Nagumo-Telegraph 2010
(2010)
Dehghan and Taleei Homotopy perturbation 1D-FHN 2010
(2010) method
Bhrawy (2013) Jacobi–Gauss–Lobatto Generalized FHN 2013
collocation
Jiwari et al. (2014) Polynomial quadrature Generalized FHN 2014
method
Moghaderi and Dehghan two-grid finite difference 1D- and 2D-FHN systems 2016
(2016) method
Kumar et al. (2018) q-homotopy analysis 1D fractional FHN 2018
method
Hemami et al. (2019) CS-RBF method 1D- and 2D-FHN systems 2019
Hemami et al. (2020) RBF-FD method 2D-FHN systems 2020
Moayeri et al. (2020a) Generalized Lagrange 1D- and 2D-FHN systems 2020
method
Moayeri et al. (2020b) Legendre spectral element 1D- and 2D-FHN systems 2020
Abdel-Aty et al. (2020) Improved B-spline method 1D fractional FHN 2020
It seems clear that in this model when we assume that v(t) = 0, μ(t) = 1, and
η(t) = −1, we can conclude that model in relation Eq. 8.13.
Different types of FHN equations have been studied numerically by several
researchers, as shown in Table 8.7.
8.3.2.1 Example 1
Consider a non-classical FHN model Eq. 8.14 with v(t) = 0, μ(t) = 1, η(t) =
−1, ρ = 2, initial condition u(x, 0) = 21 + 21 tanh( 2√x 2 ), and domain over (x, t) ∈
[−10, 10] × [0, 1]. In this example, we have A(x, t, u) = −ρ + (1 − ρ)u − u 2 ,
B(x, t, u) = 0 and C(x, t, u) = −1. So the exact solution of this test problem is
x− 2ρ−1
√ t
u(x, t) = 1
2
+ 21 tanh( √2 )
2 2
(Bhrawy 2013; Jiwari et al. 2014; Wazwaz 2007).
186 M. M. Moayeri and M. Hemami
Fig. 8.4 Approximated solution of Example 1 by the sixth- order Legendre kernel and n = 40,
t = 2.5e10−4
Table 8.8 Numerical absolute errors (L 2 ) of the method for Example 1 with different kernels
t δ Chebyshev-3 Chebyshev-6 Legendre-3 Legendre-6
0.01 30 9.8330e-07 9.6655e-07 9.8167e-07 9.7193e-07
0.25 30 2.4919e-05 2.4803e-05 2.5029e-05 2.4956e-05
0.50 30 5.1011e-05 5.0812e-05 5.1289e-05 5.1132e-05
0.07 30 7.2704e-05 7.2357e-05 7.3108e-05 7.2815e-05
1.00 30 1.0648e-04 1.0576e-04 1.0706e-04 1.0644e-04
0.01 50 4.7375e-06 1.4557e-06 3.9728e-06 1.3407e-06
0.25 50 5.2206e-05 2.6680e-05 4.4305e-05 2.6237e-05
0.50 50 6.9759e-05 5.1818e-05 6.3688e-05 5.1808e-05
0.07 50 8.6685e-05 7.3064e-05 8.2108e-05 7.3286e-05
1.00 50 1.1599e-04 1.0624e-04 1.1299e-04 1.0675e-04
In this example, we set n = 40, t = 2.5e10−4 , γ = 1015 . Figure 8.4 shows the
obtained result of function u(x, t) by sixth-order Legendre kernel as an example.
Also, the L 2 and RMS errors of different kernels at various times are represented in
Tables 8.8 and 8.9.
8 Solving Partial Differential Equations by LS-SVM 187
Table 8.9 RMS errors of the method for Example 1 with different kernels
t δ Chebyshev-3 Chebyshev-6 Legendre-3 Legendre-6
0.01 30 1.5547e-07 1.5283e-07 1.5522e-07 1.5368e-07
0.25 30 7.8801e-07 7.8433e-07 7.9149e-07 7.8917e-07
0.50 30 1.1406e-06 1.1362e-06 1.1469e-06 1.1433e-06
0.07 30 1.3740e-06 1.3674e-06 1.3816e-06 1.3761e-06
1.00 30 1.6836e-06 1.6723e-06 1.6927e-06 1.6829e-06
0.01 50 7.4906e-07 2.3017e-07 6.2816e-07 2.1198e-07
0.25 50 1.6509e-06 8.4368e-07 1.4010e-06 8.2970e-07
0.50 50 1.5598e-06 1.1587e-06 1.4241e-06 1.1585e-06
0.07 50 1.6382e-06 1.3808e-06 1.5517e-06 1.3850e-06
1.00 50 1.8340e-06 1.6798e-06 1.7865e-06 1.6878e-06
8.3.2.2 Example 2
2
) tanh( 2(1 − ρ) x4 + 1−ρ 4
t) (Kawahara and Tanaka 1983; Jiwari et al. 2014;
Wazwaz and Gorguis 2004).
Fig. 8.5 Approximated solution of Example 2 by the sixth- order Legendre kernel and n = 40,
t = 2.5e10−4
188 M. M. Moayeri and M. Hemami
Table 8.10 Numerical absolute errors (L 2 ) of the method for Example 2 with different kernels
t δ Chebyshev-3 Chebyshev-6 Legendre-3 Legendre-6
0.01 8e3 2.1195e-06 1.7618e-06 1.7015e-06 1.5138e-06
0.25 8e3 3.5893e-06 3.1130e-06 2.0973e-06 2.7182e-06
0.50 8e3 3.5383e-06 2.9330e-06 2.0790e-06 2.6138e-06
0.07 8e3 3.4609e-06 2.9069e-06 1.8524e-06 3.1523e-06
1.00 8e3 3.3578e-06 2.7996e-06 2.3738e-06 2.4715e-06
0.01 1e4 9.1689e-06 9.2317e-06 9.1726e-06 9.1931e-06
0.25 1e4 1.5310e-05 1.5417e-05 1.5318e-05 1.5352e-05
0.50 1e4 1.5089e-05 1.5195e-05 1.5097e-05 1.5130e-05
0.07 1e4 1.4869e-05 1.4974e-05 1.4876e-05 1.4909e-05
1.00 1e4 1.4472e-05 1.4575e-05 1.4479e-05 1.4512e-05
Table 8.11 RMS errors of the method for Example 2 with different kernels
t δ Chebyshev-3 Chebyshev-6 Legendre-3 Legendre-6
0.01 8e3 3.3513e-07 2.7856e-07 2.6904e-07 2.3935e-07
0.25 8e3 5.6752e-07 4.9222e-07 3.3162e-07 4.2978e-07
0.50 8e3 5.5945e-07 4.6375e-07 3.2871e-07 4.1328e-07
0.07 8e3 5.4722e-07 4.5962e-07 2.9290e-07 4.9843e-07
1.00 8e3 5.3091e-07 4.4266e-07 3.7532e-07 3.9077e-07
0.01 1e4 1.4497e-06 1.4597e-06 1.4503e-06 1.4536e-06
0.25 1e4 2.4208e-06 2.4376e-06 2.4220e-06 2.4273e-06
0.50 1e4 2.3858e-06 2.4025e-06 2.3870e-06 2.3923e-06
0.07 1e4 2.3510e-06 2.3675e-06 2.3520e-06 2.3574e-06
1.00 1e4 2.2882e-06 2.3045e-06 2.2893e-06 2.2946e-06
8.3.2.3 Example 3
Fig. 8.6 Approximated solution of Example 3 by the sixth- order Legendre kernel and n = 40,
t = 2.5e10−4
Table 8.12 Numerical absolute errors (L 2 ) of the method for Example 3 with different kernels
t δ Chebyshev-3 Chebyshev-6 Legendre-3 Legendre-6
0.01 80 4.6645e-05 1.1548e-04 3.2880e-05 2.3228e-04
0.25 80 2.3342e-04 1.7331e-04 1.8781e-04 3.2000e-03
0.50 80 2.4288e-04 1.7697e-04 1.9577e-04 4.7000e-03
0.07 80 2.4653e-04 1.6815e-04 1.9935e-04 5.4000e-03
1.00 80 2.5033e-04 1.4136e-04 2.0333e-04 5.6000e-03
0.01 100 6.2498e-05 1.1966e-04 4.8828e-05 1.9291e-04
0.25 100 3.6141e-04 1.9420e-04 2.9325e-04 2.5000e-03
0.50 100 3.7364e-04 1.9940e-04 3.0328e-04 3.5000e-03
0.07 100 3.7937e-04 1.9260e-04 3.0863e-04 3.9000e-03
1.00 100 3.8635e-04 1.7084e-04 3.1527e-04 4.0000e-03
2013; Jiwari et al. 2014; Triki and Wazwaz 2013). In this example, we set n =
40, t = 2.5e10−4 , and γ = 1015 . Figure 8.6 shows the obtained result of function
u(x, t) by sixth-order Legendre kernel as an example. Also, the L 2 and RMS errors
of different kernels at various times are represented in Tables 8.12 and 8.13.
190 M. M. Moayeri and M. Hemami
Table 8.13 RMS errors of the method for Example 3 with different kernels
t δ Chebyshev-3 Chebyshev-6 Legendre-3 Legendre-6
0.01 80 7.3752e-06 1.8259e-05 5.1988e-06 3.6726e-05
0.25 80 3.6907e-05 2.7403e-05 2.9695e-05 5.0071e-04
0.50 80 3.8403e-05 2.7981e-05 3.0954e-05 7.4712e-04
0.07 80 3.8980e-05 2.6586e-05 3.1521e-05 8.4603e-04
1.00 80 3.9580e-05 2.2351e-05 3.2149e-05 8.8481e-04
0.01 100 9.8818e-06 1.8921e-05 7.7204e-06 3.0502e-05
0.25 100 5.7144e-05 3.0706e-05 4.6367e-05 3.9744e-04
0.50 100 5.9077e-05 3.1528e-05 4.7954e-05 5.5947e-04
0.07 100 5.9983e-05 3.0452e-05 4.8798e-05 6.1710e-04
1.00 100 6.1087e-05 2.7012e-05 4.9849e-05 6.2629e-04
8.4 Conclusion
References
Aarts, L.P., Van Der Veer, P.: Neural network method for solving partial differential equations.
Neural Proc. Lett. 14, 261–271 (2001)
Abazari, R., Yildirim, K.: Numerical study of Sivashinsky equation using a splitting scheme based
on Crank-Nicolson method. Math. Method. Appl. Sci. 16, 5509–5521 (2019)
Abbasbandy, S.: Soliton solutions for the Fitzhugh-Nagumo equation with the homotopy analysis
method. Appl. Math. Modell. 32, 2706–2714 (2008)
Abbasbandy, S., Shirzadi, A.: A meshless method for two-dimensional diffusion equation with an
integral condition. Eng. Anal. Bound. Elem. 34, 1031–1037 (2010)
Abbasbandy, S., Shirzadi, A.: MLPG method for two-dimensional diffusion equation with Neu-
mann’s and non-classical boundary conditions. Appl. Numer. Math. 61, 170–180 (2011)
8 Solving Partial Differential Equations by LS-SVM 191
Abbaszadeh, M., Dehghan, M.: Simulation flows with multiple phases and components via the radial
basis functions-finite difference (RBF-FD) procedure: Shan-Chen model. Eng. Anal. Bound.
Elem. 119, 151–161 (2020a)
Abbaszadeh, M., Dehghan, M.: Direct meshless local Petrov-Galerkin method to investigate
anisotropic potential and plane elastostatic equations of anisotropic functionally graded mate-
rials problems. Eng. Anal. Bound. Elem. 118, 188–201 (2020b)
Abdel-Aty, A.H., Khater, M., Baleanu, D., Khalil, E.M., Bouslimi, J., Omri, M.: Abundant distinct
types of solutions for the nervous biological fractional FitzHugh-Nagumo equation via three
different sorts of schemes. Adv. Diff. Eq. 476, 1–17 (2020)
Abdusalam, H.A.: Analytic and approximate solutions for Nagumo telegraph reaction diffusion
equation. Appl. Math. Comput. 157, 515–522 (2004)
Ali, H., Kamrujjaman, M., Islam, M.S.: Numerical computation of FitzHugh-Nagumo equation: a
novel Galerkin finite element approach. Int. J. Math. Res. 9, 20–27 (2020)
Appadu, A.R., Agbavon, K.M.: Comparative study of some numerical methods for FitzHugh-
Nagumo equation. AIP Conference Proceedings, AIP Publishing LLC, vol. 2116 (2019), p.
030036
Aronson, D.G., Weinberger, H.F.: Nonlinear diffusion in population genetics, combustion, and nerve
pulse propagation. Partial differential equations and related topics. Springer, Berlin (1975), pp.
5–49
Aronson, D.G., Weinberger, H.F.: Multidimensional nonlinear diffusion arising in population genet-
ics. Adv. Math. 30, 33–76 (1978)
Asghari, M., Hadian Rasanan, A.H., Gorgin, S., Rahmati, D., Parand, K.: FPGA-orthopoly: a
hardware implementation of orthogonal polynomials. Eng. Comput. (inpress) (2022)
Asouti, V.G., Trompoukis, X.S., Kampolis, I.C., Giannakoglou, K.C.: Unsteady CFD computations
using vertex-centered finite volumes for unstructured grids on graphics processing units. Int. J.
Numer. Methods Fluids 67, 232–246 (2011)
Barkai, E.: Fractional Fokker-Planck equation, solution, and application. Phys. Rev. E. 63, 046118
(2001)
Bath, K.J., Wilson, E.: Numerical Methods in Finite Element Analysis. Prentice Hall, New Jersey
(1976)
Belytschko, T., Lu, Y.Y., Gu, L.: Element-free Galerkin methods. Int. J. Numer. Methods Eng. 37,
229–256 (1994)
Bengfort, M., Malchow, H., Hilker, F.M.: The Fokker-Planck law of diffusion and pattern formation
in heterogeneous environments. J. Math. Biol. 73, 683–704 (2016)
Bertolazzi, E., Manzini, G.: A cell-centered second-order accurate finite volume method for
convection-diffusion problems on unstructured meshes. Math. Models Methods Appl. Sci. 14,
1235–1260 (2001)
Bhrawy, A.H.: A Jacobi-Gauss-Lobatto collocation method for solving generalized Fitzhugh-
Nagumo equation with time-dependent coefficients. Appl. Math. Comput. 222, 255–264 (2013)
Bhrawy, A.H., Baleanu, D.: A spectral Legendre-Gauss-Lobatto collocation method for a space-
fractional advection diffusion equations with variable coefficients. Reports Math. Phy. 72, 219–
233 (2013)
Blackmore, R., Weinert, U., Shizgal, B.: Discrete ordinate solution of a Fokker-Planck equation in
laser physics. Transport Theory Stat. Phy. 15, 181–210 (1986)
Bossavit, A., Vérité, J.C.: A mixed FEM-BIEM method to solve 3-D eddy-current problems. IEEE
Trans. Magn. 18, 431–435 (1982)
Braglia, G.L., Caraffini, G.L., Diligenti, M.: A study of the relaxation of electron velocity distribu-
tions in gases. Il Nuovo Cimento B 62, 139–168 (1981)
Brink, A.R., Najera-Flores, D.A., Martinez, C.: The neural network collocation method for solving
partial differential equations. Neural Comput. App. 33, 5591–5608 (2021)
Browne, P., Momoniat, E., Mahomed, F.M.: A generalized Fitzhugh-Nagumo equation. Nonlinear
Anal. Theory Methods Appl. 68, 1006–1015 (2008)
192 M. M. Moayeri and M. Hemami
Bruggi, M., Venini, P.: NA mixed FEM approach to stress-constrained topology optimization. Int.
J. Numer. Methods Eng. 73, 1693–1714 (2008)
Cai, S., Mao, Z., Wang, Z., Yin, M., Karniadakis, G.E.: Physics-informed neural networks (PINNs)
for fluid mechanics: a review. Acta Mech. Sinica. 1–12 (2022)
Carstensen, C., Köhler, K.: Nonconforming FEM for the obstacle problem. IMA J. Numer. Anal.
37, 64–93 (2017)
Chauviére, C., Lozinski, A.: Simulation of dilute polymer solutions using a Fokker-Planck equation.
Comput. Fluids. 33, 687–696 (2004)
Chavanis, P.H.: Nonlinear mean-field Fokker-Planck equations and their applications in physics,
astrophysics and biology. Comptes. Rendus. Phys. 7, 318–330 (2006)
Chen, Y., Yi, N., Liu, W.: A Legendre-Galerkin spectral method for optimal control problems
governed by elliptic equations. SIAM J. Numer. Anal. 46, 2254–2275 (2008)
Cheung, K.C., See, S.: Recent advance in machine learning for partial differential equation. CCF
Trans. High Perf. Comput. 3, 298–310 (2021)
Chien, C.C., Wu, T.Y.: A particular integral BEM/time-discontinuous FEM methodology for solving
2-D elastodynamic problems. Int. J. Solids Struct. 38, 289–306 (2001)
Conze, A., Lantos, N., Pironneau, O.: The forward Kolmogorov equation for two dimensional
options. Commun. Pure Appl. Anal. 8, 195 (2009)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
D’ariano, G.M., Macchiavello, C., Moroni, S.: On the monte carlo simulation approach to Fokker-
Planck equations in quantum optics. Modern Phys. Lett. B. 8, 239–246 (1994)
De Decker, Y., Nicolis, G.: On the Fokker-Planck approach to the stochastic thermodynamics of
reactive systems. Physica A: Stat. Mech. Appl. 553, 124269 (2020)
Dehghan, M., Narimani, N.: The element-free Galerkin method based on moving least squares
and moving Kriging approximations for solving two-dimensional tumor-induced angiogenesis
model. Eng. Comput. 36, 1517–1537 (2020)
Dehghan, M., Shokri, A.: A numerical method for solution of the two-dimensional sine-Gordon
equation using the radial basis functions. Math. Comput. Simul. 79, 700–715 (2008)
Dehghan, M., Taleei, A.: A compact split-step finite difference method for solving the nonlinear
Schrödinger equations with constant and variable coefficients. Comput. Phys. Commun. 181,
80–90 (2010)
Dehghan, M., Tatari, M.: Numerical solution of two dimensional Fokker-Planck equations. Phys.
Scr. 74, 310–316 (2006)
Dehghan, M., Manafian Heris, J., Saadatmandi, A.: Application of semi-analytic methods for the
Fitzhugh-Nagumo equation, which models the transmission of nerve impulses. Math. Methods
Appl. Sci. 33, 1384–1398 (2010)
Delkhosh, M., Parand, K.: A new computational method based on fractional Lagrange functions to
solve multi-term fractional differential equations. Numer. Algor. 88, 729–766 (2021)
Dubois F.: Finite volumes and mixed Petrov-Galerkin finite elements: the unidimensional problem.
Numer. Methods Partial Diff. Eq. Int. J. 16, 335–360 (2000)
Eymard, R., Gallouët, T., Herbin, R.: Finite volume methods. Handbook of Numerical Analysis,
vol. 7 (2000), pp. 713–1018
Fallah, N.: A cell vertex and cell centred finite volume method for plate bending analysis. Comput.
Methods Appl. Mech. Eng. 193, 3457–3470 (2004)
Fallah, N.A., Bailey, C., Cross, M., Taylor, G.A.: Comparison of finite element and finite volume
methods application in geometrically nonlinear stress analysis. Appl. Math. Model. 24, 439–455
(2000)
Fasshauer, G.E.: Meshfree Approximation Methods with MATLAB. World Scientific, Singapore
(2007)
FitzHugh, R.: Impulses and physiological states in theoretical models of nerve membrane. Biophys.
J. 1, 445–466 (1961)
Flandoli, F., Zanco, G.: An infinite-dimensional approach to path-dependent Kolmogorov equations.
Annals Probab. 44, 2643–2693 (2016)
8 Solving Partial Differential Equations by LS-SVM 193
Frank, T.D., Beek, P.J., Friedrich, R.: Fokker-Planck perspective on stochastic delay systems: Exact
solutions and data analysis of biological systems. Astrophys. Biol. Phys. Rev. E 68, 021912
(2003)
Friedrich, R., Jenko, F., Baule, A., Eule, S.: Exact solution of a generalized Kramers-Fokker-Planck
equation retaining retardation effects. Phys. Rev. E. 74, 041103 (2006)
Furioli, G., Pulvirenti, A., Terraneo, E., Toscani, G.: Fokker-Planck equations in the modeling of
socio-economic phenomena. Math. Models Methods Appl. Sci. 27, 115–158 (2017)
Gamba, I.M., Rjasanow, S.: Galerkin-Petrov approach for the Boltzmann equation. J. Comput. Phys.
366, 341–365 (2018)
Ghidaglia, J.M., Kumbaro, A., Le Coq, G.: On the numerical solution to two fluid models via a cell
centered finite volume method. Eur. J. Mech. B Fluids. 20, 841–867 (2001)
Gordon, A., Vugmeister, B.E., Dorfman, S., Rabitz, H.: Impulses and physiological states in theo-
retical models of nerve membrane. Biophys. J. 233, 225–242 (1999)
Grima, R., Thomas, P., Straube, A.V.: How accurate are the nonlinear chemical Fokker-Planck and
chemical Langevin equations. Astrophys. Biol. J. Chem. Phys. 135, 084103 (2011)
Gronchi, M., Lugiato, A.: Fokker-Planck equation for optical bistability. Lettere Al Nuovo Cimento
23, 593–8 (1973)
Hadian-Rasanan, A.H., Bajalan, Parand, K., Rad, J.A.: Simulation of nonlinear fractional dynamics
arising in the modeling of cognitive decision making using a new fractional neural network. Math.
Methods Appl. Sci. 43, 1437–1466 (2020)
Hadian-Rasanan, A.H., Rad, J.A., Sewell, D.K.: Are there jumps in evidence accumulation, and
what, if anything, do they reflect psychologically- An analysis of Lévy-Flights models of decision-
making. PsyArXiv (2021). https://doi.org/10.31234/osf.io/vy2mh
Hadian-Rasanan, A.H., Rahmati, D., Girgin, S., Parand, K.: A single layer fractional orthogonal
neural network for solving various types of Lane-Emden equation. New Astron. 75, 101307
(2019)
Hajimohammadi, Z., Shekarpaz, S., Parand, K.: The novel learning solutions to nonlinear differential
models on a semi-infinite domain. Eng. Comput. 1–18 (2022)
Hajimohammadi, Z., Parand, K.: Numerical learning approximation of time-fractional sub diffusion
model on a semi-infinite domain. Chaos Solitons Frac. 142, 110435 (2021)
Hariharan, G., Kannan, K.: Haar wavelet method for solving FitzHugh-Nagumo equation. Int. J.
Math. Comput. Sci. 4, 909–913 (2010)
Hemami, M., Parand, K., Rad, J.A.: Numerical simulation of reaction-diffusion neural dynamics
models and their synchronization/desynchronization: application to epileptic seizures. Comput.
Math. Appl. 78, 3644–3677 (2019)
Hemami, M., Rad, J.A., Parand, K.: The use of space-splitting RBF-FD technique to simulate the
controlled synchronization of neural networks arising from brain activity modeling in epileptic
seizures. J. Comput. Sci. 42, 101090 (2020)
Hemami, M., Rad, J.A., Parand, K.: Phase distribution control of neural oscillator populations using
local radial basis function meshfree technique with application in epileptic seizures: A numerical
simulation approach. Commun. Nonlinear SCI. Numer. Simul. 103, 105961 (2021)
Heydari, M.H., Avazzadeh, Z.: Chebyshev-Gauss-Lobatto collocation method for variable-order
time fractional generalized Hirota-Satsuma coupled KdV system. Eng. Comput. 1–10 (2020)
Hodgkin, A.L., Huxley, A.F.: Currents carried by sodium and potassium ions through the membrane
of the giant axon of Loligo. J. Physiol. 116, 449–72 (1952)
Hughes, T.J.: The finite element method: linear static and dynamic finite element analysis. Courier
Corporation, Chelmsford (2012)
ïnan, B.: A finite difference method for solving generalized FitzHugh-Nagumo equation. AIP Con-
ference Proceedings, AIP Publishing LLC, vol. 1926 (2018), p. 020018
Jiménez-Aquino, J.I., Romero-Bastida, M.: Fokker-Planck-Kramers equation for a Brownian gas
in a magnetic field. Phys. Rev. E. 74, 041117 (2006)
194 M. M. Moayeri and M. Hemami
Jiwari, R., Gupta, R.K., Kumar, V.: Polynomial differential quadrature method for numerical solu-
tions of the generalized Fitzhugh-Nagumo equation with time-dependent coefficients. Ain Shams
Eng. J. 5, 1343–1350 (2014)
Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349,
255–260 (2015)
Kadeethumm, T., O’Malley, D., Fuhg, J.N., Choi, Y., Lee, J., Viswanathan, H.S., Bouklas, N.: A
framework for data-driven solution and parameter estimation of PDEs using conditional genera-
tive adversarial networks. Nat. Comput. Sci. 1, 819–829 (2021)
Kanschat, G.: Multilevel methods for discontinuous Galerkin FEM on locally refined meshes.
Comput. Struct. 82, 2437–2445 (2004)
Karniadakis, G.E., Sherwin, S.J.: Spectral/hp Element Methods for Computational Fluid Dynamics.
Oxford University Press, New York (2005)
Kassab, A., Divo, E., Heidmann, J., Steinthorsson, E., Rodriguez, F.: BEM/FVM conjugate heat
transfer analysis of a three-dimensional film cooled turbine blade. Int. J. Numer. Methods Heat
Fluid Flow. 13, 581–610 (2003)
Kawahara, T., Tanaka, M.: Interactions of traveling fronts: an exact solution of a nonlinear diffusion
equation. Phys. Lett. A 97, 311–314 (1983)
Kazem, S., Rad, J.A.: Radial basis functions method for solving of a non-local boundary value
problem with Neumann’s boundary conditions. Appl. Math. Modell. 36, 2360–2369 (2012)
Kazem, S., Rad, J.A., Parand, K.: Radial basis functions methods for solving Fokker-Planck equa-
tion. Eng. Anal. Bound. Elem. 36, 181–189 (2012a)
Kazem, S., Rad, J.A., Parand, K.: A meshless method on non-Fickian flows with mixing length
growth in porous media based on radial basis functions: a comparative study. Comput. Math.
Appl. 64, 399–412 (2012b)
Kogut, P.I., Kupenko, O.P.: On optimal control problem for an ill-posed strongly nonlinear elliptic
equation with p-Laplace operator and L 1 -type of nonlinearity. Disceret Cont. Dyn-B 24, 1273–
1295 (2019)
Kopriva, D.: Implementing Spectral Methods for Partial Differential Equations. Springer, Berlin
(2009)
Kumar, S.: Numerical computation of time-fractional Fokker-Planck equation arising in solid state
physics and circuit theory. Z NATURFORSCH A. 68, 777–784 (2013)
Kumar, D., Singh, J., Baleanu, D.: A new numerical algorithm for fractional Fitzhugh-Nagumo
equation arising in transmission of nerve impulses. Nonlinear Dyn. 91, 307–317 (2018)
Lakestani, M., Dehghan, M.: Numerical solution of Fokker-Planck equation using the cubic B-spline
scaling functions. Numer. Method. Part. D. E. 25, 418–429 (2008)
Latifi, S., Delkhosh, M.: Generalized Lagrange Jacobi-Gauss-Lobatto vs Jacobi-Gauss-Lobatto col-
location approximations for solving (2 + 1)-dimensional Sine-Gordon equations. Math. Methods
Appl. Sci. 43, 2001–2019 (2020)
Lee, Y.Y., Ruan, S.J., Chen, P.C.: Predictable coupling effect model for global placement using
generative adversarial networks with an ordinary differential equation solver. IEEE Trans. Circuits
Syst. II: Express Briefs (2021), pp. 1–5
LeVeque, R.J.: Finite Volume Methods for Hyperbolic Problems. Cambridge University Press,
Cambridge (2002)
Li, H., Guo, Y.: New exact solutions to the FitzHugh-Nagumo equation. Appl. Math. Comput. 180,
524–528 (2006)
Liaqat, A., Fukuhara, M., Takeda, T.: Application of neural network collocation method to data
assimilation. Computer Phys. Commun. 141, 350–364 (2001)
Lindqvist, P.: Notes on the Stationary p-Laplace Equation. Springer International Publishing, Berlin
(2019)
Liu, G.R.: Mesh Free Methods: Moving Beyond the Finite Element Method. CRC Press, Florida
(2003)
Liu, G.R., Gu, Y.T.: A local radial point interpolation method (LRPIM) for free vibration analyses
of 2-D solids. J. Sound Vib. 246, 29–46 (2001)
8 Solving Partial Differential Equations by LS-SVM 195
Liu, J., Hao, Y.: Crank-Nicolson method for solving uncertain heat equation. Soft Comput. 26,
937–945 (2022)
Liu, G.R., Zhang, G.Y., Gu, Y., Wang, Y.Y.: A meshfree radial point interpolation method (RPIM)
for three-dimensional solids. Comput. Mech. 36, 421–430 (2005)
Liu, F., Zhuang, P., Turner, I., Burrage, K., Anh, V.: A new fractional finite volume method for
solving the fractional diffusion equation. Appl. Math. Model. 38, 3871–3878 (2014)
Lu, Y., Lu, J., Wang, M.: The Deep Ritz Method: a priori generalization analysis of the deep
Ritz method for solving high dimensional elliptic partial differential equations. Conference on
Learning Theory, PMLR (2021), pp. 3196–3241
Mai-Duy, N.: An effective spectral collocation method for the direct solution of high-order ODEs.
Commun. Numer. Methods Eng. 22, 627–642 (2006)
Meerschaert, M.M., Tadjeran, C.: Finite difference approximations for two-sided space-fractional
partial differential equations. Appl. Numer. Math. 56, 80–90 (2006)
Mehrkanoon, S., Suykens, J.A.K: Approximate solutions to ordinary differential equations using
least squares support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 23, 1356–1362
(2012)
Mehrkanoon, S., Suykens, J.A.K.: Learning solutions to partial differential equations using LS-
SVM. Neurocomputing 159, 105–116 (2015)
Moayeri, M.M., Hadian-Rasanan, A.H., Latifi, S., Parand, K., Rad, J.A.: An efficient space-splitting
method for simulating brain neurons by neuronal synchronization to control epileptic activity.
Eng. Comput. 1–28 (2020a)
Moayeri, M.M., Rad, J.A., Parand, K.: Dynamical behavior of reaction-diffusion neural networks
and their synchronization arising in modeling epileptic seizure: A numerical simulation study.
Comput. Math. Appl. 80, 1887–1927 (2020b)
Moayeri, M.M., Rad, J.A., Parand, K.: Desynchronization of stochastically synchronized neural
populations through phase distribution control: a numerical simulation approach. Nonlinear Dyn.
104, 2363–2388 (2021)
Moghaderi, H., Dehghan, M.: Mixed two-grid finite difference methods for solving one-dimensional
and two-dimensional Fitzhugh-Nagumo equations. Math. Methods Appl. Sci. 40, 1170–1200
(2016)
Mohammadi, V., Dehghan, M.: Simulation of the phase field Cahn-Hilliard and tumor growth
models via a numerical scheme: element-free Galerkin method. Comput. Methods Appl. Mech.
Eng. 345, 919–950 (2019)
Mohammadi, V., Dehghan, M.: A meshless technique based on generalized moving least squares
combined with the second-order semi-implicit backward differential formula for numerically
solving time-dependent phase field models on the spheres. Appl. Numer. Math. 153, 248–275
(2020)
Mohammadi, V., Dehghan, M., De Marchi, S.: Numerical simulation of a prostate tumor growth
model by the RBF-FD scheme and a semi-implicit time discretization. J. Comput. Appl. Math.
388, 113314 (2021)
Moosavi, M.R., Khelil, A.: Accuracy and computational efficiency of the finite volume method
combined with the meshless local Petrov-Galerkin in comparison with the finite element method
in elasto-static problem. ICCES 5, 211–238 (2008)
Olmos, D., Shizgal, B.D.: Pseudospectral method of solution of the Fitzhugh-Nagumo equation.
Math. Comput. Simul. 79, 2258–2278 (2009)
Ottosen, N., Petersson, H., Saabye, N.: Introduction to the Finite Element Method. Prentice Hall,
New Jersey (1992)
Ozer, S., Chen, C.H., Cirpan, H.A.: A set of new Chebyshev kernel functions for support vector
machine pattern classification. Pattern Recogn. 44, 1435–1447 (2011)
Pang, G., Lu, L., Karniadakis, G.E.: fPINNs: fractional physics-informed neural networks. SIAM
J. Sci. Comput. 41, A2603–A2626 (2019)
196 M. M. Moayeri and M. Hemami
Parand, K., Rad, J.A.: Kansa method for the solution of a parabolic equation with an unknown
spacewise-dependent coefficient subject to an extra measurement. Comput. Phys. Commun. 184,
582–595 (2013)
Parand, K., Hemami, M., Hashemi-Shahraki, S.: Two meshfree numerical approaches for solving
high-order singular Emden-Fowler type equations. Int. J. Appl. Comput. Math. 3, 521–546 (2017)
Parand, K., Latifi, S., Moayeri, M.M., Delkhosh, M.: Generalized Lagrange Jacobi Gauss-Lobatto
(GLJGL) collocation method for solving linear and nonlinear Fokker-Planck equations. Eng.
Anal. Bound. Elem. 69, 519–531 (2018)
Parand, K., Aghaei, A.A., Jani, M., Ghodsi, A.: Parallel LS-SVM for the numerical simulation of
fractional Volterra’s population model. Alexandria Eng. J. 60, 5637–5647 (2021a)
Parand, K., Aghaei, A.A., Jani, M., Ghodsi, A.: A new approach to the numerical solution of
Fredholm integral equations using least squares-support vector regression. Math Comput. Simul.
180, 114–128 (2021b)
Peeters, A.G., Strintzi, D.: The Fokker-Planck equation, and its application in plasma physics.
Annalen der Physik. 17, 142–157 (2008)
Pozrikidis, C.: Introduction to Finite and Spectral Element Methods Using MATLAB, 2nd edn.
Oxford CRC Press (2014)
Qin, C., Wu, Y., Springenberg, J.T., Brock, A., Donahue, J., Lillicrap, T., Kohli, P.: Training gener-
ative adversarial networks by solving ordinary differential equations. Adv. Neural Inf. Process.
Syst. 33, 5599–5609 (2020)
Rad, J.A., Ballestra, L.V.: Pricing European and American options by radial basis point interpolation.
Appl. Math. Comput. 251, 363–377 (2015)
Rad, J.A., Parand, K.: Numerical pricing of American options under two stochastic factor models
with jumps using a meshless local Petrov-Galerkin method. Appl. Numer. Math. 115, 252–274
(2017a)
Rad, J.A., Parand, K.: Pricing American options under jump-diffusion models using local weak
form meshless techniques. Int. J. Comput. Math. 94, 1694–1718 (2017b)
Rad, J.A., Kazem, S., Parand, K.: A numerical solution of the nonlinear controlled Duffing oscillator
by radial basis functions. Comput. Math. Appl. 64, 2049–2065 (2012)
Rad, J.A., Kazem, S., Parand, K.: Optimal control of a parabolic distributed parameter system via
radial basis functions. Commun. Nonlinear Sci. Numer. Simul. 19, 2559–2567 (2014)
Rad, J.A., Parand, K., Abbasbandy, S.: Pricing European and American options using a very fast
and accurate scheme: the meshless local Petrov-Galerkin method. Proc. Natl. Acad. Sci. India
Sect. A: Phys. Sci. 85, 337–351 (2015a)
Rad, J.A., Parand, K., Abbasbandy, S.: Local weak form meshless techniques based on the radial
point interpolation (RPI) method and local boundary integral equation (LBIE) method to evaluate
European and American options. Commun. Nonlinear Sci. Numer. Simul. 22, 1178–1200 (2015b)
Rad, J.A., Höök, J., Larsson, E., Sydow, L.V.: Forward deterministic pricing of options using
Gaussian radial basis functions. J. Comput. Sci. 24, 209–217 (2018)
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning
framework for solving forward and inverse problems involving nonlinear partial differential equa-
tions. J. Comput. Phys. 378, 686–797 (2019)
Rashedi, K., Adibi, H., Rad, J.A., Parand, K.: Application of meshfree methods for solving the
inverse one-dimensional Stefan problem. Eng. Anal. Bound. Elem. 40, 1–21 (2014)
Reguera, D., Rubı, J.M., Pérez-Madrid, A.: Fokker-Planck equations for nucleation processes revis-
ited. Physica A: Stat. Mech. Appl. 259, 10–23 (1998)
Risken, H.: The Fokker-Planck Equation: Method of Solution and Applications. Springer, Berlin
(1989)
Saha, P., Mukhopadhyay, S.: A deep learning-based collocation method for modeling unknown
PDEs from sparse observation (2020). arxiv.org/pdf/2011.14965pdf
Saporito, Y.F., Zhang, Z.: Path-Dependent deep galerkin method: a neural network approach to
solve path-dependent partial differential equations. SIAM J. Financ. Math. 12, 912–40 (2021)
8 Solving Partial Differential Equations by LS-SVM 197
Shakeri, F., Dehghan, M.: A finite volume spectral element method for solving magnetohydrody-
namic (MHD) equations. Appl. Numer. Math. 61, 1–23 (2011)
Shen, J.: Efficient spectral-Galerkin method I. Direct solvers of second-and fourth-order equations
using Legendre polynomials. SIAM J. Sci. Comput. 15, 1489–1505 (1994)
Shivanian, E., Hajimohammadi, Z., Baharifard, F., Parand, K., Kazemi, R.: A novel learning
approach for different profile shapes of convecting-radiating fins based on shifted Gegenbauer
LSSVM. New Math. Natural Comput. 1–27 (2022)
Shizgal, B.: Spectral Methods in Chemistry and Physics. Scientific Computing. Springer, Berlin
(2015)
Sirignano, J., Spiliopoulos, K.: DGM: a deep learning algorithm for solving partial differential
equations. J. Comput. phys. 375, 1339–1364 (2018)
Smith, G.D.: Numerical Solutions of Partial Differential Equations Finite Difference Methods, 3rd
edn. Oxford University Press, New York (1985)
Spalart, P.R., Moser, R.D., Rogers, M.M.: Spectral methods for the Navier-Stokes equations with
one infinite and two periodic directions. J. Comput. Phy. 96, 297–324 (1991)
Strikwerda, J.C.: Finite Difference Schemes and Partial Differential Equations. Society for Industrial
and Applied Mathematics, Pennsylvania (2004)
Tanimura, Y.: Stochastic Liouville, Langevin, Fokker-Planck, and master equation approaches to
quantum dissipative systems. J. Phys. Soc. Japan 75, 082001 (2006)
Tatari, M., Dehghan, M., Razzaghi, M.: Application of the Adomian decomposition method for the
Fokker-Planck equation. Phys. Scr. 45, 639–650 (2007)
Trefethen, L.N.: Finite Difference and Spectral Methods for Ordinary and Partial Differential Equa-
tions. Cornell University, New York (1996)
Triki, H., Wazwaz, A.M.: On soliton solutions for the Fitzhugh-Nagumo equation with time-
dependent coefficients. Appl. Math. Model. 37, 3821–8 (2013)
Tsurui, A., Ishikawa, H.: Application of the Fokker-Planck equation to a stochastic fatigue crack
growth model. Struct. Safety. 63, 15–29 (1986)
Uhlenbeck, G.E., Ornstein, L.S.: On the theory of the Brownian motion. Phys. Rev. 36, 823–841
(1930)
Ullersma, P.: An exactly solvable model for Brownian motion: II. Derivation of the Fokker-Planck
equation and the master equation. Physica 32, 56–73 (1966)
Van Gorder, R.A., Vajravelu K.: A variational formulation of the Nagumo reaction-diffusion equa-
tion and the Nagumo telegraph equation. Nonlinear Anal.: Real World Appl. 11, 2957–2962
(2010)
Van Gorder, R.A.: Gaussian waves in the Fitzhugh-Nagumo equation demonstrate one role of the
auxiliary function H (x, t) in the homotopy analysis method. Commun. Nonlinear Sci. Numer.
Simul. 17, 1233–1240 (2012)
Vanaja, V.: Numerical solution of a simple Fokker-Planck equation. Appl. Numer. Math. 9, 533–540
(1992)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Wang, C.H., Feng, Y.Y., Yue, K., Zhang, X.X.: Discontinuous finite element method for combined
radiation-conduction heat transfer in participating media. Int. Commun. Heat Mass. 108, 104287
(2019)
Wazwaz, A.M., Gorguis, A.: An analytic study of Fisher’s equation by using adomian decomposition
method. Appl. Math. Comput. 154, 609–20 (2004)
Wazwaz, A.M.: The tanh-coth method for solitons and kink solutions for nonlinear parabolic equa-
tions. Appl. Math. Comput. 188, 1467–75 (2007)
Wilson, P., Teschemacher, T., Bucher, P., Wüchner, R.: Non-conforming FEM-FEM coupling
approaches and their application to dynamic structural analysis. Eng. Struct. 241, 112342 (2021)
Xing, J., Wang, H., Oster, G.: From continuum Fokker-Planck models to discrete kinetic models.
Biophys. J. 89, 1551–1563 (2005)
Yang, L., Zhang, D., Karniadakis, G.E.: Physics-informed generative adversarial networks for
stochastic differential equations. SIAM J. Sci. Comput. 46, 292–317 (2020)
198 M. M. Moayeri and M. Hemami
Yeganeh, S., Mokhtari, R., Hesthaven, J.S.: Space-dependent source determination in a time-
fractional diffusion equation using a local discontinuous Galerkin method. Bit Numer. Math.
57, 685–707 (2017)
Yu, B.:The Deep Ritz Method: a deep learning-based numerical algorithm for solving variational
problems. Commun. Math. Stat. 6, 1–12 (2018)
Zayernouri, M., Karniadakis, G.E.: Fractional spectral collocation method. SIAM J Sci. Comput.
36, A40–A62 (2014)
Zayernouri, M., Karniadakis, G.E.: Fractional spectral collocation methods for linear and nonlinear
variable order FPDEs. J. Comput. Phys. 293, 312–338 (2015)
Zayernouri, M., Ainsworth, M., Karniadakis, G.E.: A unified Petrov-Galerkin spectral method for
fractional PDEs. Comput. Methods Appl. Mech. Eng. 283, 1545–1569 (2015)
Zhang, Z., Zou, Q.: Some recent advances on vertex centered finite volume element methods for
elliptic equations. Sci. China Math. 56, 2507–2522 (2013)
Zhao, D.H., Shen, H.W., Lai, J.S. III, G.T.: Approximate Riemann solvers in FVM for 2D hydraulic
shock wave modeling. J. Hydraulic Eng. 122, 692–702 (1996)
Zhao, Y., Chen, P., Bu, W., Liu, X., Tang, Y.: Two mixed finite element methods for time-fractional
diffusion equations. J. Sci. Comput. 70, 407–428 (2017)
Zienkiewicz, O.C., Taylor, R.L., Zhu, J.Z.: The finite element method: its basis and fundamentals.
Elsevier (2005)
Zorzano, M.P., Mais, H., Vazquez, L.: Numerical solution of two dimensional Fokker-Planck equa-
tions. Appl. Math. Comput. 98, 109–117 (1999)
Zubarev, D.N., Morozov, V.G.: Statistical mechanics of nonlinear hydrodynamic fluctuations. Phys-
ica A: Stat. Mech. Appl. 120, 411–467 (1983)
Chapter 9
Solving Integral Equations by LS-SVR
Kourosh Parand, Alireza Afzal Aghaei, Mostafa Jani, and Reza Sahleh
Abstract The other important type of problem in science and engineering is integral
equations. Thus, developing precise numerical algorithms for approximating the
solution to these problems is one of the main questions of scientific computing. In
this chapter, the least squares support vector algorithm is utilized for developing a
numerical algorithm for solving various types of integral equations. The robustness
and also the convergence of the proposed method are discussed in this chapter by
providing several numerical examples.
9.1 Introduction
Any equation with an unknown function under the integral sign is called an inte-
gral equation. These equations frequently appear in science and engineering, for
instance, different mathematical models such as diffraction problems Eswaran
(1990), scattering in quantum mechanics Barlette et al. (2001), plasticity Kanaun and
K. Parand (B)
Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti
University, Tehran, Iran
e-mail: [email protected]
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Canada
A. A. Aghaei · M. Jani · R. Sahleh
Department of Computer and Data Science, Faculty of Mathematical Sciences, Shahid Beheshti
University, Tehran, Iran
e-mail: [email protected]
M. Jani
e-mail: [email protected]
R. Sahleh
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 199
J. A. Rad et al. (eds.), Learning with Fractional Orthogonal Kernel Classifiers in Support
Vector Machines, Industrial and Applied Mathematics,
https://doi.org/10.1007/978-981-19-6553-1_9
200 A. Ahmadzadeh et al.
Martinez (2012), conformal mapping Reichel (1986), water waves Manam (2011),
and Volterra’s population model Wazwaz (2002) are expressed as integral equations
Assari and Dehghan (2019), Bažant and Jirásek (2002), Bremer (2012), Kulish and
Novozhilov (2003), Lu et al. (2020), Parand and Delkhosh (2017), Volterra (1928).
Recently, the applications of integral equations in machine learning problems have
also been discussed by researchers Keller and Dahm (2019), Chen et al. (2018),
Dahm and Keller (2016). Furthermore, integral equations are closely related to
differential equations, and in some cases, these equations can be converted to each
other. Integro-differential equations are also a type of integral equations, in which not
only the unknown function is placed under the integral operator, but also its deriva-
tives appear in the equation. In some cases, the partial derivative of the unknown
function may appear in the equation. In this case, the equation is called the partial
integro-differential equation. Distributed fractional differential equations are also a
type of fractional differential equations, which are very similar to integral equations.
In these equations, the derivative fraction of the unknown function appears under the
integral in such a way that the fractional derivative order is the same as the integral
variable.
Due to the importance and wide applications of integral equations, many
researchers have developed efficient methods for solving these types of equations
Abbasbandy (2006), Assari and Dehghan (2019), Fatahi et al. (2016), Golberg (2013),
Mandal and Chakrabarti (2016), Marzban et al. (2011), Nemati et al. (2013), Wazwaz
(2011). In this chapter, starting with the explanation of integral equations, and in
the following, a new numerically efficient method based on the least squares sup-
port vector regression approach is proposed for the simulation of some integral
equations.
Integral equations are divided into different categories based on their properties. How-
ever, there are three main types of integral equations Wazwaz (2011), i.e., Fredholm,
Volterra, and Volterra-Fredholm integral equations. While the Fredholm integral
equations have constant integration bounds, in Volterra integral equations at least
one of the bounds depends on the independent variable. Volterra-Fredholm inte-
gral equations also include both types of equations. These equations themselves are
divided into subcategories such as linear/nonlinear, homogeneous/inhomogeneous,
and first/second kind. In the next section, different classes of these equations are
presented and discussed.
9 Solving Integral Equations by LS-SVR 201
where u(x) is the unknown function and K (x, t) is the kernel of the equation.
Volterra integral equations are another category of integral equations. This kind of
equation appears in many scientific applications, such as population dynamics Jafari
et al. (2021), the spread of epidemics Wang et al. (2006), and semi-conductor devices
Unterreiter (1996). Also, these equations can be obtained from initial value prob-
lems. Different numerical methods are proposed to solve these types of equations,
for example, a collocation method using Sinc and rational Legendre functions pro-
posed to solve Volterra’s population model Parand et al. (2011). In another work,
the least squares support vector regression method was proposed to solve Volterra
integral equations Parand et al. (2020). Also, the Runge-Kutta method was imple-
mented to solve the second kind of linear Volterra integral equations Maleknejad and
Shahrezaee (2004). To see more, the interested reader can see Messina and Vecchio
(2017), Tang et al. (2008), and Maleknejad et al. (2011). In this category, the equation
is defined as follows Wazwaz (2011, 2015), Rahman (2007):
202 A. Ahmadzadeh et al.
h(x)
u(x) = f (x) + λ K (x, t)u(t)dt, (9.1)
g(x)
Remark 9.1 (First and second kind integral equations) It should be noted that if
u(x) only appears under the integral sign, it is called the first kind; otherwise, if
the unknown function appears inside and outside the integral sign, it is named the
second kind. For instance, Eq. 9.1 is a second kind Volterra integral equation and the
h(x)
equation below f (x) = λ g(x) K (x, t)u(t)dt is the first kind.
Remark 9.2 (Linear and nonlinear integral equations) Suppose we have the integral
h(x)
equation ψ(u(x)) = f (x) + λ g(x) K (x, t)φ(u(t))dt. If either ψ(x) or φ(x) be a
nonlinear function, the equation is called nonlinear. In the case of the first kind
h(x)
Volterra integral equation, we have f (x) = λ g(x) K (x, t)φ(u(t))dt.
range of K (x, t), then this equation has a solution. For instance, if the kernel is in the
form of K (x, t) = sin(x) sin(t), the function f (x) should be a coefficient of sin(x)
Golberg (2013). Otherwise, the equation has no solution Wazwaz (2011). Due to
the importance of function f (x) in integral equations, scientists have categorized
equations based on the existence or non-existence of this function. If there is a
function f (x) in the integral equation, it is called inhomogeneous; otherwise, it
is identified as homogeneous. In other words, the Fredholm integral equation of
b
the second kind u(x) = f (x) + λ a K (x, t)u(t)dt is inhomogeneous and equation
b
u(x) = λ a K (x, t)u(t)dt is homogeneous.
Due to the existence of different variables, the study and modeling of physical prob-
lems usually lead to the creation of multi-dimensional cases Mikhlin (2014). Partial
differential equations and multi-dimensional integral equations are the most famous
examples of modeling these problems. The general form of these equations is defined
as follows:
204 A. Ahmadzadeh et al.
μu(x) = f (x) + λ K (x, t)σ (u(t))dt, x, t ∈ S ⊂ Rn , (9.2)
S
where x = (x1 , x2 , ..., xn ), t = (t1 , t2 , ..., tn ), and λ is the eigenvalue of the integral
equation. Also, for convenience in defining the first and second kind equations, the
constant μ ∈ R has been added to the left side of the equation. If this constant is
zero, it is called the first kind equation; otherwise, it is the second kind. In the general
case, this type of integral equation cannot be evaluated usually using exact closed-
form solutions, and powerful computational algorithms are required. However, some
works are available on numerical simulations of this model. For example, Mirzaei and
Dehghan (2010) moving least squares method for one- and two-dimensional Fred-
holm integral equations, Legendre collocation method for Volterra-Fredholm integral
equations Zaky et al. (2021), and Jacobi collocation method for multi-dimensional
Volterra integral equations Abdelkawy et al. (2017). To see more, the interested
reader can see Bhrawy et al. (2016), Esmaeilbeigi et al. (2017), and Mirzaee and
Alipour (2020).
values, and Volterra integral equations can be converted to differential equations with
initial values Parand et al. (2020).
In this section, an efficient method for the numerical simulation of integral equations
is proposed. Using the ideas behind weighted residual methods, the proposed tech-
nique introduces two different algorithms named collocation LS-SVR (CLS-SVR)
and Galerkin LS-SVR (GLS-SVR). For the sake of simplicity, here we denote an
integral equation in the operator form
N (u) = f, (9.3)
in which
N (u) = μu − K1 (u) − K2 (u).
Here, μ ∈ R is a constant which specifies the first or second kind integral equation for
μ = 0 and μ = 0, respectively. The operators K1 and K2 are Fredholm and Volterra
integral operators, respectively. These operators are defined as
K1 (u) = λ1 K 1 (x, t)u(t)dt,
1
and
K2 (u) = λ2 K 2 (x, t)u(t)dt.
2
The proposed method can solve a wide range of integral equations, so we split the
subject into three sections, based on the structure of the unknown function which
should be approximated.
In order to approximate the solution of Eq. 9.3 using the LS-SVR formulations, some
training data is needed. In contrast to the LS-SVR, in which there is a set of labeled
training data, there are no labels for any arbitrary set of training data in solving
Eq. 9.3. To handle this problem, the approximate solution is expanded as a linear
combination of unknown coefficients, and some basis functions ϕi for i = 1, . . . , d
206 A. Ahmadzadeh et al.
d
u(x) ũ(x) = w T ϕ(x) + b = wi ϕi (x) + b. (9.4)
i=1
1 T γ
min w w + eT e
w,e 2 2 (9.5)
s.t. N (ũ) − f, ψk = ek , k = 1, . . . , n,
in which n is the number of training points, {ψk }nk=1 is a set of test functions in the
test space, and ·, · is the inner product of two functions. In order to take advantage
of the kernel trick, the dual form of this optimization problem is constructed. If the
operator N is linear, the optimization problem of Eq. 9.5 is convex, and the dual form
can be derived easily. To do so, we denote the linear operators as L and construct
the Lagrangian function
1 γ n
L(w, e, α) = w T w + e T e − αk L ũ − f, ψk − ek , (9.6)
2 2 k=1
in which αk ∈ R are Lagrangian multipliers. The conditions for the optimality of the
Eq. 9.6 yield
⎧
⎪ ∂L
n
⎪
⎪ = 0 → wk = αi Lϕk , ψi , k = 1, . . . , d.
⎪
⎪
⎪ ∂wk
⎪
⎪ i=0
⎪
⎪
⎪ ∂L
⎪
⎪
⎪ = 0 → γ ek + αk = 0, k = 1, . . . , n.
⎨ ∂ek
⎪ ∂L
n (9.7)
⎪
⎪ =0→ αi L1, ψi = 0,
⎪
⎪ ∂b
⎪
⎪
⎪
⎪
i=0
⎪
⎪ d
⎪
⎪ ∂L
⎪
⎩ ∂α =0→ wi Lϕi − f, ψk + L0, ψi = ek . k = 1, . . . , n.
k i=1
in which
α = [α1 , . . . , αn ]T ,
y = [ f, ψ1 , f, ψ2 , . . . , f, ψn ]T , (9.9)
= L1, ψi .
L
9 Solving Integral Equations by LS-SVR 207
i, j = Lϕ, ψi T Lϕ, ψ j
= LL K (x, t), ψi , ψ j , i, j = 1, 2, . . . , n,
with any valid Mercer kernel K (x, t). The approximate solution in the dual form
takes the form
n
ũ(x) = (x, xi ) + b,
αi K (9.10)
i=1
where
(x, xi ) = Lϕ, ψi , ϕ = L K (x, t), ψi .
K
d
d
u(x, y) ũ(x, y) = wi, j ϕi (x)ϕ j (y) + b.
i=1 j=1
Note that the upper summation bound d and basis functions ϕ can vary in each
dimension. Fortunately, there is no need to reconstruct the proposed model. In order to
use LS-SVR for solving multi-dimensional equations, we can vectorize the unknown
tensor w, basis functions ϕi , ϕ j , and training points X . For example, in the case of
2D integral equations, we first vectorize the d × d matrix w:
wi, j = wi∗d+ j .
can be used. Also, this indexing should be applied to the basis function and training
data tensor. After using this technique, the proposed dual form Eq. 9.8 can be utilized
208 A. Ahmadzadeh et al.
for a one-dimensional case. Solving the dual form returns the vector α which can be
converted to a tensor using the inverse of the indexing function.
In the system of integral equations, there are k equations and unknown functions:
Ni (u 1 , u 2 , . . . , u k ) = f i , i = 1, 2, . . . , k. (9.11)
For solving these types of equations, the approximated solution can be formulated
as follows:
ũ i (x) = wiT ϕ(x) + bi , i = 1, 2, . . . , k,
where ϕ(x) is feature map vector, and wi and bi are the unknown coefficients. In the
next step, the unknown coefficients are set side by side in a vector form; thus, we
have
Same as the previous formulation for solving high-dimensional equations, the basis
functions can vary for each approximate function, but for simplicity, the same func-
tions are used. Since these functions are shared, they can be seen in a d-dimensional
vector. Using this formulation, the optimization problem for solving Eq. 9.11 can be
constructed as
1 T γ
min w w + eT e
w,e 2 2 (9.12)
s.t. Ni (ũ 1 , ũ 2 , . . . , ũ k ) − f i , ψ j = ei, j , j = 1, . . . , n,
where i = 1, 2, . . . , k. Also, the matrix ei, j should be vectorized the same as unknown
coefficients w. For any linear operator N , denoted by L, the dual form of the opti-
mization problem Eq. 9.12 can be derived. Here, we obtain the dual form for a system
of two equations and two unknown functions. This process can be generalized for
the arbitrary number of equations.
Suppose the following system of equations is given:
L1 (u 1 , u 2 ) = f 1
.
L2 (u 1 , u 2 ) = f 2
1 T γ
min w w + eT e
w,e 2 2
s.t. L1 (ũ 1 , ũ 2 ) − f 1 , ψ j = e j , j = 1, . . . , n,
s.t. L2 (ũ 1 , ũ 2 ) − f 2 , ψ j = e j , j = n + 1, . . . , 2n,
where
w = [w1 , w2 ] = [w1,1 , w1,2 , . . . , w1,d , w2,1 , w2,2 , . . . , w2,d ],
e = [e1 , e2 ] = [e1,1 , e1,2 , . . . , e1,d , e2,1 , e2,2 , . . . , e2,d ].
1 T γ
L(w, e, α) = w w + eT e
2 2
n
− α j L1 (ũ 1 , ũ 2 ) − f 1 , ψ j − e j
j=1
n
− αn+ j L2 (ũ 1 , ũ 2 ) − f 2 , ψ j − en+ j ,
j=1
then the conditions for optimality of the Lagrangian function are given by
⎧ n
⎪
⎪
⎪
⎪ α j L1 (ϕk , 0) − f 1 , ψ j − e j +
⎪
⎪
⎪
⎪
⎪
⎪
j=1
⎪
⎪ n
⎪
⎪ αn+ j L2 (ϕk , 0) − f 2 , ψ j − en+ j , k = 1, 2, . . . , d.
⎪
⎨
∂L j=1
= 0 → wk = n
∂wk ⎪
⎪
⎪
⎪ α j L1 (0, ϕk ) − f 1 , ψ j − e j +
⎪
⎪
⎪
⎪
⎪
⎪
j=1
⎪
⎪ n
⎪
⎪ αn+ j L2 (0, ϕk ) − f 2 , ψ j − en+ j , k = d + 1, d + 2, . . ., 2d.
⎪
⎩
j=1
(9.13)
210 A. Ahmadzadeh et al.
∂L
= 0 → γ ek + αk = 0, k = 1, . . . , 2n.
∂ek
⎧
⎪
⎪ ∂L n n
⎪
∂L ⎨ ∂b1 = 0 →
⎪ αi L1 (1, 0), ψi + αn+i L2 (1, 0), ψi ,
=0→
i=1
n
i=1
n
∂b ⎪
⎪ ∂L
⎪
⎪ = 0 → α L (0, 1), ψ + αn+i L2 (0, 1), ψi ,
⎩ ∂b2 i 1 i
i=1 i=1
⎧
⎪
⎪ d
⎪
⎪ w j L1 (ϕ j , 0) − f 1 , ψk +
⎪
⎪
⎪
⎪
⎪
⎪
j=1
⎪
⎪ d
⎪
⎪
⎪
⎪ wd+ j L1 (0, ϕ j ) − f 1 , ψk = ek k = 1, 2, . . . , n.
∂L ⎨
j=1
=0→
∂αk ⎪
⎪ d
⎪
⎪ w j L2 (ϕ j , 0) − f 2 , ψk +
⎪
⎪
⎪
⎪
⎪
⎪
j=1
⎪
⎪ d
⎪
⎪
⎪
⎪ wd+ j L2 (0, ϕ j ) − f 2 , ψk = ek k = n + 1, n + 2, . . . , 2n.
⎩
j=1
(9.14)
By defining
Ai, j = L1 (ϕi , 0), ψ j ,
Bi, j = L2 (ϕi , 0), ψ j ,
Ci, j = L1 (0, ϕi ), ψ j ,
Di, j = L2 (0, ϕi ), ψ j ,
E j = L1 (1, 0), ψi ,
F j = L2 (1, 0), ψi ,
G j = L1 (0, 1), ψi ,
H j = L2 (0, 1), ψi ,
and
A B
Z= ,
C D
E F
V = ,
G H
where
α = [α1 , . . . , α2n ]T ,
y = [ f 1 , ψ1 , f 1 , ψ2 , . . . , f 1 , ψn , f 2 , ψ1 , f 2 , ψ2 , . . . , f 2 , ψn ]T ,
and
AT C T A B
= ZT Z = . (9.16)
B T DT C D
The kernel trick also appears at each block of matrix . The approximated solution
in the dual form can be computed using
n
n
ũ 1 (x) = 1 (x, xi ) +
αi K 2 (x, xi ) + b1 ,
αi K
i=1 i=1
n n
ũ 2 (x) = 3 (x, xi ) +
αi K 4 (x, xi ) + b2 ,
αi K
i=1 i=1
where
1 (x, xi ) = L1 (ϕ, 0), ψi , ϕ = L1 (K (x, t), 0), ψi ,
K
2 (x, xi ) = L2 (ϕ, 0), ψi , ϕ = L2 (K (x, t), 0), ψi ,
K
3 (x, xi ) = L1 (0, ϕ), ψi , ϕ = L1 (0, K (x, t)), ψi ,
K
4 (x, xi ) = L2 (0, ϕ), ψi , ϕ = L2 (0, K (x, t)), ψi .
K
In this section, the collocation form of the LS-SVR model is proposed for solving
integral equations. Similar to the weighted residual methods, by using the Dirac delta
function as the test function in the test space, we can construct the collocation LS-
SVR model, abbreviated as CLS-SVR. In this case, the primal form of optimization
problem Eq. 9.5 can be expressed as
1 T γ
min w w + eT e
w,e 2 2 (9.17)
s.t. N (ũ)(xk ) − f (xk ) = ek , k = 1, . . . , n,
212 A. Ahmadzadeh et al.
in which
α = [α1 , . . . , αn ]T ,
y = [ f (x1 ), f (x2 ), . . . , f (xn )]T ,
= [L1(x1 ), L1(x2 ), . . . , L1(xn )] ,
L (9.19)
i, j = Lϕ(xi )T Lϕ(x j )
= LL K (xi , x j ), i, j = 1, 2, . . . , n.
The approximated solution in the dual form takes the following form:
n
ũ(x) = (x, xi ) + b,
αi K
i=1
where
(x, xi ) = Lϕ(xi )T ϕ(x) = L K (x, xi ).
K
The Galerkin approach is a famous method for solving a wide of problems. In this
approach, the test functions ψ are chosen equal to the basis functions. If the basis
functions are orthogonal together, this approach leads to a sparse system of algebraic
equations. Some examples of this feature are provided in the next section. For now,
let us define the model. In the primal space, the model can be constructed as follows:
1 T γ
min w w + eT e
w,e 2 2
(9.20)
s.t. [L ũ(x) − f (x)]ϕk (x)d x = ek , k = 0, . . . , d,
in which
α = [α1 , . . . , αn ]T ,
L= L1(x)ϕ1 (x)d x, L1(x)ϕ2 (x)d x, . . . , L1(x)ϕd (x)d x, ,
T
y= f (x)ϕ1 (x)d x, f (x)ϕ2 (x)d x, . . . , f (x)ϕd (x)d x ,
T
i, j = Lϕ(x)ϕi (x)d x Lϕ(x)ϕ j (x)d x
n
ũ(x) = (x, xi ) + b,
αi K
i=1
where
T
(x, xi ) =
K Lϕ(s)ϕi (s)ds ϕ(x) = L K (x, s)ϕi (s)ds.
In this section, some integral equations are considered as test problems and the
efficiency of the proposed method is shown by approximating the solution of these
test problems. Also, the shifted Legendre polynomials are used as the kernel of LS-
SVR. Since the Legendre polynomial of degree 0 is a constant,
d the bias term b can
be removed and the approximate solution is defined as i=0 wi Pi (x). As a result,
the system Eq. 9.8 reduces to α = y, which can be efficiently solved using the
Cholesky decomposition or the conjugate gradient method. The training points for
the following examples are the roots of the shifted Legendre polynomials, and the
test data are equidistant points in the problem domain.
This method has been implemented in Maple 2019 software with 15 digits of
accuracy. The results are obtained on an Intel Core i5 CPU with 8 GB of RAM. In
all of the presented numerical tables, the efficiency of the method is computed using
the mean absolute error function:
214 A. Ahmadzadeh et al.
Table 9.1 The convergence of the CLS-SVR and GLS-SVR methods for Example 9.1 by different
d values. The number of training points for each approximation is d + 1. The CPU time is also
reported in seconds
d CLS-SVR GLS-SVR
Train Test Time Train Test Time
4 7.34E-05 7.65E-05 0.01 3.26E-05 5.58E-05 0.08
6 1.96E-06 2.32E-06 0.02 8.80E-07 1.68E-06 0.17
8 5.35E-08 7.50E-08 0.03 2.41E-08 4.61E-08 0.21
10 1.47E-09 2.14E-09 0.03 6.65E-10 1.48E-09 0.26
12 5.12E-11 8.92E-11 0.07 1.90E-11 4.26E-11 0.38
1
n
L(u, ũ) = |u i − ũ i |,
n i=1
where u i and ũ i are the exact and the predicted value at xi , respectively.
Example 9.1 Suppose the following Volterra integral equation of the second kind
with the exact solution ln (1 + x) Wazwaz (2011).
x
1
x − x 2 − ln(1 + x) + x 2 ln(1 + x) = 2tu(t)dt.
2 0
In Table 9.1, the errors of the approximate solutions for the CLS-SVR and GLS-
SVR are given. Figure 9.1 shows the approximate and the residual function of the
approximate solution. Also, the non-zero elements of the matrix of the GLS-SVR
method are drawn. It is observed that most of the matrix elements are approximately
zero.
Example 9.2 Consider the following Fredholm integral equation of the first kind.
As stated before, these equations are ill-posed, and their solution may not be unique.
A solution for this equation is exp(x) Wazwaz (2011):
1
1 x 4
e = e x−t u(t)dt.
4 0
In Table 9.2, the obtained results of solving these equations with different values for
γ with d = 6 and n = 7 are reported.
[H]
(a)
(b)
(c) (d)
Fig. 9.1 The plots of Example 9.1. a Exact versus LS-SVR. b Absolute of the residual function.
c, d Sparsity of matrix in the GLS-SVR with fuzzy zero 10−3 and 10−4
216 A. Ahmadzadeh et al.
Table 9.2 The obtained solution norm and training error of the CLS-SVR method for Example 9.2
γ w 2 e 2
1E-01 0.041653 0.673060
1E+00 0.312717 0.505308
1E+01 0.895428 0.144689
1E+02 1.100491 0.017782
1E+03 1.126284 0.001820
1E+04 1.128930 0.000182
1E+05 1.129195 0.000018
Table 9.3 The convergence of the CLS-SVR and GLS-SVR method for Example 9.3 by different
d values. The number of training points for each approximation is d + 1. The CPU time is reported
in seconds
d CLS-SVR GLS-SVR
Train Test Time Train Test Time
4 1.53E-07 1.18E-04 0.02 4.22E-06 1.18E-04 0.13
6 1.16E-10 2.46E-07 0.04 6.04E-09 2.46E-07 0.18
8 5.90E-14 2.71E-10 0.04 5.13E-12 2.72E-10 0.28
10 4.12E-15 2.44E-13 0.07 2.21E-14 2.40E-13 0.38
12 6.84E-15 8.80E-15 0.12 2.12E-14 1.55E-14 0.60
The exact solution of this equation is exp(x) Malaikah (2020). Table 9.3 shows the
convergence of the proposed method for this equation. In Fig. 9.2 are plotted the
exact solution and the residual function of the approximate solution. Also, it is seen
that the matrix of the GLS-SVR method has good sparsity.
The exact solution of this equation is u(x, y) = x cos(y). The numerical results of
the CLS-SVR and GLS-SVR methods are given in Table 9.4. Figure 9.3 shows the
plot of the exact solution and the residual function. Also, the sparsity of the methods
in the two-dimensional case is attached. It can be seen that the sparsity dominates in
the higher dimensional problems Parand et al. (2021).
Example 9.5 Consider the following system of an integral equation with the exact
solution u 1 (x) = sin(x), u 2 (x) = cos(x) Wazwaz (2011).
9 Solving Integral Equations by LS-SVR 217
[H]
(a)
(b)
(c) (d)
Fig. 9.2 The plots of Example 9.3. a Exact versus LS-SVR. b Absolute of the residual function.
c, d Sparsity of the matrix in the GLS-SVR with fuzzy zero 10−3 and 10−4
218 A. Ahmadzadeh et al.
Table 9.4 The convergence of the CLS-SVR and GLS-SVR methods for Example 9.4 by different
d values. The CPU time is also reported in seconds
d n CLS-SVR GLS-SVR
Train Test Time Train Test Time
1 4 1.93E-03 1.93E-03 0.01 8.54E-04 8.12E-04 0.06
2 9 1.33E-05 1.33E-05 0.05 1.69E-05 5.86E-06 0.16
3 16 5.56E-08 5.56E-08 0.11 3.12E-07 2.34E-08 0.38
4 25 1.73E-10 1.73E-10 0.26 1.59E-08 8.16E-11 0.80
5 36 2.19E-11 2.19E-11 0.73 2.19E-11 2.19E-11 1.57
(a) (b)
(c) (d)
Fig. 9.3 The plots of Example 9.4. a Exact solution. b Absolute of the residual function. c, d
Sparsity of matrix in the GLS-SVR with fuzzy zero 10−3 and 10−4
9 Solving Integral Equations by LS-SVR 219
Table 9.5 The convergence of mean squared error value for the CLS-SVR and GLS-SVR methods
in Example 9.5 by different d values. The number of training points for each approximation is
d + 1. The CPU time is reported in seconds
d u1 u2 Time
Train Test Train Test
4 9.02E-13 1.20E-06 2.28E-13 2.00E-05 0.34
7 5.43E-28 6.04E-11 2.22E-27 1.67E-12 0.48
10 3.07E-26 3.35E-19 3.29E-27 2.18E-17 0.66
13 8.19E-27 1.36E-24 2.09E-27 3.86E-26 0.91
16 2.38E-27 2.31E-27 1.87E-28 1.85E-28 1.38
(a) (b)
(c) (d)
Fig. 9.4 The plots of Example 9.5. a Exact versus LS-SVR. b Absolute of the residual functions.
c, d Sparsity of matrix in the GLS-SVR with fuzzy zero 10−3 and 10−4
220 A. Ahmadzadeh et al.
⎧ π
⎪
⎨u 1 (x) = sin x − 2 − 2x − π x + ((1 + xt)u 1 (t) + (1 − xt)u 2 (t))dt,
0 π
⎪
⎩u 2 (x) = cos x − 2 − 2x + π x + ((1 − xt)u 1 (t) − (1 + xt)u 2 (t))dt.
0
The numerical simulation results of this example are given in Table 9.5. Also, Fig. 9.4
plots the exact solution and approximate solution of the methods. Since the matrix
in this case is a symmetric block matrix, the resulting matrix of the GLS-SVR has
an interesting structure.
In Fig. 9.4a, the the solutions and the approximations are not discriminable.
Table 9.6 The convergence of the CLS-SVR and GLS-SVR methods for Example 9.6 by different
d values. The number of training points for each approximation is d + 1. The CPU time is also
reported in seconds
d CLS-SVR GLS-SVR
Train Test Time Train Test Time
2 9.23E-05 8.66E-03 0.08 8.55E-04 8.76E-03 0.72
4 4.24E-07 8.45E-05 0.12 4.28E-06 8.5E-05 1.54
6 1.12E-09 3.22E-07 0.15 1.16E-08 3.23E-07 3.84
8 1.99E-12 6.97E-10 0.23 7.82E-11 7.39E-10 12.04
10 3.07E-13 1.22E-12 0.33 1.31E-10 1.71E-10 38.37
(a) (b)
Fig. 9.5 The plots of Example 9.5. a Exact versus LS-SVR. b Absolute of the residual function
9 Solving Integral Equations by LS-SVR 221
t π
1 1 1 2
u(x) = (35 cos(x) − 1) + sin(t)u 2 (t)dt + (cos3 (x) − cos(x))u(t)dt.
36 12 0 36 0
The exact solution of this equation is u(x) = cos(x) Since the equation is nonlinear,
the corresponding optimization problem leads to a nonlinear programming problem.
Also, the dual form yields a system of nonlinear algebraic equations. In Table 9.6
and Fig. 9.5, the numerical results and the convergence of the method can be seen.
9.5 Conclusion
In this chapter, a new computational method for solving different types of integral
equations, including multi-dimensional cases and systems of integral equations, is
proposed. In linear equations, learning the solution reduces to solving a positive defi-
nite system of linear equations. This formulation was similar to the LS-SVR method
for solving regression problems. By using the ideas behind spectral methods, we
have presented CLS-SVR and GLS-SVR methods. Although the CLS-SVR method
is more computationally efficient, the resulting matrix in the GLS-SVR method has a
sparsity property. In the last section, some integral equations have been solved using
the CLS-SVR and GLS-SVR methods. The numerical results show that these meth-
ods have high efficiency and exponential convergence rate for integral equations.
References
Abbasbandy, S.: Numerical solutions of the integral equations: homotopy perturbation method and
Adomian’s decomposition method. Appl. Math. Comput. 173, 493–500 (2006)
Abdelkawy, M.A., Amin, A.Z., Bhrawy, A.H., Machado, J.A.T., Lopes, A.M.: Jacobi collocation
approximation for solving multi-dimensional Volterra integral equations. Int. J. Nonlinear Sci.
Numer. Simul. 18, 411–425 (2017)
Amiri, S., Hajipour, M., Baleanu, D.: On accurate solution of the Fredholm integral equations of
the second kind. Appl. Numer. Math. 150, 478–490 (2020)
Amiri, S., Hajipour, M., Baleanu, D.: A spectral collocation method with piecewise trigonometric
basis functions for nonlinear Volterra-Fredholm integral equations. Appl. Math. Comput. 370,
124915 (2020)
Assari, P., Dehghan, M.: A meshless local discrete Galerkin (MLDG) scheme for numerically
solving two-dimensional nonlinear Volterra integral equations. Appl. Math. Comput. 350, 249–
265 (2019)
Assari, P., Dehghan, M.: On the numerical solution of logarithmic boundary integral equations
arising in laplace’s equations based on the meshless local discrete collocation method. Adv.
Appl. Math. Mech. 11, 807–837 (2019)
Assari, P., Asadi-Mehregan, F., Dehghan, M.: On the numerical solution of Fredholm integral
equations utilizing the local radial basis function method. Int. J. Comput. Math. 96, 1416–1443
(2019)
Babolian, E., Shaerlar, A.J.: Two dimensional block pulse functions and application to solve
Volterra-Fredholm integral equations with Galerkin method. Int. J. Contemp. Math. Sci. 6, 763–
770 (2011)
222 A. Ahmadzadeh et al.
Babolian, E., Masouri, Z., Hatamzadeh-Varmazyar, S.: Numerical solution of nonlinear Volterra-
Fredholm integro-differential equations via direct method using triangular functions. Comput.
Math. Appl. 58, 239–247 (2009)
Bahmanpour, M., Kajani, M.T., Maleki, M.: Solving Fredholm integral equations of the first kind
using Müntz wavelets. Appl. Numer. Math. 143, 159–171 (2019)
Barlette, V.E., Leite, M.M., Adhikari, S.K.: Integral equations of scattering in one dimension. Am.
J. Phys. 69, 1010–1013 (2001)
Bažant, Z.P., Jirásek, M.: Nonlocal integral formulations of plasticity and damage: survey of
progress. J. Eng. Mech. 128, 1119–1149 (2002)
Bhrawy, A.H., Abdelkawy, M.A., Machado, J.T., Amin, A.Z.M.: Legendre-Gauss-Lobatto collo-
cation method for solving multi-dimensional Fredholm integral equations. Comput. Math. Appl.
4, 1–13 (2016)
Bremer, J.: A fast direct solver for the integral equations of scattering theory on planar curves with
corners. J. Comput. Phys. 231, 1879–1899 (2012)
Brunner, H.: On the numerical solution of nonlinear Volterra-Fredholm integral equations by col-
location methods. SIAM J. Numer. Anal. 27, 987–1000 (1990)
Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary differential equations.
Adv. Neural Inf. Process. Syst. 31 (2018)
Dahm, K., Keller, A.: Learning light transport the reinforced way. In: International Conference on
Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing, pp. 181–195 (2016)
Dehghan, M.: Solution of a partial integro-differential equation arising from viscoelasticity. Int. J.
Comput. Math. 83, 123–129 (2006)
Dehghan, M., Saadatmandi, A.: Chebyshev finite difference method for Fredholm integro-
differential equation. Int. J. Comput. Math. 85, 123–130 (2008)
Derakhshan, M., Zarebnia, M.: On the numerical treatment and analysis of two-dimensional Fred-
holm integral equations using quasi-interpolant. Comput. Appl. Math. 39, 1–20 (2020)
El-Shahed, M.: Application of He’s homotopy perturbation method to Volterra’s integro-differential
equation. Int. J. Nonlinear Sci. Numer. Simul. 6, 163–168 (2005)
Esmaeilbeigi, M., Mirzaee, F., Moazami, D.: A meshfree method for solving multidimensional
linear Fredholm integral equations on the hypercube domains. Appl. Math. Comput. 298, 236–
246 (2017)
Eswaran, K.: On the solutions of a class of dual integral equations occurring in diffraction problems.
Proc. Math. Phys. Eng. Sci. 429, 399–427 (1990)
Fatahi, H., Saberi-Nadjafi, J., Shivanian, E.: A new spectral meshless radial point interpolation
(SMRPI) method for the two-dimensional Fredholm integral equations on general domains with
error analysis. J. Comput. Appl. 294, 196–209 (2016)
Ghasemi, M., Kajani, M.T., Babolian, E.: Numerical solutions of the nonlinear Volterra-Fredholm
integral equations by using homotopy perturbation method. Appl. Math. Comput. 188, 446–449
(2007)
Golberg, M.A.: Numerical Solution of Integral Equations. Springer, Berlin (2013)
Jafari, H., Ganji, R.M., Nkomo, N.S., Lv, Y.P.: A numerical study of fractional order population
dynamics model. Results Phys. 27, 104456 (2021)
Kanaun, S., Martinez, R.: Numerical solution of the integral equations of elasto-plasticity for a
homogeneous medium with several heterogeneous inclusions. Comput. Mater. Sci. 55, 147–156
(2012)
Keller, A., Dahm, K.: Integral equations and machine learning. Math. Comput. Simul. 161, 2–12
(2019)
Kulish, V.V., Novozhilov, V.B.: Integral equation for the heat transfer with the moving boundary. J.
Thermophys. Heat Trans. 17, 538–540 (2003)
Li, X.Y., Wu, B.Y.: Superconvergent kernel functions approaches for the second kind Fredholm
integral equations. Appl. Numer. Math. 167, 202–210 (2021)
9 Solving Integral Equations by LS-SVR 223
Lu, Y., Yin, Q., Li, H., Sun, H., Yang, Y., Hou, M.: Solving higher order nonlinear ordinary differen-
tial equations with least squares support vector machines. J. Ind. Manag. Optim. 16, 1481–1502
(2020)
Malaikah, H.M.: The adomian decomposition method for solving Volterra-Fredholm integral equa-
tion using maple. Appl. Math. 11, 779–787 (2020)
Maleknejad, K., Hashemizadeh, E., Ezzati, R.: A new approach to the numerical solution of Volterra
integral equations by using Bernstein’s approximation. Commun. Nonlinear Sci. Numer. Simul.
16, 647–655 (2011)
Maleknejad, K., Hadizadeh, M.: A new computational method for Volterra-Fredholm integral equa-
tions. Comput. Math. Appl. 37, 1–8 (1999)
Maleknejad, K., Nosrati Sahlan, M.: The method of moments for solution of second kind Fredholm
integral equations based on B-spline wavelets. Int. J. Comput. Math. 87, 1602–1616 (2010)
Maleknejad, K., Shahrezaee, M.: Using Runge-Kutta method for numerical solution of the system
of Volterra integral equation. Appl. Math. Comput. 149, 399–410 (2004)
Maleknejad, K., Almasieh, H., Roodaki, M.: Triangular functions (TF) method for the solution
of nonlinear Volterra-Fredholm integral equations. Commun. Nonlinear Sci. Numer. Simul. 15,
3293–3298 (2010)
Manam, S.R.: Multiple integral equations arising in the theory of water waves. Appl. Math. Lett.
24, 1369–1373 (2011)
Mandal, B.N., Chakrabarti, A.: Applied Singular Integral Equations. CRC Press, FL (2016)
Marzban, H.R., Tabrizidooz, H.R., Razzaghi, M.: A composite collocation method for the nonlin-
ear mixed Volterra-Fredholm-Hammerstein integral equations. Commun. Nonlinear Sci. Numer.
Simul. 16, 1186–1194 (2011)
Messina, E., Vecchio, A.: Stability and boundedness of numerical approximations to Volterra inte-
gral equations. Appl. Numer. Math. 116, 230–237 (2017)
Mikhlin, S.G.: Multidimensional Singular Integrals and Integral Equations. Elsevier (2014)
Miller, K.S., Ross, B.: An Introduction to the Fractional Calculus and Fractional Differential Equa-
tions. Wiley, New York (1993)
Mirzaee, F., Alipour, S.: An efficient cubic B?spline and bicubic B?spline collocation method for
numerical solutions of multidimensional nonlinear stochastic quadratic integral equations. Math.
Methods Appl. Sci. 43, 384–397 (2020)
Mirzaei, D., Dehghan, M.: A meshless based method for solution of integral equations. Appl. Numer.
Math. 60, 245–262 (2010)
Mohammad, M.: A numerical solution of Fredholm integral equations of the second kind based on
tight framelets generated by the oblique extension principle. Symmetry 11, 854–869 (2019)
Nemati, S., Lima, P.M., Ordokhani, Y.: Numerical solution of a class of two-dimensional nonlinear
Volterra integral equations using Legendre polynomials. J. Comput. Appl. 242, 53–69 (2013)
Oldham, K., Spanier, J.: The Fractional Calculus theory and Applications of Differentiation and
Integration to Arbitrary Order. Elsevier, Amsterdam (1974)
Parand, K., Delkhosh, M.: Solving the nonlinear Schlomilch’s integral equation arising in iono-
spheric problems. Afr. Mat. 28, 459–480 (2017)
Parand, K., Rad, J.A.: An approximation algorithm for the solution of the singularly perturbed
Volterra integro-differential and Volterra integral equations. Int. J. Nonlinear Sci. 12, 430–441
(2011)
Parand, K., Rad, J.A.: Numerical solution of nonlinear Volterra-Fredholm-Hammerstein integral
equations via collocation method based on radial basis functions. Appl. Math. Comput. 218,
5292–5309 (2012)
Parand, K., Abbasbandy, S., Kazem, S., Rad, J.A.: A novel application of radial basis functions for
solving a model of first-order integro-ordinary differential equation. Commun. Nonlinear Sci.
Numer. Simul. 16, 4250–4258 (2011)
Parand, K., Delafkar, Z., Pakniat, N., Pirkhedri, A., Haji, M.K.: Collocation method using sinc and
Rational Legendre functions for solving Volterra’s population model. Commun. Nonlinear Sci.
Numer. Simul. 16, 1811–1819 (2011)
224 A. Ahmadzadeh et al.
Parand, K., Rad, J.A., Nikarya, M.: A new numerical algorithm based on the first kind of modified
Bessel function to solve population growth in a closed system. Int. J. Comput. Math. 91, 1239–
1254 (2014)
Parand, K., Hossayni, S.A., Rad, J.A.: An operation matrix method based on Bernstein polynomials
for Riccati differential equation and Volterra population model. Appl. Math. Model. 40, 993–1011
(2016)
Parand, K., Yari, H., Taheri, R., Shekarpaz, S.: A comparison of Newton-Raphson method with
Newton-Krylov generalized minimal residual (GMRes) method for solving one and two dimen-
sional nonlinear Fredholm integral equations. Sema. 76, 615–624 (2019)
Parand, K., Aghaei, A.A., Jani, M., Ghodsi, A.: A new approach to the numerical solution of
Fredholm integral equations using least squares-support vector regression. Math. Comput. Simul.
180, 114–128 (2021)
Parand, K., Razzaghi, M., Sahleh, R., Jani, M.: Least squares support vector regression for solving
Volterra integral equations. Eng. Comput. 38(38), 789–796 (2022)
Rad, J.A., Parand, K.: Numerical pricing of American options under two stochastic factor models
with jumps using a meshless local Petrov-Galerkin method. Appl. Numer. Math. 115, 252–274
(2017)
Rad, J.A., Parand, K.: Pricing American options under jump-diffusion models using local weak
form meshless techniques. Int. J. Comput. Math. 94, 1694–1718 (2017)
Rahman, M.: Integral Equations and their Applications. WIT Press (2007)
Reichel, L.: A fast method for solving certain integral equations of the first kind with application
to conformal mapping. J. Comput. Appl. Math. 14, 125–142 (1986)
Tang, T., Xu, X., Cheng, J.: On spectral methods for Volterra integral equations and the convergence
analysis. J. Comput. Math. 26, 825–837 (2008)
Unterreiter, A.: Volterra integral equation models for semiconductor devices. Math. Methods Appl.
Sci. 19, 425–450 (1996)
Volterra, V.: Variations and fluctuations of the number of individuals in animal species living
together. ICES Mar. Sci. Symp. 3, 3–51 (1928)
Wang, G.Q., Cheng, S.S.: Nonnegative periodic solutions for an integral equation modeling infec-
tious disease with latency periods. In Intern. Math. Forum 1, 421–427 (2006)
Wang, S.Q., He, J.H.: Variational iteration method for solving integro-differential equations. Phys.
Lett. A 367, 188–191 (2007)
Wazwaz, A.M.: First Course in Integral Equations. A World Scientific Publishing Company (2015)
Wazwaz, A.M.: A reliable treatment for mixed Volterra-Fredholm integral equations. Appl. Math.
Comput. 127, 405–414 (2002)
Wazwaz, A.M.: Linear and Nonlinear Integral Equations. Springer, Berlin (2011)
Yousefi, S., Razzaghi, M.: Legendre wavelets method for the nonlinear Volterra-Fredholm integral
equations. Math. Comput. Simul. 70, 1–8 (2005)
Zaky, M.A., Ameen, I.G., Elkot, N.A., Doha, E.H.: A unified spectral collocation method for non-
linear systems of multi-dimensional integral equations with convergence analysis. Appl. Numer.
Math. 161, 27–45 (2021)
Chapter 10
Solving Distributed-Order Fractional
Equations by LS-SVR
Abstract Over the past years, several artificial intelligence methods have been
developed for solving various types of differential equations; due to their potential to
deal with different problems, so many paradigms of artificial intelligence algorithms
such as evolutionary algorithms, neural networks, deep learning methods, and sup-
port vector machine algorithms, have been applied to solve them. In this chapter,
an artificial intelligence method has been employed to approximate the solution of
distributed-order fractional differential equations, which is based on the combina-
tion of least squares support vector regression algorithm and a well-defined spectral
method named collocation approach. The importance of solving distributed-order
fractional differential equations is being used to model some significant phenomena
that existed in nature, such as the ones in viscoelasticity, diffusion, sub-diffusion,
and wave. We used the modal Legendre functions, which have been introduced in
Chap. 4 and have been applied several times in the previous chapters as the least
squares support vector regression algorithm’s kernel basis in the presented method.
The uniqueness of the obtained solution is proved for the resulting linear systems.
Finally, the efficiency and applicability of the proposed algorithm are determined by
the results of applying it to different linear and nonlinear test problems.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 225
J. A. Rad et al. (eds.), Learning with Fractional Orthogonal Kernel Classifiers in Support
Vector Machines, Industrial and Applied Mathematics,
https://doi.org/10.1007/978-981-19-6553-1_10
226 A. H. Hadian Rasanan et al.
10.1 Introduction
Fractional differential equations are frequently used for modeling real-world prob-
lems in science and engineering, such as fractional time evolution, polymer physics,
rheology, and thermodynamics Hilfer (2000), cognitive science Cao et al. (2021),
Hadian-Rasanan et al. (2021), neuroscience Datsko et al. (2015). Since the analytical
methods can only solve some special simple cases of fractional differential equations,
numerical methods are developed for solving these equations involving linear and
nonlinear cases. On the other hand, since the fractional calculus foundation, several
definitions have been presented by scientists such as Liouville Hilfer (2000), Caputo
(1967), Rad et al. (2014), Riesz Podlubny (1998), and Atangana-Baleanu-Caputo
(2017). So solving fractional differential equations is still a challenging problem.
In modeling some phenomena, the order of derivatives depends on time (i.e.,
derivative operators’ order as a function of t). It is for the memory property of the
variable-order fractional operators Heydari et al. (2020). A similar concept to the v
ariable- order differentiation is the distributed order differentiation, which indicates
a continuous form of weighted summation of various orders chosen in an interval.
In many problems, differential equations can have different orders. For instance, the
first-order differential equation can be written by the summation of various orders
along with their proper weight functions as follows
n
α
ω j Dt j u, (10.1)
j=0
in which α j s are descending and have equal distance to each other, ω j s can be
determined by using the data. It is worth mentioning, by calculating the limit of the
above equation, convert to the following form, which is convergence:
1
ω(x)D α u d x. (10.2)
0
solving a DOFDE given by Mashayekhi and Razzaghi (2016), Yuttanan and Razzaghi
(2019):
b
G 1 ( p, D p u(t))dp + G 2 (t, u(t), D αi u(t)) = F(t), t > 0, (10.3)
a
Najafi et al. analyzed the stability of three classes of DOFDEs subjected to the
nonnegative density function Najafi et al. (2011). Atanacković et al. studied the exis-
tence, and the uniqueness of mild and classical solutions for the specific general
form, which are arisen in distributed derivative models of viscoelasticity and identi-
fication theory Atanacković et al. (2007). He and his collaborators also studied some
properties of the distributed-order fractional derivative in viscoelastic rod solutions
Atanackovic et al. (2005). Refahi et al. presented DOFDEs to generalized the iner-
tia and characteristics of polynomial concepts concerning the nonnegative density
function Refahi et al. (2012). Aminikhah et al. employed a combined Laplace trans-
form and new Homotopy perturbation method for solving a particular class of the
distributed-order fractional Riccati equation Aminikhah et al. (2018). For a time
distributed order multi-dimensional diffusion-wave equation which is contained a
forcing term, Atanacković et al. reinterpreted a Cauchy problem Atanackovic et al.
(2009).
Katsikadelis proposed a method based on the finite-difference method to solve
both linear and nonlinear DOFDEs numerically Katsikadelis(2014). Dielthem and
m
Ford proposed a method for DOFDEs of the general form 0 A (r, D∗r u(t))dr =
f (t) where m ∈ R+ and D∗r is the fractional derivative of Caputo type of order r and
introduced its analysis Diethelm and Ford (2009). Zaky and Machado first derived
228 A. H. Hadian Rasanan et al.
the generalized necessary conditions for optimal control with dynamics described
by DOFDEs and then proposed a practical numerical scheme for solving these equa-
tions Zaky and Machado (2017). Hadian-Rasanan et al. provided an artificial neural
network framework for approximating various types of Lane-Emden equations such
as fractional-order, and Lane-Emden equations system Hadian-Rasanan et al. (2020).
Razzaghi and Mashayekhi presented a numerical method to solve the DOFDEs
based on hybrid function approximation Mashayekhi and Razzaghi (2016). Mashoof
and Refahi proposed methods based on the fractional-order integration’s operational
matrix with an initial value point for solving the DOFDEs Mashoof and Sheikhani
(2017). Li et al. presented a high order numerical scheme for solving diverse DOFDEs
by applying the reproducing kernel Li et al. (2017). Gao and Sun derived two implicit
difference schemes for two-dimensional DOFDEs Gao et al. (2016).
In the first chapter, SVM has been fully and comprehensively discussed.
Mehrkanoon et al. introduced a novel method based on LS-SVMs to solve ODEs
Mehrkanoon et al. (2012). Mehrkanoon and Suykens also presented another new
approach for solving delay differential equations Mehrkanoon et al. (2013). Ye et al.
presented an orthogonal Chebyshev kernel for SVMs Ye et al. (2006). Leake et al.
compared the application of the theory of connections by employing LS-SVMs Leake
et al. (2019). Baymani et al. developed a new technique by utilizing -LS-SVMs to
achieve the solution of the ODEs in an analytical form Baymani et al. (2016). Chu
et al. presented an improved method for the numerical solution of LS-SVMs. They
indicated that by using a reduced system of linear equations, the problem could be
solved. They believed that their proposed approach is about twice as effective as the
previous algorithms Chu et al. (2005). Pan et al. proposed an orthogonal Legendre
kernel function for SVMs using the properties of kernel functions and comparing it
to the previous kernels Pan et al. (2012). Ozer et al. introduced a new set of func-
tions with the help of generalized Chebyshev polynomials, and they also increase
the generalization capability of their previous work Ozer et al. (2011).
Lagaris et al. proposed a method based on artificial neural networks that can
solve some categories of ODEs and PDEs and later compared their method to those
obtained using the Galerkin finite element method Lagaris et al. (1998). Meade and
Fernandez illustrated theoretically how a feedforward neural network could be con-
structed to approximate arbitrary linear ODEs Meade et al. (1994). They also indi-
cated the way of directly constructing a feedforward neural network to approximate
the nonlinear ordinary differential equations without training Meade et al. (1994).
Dissanayake and Phan-Thien presented a numerical method for solving PDEs, which
is based on neural-network-based functions Dissanayake and Phan-Thien (1994).
To solve ODEs and elliptic PDEs, Mai-Duy and Tran-Cong proposed mesh-free
procedures that rely on multiquadric radial basis function networks Mai-Duy and
Tran-Cong (2001). Effati and Pakdaman proposed a new algorithm by utilizing feed-
forward neural networks for solving fuzzy differential equations Effati and Pakdaman
(2010). To solve the linear second kind integral equations of Volterra and Fredholm
types, Golbabai and Seifollahi presented a novel method based on radial basis func-
tion networks that applied a neural network as the approximate solution of the integral
10 Solving Distributed-Order Fractional Equations by LS-SVR 229
equations Golbabai and Seifollahi (2006). Jianyu et al. illustrated a neural network
to solve PDEs using the radial basis functions as the activation function of the hidden
nodes Jianyu et al. (2003).
The rest of the chapter is organized as follows. Some preliminaries are presented
in Sect. 10.2, including the fractional derivatives and the numerical integration. We
present the proposed method for simulating distributed-order fractional dynamics,
and we depict LS-SVR details in Sect. 10.3. Numerical results are given in Sect. 10.4
to show the validity and efficiency of the proposed method. In Sect. 10.5, we draw
concluding remarks.
10.2 Preliminaries
As we point out earlier, modal Legendre functions are used in the proposed algorithm.
We skip clarifying the properties and the procedure of making modal Legendre
functions in this chapter as they have been discussed thoroughly in Chap. 4. Albeit,
the fractional derivative focusing on the Caputo definition of the fractional derivative
and the numerical integration are described in this section.
By considering f (x) as a function, the Cauchy formula for n-th order can be obtained,
and by generalizing the Cauchy formula for non-integer orders, we can achieve the
Riemann-Liouville definition of fractional integral, and due to this, the well-known
Gamma function is used as the factorial function for non-integer numbers. Therefore,
by denoting the Riemann-Liouville definition of the fractional order of integral as
RL β
a Ix f (x), it can be defined as follows Hadian et al. (2020):
x
RL β 1
a Ix f (x) = (x − t)β−1 f (t) dt. (10.5)
(β) a
in which β is a real number which indicates the order of integral. Moreover, there
is another definition for fractional derivative, namely, Caputo definition, and by
β
denoting it as Ca Dx f (x), it can be defined as follows Hadian et al. (2020):
⎧ x
⎨ 1 f (k) (t)dt
if β ∈
/N
C β R L (k−β) (k) (k−β) a (x−t)β+1−k
a Dx f (x) = a Ix f (x) = , (10.6)
⎩ d β
if β ∈ N
dxβ
It is worth mentioning that Eqs. (10.7) and (10.8) are the most useful properties of
the Caputo derivative Hadian et al. (2020)
230 A. H. Hadian Rasanan et al.
⎧ (γ +1) γ −α
⎨ (γ −α+1) x , 0≤α≤γ
C α γ
0 Dx x = , (10.7)
⎩
0, α>γ
and since the Caputo derivative is a linear operator Hadian et al. (2020), we have
C α
a Dx (λ f (x) + μg(x)) = λ Ca Dxα f (x) + μ Ca Dxα g(x) λ, μ ∈ R. (10.8)
kn+1 f n 2ω
ωi = , 0≤i ≤n (10.11)
kn f n (xi ) f n+1 (xi )
In this section, first, the LS-SVR is introduced, then some of its characteristics are
discussed, then the considered method based on LS-SVR, which helps solve the
DOFDEs, is included. Having the concepts of LS-SVM, we present our solution
based on the LS-SVM regression in this section. Due to this, consider the following
equations
b
G 1 ( p, D p u(t)) dp + G 2 (t, u(t), D αi u(t)) = F(t), t ∈ [0, η], (10.12)
a
which ω = [ω0 , . . . , ω N ]T are the weight coefficients and φ = [φ0 , . . . , φ N ]T are the
basis functions. To determine the unknown coefficients, we consider the following
optimization problem
1 γ
min ω T ω + ε T ε, (10.17)
ω,ε 2 2
hence we have
Lu(ti ) − F(ti ) = εi (10.19)
232 A. H. Hadian Rasanan et al.
N
consider the Lagrangian L(u) = j=0 ω j l j we have
1 T γ N
L = ω ω + εT ε − λi (L(u(ti )) − F(ti ) − εi ), (10.20)
2 2 i=0
Matrix S is as below
b
Si j = L(φi (t j )) = g( p)D p φi (t j ) dp + Aφi (t j ), (10.24)
a
and by considering this equation, Mi j := D p φi (ti ) can be defined. We use the Gaus-
b ( p)
sian numerical integration to calculate a g( p)Mi j dp.
There exist two ways to determine numerical integration
b
N
f (x)d x ≈ ωi f (xi ) (10.25)
a i=1
1
S T Sλ + λ = f, (10.26)
2γ
By defining A := S T S + 1
2γ
I , the following equation is achieved
Aα = f . (10.27)
Remark 10.1 The Matrix A is positive definite if and only if for all vectors x ∈ Rn+1
we have x T Ax ≥ 0. So for every vector x, we can address
1 1
x T ST S + I x = x T S T Sx + x T x, (10.28)
2γ γ
Since the system Eq. 10.27 is positive definite and the sparseness of A with
solving Eq. 10.27, we can reach α, and with the help of criteria (I) of Eq. 10.23, we
can calculate ω.
In the nonlinear cases, the nonlinear equation can be converted to a sequence
of linear equations using Quasi-Linearization Method (QLM). Then we are able to
solve the obtained linear equations separately.
In the next section, some numerical examples are provided for indicating the
efficiency and accuracy of the presented method, and the convergence of the proposed
approach is represented.
In this section, various numerical examples are presented to express the mentioned
method’s accuracy and efficiency, here are three linear and three nonlinear examples,
and the comparison of our results with other related works. Moreover, the interval
= [0, T ] is considered as the intended domain. Additionally, the utilized norm for
comparing the exact solution and the approximated one, which is denoted by e 2
is defined as follows:
T 21
e 2 = (u(t) − u app (t))2 dt , (10.29)
0
234 A. H. Hadian Rasanan et al.
in which u(t) is the exact solution and the u app (t) is the approximated solution. All
numerical experiments are computed in Maple software on a 3.5 GHz Intel Core i5
CPU machine with 8 GB of RAM.
As the first example, consider the following equation Yuttanan and Razzaghi (2019)
1.5
t 1.8 − t 0.5
(3 − p)D p u(t) dp = 2 , (10.30)
0.2 ln(t)
in which u(t) = t 2 is the exact solution. Then by applying the proposed method, the
following equation will be concluded: Fig. 10.1 indicates the errors, for
Example 10.4.1 for three different aspects, which are the number of Gaussian points,
Gamma, and the number of basis functions.
By looking at Fig. 10.1a, it can be concluded that by increasing the Gaussian
points, the error is converging to zero; by a glance at Fig. 10.1b, it is obvious the
error is decreasing exponentially, and Fig. 10.1c, it seems that increasing the number
of bases, caused error reducing, constantly.
which u(t) = t 4 is the exact solution. Now, look at the following figure, which
demonstrates the error behavior in the different number of Gaussian points, amount
of Gamma, and the number of basis.
By considering Fig. 10.2a is shown that by increasing the Gaussian points, the error
converges to zero more. Figure 10.2b shows that the error reduced quite exponentially,
and Fig. 10.2c indicates that decreasing error is due to increasing the bases number.
10 Solving Distributed-Order Fractional Equations by LS-SVR 235
(a) Calculated error for different Gaus- (b) Calculated error for different
sian points numbers Gamma values
Fig. 10.1 Calculated error for three different aspects, for Example 10.4.1
This example has also been solved by Xu et al. (2019). Table 10.1 is a comparison
between the method mentioned earlier and the one that Xu proposed. It can be
concluded that for different Gaussian numbers, the proposed method is achieving
better results.
(a) Calculated error for different Gaussian (b) Calculated error for different Gamma
points numbers values
Fig. 10.2 Calculated error for three different aspects for Example 10.4.2
Table 10.1 The table of the Example 10.4.2 in comparison with Xu method Xu et al. (2019)
M Method of Xu et al. (2019) Presented method with N = 4
with N = 4
2 2.4768E-008 6.9035E-010
3 9.0026E-011 2.3935E-014
4 2.3632E-013 8.4114E-018
5 2.7741E-015 3.0926E-023
10 Solving Distributed-Order Fractional Equations by LS-SVR 237
u(0) = 0. (10.35)
u(t) = t 6 is the exact solution of this example. Now consider Fig. 10.3 which displays
an error in different aspects.
By looking at Fig. 10.3a, it is clear that as the Gaussian points are increasing, the
error is decreasing. Same in Fig. 10.3b, c when Gamma and number of bases are
decreasing, respectively, the error decrease too.
In Table 10.2, the results of the proposed technique and the method presented in
Xu et al. (2019) are compared. By a glance at the table, it is clear that the suggested
method is working way better than Xu’s method.
(a) Calculated error for different Gaussian (b) Calculated error for different Gamma
points numbers values
Fig. 10.3 Calculated error for three different aspects for Example 10.4.3
238 A. H. Hadian Rasanan et al.
Table 10.2 The table of the comparison of our results for Example 10.4.3 and Xu method Xu et al.
(2019)
M Method in Xu Proposed method Method of Xu Presented method
et al. (2019) with with N = 7 et al. (2019) with with N = 9
N =7 N =9
2 4.3849E-009 1.1538E-010 1.3008E-008 1.1550E-010
3 1.5375E-011 1.8923E-015 3.8919E-011 1.8932E-015
4 3.7841E-014 3.0791E-019 8.0040E-014 2.4229E-019
5 3.6915E-016 3.1181E-019 1.2812E-015 2.4578E-019
u(0) = 0. (10.37)
The exact solution is u(t) = t 3.1 . Now consider the following figure, which presents
that the error is converging to zero.
Figure 10.4a–c confirms that if the Gaussian points, Gamma, and number of basis
increases, respectively, then the error will converge to zero. Table 10.3 shows the
results obtained in Xu et al. (2019) and the result obtained by using the presented
method.
For the final example consider the following equation Xu et al. (2019)
1 2
t5 − t3
(6 − p)D p u(t) dp = (7.1) , (10.38)
120 0 ln(t)
u(0) = 0. (10.39)
Where u(t) = t 5 is the exact solution. Now suppose Fig. 10.5, which is showing the
behavior of error by considering different aspects.
10 Solving Distributed-Order Fractional Equations by LS-SVR 239
(a) Calculated error for different Gaussian (b) Calculated error for different Gamma
points numbers values
Fig. 10.4 Calculated error for three different aspects for Example 10.4.4
Table 10.3 The table of the comparison of our results for Example 10.4.4 and Xu method Xu et al.
(2019)
M Method in Xu Proposed method Method of Xu Presented method
et al. (2019) with with N = 6 et al. (2019) with with N = 14
N =6 N = 14
2 4.4119E-005 1.5437E-005 2.4317E-006 2.5431E-007
3 4.3042E-005 1.5435E-005 2.4984E-007 2.5221E-007
4 4.3034E-005 1.5435E-005 2.4195E-007 2.5221E-007
5 4.3034E-005 1.5435E-005 2.4190E-007 2.5221E-007
By looking at Fig. 10.5a c, it can be concluded that the error which is proposed
in Eq. 10.29 converges to zero, and in Fig. 10.5a, it converges to zero exponentially,
which means the approximate results are converging to the exact solution.
240 A. H. Hadian Rasanan et al.
(a) Calculated error for different Gaussian (b) Calculated error for different Gamma
points numbers values
Fig. 10.5 Calculated error for three different aspects for Example 10.4.5
Table 10.4 The table of the comparison of our results for Example 10.4.5 and Xu method Xu et al.
(2019)
N Method in Xu et al. (2019) Proposed method with M = 4
with M = 4
5 4.2915E-010 4.8113E-002
6 3.3273E-010 1.6388E-013
7 1.9068E-010 1.4267E-013
8 9.4633E-011 1.2609E-013
9 6.7523E-011 1.1283E-013
Table 10.4 indicates the comparison between our proposed method and the method
proposed in Xu et al. (2019). By studying this table, it can be concluded that besides
N = 5, our method is achieving results more accurately.
10 Solving Distributed-Order Fractional Equations by LS-SVR 241
10.5 Conclusion
In this chapter, the LS-SVR algorithm was utilized for solving the Distributed-order
fractional differential equations, which is based on utilizing modal Legendre as the
basis function. The mentioned algorithm has better accuracy in comparison with other
proposed methods for DOFDEs. On the other hand, the uniqueness of the solution
is provided. One of the important parameters for obtaining high precision in solving
this kind of equation is the Gamma parameter. The impact of this parameter on the
accuracy of the numerical algorithm is indicated in numerical examples. Moreover,
it has to be considered that the Gamma value cannot be increased too much because
it might become more than machine accuracy. Even though all of the computations
are done with MAPLE, which is a symbolic application, this point is considered. For
indicating the applicability of the proposed algorithm, five examples are proposed in
which the obtained accuracy of these examples is compared with other methods.
References
Aminikhah, H., Sheikhani, A.H.R., Rezazadeh, H.: Approximate analytical solutions of distributed
order fractional Riccati differential equation. Ain Shams Eng. J. 9, 581–588 (2018)
Atanackovic, T.M., Pilipovic, S., Zorica, D.: Time distributed-order diffusion-wave equation. II.
Applications of laplace and fourier transformations. Proc. R. Soc. A: Math. Phys. Eng. Sci. 465,
1893–1917 (2009)
Atanackovic, T.M., Budincevic, M., Pilipovic, S.: On a fractional distributed-order oscillator. J.
Phys. A: Math. Gen. 38, 6703 (2005)
Atanacković, T.M., Oparnica, L., Pilipović, S.: On a nonlinear distributed order fractional differential
equation. J. Math. Anal. 328, 590–608 (2007)
Atangana, A., Gómez-Aguilar, J.F.: A new derivative with normal distribution kernel: theory, meth-
ods and applications. Phys. A: Stat. Mech. Appl. 476, 1–14 (2017)
Bagley, R.L., Torvik, P.J.: Fractional calculus in the transient analysis of viscoelastically damped
structures. AIAA J. 23, 918–925 (1985)
Baymani, M., Teymoori, O., Razavi, S.G.: Method for solving differential equations. Am. J. Comput.
Sci. Inf. Eng. 3, 1–6 (2016)
Cao, K.C., Zeng, C., Chen, Y., Yue, D.: Fractional decision making model for crowds of pedestrians
in two-alternative choice evacuation. IFAC-PapersOnLine 50, 11764–11769 (2017)
Caputo, M.: Linear models of dissipation whose Q is almost frequency independent-II. Geophys.
J. Int. 13, 529–539 (1967)
Caputo, M.: Diffusion with space memory modelled with distributed order space fractional differ-
ential equations. Ann. Geophys. 46, 223–234 (2003)
Carpinteri, A., Mainardi, F.: Fractals and Fractional Calculus in Continuum Mechanics. Springer
(2014)
Chu, W., Ong, C.J., Keerthi, S.S.: An improved conjugate gradient scheme to the solution of least
squares SVM. IEEE Trans. Neural Netw. 16, 498–501 (2005)
Datsko, B., Gafiychuk, V., Podlubny, I.: Solitary travelling auto-waves in fractional reaction-
diffusion systems. Commun. Nonlinear Sci. Numer. Simul. 23, 378–387 (2015)
Diethelm, K., Ford, N.J.: Numerical analysis for distributed-order differential equations. J. Comput.
Appl. Math. 225, 96–104 (2009)
242 A. H. Hadian Rasanan et al.
Ding, W., Patnaik, S., Sidhardh, S., Semperlotti, F.: Applications of distributed-order fractional
operators: a review. Entropy 23, 110 (2021)
Dissanayake, M.W.M.G., Phan-Thien, N.: Neural-network-based approximations for solving partial
differential equations. Commun. Numer. Methods Eng. 10, 195–201 (1994)
Effati, S., Pakdaman, M.: Artificial neural network approach for solving fuzzy differential equations.
Inf. Sci. 180, 1434–1457 (2010)
Gao, G.H., Sun, Z.Z.: Two alternating direction implicit difference schemes for two-dimensional
distributed-order fractional diffusion equations. J. Sci. Comput. 66, 1281–1312 (2016)
Golbabai, A., Seifollahi, S.: Numerical solution of the second kind integral equations using radial
basis function networks. Appl. Math. Comput. 174, 877–883 (2006)
Hadian Rasanan, A.H., Bajalan, N., Parand, K., Rad, J.A.: Simulation of nonlinear fractional dynam-
ics arising in the modeling of cognitive decision making using a new fractional neural network.
Math. Methods Appl. Sci. 43, 1437–1466 (2020)
Hadian-Rasanan, A.H., Rad, J.A., Sewell. D. K.: Are there jumps in evidence accumulation, and
what, if anything, do they reflect psychologically? An analysis of Lévy-Flights models of decision-
making. PsyArXiv (2021). https://doi.org/10.31234/osf.io/vy2mh
Hadian-Rasanan, A.H., Rahmati, D., Gorgin, S., Parand, K.: A single layer fractional orthogonal
neural network for solving various types of Lane-Emden equation. New Astron. 75, 101307
(2020)
Hartley, T.T.: Fractional system identification: an approach using continuous order-distributions.
NASA Glenn Research Center (1999)
Heydari, M.H., Atangana, A., Avazzadeh, Z., Mahmoudi, M.R.: An operational matrix method
for nonlinear variable-order time fractional reaction-diffusion equation involving Mittag-Leffler
kernel. Eur. Phys. J. Plus 135, 1–19 (2020)
Hilfer, R.: Applications of Fractional Calculus in Physics. World Scientific (2000)
Jianyu, L., Siwei, L., Yingjian, Q., Yaping, H.: Numerical solution of elliptic partial differential
equation using radial basis function neural networks. Neural Netw. 16, 729–734 (2003)
Katsikadelis, J.T.: Numerical solution of distributed order fractional differential equations. J. Com-
put. Phys. 259, 11–22 (2014)
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial
differential equations. IEEE Trans. Neural Netw. Learn. Syst. 9, 987–1000 (1998)
Leake, C., Johnston, H., Smith, L., Mortari, D.: Analytically embedding differential equation con-
straints into least squares support vector machines using the theory of functional connections.
Mach. Learn. Knowl. Extr. 1, 1058–1083 (2019)
Li, X., Li, H., Wu, B.: A new numerical method for variable order fractional functional differential
equations. Appl. Math. Lett. 68, 80–86 (2017)
Mai-Duy, N., Tran-Cong, T.: Numerical solution of differential equations using multiquadric radial
basis function networks. Neural Netw. 14, 185–199 (2001)
Mashayekhi, S., Razzaghi, M.: Numerical solution of distributed order fractional differential equa-
tions by hybrid functions. J. Comput. Phys. 315, 169–181 (2016)
Mashoof, M., Sheikhani, A.R.: Simulating the solution of the distributed order fractional differential
equations by block-pulse wavelets. UPB Sci. Bull. Ser. A: Appl. Math. Phys 79, 193–206 (2017)
Mastroianni, G., Milovanovic, G.: Interpolation Processes: Basic Theory and Applications. Springer
Science & Business Media, Berlin (2008)
Meade, A.J., Jr., Fernandez, A.A.: The numerical solution of linear ordinary differential equations
by feedforward neural networks. Math. Comput. Model. 19, 1–25 (1994)
Meade, A.J., Jr., Fernandez, A.A.: Solution of nonlinear ordinary differential equations by feedfor-
ward neural networks. Math. Comput. Model. 20, 19–44 (1994)
Mehrkanoon, S., Suykens, J.A.: LS-SVM based solution for delay differential equations. J. Phys.:
Conf. Ser. 410, 012041 (2013)
Mehrkanoon, S., Falck, T., Suykens, J.A.: Approximate solutions to ordinary differential equations
using least squares support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 23, 1356–
1367 (2012)
10 Solving Distributed-Order Fractional Equations by LS-SVR 243
Najafi, H.S., Sheikhani, A.R., Ansari, A.: Stability analysis of distributed order fractional differential
equations. Abst. Appl. Anal. 2011, 175323 (2011)
Ozer, S., Chen, C.H., Cirpan, H.A.: A set of new Chebyshev kernel functions for support vector
machine pattern classification. Pattern Recognit. 44, 1435–1447 (2011)
Pan, Z.B., Chen, H., You, X.H.: Support vector machine with orthogonal Legendre kernel. In: 2012
International Conference on Wavelet Analysis and Pattern Recognition, pp. 125–130 (2012)
Parodi, M., Gómez, J.C.: Legendre polynomials based feature extraction for online signature veri-
fication. Consistency analysis of feature combinations. Pattern Recognit. 47, 128–140 (2014)
Podlubny, I.: Fractional differential equations: an introduction to fractional derivatives, fractional
differential equations, to methods of their solution and some of their applications. Elsevier (1998)
Rad, J.A., Kazem, S., Shaban, M., Parand, K., Yildirim, A.: Numerical solution of fractional differ-
ential equations with a Tau method based on Legendre and Bernstein polynomials. Math. Methods
Appl. Sci. 37, 329–342 (2014)
Refahi, A., Ansari, A., Najafi, H.S., Merhdoust, F.: Analytic study on linear systems of distributed
order fractional differential equations. Matematiche 67, 3–13 (2012)
Rossikhin, Y.A., Shitikova, M.V.: Applications of fractional calculus to dynamic problems of linear
and nonlinear hereditary mechanics of solids. Appl. Mech. Rev. 50, 15–67 (1997)
Sokolov, I. M., Chechkin, A.V., Klafter, J.: Distributed-order fractional kinetics (2004).
arXiv:0401146
Umarov, S., Gorenflo, R.: Cauchy and nonlocal multi-point problems for distributed order pseudo-
differential equations: part one. J. Anal. Appl. 245, 449–466 (2005)
Xu, Y., Zhang, Y., Zhao, J.: Error analysis of the Legendre-Gauss collocation methods for the
nonlinear distributed-order fractional differential equation. Appl. Numer. Math. 142, 122–138
(2019)
Ye, N., Sun, R., Liu, Y., Cao, L.: Support vector machine with orthogonal Chebyshev kernel. In:
18th International Conference on Pattern Recognition (ICPR’06), vol. 2, pp. 752–755 (2006)
Yuttanan, B., Razzaghi, M.: Legendre wavelets approach for numerical solutions of distributed
order fractional differential equations. Appl. Math. Model. 70, 350–364 (2019)
Zaky, M.A., Machado, J.T.: On the formulation and numerical simulation of distributed-order frac-
tional optimal control problems. Commun. Nonlinear Sci. Numer. Simul. 52, 177–189 (2017)
Part IV
Orthogonal Kernels in Action
Chapter 11
GPU Acceleration of LS-SVM, Based on
Fractional Orthogonal Functions
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 247
J. A. Rad et al. (eds.), Learning with Fractional Orthogonal Kernel Classifiers in Support
Vector Machines, Industrial and Applied Mathematics,
https://doi.org/10.1007/978-981-19-6553-1_11
248 A. Ahmadzadeh et al.
Nowadays, computers and their benefits have revolutionized human life. Due to their
vast applications, the need for accuracy and multi-feature application demand have
made them more complex. These complexities are due to data accessibility and pro-
cessing complexity. Chip vendors always try to produce memories with less latency
and broader bandwidth to overcome data accessibility. Besides, the parallel process-
ing approach overcomes the processing complexity Asghari et al. (2022). Parallel
processing is an approach to divide large and complex tasks into several small tasks.
This approach allows execution of the processing parts simultaneously. GPUs are
special-purpose processors mainly designed for graphical workloads. Using GPUs
for general-purpose computing enlightened the way for the emergence of a new era in
high-performance computing Owens et al. (2007), Ahmadzadeh et al. (2018), includ-
ing accelerators for cryptosystems Ahmadzadeh et al. (2018), Gavahi et al. (2015),
Luo et al. (2015), multimedia compression standards Xiao et al. (2019), scientific
computing Moayeri et al. (2020), Parand et al. (2021), machine learning and cluster-
ing accelerators Rahmani et al. (2016), simulation of molecular dynamics Allec et al.
(2019), and quantum computing simulation Moayeri et al. (2020). The designers are
gradually making improvements in GPU architecture to accelerate it, resulting in
real-time processing in many applications. Many smaller computational units han-
dle simple parallel tasks within a GPU that traditionally CPUs were supposed to
hold. This usage of GPUs instead of CPUs is called General-Purpose Computing on
Graphics Processing Unit (GPGPU). One admiring improvement of the GPGPUs
is the release of CUDA (Compute Unified Device Architecture), NVIDIA’s paral-
lel computing platform and programming model. CUDA was introduced in 2007,
allowing many researchers and scientists to deploy their compute-intensive tasks on
GPUs. CUDA provides libraries and APIs that could be used inside multiple pro-
gramming languages such as C/C++ and Python, general-purpose applications such
as multimedia compression, cryptosystems, machine learning, etc. Moreover, they
are generally faster than CPUs, performing more instructions in a given time. There-
fore, together with CPUs, GPUs provide heterogeneous and scalable computing,
acting as co-processors while reducing the CPU’s workload.
The rest of this chapter is organized as follows. In Sect. 11.2, the NVIDIA GPU
architecture and how it helps us as an accelerator is presented. PyCUDA is also
described in Sect. 11.2. In Sect. 11.3, the proposed programming model is analyzed
before acceleration. Section 11.4 defines the hardware and software requirements
for implementing our GPU acceleration platform. The method of accelerating the
Chebyshev kernel is described in Sect. 11.5 in detail. In Sect. 11.6, we propose critical
optimizations that should be considered for more speedup. Then, we recommend a
Quadratic Problem Solver (QPS) based on GPU in Sect. 11.7 for more speedup in
our platform. This chapter is concluded in Sect. 11.8.
11 GPU Acceleration of LS-SVM, Based on Fractional … 249
Fig. 11.1 GPU Architecture (CUDA Cores, Shared Memory, Special Function Units, WARP
Scheduler, and Register Files) Cheng et al. (2014)
Fig. 11.2 GPU kernel and thread hierarchy (blocks of threads and grid of blocks) Cheng et al.
(2014)
running. Instead, it is more efficient to execute only those pieces of programs suitable
to GPU-style parallelism, such as linear algebra code, and leave the other sections
of the program code for the CPU. Therefore, writing suitable programs that utilize
the GPU cores is intricate even for the algorithms well-suited to be programmed in
parallel. There has been a generation of libraries that provide GPU implementations
of standard linear algebra kernels (BLAS), which help separate code into pieces sub-
tasks used in these libraries and achieve higher performance Cheng et al. (2014). This
described the first limitation. The second limitation of GPUs for GPGPU computation
is that GPUs use a separate memory from the host memory. In other words, the GPU
has a special memory and access hierarchy and uses a different address space from
the host memory.
The host (the CPU) and the GPU cannot share data easily. This limitation of
communication between the host and GPU is especially problematic when both host
and GPU are working on shared data simultaneously. In this case, the data must
be returned, and the host must wait for the GPU and vice versa. The worse section
of this scenario is transferring data between the host and the GPU as it is slow,
252 A. Ahmadzadeh et al.
Fig. 11.3 GPU memory hierarchy (Global, Constant, Local, Shared, Registers) Cheng et al. (2014)
especially compared to the speed of host memory or the GPU dedicated memory.
The maximum speed of data transferred is limited to the PCI express bus speed,
as the GPU is connected to the host using the PCI express bus. Consequently, data
transmission often has the highest cost in GPU computation. Several methods have
been tried to avoid data transfers when costs exceed the gain of GPU computation
AlSaber et al. (2013). Therefore, GPU programming has many limitations that need
to be considered to achieve higher performance. For example, the amount of shared
memory capacity and the sequence of process executions and branch divergence, or
other bottlenecks, which are needed to be fixed by considering the GPU architecture.
The NVIDIA processor’s different architectures, such as Tesla, Maxwell, Pascal,
Volta, and Turing, are introduced here for better exploration.
• Tesla architecture emerged by introducing the GeForce 8800 product line, which
has unified the vertex and the pixel processor units. This architecture is based on
scalable array processors by facilitating an efficient parallel processor. In 2007,
C2050, another GPU with this architecture, found its way to the market Pienaar
et al. (2011). The performance of the Tesla C2050 reaches 515 GFLOPS in double-
11 GPU Acceleration of LS-SVM, Based on Fractional … 253
cores specially designed for deep learning to achieve more performance over the
regular CUDA cores. NVIDIA GPUs with this architecture, such as Tesla V100,
are manufactured in TSMC 12 nm FinFET process. Tesla V100 performance is 7.8
TFLOPS in double-precision and 15.7 TFLOPS in single-precision floating-point
computations. This GPU card is designed with a CUDA capability of 7.0, and each
SMs have 64 CUDA cores; the total cores are 5120 units and the memory size is
16 GB with 900 GB/s bandwidth, which employs the HBM2 memory features
Mei et al. (2016). The maximum power consumption is 250 W; designed very
efficiently for power consumption and performance per watt NVIDIA (2017).
• Another NVIDIA GPU architecture, which was introduced in 2018, is the Tur-
ing architecture, and the famous RTX 2080 Ti product is based on it NVIDIA
(2018). The Turing architecture enjoys the capability of real-time ray tracing with
dedicated ray-tracing processors and dedicated artificial intelligence processors
(Tensor Cores). The performance of 4352 CUDA cores at 1.35 GHz frequency is
11.7 TFLOPS in single-precision computation, which uses GDDR6/HBM2 mem-
ory controller NVIDIA (2018). Its global memory size is 11 GB, the bandwidth of
this product is 616 GB/s, and the maximum amount of shared memory per thread
block is 64 KB. However, the maximum power consumption is 250 W, manu-
factured with TSMC 12 nm FinFET process, and this GPU supports the compute
capability 7.5 Ahmadzadeh et al. (2018), NVIDIA (2018), Kalaiselvi et al. (2017),
Choquette et al. (2021).
CUDA provides APIs with libraries for C/C++ programming languages. However,
Python also has its own libraries for accessing CUDA APIs and GPGPU capabilities.
PyCUDA (2021) performs Pythonic access to the CUDA API for parallel computa-
tion in which claims:
1. PyCUDA has an automatic object clean-up after the object’s lifetime.
2. With some abstractions, it is more convenient than programming with NVIDIA’s
C-based runtime.
3. PyCUDA has all the CUDA’s driver API.
4. It supports automatic error checking.
5. PyCUDA’s base layer is written in C++; therefore, it is fast.
on SVM with Chebyshev’s first kind kernel function. The proposed work is divided
into two parts: the training (fit) function and the test (project) function.
At the beginning of the fit function, the two-nested for-loop calculates the K matrix.
The matrix K is 2D with the train size elements for each dimension. Inside the nested
loops, a Chebyshev function is calculated in each iteration. The recursive form of
Chebyshev calculation is not appropriate. In recursive functions, handling the stack
and always returning the results of each function call is problematic. GPU-based
implementations of the repeated memory access tasks by recursive functions are
not efficient. However, the Tn polynomial has an explicit form, which is called the
Chebyshev polynomial. For more details of the Chebyshev function and its explicit
form, refer to Chap. 3. The matrix P is the result of an inner product of matrix K
with the vectorized outer product on the target array. The next complex part of the
fit function is a quadratic problem (QP) solver. For the case of GPU acceleration,
CUDA has its own QP solver APIs.
In the project function (test or inference), the complex part is the iterations inside the
nested loop, which in turn contains a call to the Chebyshev function. As mentioned in
Sect. 11.3.1, we avoid recursive functions due to the lack of efficient stack handling
in GPUs, and thus using them is not helpful. Therefore, for the test function, we do
the same as we did in the training function for calling the Chebyshev function.
To check the CUDA toolkit and its functionality, execute this command
nvidia-smi
The above command results in Fig. 11.4 in which the GPU card information with
its running processes. In order to test PyCUDA, Program 1 which is implemented in
Python3 can be executed.
11 GPU Acceleration of LS-SVM, Based on Fractional … 257
In this program, the PyCUDA library and its compiler are loaded. Then a source
module is created, which contains C-based CUDA kernel code. At line 15, the func-
tion is called with a 4X4 matrix of threads inside a single grid.
For accelerating the Chebyshev Kernel, its explicit form (refer to Chap. 3) is more
applicable. As mentioned before, handling recursive functions due to the lack
of stack support in GPUs is complex. Therefore, the T n 1 statements from the
Chebyshev_Tn function should be replaced with its explicit form as listed in
Program 2.
1 The statement of one iteration in the recursive form of a first kind Chebyshev (refer to Chap. 3).
258 A. Ahmadzadeh et al.
In the next step, let’s take a look at the statement that calls the function
Chebyshev. The third line in Program 3, means calling Chebyshev(X[i],
X[j], n=3, f=’r’). This function is called n_sample2 times, and as n_sample
equals 60, then it will be called 3600 times.
1: for i in range(n_samples):
2: for j in range(n_samples):
3: K[i, j] = self.kernel(X[i], X[j])
ture mentioned previously, the GPU’s global memory is located on the device. Hence,
the x and y variables should be copied to the device memory from the host side (the
main memory from the CPU side). Therefore, for small and sequential applications,
this process will take a longer time in comparison with the CPU execution without
GPU.
This program should transfer a massive block of data in single access to reduce
the number of accesses and minimize the memory access latency. Moreover, when a
GPU is active, all its functional units consume energy; hence utilizing more threads is
more efficient. In Program 5, the listed program breaks the inner for-loop (second line
of Program 3), then it calls the Chebyshev function only 60 times. According to
Program 5, to break the mentioned for-loop, the threadIdx.x is used. The matrix
data has been considered as a linear array. Each row of the matrix y is followed
after the first row of it in a single line. Therefore, the starting index of each row is
calculated with threadIdx.x * 4.
Before calling the CUDA Kernel function, we have to load all the needed variables
and array elements on the GPU device’s global memory. The mem_alloc instruc-
tion allocates the required memory on the device. After that, the memcpy_htod
copies the data from the host to the device. As shown in lines 8 and 12, the x_gpu
and y_gpu are the allocated memory on the device. The x variable is a single row
of the X matrix, containing 60 rows with four features. We send all the X matrix as
260 A. Ahmadzadeh et al.
the y input. In line 35, the block size for the threads is defined. In this code, the
block contains 60 (n_samples) threads in the x dimension, and its y dimension is
set to 1, which shows a single dimension block is used. In order to break the other
for-loop (first line from Program 3), a two-dimensional block with 60 ∗ 60 threads
could be used. In this case, together with reducing memory access from the host site
for getting p_cheb outputs or writing x and y variables on the device, the amount
of data elements would be reduced as you do not need to copy the y variable to the
y_gpu.
32: """)
33: func = mod.get_function("Chebyshev")
34: func(x_gpu,y_gpu,p_cheb_gpu, block=(n_samples,1,1))
35: p_cheb = np.empty_like(p_cheb)
36: cuda.memcpy_dtoh(p_cheb, p_cheb_gpu)
37: K[i] = p_cheb;
At the end of the code (Program 5) in line 37, the CPU reads the whole data
from the p_cheb_gpu location with the command memcpy_dtoh, which means
copying the memory from the device to the host. Therefore, each GPU execution
brings a row containing 60 elements for the K matrix.
In this section, we focus on Program 5. Although it breaks 3600 calls of the Chebyshev
function into only 60 calls, it is not efficient yet. This is because the execution time
on the CPU is less than on GPU. Therefore, for acceleration, it is worth considering
these hints:
1. The object SourceModule (line 15 of Program 5) is not a compiled source code.
Hence, having this statement inside a for-loop is inefficient because of the extra
time for compilation in each iteration. We may move lines 15 to 33 of Program 5
outside the for-loop before line 1.
2. Having memory allocation in each iteration of a for-loop is time-consuming.
Therefore, we should allocate the needed memory first, and then use it in every
iteration.
3. It is more efficient to increase the utilization of CUDA cores. This will be reached
by hiring more threads in our implementation. This issue will increase parallelism
and also reduces memory communications.
After the optimizations, a 2.2X speedup is obtained on Tesla T4 GPU over the
CPU in a Colab machine containing two VCPUs of intel Xeon processor working at
2.20 GHz and with 13 GB memory. You may download the codes from the footnote
link.2 In the chosen SVM classification code with the first kind of Chebyshev kernel,
there are two parts, fit function and predict function. Significantly, the speedup for
the fit function is 58X over the CPU. The nested loops inside the fit function result
in more array-based computations. Therefore, the nature of our algorithm directly
affects the rate of acceleration.
2 https://github.com/sampp098/SVM-Kernel-GPU-acceleration-.
262 A. Ahmadzadeh et al.
The following shows the problem formulation. The standard form of the QP is
used as follows:
min 21 (x T )P x + q T .x
subject to : Gx <= h, and Ax = b
However, the QPTH define a quadratic program layer as
min 21 (z T )Qz + p T .z
subject to : Gz <= h, and Az = b
The differences are only at the name of variables. Therefore, we replace the
following statement:
solution = cvxopt.solvers.qp(P, q, G, h, A, b)
to:
11 GPU Acceleration of LS-SVM, Based on Fractional … 263
solution = QPFunction(verbose=False)(P, q, G, h, A, b)
It is also necessary to change all the CVXOPT matrices to NumPy form, for example
P = np.outer(y, y) * K
This library will automatically send the process to execute on the GPU device. The
PyTorch package also can be used for matrix multiplication and also other matrix
operations on the GPU device.
11.8 Conclusion
In this chapter, the internal architecture of some NVIDIA GPUs was explained briefly.
The GPUs are the many-core processors that have their memories. They are tradition-
ally used only for graphical processing, for example, rendering videos or enhancing
the graphics in computer games. Due to the structure of these devices, recently, their
usage has significantly changed toward general-purpose applications, i.e., GPGPU
programming. Machine Learning applications, with massive datasets and deep neu-
ral networks, are the most used applications nowadays running on GPU devices. The
same as a specific application, for the SVM Kernel trick on classification problems,
GPU devices can perform better when coupled with the CPUs.
Moreover, some methods for the GPGPU on the previous LS-SVM implementa-
tions (especially the first kind of Chebyshev kernel) are proposed. Instead of CUDA,
in our Python implementation, PyCUDA package is used which results in an accept-
able performance. An important issue is a gap between the main memory and the
device memory on GPU devices. Therefore, it should be noted that more access to the
device memory will result in lower performance on the execution. Also, it should be
noted that the GPGPU shows its superiority when there is a large dataset. However,
the structure of a GPGPU program plays a significant role in getting speedups. In the
first part of this chapter, GPU devices’ architecture is explained, which is needed to
be known to optimize the GPU kernels. There exist many libraries that use GPUs as
their processors in the background, far from the user side. If the developer does not
have enough knowledge of parallel programming, these libraries are the best choices
to be employed.
We have used the mentioned acceleration methods and hints based on GPGPU,
and the SVM application with the first kind of Chebyshev function as its kernel
is edited based on these methods. The experiments show that a 2.2X speedup (for
both fit and test function) is gained over the CPU on Colab’s Tesla T4 GPU. The
partial optimization, only on fit function, resulted in a better speedup of about 58X
due to the structure of this function. The main leads for getting performance are
264 A. Ahmadzadeh et al.
reducing memory access together with increasing the CUDA core utilization. In
the optimization process, unwanted extra instructions inside the loops, like memory
allocations and compilation processes, are omitted.
References
Ahmadzadeh, A., Hajihassani, O., Gorgin, S.: A high-performance and energy-efficient exhaustive
key search approach via GPU on DES-like cryptosystems. J. Supercomput. 74, 160–182 (2018)
Allec, S.I., Sun, Y., Sun, J., Chang, C.E.A., Wong, B.M.: Heterogeneous CPU+ GPU-enabled
simulations for DFTB molecular dynamics of large chemical and biological systems. J. Chem.
Theory Comput. 15, 2807–2815 (2019)
AlSaber, N., Kulkarni, M.: Semcache: Semantics-aware caching for efficient gpu offloading. In:
Proceedings of the 27th International ACM Conference on International Conference on Super-
computing, pp. 421–432 (2013)
Amos, B., Kolter, J.Z.: OptNet: differentiable optimization as a layer in neural networks. In: Inter-
national Conference on Machine Learning, pp. 136–145. PMLR (2017)
Asghari, M., Hadian Rasanan, A.H., Gorgin, S., Rahmati, D., Parand, K.: FPGA-orthopoly: a
hardware implementation of orthogonal polynomials. Eng. Comput. (2022). https://doi.org/10.
1007/s00366-022-01612-x
Cheng, J., Grossman, M., McKercher, T.: Professional CUDA c Programming. Wiley, Amsterdam
(2014)
Choquette, J., Gandhi, W., Giroux, O., Stam, N., Krashinsky, R.: Nvidia a100 tensor core gpu:
Performance and innovation. IEEE Micro. 41, 29–35 (2021)
Corporation, N.: CUDA Zone (2019). https://developer.nvidia.com/cuda-zone
Dalrymple R.A.: GPU/CPU Programming for Engineers Course, Class 13 (2014)
Doi, J., Takahashi, H., Raymond, R., Imamichi, T., Horii, H.: Quantum computing simulator on a
heterogenous hpc system. In: Proceedings of the 16th ACM International Conference on Com-
puting Frontiers, pp. 85–93 (2019)
Gavahi, M., Mirzaei, R., Nazarbeygi, A., Ahmadzadeh, A., Gorgin, S.: High performance GPU
implementation of k-NN based on Mahalanobis distance. In: 2015 International Symposium on
Computer Science and Software Engineering (CSSE), pp. 1–6 (2015)
Kalaiselvi, T., Sriramakrishnan, P., Somasundaram, K.: Survey of using GPU CUDA programming
model in medical image analysis. Inform. Med. Unlocked 9, 133–144 (2017)
Luo, C., Fei, Y., Luo, P., Mukherjee, S., Kaeli, D.: Side-channel power analysis of a GPU AES
implementation. In: 2015 33rd IEEE International Conference on Computer Design (ICCD), pp.
281–288 (2015)
Mei, X., Chu, X.: Dissecting GPU memory hierarchy through microbenchmarking. IEEE Trans.
Parallel Distrib. Syst. 28, 72–86 (2016)
Moayeri, M.M., Hadian Rasanan, A.H., Latifi, S., Parand, K., Rad, J.A.: An efficient space-splitting
method for simulating brain neurons by neuronal synchronization to control epileptic activity.
Eng. Comput. 1–28 (2020)
Nvidia, T.: NVIDIA GeForce GTX 750 Ti: Featuring First-Generation Maxwell GPU Technology,
Designed for Extreme Performance per Watt (2014)
Nvidia, T.: NVIDIA Turing GPU Architecture: Graphics Reinvented (2018)
Nvidia, T.: P100. The most advanced data center accelerator ever built. Featuring Pascal GP100,
the world’s fastest GPU (2016)
Nvidia, T.: V100 GPU architecture. The world’s most advanced data center GPU. Version WP-
08608-001_v1 (2017)
11 GPU Acceleration of LS-SVM, Based on Fractional … 265
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A.E., Purcell, T.J.: A
survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26, 80–113
(2007)
Parand, K., Aghaei, A.A., Jani, M., Ghodsi, A.: Parallel LS-SVM for the numerical simulation of
fractional Volterra’s population model. Alex. Eng. J. 60, 5637–5647 (2021)
Pienaar, J.A., Raghunathan, A., Chakradhar, S.: MDR: performance model driven runtime for
heterogeneous parallel platforms. In: Proceedings of the International Conference on Supercom-
puting, pp. 225–234 (2011)
PyCUDA 2021, documentation (2021). http://documen.tician.de/pycuda/
Rahmani, S., Ahmadzadeh, A., Hajihassani, O., Mirhosseini, S., Gorgin, S.: An efficient multi-core
and many-core implementation of k-means clustering. In: ACM-IEEE International Conference
on Formal Methods and Models for System Design (MEMOCODE), pp. 128–131 (2016)
Wang, C., Jia, Z., Chen, K.: Tuning performance on Kepler GPUs: an introduction to Kepler assem-
bler and its usage in CNN optimization. In: GPU Technology Conference Presentation (2015)
Welcome To Colaboratory (2021). https://colab.research.google.com
Xiao, B., Wang, H., Wu, J., Kwong, S., Kuo, C.C.J.: A multi-grained parallel solution for HEVC
encoding on heterogeneous platforms. IEEE Trans. Multimedia 21, 2997–3009 (2019)
Chapter 12
Classification Using Orthogonal Kernel
Functions: Tutorial on ORSVM Package
Abstract Classical and fractional orthogonal functions and their properties as the
kernel functions for the SVM algorithm are discussed throughout this book. In
Chaps. 3, 4, 5, and 6, the four classical families of the orthogonal polynomials (Cheby-
shev, Legendre, Gegenbauer, and Jacobi) were considered and the fractional forms of
these polynomials were presented as kernel function and their performance has been
shown. However, using these kernels needs much effort to implement. To make it
easy for anyone who needs to try and use these kernels, a Python package is provided
here. In this chapter, the ORSVM package is introduced as an SVM classification
package with orthogonal kernel functions.
12.1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 267
J. A. Rad et al. (eds.), Learning with Fractional Orthogonal Kernel Classifiers in Support
Vector Machines, Industrial and Applied Mathematics,
https://doi.org/10.1007/978-981-19-6553-1_12
268 A. H. Hadian Rasanan et al.
12.1.1 ORSVM
ORSVM is a free and open-source Python package that provides an SVM classifier
with some novel orthogonal kernel functions. This library provides a complete chain
of using the SVM classifier from normalization to calculation of the SVM equation
and the final evaluation. However, note that there are some necessary steps before
normalization that should be handled for every dataset, such as duplicate check-
ing, Null values or outlier handling, or even dimensionality reduction or whatever
enhancements that may apply to a dataset. These steps are out of the scope of the
SVM algorithm, thereupon, ORSVM package. Instead, the normalization step which
is a must before sending data points into orthogonal kernels is handled directly in
ORSVM, by calling the relevant function. As it has been discussed already, the frac-
tional form of all kernels is achievable during the normalization process too. ORSVM
package includes multiple classes and functions. All of them will be introduced in
this chapter.
ORSVM package is heavily dependent on Numpy and cvxopt packages. Arrays,
matrices, and linear algebraic functions are used repeatedly from numpy and the
12 Classification Using Orthogonal Kernel Functions: Tutorial on ORSVM Package 269
heart of SVM algorithm which is solving the convex equation of SVM and finding
the Support Vectors is achievable by means of a convex quadratic solver from cvxopt
library which is in turn a free Python package for convex optimization.1
ORSVM is structured on two modules. A module consists of Kernel classes and
the other one consists of relevant classes and functions supporting initialization,
normalization, fitting process, capturing the fitted model, and the classification report.
ORSVM is structured as follows:
• orsvm module
This is the main module consisting of the fitting procedure, prediction, and report.
It includes Model and SVM classes and the transformation (normalization) func-
tion. We opted for the transformation name instead of normalization because
transforming to fractional space is achievable through this function too.
– Model Class
Model class literally creates the model. Initialization starts under this class.
Calling the model.model_fit function initiates an object from the SVM class.
After calling the Transformation function, the train set gets ready to input into
the SVM object. Then, an object is ready to start the fitting process of the SVM
object from SVM class.
1. ModelFit function
Initiates an object from the SVM class, transforms/normalizes the train set,
and calls the fit function of the SVM object. Finally captures the fitted model
and parameters.
2. ModelPredict function
Transforms/Normalizes the train set. Calls the predict function of the SVM
object with proper parameters. And finally calls accuracy_score, confu-
sion_matrix, and classification_report of Scikit learn (sklearn.metrics) with
the previously captured result.
– SVM class
Here the svm equation is formed, and required matrices of cvxopt are created
under the fit function of the SVM class. Invokes cvxopt and solves the SVM
equation to determine the support vectors and calculate the hyper-plane’s equa-
tion. The prediction procedure is implemented under the prediction function of
the SVM class.
1. fit function creates proper matrices by directly calling kernels. Such matrices
are the Gram matrix and also other matrices of the SVM equation required by
cvxopt. As the result, cvxopt returns Lagrange Multipliers. Applying some
criteria of user interest, support vectors opt from them, and as a result, the
SVM.fit function returns the weights and bias of the hyper-plane’s equation
and also the array of support vectors.
1 A suitable guide on this package is available at http://cvxopt.org about installation and how to use
it.
270 A. H. Hadian Rasanan et al.
2. predict function maps data points from the test set with the decision bound-
ary( hyper-plane equation) and determines to which class each data points
belong.
– Transformation function
This is the function that normalizes the input dataset or, in the case of frac-
tional form, transforms the input dataset into fractional space, in other words,
normalizes the input in fractional space.
• Kernels module includes one class per kernel. So currently there exist four classes
for each orthogonal kernel. The following classes are available in the kernels
module:
– Chebyshev class contains the relevant functions to calculate the orthogonal
polynomials and fractional form of the Chebyshev family of the first kind.
– Legendre class holds the relevant functions to calculate the orthogonal polyno-
mials and fractional form of the Legendre family.
– Gegenbauer class consists of the relevant functions to calculate the orthogonal
polynomials and fractional form of the Gegenabuer family.
– Jacobi class includes the relevant functions to calculate the orthogonal polyno-
mials and fractional form of the Jacobi family.
or you can easily clone the package from our GitHub and use the setup.py file to
install the package;
python setup.py install.
This is the interface to use the ORSVM package. Using ORSVM starts with creating
an object of the Model class. This class includes a ModelFit function which itself
creates an ORSVM model as an instance of the SVM class. Then normalizes the
input data in the case of the normal form of the kernel or transforms the input data
in the case of fractional form. Choosing between normal or fractional is determined
12 Classification Using Orthogonal Kernel Functions: Tutorial on ORSVM Package 271
by the value of the argument T if T = 1 the kernel is in normal form, and in the case
of 0 < T < 1, then it is the fractional form. model_fit function receives the train and
test set separately; moreover, each of them should be divided into two matrices of x
and y. The matrix x is the whole dataset where the relevant column to output a.k.a
class or label is omitted and the matrix y is that specific column of class or label.
Division of datasets can be achieved through multiple methods. A widely used one
is StratifiedShuffleSplit from sklearn.model_selection.
Creating an object of the Model class requires some input parameters, cause it
needs to create an SVM object with these passed parameters. The following example
code creates an orsvm classifier from the Jacobi kernel function of order 3.
obj = orsvm.Model(kernel="jacobi", order = 3, T = 0.5,
k_param1 = --0.8, k_param2 = 0.2,
sv_determiner = ’a’, form = ’r’, C = 100)
Here T = 0.5 means that it is in the fractional form of order 0.5. “k_param1” is
equivalent to ψ, and “k_param2” is equivalent to ω in case “Jacobi” kernel is chosen.
If it was Gegenbauer, then only “k_param1” was applicable to the kernel. Because
Chebyshev and Legendre both do not have any hyper-parameter, there is no need to
pass any value for “k_param1” and “k_param2”, even if one passes values for these
parameters, those will be ignored. Parameter sv_determiner is the user’s choice on
how support vectors have to be selected among Lagrange Multipliers (data points
which were computed through convex optimization solver and opted as candidates to
draw the SVM’s hyper-plane). This is one of the important parameters that can affect
final classification performance metrics. Three options are considered for choosing
support vectors. Two options are widely used, first a number of type int, that represents
the number of support vectors that should be selected from Lagrange Multipliers,
if the number is greater than that number of Lagrange Multipliers then all of them
will be selected as support vectors. The second one is a number in scientific notation
(often a negative power of 10) as a minimum threshold, and the third method which
is considered in ORSVM is the flag “a” that represents Average. Most of the time,
we have no clear conjecture about the values of Lagrange Multipliers, how small
they are, or how many of them will be available. Therefore, we do not know what
the threshold should be, or how many support vectors should be chosen. Sometimes,
it leads to zero support vectors, then there will be an Error. Our solution to this
situation, which always happens for a new dataset, is the average method. In this
case, ORSVM finds the average of Lagrange Multipliers and sets the threshold.
Then, never an error has occurred. However, this method does not guarantee the best
generalization accuracy. But gives a factual estimation about support vectors. After
all, choosing the best number of support vectors itself is an important task to do
in SVM. The parameter form is only applicable to Chebyshev kernel, because two
implantations of Chebyhsev Kernel are available, an explicit equation and a recursive
one. So “r” refers to recursive and “e” refers to “explicit”. Parameter “noise” is only
applicable to the Jacobi kernel, and the Jacobi’s weight function indeed. In fact,
‘noise’ purpose is to avoid the errors that happen at the boundaries of the weight
function as it has already been explained. Finally, parameter “C” is the regularization
272 A. H. Hadian Rasanan et al.
parameter of the SVM algorithm, which controls to what degree the misclassified
data points are important. Setting “C” to the best value leads to a better generalization
of the classifier. Table 12.1 summarizes the parameters of the Model class.
Right at this point, ORSVM just has initiated the model and has not done any
computation yet; to fit the model, we have to call the ModelFit function. Clearly,
fitting requires an input dataset. As already discussed, ModelFit receives train and
test datasets which are divided into x (data without label) and y (label). The following
code snippet represents how one can divide the dataset. Function LoadDataSet reads
and loads dataset into a pandas DataFrame, then converts and maps the classes to
binary classification. As the “Clnum” is the label column, we have to select and
convert that column solely into one Numpy array. Then remove the label column
from Pandas data frame to reach the data without the label. It converts data into
numpy array; however, it is not necessary. Calling the ‘LoadDataSet’ function gives
the x and y. In the next line using StratifiedShuffleSplit, we can create an object of
stratified shuffle split with required parameters, and then using the split function of
the created object, we can get the train and test set divided into X and y.
Now that train and test set are ready, we can call ModelFit with proper parameters,
and as the result, the function prints status messages and returns weights and bias of
the SVM’s hyper-plane equation and also an array for support vectors and the kernel
instance. Therefore, we have the fitted model and corresponding parameters we can
use to predict. By calling the ModelPredict function, the final step of classification
with ORSVM will be achieved. ModelPredict requires text sets and also the bias and
the kernel instance to calculate the accuracy. It should be noted that the test set will
be transformed into proper space through the ModelPredict function. As the result,
12 Classification Using Orthogonal Kernel Functions: Tutorial on ORSVM Package 273
the accuracy score will be returned. Moreover, ModelPredict will print the confusion
matrix, classification report, and accuracy score.
import pandas
from from sklearn.model_selection import StratifiedShuffleSplit
def LoadDataSet():
# load dataset
df = pandas.read_csv(’/home/data/spiral.csv’,
names = [’Chem1’, ’Chem2’, ’Clnum’],
index_col = False)
return df_np,y_np
X,y = data_set()
sss = StratifiedShuffleSplit(n_splits = 5,
random_state = 30,
test_size = 0.9)
Accuracy_score = obj.ModelPredict(X_test,
y_test,
Bias,
K_Instance)
The SVM class is the heart of this package where the fitting process happens, the
SVM equation is evaluated to find the support vectors, and finally, the equation of
hyper-plane is constituted. Before getting delve into svm equation and finding the
support vectors, we have to do the kernel trick. So we need a “Gram matrix”, the
matrix of all possible inner products of x_train under the selected kernel. Therefore,
considering X _train m×n , the “SVM.fit” will create a square matrix of shape m × m:
274 A. H. Hadian Rasanan et al.
⎡ ⎤
k(x1 , x2 ) ··· k(x1 , xm )
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
K =⎢
⎢ ··· k(xi , x j ) ··· ⎥.
⎥
⎢ .. ⎥
⎣ . ⎦
k(xm , x1 ) ··· k(xm , xm )
This is also known as kernel matrix as it represents the kernel trick of SVM. “K” can
be any of Legendre, Gegenbauer, Chebyshev, or Jacobi.
Here, after the convex optimization problem, or better to say minimization of the
dual form of SVM equation, it will be solved using qp function from cvxopt library:
cvxopt.solvers.qp(P, q, G, h, A, b)
This function solves a quadratic program:
1
minimi ze x T P x + q T x, (12.1)
2
subject to Gx h,
Ax = b.
Another kernel introduced in this package is the Chebyshev kernel, which is imple-
mented in the vectorial approach which has already been introduced as a Generalized
Chebyshev Kernel. Similar to the Legendre-kernel, the Chebyshev kernel is available
when initiating the Model class object by passing the “Chebyshev” parameter as the
kernel name. Another parameter applicable to the Chebyshev kernel is form. Two
different types of Chebyshev kernels are implemented in the Chebyshev class. One
uses an explicit equation and another is a recursive function. Selecting a type to use
is possible through form parameter. The parameters of the Chebyshev class are as
follows:
• or der := Order of polynomial
• f or m := “e” for Explicit form and “r” for Recursive form.
Legendre kernel class is generally given as an object to the kernel parameter of Mod-
els class in the initialization step of an object and also its Legendre-kernel function
is explicitly used in constructing the gram matrix K . Legendre class benefits from
a recursive implementation of the Legendre-kernel function. To use this kernel, the
user only needs to initiate the Model object with “Legendre” as the kernel parameter
of the Model class. The Legendre class needs the following parameters:
The Gegenbauer kernel is also available in the ORSVM package. For the imple-
mentation of the Gegenbauer kernel function, the fractional kernel function which is
introduced in Chap. 5 is used. Therefore, in addition to the product of input vectors,
the values are obtained from the two other equations. The Gegenbauer class has the
following parameters:
• or der := Order of the Gegenbauer kernel function.
• lambda := The λ parameter of the Gegenbauer kernel function.
276 A. H. Hadian Rasanan et al.
The Jacobi kernel is the last orthogonal kernel currently available in the ORSVM
package. Jacobi kernel is available to choose from when the ORSVM package is
imported. Simply, the kernel name should be “Jacobi” during the initiation of an
object from the Model class. The Jacobi class needs the following parameters:
and the mapping equation already introduced for the fractional form of all kernels.
x = 2x α − 1,
so we have α
x − xmin
x =2 − 1,
xmax − xmin
which transforms the input x into the kernel space related to the fractional order
of function (α). Therefore, the transformation function requires the x (input data)
and alpha ( T as a parameter of the Model class, T represents the transformation
step.) which is 1 by default and causes to normalize the input data as the Min-max
feature scaling function does. The user never needs to call the transformation function
directly, but in the case one needs so
import orsvm
orsvm.transformation(x, T=1).
Using the ORSVM library is easy and straightforward. The simplicity of use has
been an important motivation in developing the ORSVM package. The user only
12 Classification Using Orthogonal Kernel Functions: Tutorial on ORSVM Package 277
should provide the dataset as matrices and select a kernel, and by setting the T , the
user can choose between the normal or fractional form of the kernel functions. Other
required parameters related to the chosen kernel should be provided in the object
initiation step. Those parameters are completely discussed already. In this section,
we only demonstrate a sample code of classification of a dataset using ORSVM.
As an example, the three monks problem dataset is considered which has been
introduced before in Chap. 3. Three Monks’ problem has the train set and test set
separated. In case one needs to do this separation, please refer to the code snippet
in Sect. 12.3. In order to use ORSVM, we first have to divide the train and test into
x_train and y_train. This can be done by importing the dataset into a pandas data
frame first, and then we will map the class values in monks dataset to −1 and 1 which
are suitable for the SVM algorithm instead of 0 and 1.
import numpy as np
import pandas as pd
import orsvm
# load train-set
df = pd.read_csv(’/home/datasets/1_monks.train’,
names = [’Class’, ’col1’, ’col2’, ’col3’,
’col4’, ’col5’, ’col6’],
index_col = False)
# load test-set
df = pd.read_csv(’/home/datasets/1_monks.test’,
names = [’Class’, ’col1’, ’col2’, ’col3’,
’col4’,’col5’,’col6’],
index_col = False)
df.loc[df.Class == 0, [’Class’]] = -1
y_test=df[’Class’].to_numpy()
df.drop(’Class’, axis = 1, inplace = True)
X_test = df.to_numpy()
Now that we have the train set and test set ready, we need an instance of Model
class, to call the ModelFit function using proper arguments. For example, here we
choose the Chebyshev kernel with T = 0.5 and we let the SVM’s regularization
parameter keep the default value, that is “None” and also the recursive implementa-
tion is preferred, in the second line by calling the ModelFit function, ORSVM fits the
model and returns the fitted parameters. We can capture these parameters for later
use, for example for prediction.
# Create an object from Model class of ORSVM
obj = orsvm.Model(kernel="Chebyshev",order=3,T=0.5,form=’r’)
These are only for logging purposes. Then in case one needs the prediction, one
may call the ModelPredict function that requires the test set divided into x and y,
and also the bias and the kernel instance from the previous step.
accuracy_score = obj.ModelPredict(X_test,
y_test,
Bias,
K_Instance)
ModelPredict returns the accuracy score and we can capture it. Moreover, this func-
tion prints much more information on classification. Here is the output (Fig. 12.1):
Using the log information may help for better debugging or gain a better under-
standing of how fitting gets done for a dataset by setting the log parameter of the fit
function as True.
After mentioning the basics of SVM in part one, introducing some fractional
orthogonal kernel functions in part two, and reviewing some applications of SVM
algorithms, the aim of this chapter was to present a python package that enables us to
apply the introduced fractional orthogonal kernel functions in real-world situations.
The architecture of this package and a brief tutorial usage of it are presented here.
But for more information and a detailed updated tutorial on this package, you can
visit the online page of this package which is available at http://orsvm.readthedocs.
io.
Appendix: Python Programming Prerequisite
A.1 Introduction
1 https://www.tiobe.com/tiobe-index/.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 279
J. A. Rad et al. (eds.), Learning with Fractional Orthogonal Kernel Classifiers in Support
Vector Machines, Industrial and Applied Mathematics,
https://doi.org/10.1007/978-981-19-6553-1
280 Appendix: Python Programming Prerequisite
There are many ways you can install Python and work with it. Here, some ways
that are more appropriate are discussed. The basic way of installing is to go to the
Python website2 and find the suitable version of Python for your work based on
your operation system. After downloading and installing Python, also we need a
code editor for writing out code such as Notepad++,3 Atom,4 Sublime,5 or any code
editor. These code editors are open source and we can easily download and use them
for free. There is also some commercial programs like PyCharm, Spyder, Pydev, etc.
The applicability of a program can be seen in debugging of big Python programs.
There is another way to install Python using Anaconda,6 a Python distribution that
its objective is scientific computing (data science, machine learning applications,
large-scale data processing, predictive analytics, etc.). By installing Anaconda, all
important and famous packages are effortlessly installed. Additionally, Anaconda
installs some programs such as Jupyter Notebook7 which is a well-known open-
source platform to develop code. One of the advantages of Jupyter Notebook over
others is that in this program we can run Python code cell by cell which can be
very helpful. For installing Anaconda in Windows or macOS, you can download
the graphical installer, but for Linux .sh file should be downloaded and run in the
terminal. This is the last step to install Python and its packages. Now the basics of
the language can be explained.
Python Basics
The basics of Python are very similar to other programming languages like C, C++,
C#, and Java, so if one is familiar with these programming languages they can easily
learn Python.
2 https://www.Python.org/downloads/.
3 https://notepad-plus-plus.org/.
4 https://atom.io/.
5 https://www.sublimetext.com/.
6 https://www.anaconda.com.
7 https://jupyter.org/.
Appendix: Python Programming Prerequisite 281
Basic Syntax
Python syntax can be executed by writing directly in the Command-Line. For exam-
ple, write the following command:
print("Hello, World!")
Moreover, we can create a Python file; then, write the code in it, save the file using
the .py extension, and run it in the Command-Line. For instance,
> Python test.py
Python uses white-space indentation for delimiting blocks instead of curly brackets
or keywords. The number of spaces is up to the programmer, but it has to be at least
one.
if 2 > 1:
print("Two is greater than one!")
if 2 > 1:
print("Two is greater than one!")
Comments
Comments start with a “#” for the purpose of in-code documentation, and Python
will render the rest of the line as a comment. Comments also can be placed at the
end of a line. In order to prevent Python from executing commands, we can make
them a comment.
# This is a comment.
print("Hello, World!")
print("Hello, World!") # This is a comment.
# print("Hello, World!")
Variable Types
A Python variable unlike other programming languages does not need an explicit
declaration to reserve memory space. The declaration happens automatically when
a variable is created by assigning a value to it. The equal sign (=) is used to assign
values to variables. The type of a variable can be changed by setting a new value to
it.
x = 5 # x is integer
y = "Adam" # y is string
y = 1 # y is integer
282 Appendix: Python Programming Prerequisite
Variables that are created outside of a function or class are known as global variables
and can be used everywhere, both inside and outside of functions.
x = "World!"
def myfunc():
print("Hello" + x)
myfunc()
If a variable was defined globally and is redefined in a function with the same name,
its local value can only be usable in that function and the global value remains as it
was
Appendix: Python Programming Prerequisite 283
x = "World!"
def myfunc():
x = "Python."
print("Hello " + x)
myfunc()
a = int(2) # a will be 2
b = int(2.8) # b will be 2
c = int("4") # c will be 4
284 Appendix: Python Programming Prerequisite
Strings
> Hello
> l
Appendix: Python Programming Prerequisite 285
> lo, W
> World
> 13
There are so many built-in methods that are applicable on strings, just to name a few:
• strip() removes any white-space from the beginning or the end:
a = "Hello, World!"
print(a.strip())
• The split() method splits the string into substrings if it finds instances of the sep-
arator:
a = "Hello, World!"
print(a.split("o"))
• \’ single quote;
• \\ backslash;
• \n new line;
• \t tab;
• \r carriage Return;
• \b backspace;
• \f form feed;
• \ ooo octal value;
• \ xhh Hex value.
Lists
Lists are one of all four collection data types. When we decide to save some data
in one variable, a list container can be used. In Python lists, elements are written by
square brackets and an item is accessible by referring to its index number.
mylist = ["first", "second"]
print(mylist)
print(mylist[1])
As mentioned in the Strings section, a string is a list and all the negative indexing
and range of indexes can be used for lists, and also remember that the first item in
Python language has an index 0. By considering one side of colon devoid, the range
will go on to the end of the beginning of the list.
mylist = ["first", "second", "Third", "fourth", "fifth"]
print(mylist[-1])
print(mylist[1:3])
print(mylist[2:])
> fifth
> [’second’, ’Third’]
> [’Third’, ’fourth’, ’fifth’]
To change an item value in a list, we can refer to its index number. Iterating through
the list items is possible by for loop. To determine that an item is present in a list we
can use in keyword and with len() function we can find how many items are in a list.
288 Appendix: Python Programming Prerequisite
> Third
> second
> No, first is not in mylist
> 2
There are various methods for lists and some of them are discussed in this appendix:
• append() adds an item to the end of the list.
• insert() inserts an item at a given position. The first argument is the index and the
second is value.
• remove() removes the first item from the list.
• pop() removes the item at the selected position in the list, and returns it. If no index
is specified pop() removes and returns the last item in the list.
• del removes the variable and makes it free for new value assignment.
• clear() removes all items from the list.
For making a copy of a list the syntax of list1 = list2 cannot be used. By this command,
any changes in one of the two lists are applied to the other one, because list2 is only
a reference to list1. Instead, copy() or list() methods can be utilized to make a copy
of a list.
mylist1 = ["first", "second"]
mylist2 = mylist1.copy()
mylist2 = list(mylist1)
For joining or concatenating two lists together, we can easily use “+” between two
lists or using extend() method.
mylist1 = ["first", "second"]
mylist2 = ["third", "fourth"]
mylist3 = mylist1 + mylist2
print(mylist3)
mylist1.extend(mylist2)
print(mylist1)
Dictionary
Dictionary is another useful data type collection whose main difference with other
collections is that the dictionary is unordered and unlike sequences, which are indexed
by a range of numbers, dictionaries are indexed by keys. Type of the keys can be
any immutable types as strings and numbers. In addition, one can create an empty
dictionary with a pair of braces. A dictionary with its keys and values is defined as
follows:
mydict = {
"Age": 23,
"Gender": "Male",
"Education": "Student"
}
print(mydict)
In order to access the items of a dictionary, we can refer to its key name inside square
brackets or call the get() method. Moreover, the corresponding value to a key can be
changed by assigning a new value to it. The keys() and values() methods can be used
for getting all keys and values in the dictionary.
print(mydict["Age"])
print(mydict.get("Age")))
mydict["Age"] = 24
print(mydict["Age"])
print(mydict.keys())
print(mydict.values())
> 23
> 23
> 24
> dict_keys([’Age’, ’Gender’, ’Education’])
> dict_values([23, ’Male’, ’Student’])
> Age
> Gender
> Education
> 23
> Male
> Student
> Age 23
> Gender Male
> Education Student
Checking the existence of a key in a dictionary can be done by the in keyword same
as lists. Moreover, to determine how many items are in a dictionary we can use the
len() method. Adding an item to a dictionary can be easily done by using the value
and its key.
if "Name" in mydict:
print("No, there is no Name in mydict.")
print(len(mydict))
mydict["weight"] = 72
print(mydict)
If ... Else
Logical conditions from mathematics can be used in “if statements” and loops. An “if
statement” is written by using the if keyword. The scope of “if statement” is defined
with white-space while other programming languages use curly brackets. Also, we
can use an if statement inside another if statement called nested if/else by observing
the indentation.
• Equals: a == b.
• Not Equals: a! = b.
• Less than: a < b.
• Less than or equal to: a <= b.
• Greater than: a > b.
• Greater than or equal to: a >= b.
x = 10
y = 15
if y > x:
print("y is greater than x")
Appendix: Python Programming Prerequisite 291
elif is a Python keyword that can be used for “if the previous conditions were not
true, then try this condition”. And the else keyword used for catches anything that
couldn’t be caught by previous conditions.
x = 10
y = 9
if x > y:
print("x is greater than y")
elif x == y:
print("x and y are equal")
else:
print("y is greater than x")
There are some logical keywords that can be used for combining conditional state-
ments like and, or.
a = 10
b = 5
c = 20
if a > b and c > a:
print("Both conditions are True")
if a > b or a > c:
print("At least one of the conditions is True")
While Loops
While loop is one of two types of iterations in Python that can execute specific
statements as long as a condition is true.
292 Appendix: Python Programming Prerequisite
i = 1
while i < 3:
print(i)
i += 1
> 1
> 2
Just like if statement, we can use else code block for when the while loop condition
is no longer true.
i = 1
while i < 3:
print(i)
i += 1
else:
print("i is grater or equal to 3.")
> 1
> 2
> i is grater or equal to 3.
For Loops
The for statement in Python differs from what can be used in other programming
languages like C and Java. Unlike other programming languages that the for statement
always iterates over an arithmetic progression of numbers or lets the user define both
the iteration step and halting condition, in Python, we can iterate over the items of
any sequence (like string).
nums = ["One", "Two", "Three"]
for n in nums:
print(n)
for x in "One.":
print(x)
> One
> Two
> Three
> O
> n
> e
> .
There are some statements and a function that can be very useful in for statement
which will be discussed in the next section.
Appendix: Python Programming Prerequisite 293
for x in range(3):
print(x)
> 0
> 1
> 2
> 2
> 3
> 0 One
> 1 Two
> 2 Three
> 2
> 5
> 8
Same as while loop, else statement can be used after for loop. In Python, break and
continue are used like other programming languages to terminate the loop or skip
the current iteration, respectively.
294 Appendix: Python Programming Prerequisite
for x in range(10):
print(x)
if x > 2:
break
> 0
> 1
> 2
for x in range(3):
if x == 1:
continue
print(x)
> 0
> 2
Try, Expect
Like any programming language, Python has its own error handling statements named
try and except to test a block of code for errors and handle the error, respectively.
try:
print(x)
except:
print("x is not define!")
A developer should handle the error completely in order that the user can see where
the error is accrued. For this purpose, we should use a raise Exception statement to
raise the error.
try:
print(x)
except:
raise Exception("x is not define!")
Functions
A function is a block of code, may take inputs, doing some specific statements or
computations which may produce output. The purpose of a function is reusing a
block of code to perform a single, related activity to enhance the modularity of the
code. A function is defined by def keyword and is called by its name.
def func():
print("Inside the function")
func()
information is passed into the function by adding arguments after the function name,
inside the parentheses. We can add as many arguments as is needed.
def func(arg):
print("Passed argument is", arg)
func("name")
func("name", "family")
An argument defined in a function should be passed to the function when the function
is called. Default parameter values are used to avoid an error in calling a function.
So even when the user does not pass value to the function, the code works properly
by default values.
def func(arg="Bob"):
print("My name is", arg)
func("john")
func()
For returning a value from a function we can use the return statement
296 Appendix: Python Programming Prerequisite
def func(x):
return 5 * x
print(my_function(3))
> 15
Libraries
A.3 Pandas
Pandas is an open-source Python library that delivers data structures and data analysis
tools for the Python programmer. Pandas is used in various sciences including finance,
statistics, analytic, data science, machine learning, etc. Pandas is installed easily by
using conda or pip:
> conda install pandas
or
> pip install pandas
For importing it, we usually use a shorter name as follows:
import pandas as pd
Appendix: Python Programming Prerequisite 297
The major two components of pandas are Series and DataFrame. A Series is essen-
tially a column of data, and a DataFrame is a multi-dimensional table. In other
words, it is a collection of Series. A pandas Series can be created using the following
constructor:
pandas.Series(data, index, dtype, copy)
where the parameters of the constructor are described as follows:
When we want to create a Series from a ndarray, the index should have the same
length with ndarray. However, by default index is equal to range(n) where n is array
length.
import pandas as pd
import numpy as np
mydata = np.array([’a’,’b’, np.nan, 0.2])
s = pd.Series(mydata)
print(s)
> 0 a
1 b
2 NaN
3 0.2
dtype: object
Data Take various forms like ndarray, series, map, lists, dict, constants, etc.
Index Labels of rows. The index default value is np.arange(n)
Columns Labels of column. Its default value is np.arange(n). This is only true if no index is
passed
Dtype Data type of each column
Copy Copy data
The DataFrame can be created using a single list or a list of lists or a dictionary.
298 Appendix: Python Programming Prerequisite
import pandas as pd
data = [[’Alex’,10],[’Bob’,12],[’Clarke’,13]]
df = pd.DataFrame(data,columns=[’Name’,’Age’])
df2 = pd.DataFrame({
’A’: 1.,
’B’: pd.Timestamp(’20130102’),
’C’: pd.Series(1, index=list(range(4)), dtype=’float32’),
’D’: np.array([3] * 4, dtype=’int32’),
’E’: pd.Categorical(["test", "train", "test", "train"]),
’F’: ’foo’})
print(df)
print(df2)
> A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo
Examples are taken from8 and.9 Here we mention some attributes that are useful for
a better understanding of the data frame:
df2.dtypes # show columns data type
df.head() # show head of data frame
df.tail(1) # show tail of data frame
df.index # show index
df.columns # show columns
df.describe() # shows a quick statistic summary of your data
> A float64
B datetime64[ns]
C float32
D int32
E category
F object
dtype: object
8 https://www.tutorialspoint.com/Python_pandas/Python_pandas_dataframe.html.
9 https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html.
Appendix: Python Programming Prerequisite 299
0 Alex 10
1 Bob 12
2 Clarke 13
> Age
count 3.000000
mean 11.666667
std 1.527525
min 10.000000
25\% 11.000000
50\% 12.000000
75\% 12.500000
max 13.000000
> 0 1 2
Name Alex Bob Clarke
Age 10 12 13
In Pandas, creating csv file from data frame or reading a csv file into a data frame is
done as follows:10
df.to_csv(’data.csv’)
pd.read_csv(’data.csv’)
A.4 Numpy
10 https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#min.
300 Appendix: Python Programming Prerequisite
import numpy as np
a = np.arange(15)
a = a.reshape(3, 5)
print(a)
a.shape
> (3, 5)
a.ndim
> 2
a.dtype.name
> ’int64’
a.size
> 15
Appendix: Python Programming Prerequisite 301
A.5 Matplotlib
Pyplot
In order to import this library, usually, it is common to use a shorter name as follows:
For plot x versus y, easily the plot() function can be utilized in the following way:
11 https://numpy.org/devdocs/user/quickstart.html.
302 Appendix: Python Programming Prerequisite
In order to distinguish each plot, there is an optional third argument that indicates
the color and the shape of the plot. The letters and symbols of the format string are
like MATLAB.
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], ’ro’)
There are some cases in which the format of the data lets accessing particular variables
with strings. With the Matplotlib library, you can skillfully plot them. There is a
sample demonstrating how it is carried out.
plt.figure(figsize=(9, 3))
plt.subplot(131)
plt.bar(names, values)
plt.subplot(132)
plt.scatter(names, values)
plt.subplot(133)
plt.plot(names, values)
plt.suptitle(’Categorical Plotting’)
plt.show()
Appendix: Python Programming Prerequisite 305