Advanced Mathematical Techniques in Engineering Sciences

Advanced Mathematical
Techniques in Engineering
Sciences
www.Technicalbookspdf.com
Science, Technology, and Management Series
Series Editor
J. Paulo Davim
Advanced Mathematical Techniques in Engineering Sciences

Mangey Ram and J. Paulo Davim
Optimizing Engineering Problems through Heuristic Techniques
Kaushik Kumar, Nadeem Faisal, and J. Paulo Davim
Advanced Mathematical
Techniques in Engineering
Sciences
Edited by
Mangey Ram and J. Paulo Davim
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2018 by Taylor & Francis Group, LLC

CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed on acid-free paper
International Standard Book Number-13: 978-1-138–55439-9 (Hardback)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the
validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the
copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to
publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let
us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including
photocopying, microfilming, and recording, or in any information storage or retrieval system, without written
permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users.
For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been
arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at

http://www.taylorandfrancis.com
and the CRC Press Web site at

http://www.crcpress.com
Contents
Preface............................................................................................................................................. vii
Acknowledgments..........................................................................................................................xi
Editors............................................................................................................................................ xiii
Contributors....................................................................................................................................xv
Chapter 1 Application of the Laplace transform in problems of studying

the dynamic properties of a material system and in engineering
technologies...................................................................................................... 1
Lubov Mironova and Leonid Kondratenko
Chapter 2 Fourier series and its applications in engineering...........................................35

Smita Sonker and Alka Munjal
Chapter 3 Soft computing techniques and applications....................................................57

Pankaj Kumar Srivastava, Dinesh Bisht, and Mangey Ram
Chapter 4 New approach for solving multi-objective transportation problem............71

Gurupada Maity and Sankar Kumar Roy
Chapter 5 An application of dual-response surface optimization methodology

to improve the yield of pulp cooking process..................................................91
Boby John and K.K. Chowdhury
Chapter 6 Time-dependent conflicting bifuzzy set and its applications in

reliability evaluation............................................................................................ 111
Shshank Chaube, S.B. Singh, Sangeeta Pant, and Anuj Kumar
Chapter 7 Recent progress on failure time data analysis of repairable system..........129

Yasuhiro Saito and Tadashi Dohi
Chapter 8 View-count based modeling for YouTube videos and weighted

criteria–based ranking......................................................................................... 149
N. Aggrawal, A. Arora, A. Anand, and M.S. Irshad
Chapter 9 Market segmentation-based modeling: An approach to understand

multiple modes in diffusion curves.................................................................. 165
A. Anand, R. Aggarwal, and O. Singh
vi Contents
Chapter 10 Kernel estimators for data analysis...................................................................177

Piotr Kulczycki
Chapter 11 A new technique for constructing exact tolerance limits on future

outcomes under parametric uncertainty..........................................................203
N.A. Nechval, K.N. Nechval, and G. Berzins
Chapter 12 Design of neural network–based PID controller for biped robot

while ascending and descending the staircase...............................................227
Ravi Kumar Mandava and Pandu R. Vundavilli
Chapter 13 Modeling fertility in Murrah bulls with intelligent algorithms................247

Adesh Kumar Sharma, Ravinder Malhotra, and Atish Kumar Chakravarty
Chapter 14 Computational study of the Coanda flow for V/STOL..................................265

Maharshi Subhash and Michele Trancossi
Chapter 15 Introduction to collocation method with application of B-spline

basis functions to solve differential equations..............................................285
Geeta Arora
Chapter 16 Rayleigh’s approximation method on reflection/refraction

phenomena of plane SH-wave in a corrugated anisotropic structure.......297
Neelima Bhengra
Index.............................................................................................................................................. 327
Preface
Mathematical techniques are the strength of engineering sciences and form the common
foundation of all novel disciplines as engineering sciences. The book Advanced Mathematical
Techniques in Engineering Sciences involved an ample range of mathematical tools and tech-
niques applied in various fields of engineering sciences. Through this book, engineers
have the opportunity to gain a greater knowledge and it may help them in the applications
of mathematics in engineering sciences.
Chapter 1 presents the rules and methods for applying the Laplace transform. Three
sections of the mathematical investigation of applied questions are distinguished: the rules
for performing operations in the Laplace transform; Laplace transform in research tasks
of the vibrations of a rod; and application of the Laplace transform in engineering technol-
ogy. Specific examples of solving differential equations are presented, applied to prob-
lems of mechanics and the theory of oscillations. The essence of Kondratenko’s method is
described, on the basis of which mathematical modeling of some technological operations
of mechanical engineering was carried out and features of dynamic phenomena during
the work of equipment and interaction of the tool with detail were revealed.
Chapter 2 investigates the history, nature, and importance of the Fourier series. This
chapter describes the periodic function, orthogonal function, Fourier series, Fourier
approximation, Dirichlet’s theorem, Riemann–Lebesgue lemma for Fourier series, dif-
ferentiation of Fourier series, convergence of the Fourier series of the functions, Fourier
transform, Fourier analysis with Fourier transform, and Gibbs phenomenon. Also, some
of the summability methods (Cesàro, Nörlund, Riesz, weighted mean, etc.), absolute sum-
mability methods, strong summability methods, necessary and sufficient conditions for
regularity of matrix of summability methods, uses of summability, norm, modulus of con-
tinuity, Lipschitz condition, various Lipschitz classes in trigonometric Fourier approxima-
tion, and importance of degree of approximation have been explained. The applications of
summability methods in approximation of the signals, the behavior of the Fourier series
of a piecewise smooth function (Gibbs phenomenon), Fourier series of signals of bounded
bandwidth, filtering by Fourier transforms, and applications of summability technique
and Fourier series have been described.
Chapter 3 describes the basics of soft computing and their applications. The key goal
of soft computing is to develop intelligent machines to provide solutions to real-world
problems, which are difficult to model mathematically.
Chapter 4 describes the study on solving transportation problems under a multi-objec-
tive environment. The main focus of this chapter is to introduce a new approach for solv-
ing multi-objective transportation problems in addition to the existing approaches such
as goal programming, fuzzy programming, and revised multi-choice goal programming.
In the proposed approach a procedure to obtain a Pareto-optimal solution of a multi-
objective transportation problem using the Vogel approximation method is incorporated.
vii
viii Preface
The merits and demerits of the approaches goal programming, fuzzy programming, and
revised multi-choice goal programming compared to our new approach to solving a multi-
objective transportation problem are presented.
Chapter 5 provides the study of simultaneous optimization of yield and viscosity of
the pulp cooking process using the dual-response surface methodology. The pulp cook-
ing process is an important step in the manufacturing of rayon grade pulp. The pulp is
the cellulose component of the wood. The cellulose is separated from other components
and impurities of wood by cooking the wood chips in a highly pressurized chamber fol-
lowed by multiple stages of washing and chemical treatments. The study is undertaken
to increase the pulp yield as far as possible without increasing the viscosity beyond the
specified upper limit.
Chapter 6 gives the concept of a time-dependent conflicting bifuzzy set (CBFS), and a
new procedure to construct the membership and nonmembership functions of the fuzzy
reliability function is proposed with the help of time-dependent CBFS. The concept of
triangular CBFS has been developed, and triangular CBFS is used to represent the failure
rate function of the system.
Chapter 7 focuses on the failure time data analysis based on the nonhomogeneous
Poisson process (NHPP) and discusses several statistical estimation methods for a periodic
replacement problem with minimal repair as the simplest application of life data analysis
with NHPP. Not only the parametric maximum likelihood estimation of the power law
process is applied, but also constrained nonparametric maximum likelihood estimation
(CNPMLE) and kernel-based estimation methods for estimating the cost-optimal periodic
replacement problem with the minimal repair, where single or multiple minimal repair
data are assumed.
Chapter 8 extends the available literature and discusses the important attribute “view-
count” of content dynamically. With the Internet emerging as a rapidly growing new mar-
ket, the netizens are also growing at a fast pace. Making use of this ideology, a modeling
framework whose utility has been highlighted through three models is proposed that
describes the growing Internet market size and repeat viewers. The models have been
validated on a YouTube entertainment video data set.
Chapter 9 analyzes dual-market modeling. Dual-market modeling is an increasingly
important concept in marketing, and in order to inculcate heterogeneity and the require-
ment of some specific characteristics for some technologies to be adopted differently in
different geographical locations, the authors intend to study this behavior through a math-
ematical framework that exhibits the dual-market phenomenon.
Chapter 10 presents a uniform methodology for three fundamental problems in data
analysis: identification/detection of atypical elements (outliers), clustering, and classifica-
tion. Such a unification facilitates understanding of the material and adapting it to the
individual needs and preferences of particular users. The investigated material is ready
to use, is practically parameter free, and does not require laborious exploration from the
researcher. This has been illustrated with a number of applications in the fields of engi-
neering, management, medicine and biology, as well as supplemented by a thematic bibli-
ography extending the issues presented.
Chapter 11 analyzes the statistical tolerance limits used today in both production and
research. It is often desirable to have statistical tolerance limits available for the distribu-
tions used to describe time-to-failure data in reliability problems. For example, one might
wish to know if at least a certain proportion of a manufactured product will operate for
at least, say, the warranty period. This question cannot usually be answered exactly, but
it may be possible to determine a lower tolerance limit, based on a preliminary random
Preface ix
sample, such that one can say with a certain confidence that at least a specified proportion
or more of the product will operate longer than the lower tolerance limit. Then reliability
statements can be made based on the lower tolerance limit, or, decisions can be reached by
comparing the tolerance limit to the warranty period.
Chapter 11 analyzes the lower tolerance limit, based on a preliminary random sam-
ple, such that one can say with a certain confidence that at least a specified proportion or
more of the product will operate longer than the lower tolerance limit, which presents a
new technique for constructing exact lower and upper tolerance limits on outcomes (for
example, on order statistics) in future samples. The technique used here emphasizes piv-
otal quantities relevant for obtaining tolerance factors and is applicable whenever the sta-
tistical problem is invariant under a group of transformations that acts transitively on the
parameter space. The proposed technique is based on a probability transformation and
pivotal quantity averaging. It is conceptually simple and easy to use. The discussion is
restricted to one-sided tolerance limits.
Chapter 12 deals with the design of torque-based PID controller and tuning its gains
with the help of two algorithms, namely, modified chaotic invasive weed optimization
(MCIWO) and modified chaotic invasive weed optimization-neural network (MCIWO-NN)
algorithms for the biped robot while walking on a staircase. An analytical method has
been developed to generate the gaits and design the torque-based PID controller. The
dynamics of the biped robot utilized in the said controller have been derived after utiliz-
ing the Lagrange–Euler formulation. Alongside, the authors utilized the MCIWO algo-
rithm to optimize the gains of the PID controller. Further, in the MCIWO-NN algorithm,
the MCIWO algorithm is used to evolve the architecture of the NN, which helped in pre-
dicting the gains of the PID controller in an adaptive manner. The developed algorithms
are tested in computer simulations and on a real biped robot.
In Chapter 13, intelligent predictive models for modeling fertility of Murrah bulls
using the various emerging machine learning (ML) algorithms, namely, neural networks
(NNs), support vector regression (SVR), decision trees (DTs), and random forests (RFs), and
a conventional linear model (LM) for regression have been described. These intelligent
ML models would provide decision support to organized dairy farms for selecting good
bulls. Hence, the ML models can be employed as a plausible alternative to linear regres-
sion models to assess more accurately the conception rate in Murrah breeding bulls at the
organized farms.
In Chapter 14, the computational study has been performed on the two-jet vectoring
through the Coanda surface, realizing the future concept of the vertical and short takeoff
and landing (V/STOL) of an aircraft for civil aviation purposes. The set of computations
has been performed from the incompressible flow regime to the incipient of the compress-
ible flow regime. This computational study aims to identify the design parameters through
the flow characteristics using the computational fluid dynamics technique.
Chapter 15 deals with the discussion on the collocation method which is a very well-
known numerical technique. Along with the description of the methodology adopted,
details are given for the used B-spline basis function in the collocation method. The prop-
erties of B-spline basis functions are discussed in this chapter with a description of the
types and degrees of B-spline basis functions.
In Chapter 16, utilizing Rayleigh’s approximation method, an attempt has been
made to study the reflection and refraction patterns in a corrugated interface sand-
wiched between an initially stressed fluid-saturated poroelastic half-space and a highly
anisotropic half-space. The highly anisotropic half-space is considered as triclinic.
Various two-dimensional plots have been drawn to show the effects of some affecting
x Preface
parameters such as initial stress parameter, corrugation amplitude, wavelength, and

frequency factor.
This book can be used as a support book for the final undergraduate engineering
course (for example, mechanical, mechatronics, industrial, computer science, information
technology, mathematics, etc.). Also, this book can serve as a valuable reference for aca-
demics, mechanical, mechatronics, computer science, information technology and indus-
trial engineers, environmental sciences as well as researchers in related subjects.
Mangey Ram
Dehradun, India
J. Paulo Davim
Aveiro, Portugal
Acknowledgments
The editors acknowledge CRC Press for this opportunity and professional support. Also,
we would like to thank all the chapter authors and reviewers for their availability for this
work.
xi
Editors
Mangey Ram received the PhD in mathematics and minor in computer science from
G. B. Pant University of Agriculture and Technology, Pantnagar, India, in 2008. He has
been a faculty member for around 10 years and has taught several core courses in pure
and applied mathematics at undergraduate, postgraduate, and doctorate levels. He is cur-
rently a professor at Graphic Era (Deemed to be University), Dehradun, India. Before
joining Graphic Era (Deemed to be University), he was a deputy manager (probation-
ary officer) with Syndicate Bank for a short period. He is editor-in-chief of International
Journal of Mathematical, Engineering and Management Sciences; and the guest editor and
member of the editorial boards of many journals. He is a regular reviewer for interna-
tional journals, including those published by the Institute of Electrical and Electronics
Engineers, Elsevier, Springer, Emerald, John Wiley, Taylor & Francis Group, and many
other publishers. He has published 125 research publications (published by Institute of
Electrical and Electronics Engineers, Springer, Emerald, World Scientific, among oth-
ers) and in many other national and international journals of repute and also presented
his works at national and international conferences. His fields of research are reliability
theory and applied mathematics. Ram is a senior member of the Institute of Electrical
and Electronics Engineers, member of the Operational Research Society of India, the
Society for Reliability Engineering, Quality and Operations Management in India, the
International Association of Engineers in Hong Kong, and the Emerald Literati Network
in the United Kingdom. He has been a member of the organizing committees of a num-
ber of international and national conferences, seminars, and workshops. He has been
conferred with the Young Scientist Award by the Uttarakhand State Council for Science
and Technology, Dehradun, in 2009. He has been awarded the Best Faculty Award in 2011
and recently the Research Excellence Award in 2015 for his significant contributions in
academics and research at Graphic Era (Deemed to be University).
J. Paulo Davim received his PhD in mechanical engineering in 1997, and MSc in mechani-
cal engineering (materials and manufacturing processes) in 1991, the Dipl-Ing engineer’s
degree (5 years) in mechanical engineering in 1986, from the University of Porto (FEUP),
the Aggregate title (Full Habilitation) from the University of Coimbra in 2005, and the
DSc from London Metropolitan University in 2013. He is Eur Ing by FEANI-Brussels and
senior chartered engineer by the Portuguese Institution of Engineers with a MBA and
Specialist title in engineering and industrial management. Currently, he is a professor at
the Department of Mechanical Engineering of the University of Aveiro, Portugal. He has
more than 30 years of teaching and research experience in manufacturing, materials and
mechanical engineering with special emphasis in machining and tribology. He also has
interest in management and industrial engineering and higher education for sustain-
ability and engineering education. He has guided large numbers of postdoc, PhD, and
xiii
xiv Editors
master’s degree students. He has received several scientific awards. He has worked as
the evaluator of projects for international research agencies as well as examiner of PhD
theses for many universities. He is the editor-in-chief of several international journals,
guest editor of journals, books editor, book series editor, and scientific advisor for many
international journals and conferences. Presently, he is an editorial board member of 25
international journals and acts as a reviewer for more than 80 prestigious Web of Science
journals. In addition, he has published as editor (and co-editor) more than 100 books
and as author (and co-author) more than 10 books, 70 book chapters, and 400 articles in
journals and conferences (more than 200 articles in journals indexed in Web of Science
core collection/h-index 41+/5000+ citations and SCOPUS/h-index 51+/7500+ citations).
Contributors
N. Aggrawal Dinesh Bisht
Department of Computer Science & Department of Mathematics
Information Technology Jaypee Institute of Information Technology
Jaypee Institute of Information Technology Noida, Uttar Pradesh, India
Noida, Uttar Pradesh, India
Atish Kumar Chakravarty
R. Aggarwal Computer Centre & Dairy Economics,
Department of Operational Research Statistics & Management Division
University of Delhi ICAR-National Dairy Research Institute
Delhi, India Karnal, Haryana, India
A. Anand Shshank Chaube

Department of Operational Research Department of Mathematics
University of Delhi University of Petroleum & Energy Studies
Delhi, India Dehradun, Uttarakhand, India
A. Arora K.K. Chowdhury

Department of Computer Science & SQC & OR Unit
Information Technology Indian Statistical Institute
Jaypee Institute of Information Technology Bangalore, Karnataka, India
Tadashi Dohi
Geeta Arora Department of Information Engineering
Department of Mathematics Graduate School of Engineering
Lovely Professional University Hiroshima University
Phagwara, Punjab, India Higashihiroshima, Japan
G. Berzins M.S. Irshad

BVEF Research Institute Department of Operational Research
University of Latvia University of Delhi
Riga, Latvia Delhi, India
Neelima Bhengra Boby John

Department of Applied Mathematics SQC & OR Unit
Indian Institute of Technology (ISM) Indian Statistical Institute
Dhanbad, Jharkhand, India Bangalore, Karnataka, India
xv
xvi Contributors
Leonid Kondratenko A. Munjal

Department of Machine Science and Department of Mathematics
Machine Components National Institute of Technology
Moscow Aviation Institute (State National Kurukshetra
Research University) Kurukshetra, Haryana, India
Moscow, Russia
N.A. Nechval
Piotr Kulczycki BVEF Research Institute
Systems Research Institute University of Latvia
Polish Academy of Sciences Riga, Latvia
Warsaw, Poland
and K.N. Nechval
Division for Information Technology and Aviation Department
Systems Research Transport and Telecommunication
AGH University of Science and Technology Institute
Cracow, Poland Riga, Latvia
Sangeeta Pant
Anuj Kumar
Department of Mathematics
University of Petroleum & Energy
University of Petroleum & Energy
Studies
Studies
Dehradun, Uttarakhand, India
Dehradun, Uttarakhand, India
Mangey Ram
Gurupada Maity Department of Mathematics, Computer
Department of Applied Mathematics Science & Engineering
with Oceanology and Computer Graphic Era (Deemed to be University)
Programming Dehradun, Uttarakhand, India
Vidyasagar University
Midnapore, West Bengal, India Sankar Kumar Roy
Department of Applied Mathematics
Ravinder Malhotra with Oceanology and Computer
Computer Centre & Dairy Economics, Programming
Statistics & Management Division Vidyasagar University
ICAR-National Dairy Research Midnapore, West Bengal, India
Institute
Karnal, Haryana, India Yasuhiro Saito
Department of Maritime Safety
Ravi Kumar Mandava Technology
School of Mechanical Sciences Japan Coast Guard Academy
IIT Bhubaneswar Kure, Japan
Bhubaneswar, Odisha, India
Adesh Kumar Sharma
Lubov Mironova Computer Centre & Dairy Economics,
Institute of Applied Technology Statistics & Management Division
Russian University of Transport (MIIT) ICAR-National Dairy Research Institute
Moscow, Russia Karnal, Haryana, India
Contributors xvii
S. Sonker Maharshi Subhash

Department of Mathematics Department of Science and Methods in
National Institute of Technology Engineering
Kurukshetra University of Modena and Reggio Emilia
Kurukshetra, Haryana, India Reggio Emilia, Italy
O. Singh Michele Trancossi

Department of Operational Research Department of Science and Methods in
University of Delhi Engineering
Delhi, India University of Modena and Reggio Emilia
Reggio Emilia, Italy
S.B. Singh
Department of Mathematics, Statistics & Pandu R. Vundavilli
Computer Science School of Mechanical Sciences
G. B. Pant University of Agriculture & IIT Bhubaneswar
Technology Bhubaneswar, Odisha, India
Pantnagar, Uttarakhand, India
Pankaj Kumar Srivastava

Jaypee Institute of Information Technology
chapter one
Application of the Laplace transform

in problems of studying the dynamic
properties of a material system and
in engineering technologies
Lubov Mironova
Russian University of Transport (MIIT)
Leonid Kondratenko
Moscow Aviation Institute (State National Research University)
Contents
1.1 esignation..............................................................................................................................2
D
1.2 Laplace transform and operations mapping......................................................................2
1.3 Linear substitutions................................................................................................................ 7
1.4 Differentiation and integration.............................................................................................9
1.5 Multiplication and curtailing.............................................................................................. 11
1.6 The image of a unit function and some other simple functions.................................... 13
1.7 Examples of solving some problems of mechanics......................................................... 18
1.8 Laplace transform in problems of studying oscillation of rods..................................... 23
1.9 Relationship between the velocities of the particles of an elementary volume
of a cylindrical rod with stresses........................................................................................ 24
1.10 An inertial disk rotating at the end of the rod................................................................. 25
1.11 Equations of torsional oscillations of a disk..................................................................... 26
1.12 Equations of longitudinal oscillations of a disk............................................................... 27
1.13 Application of the Laplace transform in engineering technology................................30
1.13.1 Method of studying oscillations of the velocities of motion and
stresses in mechanisms containing rod systems�� 30
1.13.2 Features of functioning of a drive with a long force line.................................... 31
1.13.3 Investigation of dynamic features of the system in the technologies
of deephole machining��32
References........................................................................................................................................ 33
This chapter is written by engineers for engineers. The authors try to convey to the reader
the simplicity and accessibility of the methods in a concise form with the illustration of
the calculation schemes. For a more extensive study of the stated problems of mathemati-
cal modeling, at the end of the chapter are given the literature sources, from which the
2 Advanced Mathematical Techniques in Engineering Sciences
reader can obtain the necessary additional explanations. The list of authors includes well-
known scientists in the field of mathematics and mechanics – G. Doetsch, A.I. Lur’e, L.I.
Sedov, V.A. Ivanov, and B.K. Chemodanov. In compiling the theoretical material, we refer
to the authors mentioned. This chapter reflects the experience of lecturing on mathemati-
cal methods of modeling, as well as the personal participation of the authors in the work
in this technical field.
The material presented can be of interest to students, graduate students and other
specialists.
1.1 Designation
j — the imaginary unit; e—the base of natural logarithms;
α = σ + jω —the complex number;
Re—real part, Im—the imaginary part of the complex number;
s—a complex variable; s = x + jy, x = Re s, y = Im s;
L—the transformation (Laplace transform);
F(s)—function of complex variable s (Laplace representation);
f(t)—function of the real variable t (the original);
L[f(t)]—direct Laplace transform;
L −1[F(s)]—inverse Laplace transform; and
→—the sign of the correspondence of the transformation:
for the direct transformation −f(t) → F(s); for the inverse transformation −F(s) ← f(t).
In many formulas, fractional numbers are not represented by a standard record but by
a slash or by multiplication of the factor in the degree (n), irrational numbers are expressed
as a number in fractional power. For a correct understanding of these symbols, examples
are given:
a b a a 1
, a + b/c = a + , a/ ( b + c ) = , a ( bc + d ) =
−1
a/bc = , a1/2 = a , b −1/3 = 3 .
bc c b+c bc + d b

1.2 Laplace transform and operations mapping
The Laplace transform is a powerful mathematical method for solving differential, differ-
ence, and integral equations. By means of these equations, one can describe any physical
(technological) process and conduct mathematical modeling of the behavior of the object
and of the reaction of the environment under the influence of force or other factors, inves-
tigate the dynamic properties of the element of construction, and much more.
In many engineering problems, it is important to investigate a function f(t), where real
variable t is time. Such problems in mechanics relate to dynamic problems.
The simplest and most economical solution of such problems is possible with the help
of methods of the theory of operational calculus [1].
An important role in applied mathematical analysis is played by the Laplace integral
I=
∫ f (t ) e
0
− st
dt. (1.1)
Chapter one: Application of the Laplace transform in problems 3
Here, s is a complex variable; s = x + jy; t > 0.

To calculate the Laplace integral, the definition and behavior of the function f(t) for a
negative value of the argument t < 0 are immaterial. Therefore, we consider a subclass of
piecewise continuous functions f(t) of the real variable t defined for t > 0 and assumed to
be zero for t < 0 (Figure 1.1).
This class of functions is characterized by a certain order of growth, such that for any
t > 0 the module f(t) grows more slowly than some exponential function [1].
Expression (1.1) in the operational calculus and its applications is applied in the fol-
lowing form:
∞
F( s) =
∫ f (t)e
0
− st
dt . (1.2)
Here, the function F(s) is a function of the complex variable s = x + jy.

In the new expression (1.2), the function of the complex variable s is put in correspon-
dence with the function of the real variable t. Such a correspondence is called the Laplace
transform. Symbolically, the Laplace transform is written in form
∞
L  f (t)  = F( s) =
∫ f (t)e
0
− st
dt . (1.3)
The record L[f(t)] means L-transformation. The Laplace transform connects the single-
valued function F(s) of the complex variable s (image) with the corresponding function f(t)
of the real variable t (the original). A brief description of the essence of the Laplace trans-
form and the correspondence table of operations can be found in Ref. [2].
As can be seen from (1.2), this transformation consists of multiplying the function f(t)
by the exponential function e−s and integrating the product of these functions with respect
to the argument t in the range from 0 to ∞.
From the image of F(s), if it exists, one can always find the original f(t). Such a transi-
tion is called the inverse Laplace transform, symbolically denoted by L −1, and corresponds to
x + j∞
1
f (t) = L [ F( s)] =
−1
2π ∫ F(s)e st
ds, t > 0. (1.4)

x − j∞
f (t1 – 0) f (t1 + 0)
f (0)
t
t1
Figure 1.1 A piecewise-continuous function: t1 is a point of discontinuity of the first kind;

f ( +0 ) = limt→+0 f (t); f (−0) = 0; f (t1 + 0) − f (t1 − 0) – function jump f (t).
The right-hand side of Equation (1.4) is called the inverse of the Laplace integral and is a com-
plex integral.
If relation (1.4) is rewritten in the form
+∞

∫e
−∞
jyt
( x + jy ) dy = 2π e − xt f (t ) for t > 0;
+∞

∫e
−∞
jyt
( x + jy ) dy = 0 for t < 0, (1.5)
then the formulas (1.2) and (1.5) have a physical meaning. For a constant value of x in the
complex variable s = x + jy, the function F(x + jy) is the spectral density of the damped time
function e−stf(t) for which the variable y is the circular frequency. Such a change in the vari-
able s in the complex plane corresponds to the displacement of the point along the vertical
line with the abscissa x (Figure 1.2).
From the mathematical point of view, multiplying the function f(t) by e−st makes the
improper integral of the right-hand side of expression (1.2) a convergent in the half-plane
Re s > x0 (Figure 1.3).
The function f(t) can be an original only if the following conditions are satisfied:
jy
Re s > x0
х0
х
O
Figure 1.2 Changing the complex variable.
jy
s
y
х
O
х
Figure 1.3 The domain of convergence of the Laplace integral.
1. The function f(t) is continuous for all values t ≥ 0. Continuity can be violated only at
points of discontinuity of the first kind. The number of these points must be finite in
any interval of limited length (Figure 1.1).
2. The function f(t) = 0 for the values t < 0.
3. The function f(t) has a limited order of increasing (i.e., one can find constant numbers).
4. M > 0 and x0 ≥ 0 such that
f ( t ) < Me x0t , t > 0.
Here, x0 is the exponent of the growth of the function f(t).

The set of all f(t) is called the space of originals, and the set of all F(s) is the image
space.
An important property of the Laplace transform is the following: the images obtained
as a result of the L-transformation are analytic functions. Such functions (as complex func-
tions) can be differentiated as many times as desired.
The true meaning of the Laplace transform is that this transformation has the char-
acter of a single-valued mapping of the function from the space of originals in the image
space, in which the operations performed on the image function are much simpler and
more obvious. The transition from the image to the original makes it possible to obtain the
desired solutions in a simpler way.
By virtue of the brevity of the following presentation, we give without proofs theo-
rems characterizing the important properties of the Laplace transform. A complete exposi-
tion and proofs of the theorems are given in Refs. [1,3,4].
Theorem 1.1: If the function f(t) is an original, then this function is Laplace trans-
formed and the image of the given function F(s) is defined in the half-plane Re s > x0,
where x0 is the growth index of the function f(t).
Under the condition Re s > x0, the integral (1.2) is an absolutely convergent inte-
gral. The number x0 is called the abscissa of absolute convergence of the integral (1.2).
Theorem 1.2: The image F(s) of the original f(t) in the half-plane, for which Re s > x0,
where x0 is the growth index of the original, is an analytic function.
The continuity of the function F(s) follows from the proof of Theorem 1.1.
This important property makes it possible to use powerful methods of the theory
of functions of a complex variable in calculations, because in the practical applica-
tion of Laplace, calculations are performed not over given functions, but over their
images.
The analytic expression of the original through the image is formulated by
Theorem 1.3.
Theorem 1.3: The original f(t) at points of continuity is defined by
x + j∞
1
f (t) =
2π ∫ F(s)e st
ds, (1.6)

x − j∞
where F(s) is the Laplace representation of the original f(t), and the integral on the
right-hand side of this equation is understood in the sense of the principal value
jy
ds
Re s > x0
х0
х
O
Figure 1.4 The domain of inversion of the Laplace integral: ds = jdy.
x + j∞ x + jy

∫
x − j∞
F( s)e st ds = lim y→∞
∫ F(s)e
x − jy
st
ds,
and this is taken along a straight line parallel to the imaginary axis and located in the
half-plane Re s > x0 (Figure 1.4).
Formula (1.6) is called the Laplace inversion formula and establishes a connection
between the image of F(s) and the single-valued corresponding original f(t). The pro-
cess of obtaining the original from a given image is written by the expression (1.4).
The formula (1.6) defines the original only at the points of its continuity. However,
for the piecewise-continuous functions f(t), illustrated in Figure 1.1, the limit of the
right-hand side of (1.6) at the points of discontinuity of the first kind exists and is
defined by
x + j∞
1 1
lim y→∞
2π j ∫ F(s)e st
ds =
2
 f ( t + 0 ) + f ( t − 0 )  .

x − j∞
From this follows another important property—the single value of the Laplace transform.
The original always corresponds to a single image, since the values of the origi-
nal at points of discontinuity do not change the view of the image. At the same time,
the same image can be associated with a set of originals, the values of which differ
from each other only at points of discontinuity [4].
Corollary of Theorem 1.3: If the original is a differentiable function everywhere in
the interval 0 < t < ∞, then the original with respect to the given image is uniquely
determined.
It should be noted that not all analytical functions can be images. In particular,
periodic functions of the form eαs, cos s, sin s, are not images, and not all functions
2
can be originals (1/t , tg ω t , e t ). The proof of the theorem, which determines sufficient
conditions when the function F(s) is an image, is find in Refs. [4,5].
The Laplace transform is the result of the extension of the Fourier transform to
functions that satisfy the Dirichlet conditions in the interval 0 < t < ∞ but do not sat-
isfy the condition of absolute integrability in this interval. The connection between
the Fourier transform and the Laplace transform is clearly presented in Ref. [3]. The
Fourier and Laplace transforms are widely used in the theory of automatic regulation.
The next important property of the Laplace transform is the linearity of the trans-
formation, which is formulated by a theorem that establishes the “original-image”
correspondence.
Theorem 1.4: If the functions f1(t), f2(t), …, fn(t) are originals, and the images of these
functions are, respectively, F1(s), F2(s), …, Fn(s), and if λ1, λ2, …, λn are quantities that do
not depend on t and s, then the following equalities hold:
 n  n
L
 k = 1
∑λ k f k (t)  =

∑ λ F (s); (1.7)
k =1
k k
 n  n
L  −1
 k = 1
∑
λ k Fk ( s)  =

∑ λ f (t). (1.8)
k =1
k k
The linearity property allows, in the practical application of the Laplace trans-
form, performance of calculations not on the given functions, but on their images,
applying the table of correspondences between the originals and the images. In this
case, you need to know not only the images of individual functions, but also the
rules for displaying operations performed on such functions. Therefore, following
we formulate other properties of transformation (differentiability, integrability, etc.)
in the form of rules, and we call such rules later when solving some mathematical
problems.
1.3 Linear substitutions
We give the rules for linear transformation of an argument in the original or image. For
simplicity of clarity, instead of the symbolic designation of the transformation, we intro-
duce the arrows indicating the direct and inverse Laplace transformations.
Rule I. Theorem 1.5: Similarity theorem. Multiplying the argument of the original
(image) by a certain positive number results in the division of the image (the original)
and argument of the image (the original) into the same positive number
1  s 1  t
f ( at ) → F   , F ( as ) ← f   a > 0. (1.9)
a  a a  a

This operation characterizes the change in the scale of the independent variable.
Rule II. Theorem 1.6: First displacement theorem (the lag theorem). If the function f(t) is
an original and F(s) is its image, then the image of the displaced original f(t − a), where
a is a real number, is determined by the expression
f (t − a) → e − as F( s); a > 0. (1.10)
For the inverse transformation is valid
e − as F( s) = 0, for t < a ;
e − as F( s) ← f (t), for t > a. (1.11)
Since t < a, the argument t − a is negative, then the function f(t − a) is equal to zero.
The graph of this function is obtained from the graph of the function f(t) by shifting
its graph to the right by a distance a (Figure 1.5a, b).
The displacement theorem has a wide application in the theory of automatic reg-
ulation, as well as in the study of processes described by piecewise-continuous and
periodic functions.
Rule III. Theorem 1.7: Second bias theorem (the bias theorem). If the function f(t) is the
original and F(s) is the image, then the image of the displaced original f(t + a), where
a is a real number, is determined by the expression
 a

f (t + a) → e  F( s) −

as
∫
0
e − st f (t)dt  ; a > 0. (1.12)

The essence of this theorem is that the image of F(s) cannot be linearly transformed
into the original of the function f(t + a), since the right-hand side of Equation (1.12)
has a finite Laplace integral. Its calculation is carried out in the interval of variation
of the real variable 0 ≤ t < a.
This rule is the opposite of the second rule. The graph of the function f(t) is
shifted to the left by a distance a (Figure 1.5a, c).
The bias theorem determines the ratio of the image and the original in the case
when the complex variable is displaced by a [1]:
t
F ( s + a) ← e − at
∫
f (t) + a e − at f (t) dt . (1.13)
0
(a)
f (t)
t
O
(b) (c)
f (t – a)
f (t + a)
t t
O O
а
а
Figure 1.5 Offset function to the right and left.
Rule III plays an important role in solving difference equations [3].

Rule IV. Theorem 1.8: Damping theorem. If the function f(t) is the original and F(s) is
the image, and if α is any complex number, then we can write
eα t f (t) → F( s − α ) (1.14)
or
e −α t f (t) → F( s + α ). (1.15)
It follows that if the correspondence f(t) → F(s) holds in the half-plane.

Re s = x > x0, then the correspondence (1.14) is meaningful for Re (s + α) > x0, i.e.,
Re s > x0 - Re α. The actual attenuation of the original occurs only if α is a positive
real number.
1.4 Differentiation and integration

The following theorems establish two important properties of the Laplace transform.
Theorem 1.9: The differentiation theorem for the original.
If the function f(t) and its derivative are f′(t) originals and F(s) is the image of the
original f(t), then is justly the equality
L  f ′ ( t )  = sF ( s ) − f ( +0 ) , where f ( +0 ) = limt→0 f ( t ) . (1.16)
If we set the initial value f (+0) = 0, then from formula (1.16) we obtain
L  f ′(t)  = sF( s).
Hence, we formulate the following rule.

Rule V. The operation of differentiating the original corresponds to the operation of
multiplying the image of this original by the complex number s.
f ′(t) → sF( s). (1.17)
If the derivatives of higher orders f (2)(t), f (3)(t), …, f (n)(t) are originals, then the following
relations hold:
f ( 2 ) (t) → s2 F( s) − f (+0) − f ′(+0);
f (3) (t) → s3 F( s) − f (+0)s2 − f ′(+0)s − f (2) (+0);
and
f ( n)
(t ) → s F ( s) −
n
∑s
k +1
n− k
f ( k − 1) ( +0 ). (1.18)
The essence of Rule V is as follows. The differentiation, which in the origin space
is a transcendent process [3], is replaced in the image space by multiplying the image
by the degree of argument s with the simultaneous addition of the polynomial whose
coefficients are the initial values of the original.
Rule V assumes that the derivative of the highest order f (n) (t) exists at each point t > 0
and has an image. This rule is especially valuable in solving differential equations.
In the operational calculus, instead of the Laplace integral (1.3), we prefer to
consider the function

∫
F( s) = s L[ f (t)]; F( s) = s e − st f (t) dt (1.19)
0
or
∞
F( s)
s
=
∫e − st
f (t) dt. (1.20)

0
Taking into account (1.19), we give the most important ratio of operational calculus
 f ′(0) 
f ′(t) → s  F( s) − f (0)  ; f ′′(t) → s2  F( s) − f (0) − ;
 s 

 f ′(0) f ( n− 1) (0) 
f ( n) (t) → sn  F( s) − f (0) − −  − ( n− 1)  . (1.21)
s s

 
There are no contradictions between the formulas for differentiating the original
(1.18) and the expressions (1.21). According to the rule for calculating the integral,
these expressions differ only in the integration constants. Only in (1.18), these con-
stants are real, and in (1.21) are only complex quantities.
The essence of the function (1.19) lies in the fact that in transformations we work
in image space only with analytic functions and their initial values. This important
technique is widely used in mechanics and other technical applications. Then, using
a concrete example, we show the advantages of applying formulas (1.21).
Rule VI. Theorem 1.10: Differentiation theorem for an image.
If the function f(t) is an original and F(s) is an image, then is justly the equality
d
L tf (t)  = − F( s). (1.22)
ds

Since the image of F(s) is always an analytic function and possesses all derivatives, on
the basis of (1.22) one can obtain derivatives of any order.
Thus, the operation of image differentiation with respect to s corresponds to the
operation of multiplying the original by an independent variable t taken with the
opposite sign:
−tf (t) → F ′( s); t 2 f (t) → F ( 2 ) ( s); (−1)n t n f (t) → F ( n) ( s). (1.23)
Rule VII. Theorem 1.11: If the function f(t) is an original, and F(s) is the image, then
the integral is also an original and is justly equality
F( s) f −1 (+0)
L  f −1 (t)  = + . (1.24)
s s

t
Here, f −1 (t) =
∫ f (t) dt =
∫ f (τ ) dτ + f
0
−1
(+0), where f −1 (+0) is the integration constant.
Hence, under the condition that f −1 (+0) = 0, Rule VII is formulated.
The operation of integrating the original corresponds to the operation of divid-
ing the image of this original by the complex number s:
t
1
∫ f (t) dt → s F(s). (1.25)

0
Theorem 1.11 extends to integrals of higher orders.
Let f ( − k ) (t) =
∫ f (t) dt ∫ f (t)(dt)  ∫ f (t)(dt) , then
2 k
F( s) f −1 (+0) f ( −2) (+0)

L  f ( −2) (t)  = + + ;
s2 s2 s

F( s) f −1 (+0) f ( −2) (+0) f ( −3) (+0)
L  f ( −3) (t)  = + + + ;
s3 s3 s2 s

and
f ( − k ) (+0)
n
L  f ( − n) (t)  =
F( s)
sn
+ ∑ sn k +1
. (1.26)

k =1
The rule of integration for an image is rarely used in practice, so we do not give it here.
The expressions for the derivative and integral representations are of primary
importance in operational calculus; therefore, the number s acquires the character of
the operator [1].
1.5 Multiplication and curtailing

Consider operations on combinations of several functions.
Rule VIII. Theorem 1.12: The image of the sum of a finite number of originals is
equal to the sum of the images of these originals:
 f1 (t) + f2 (t)  → F1 ( s) + F2 ( s). (1.27)
Rule IX. Theorem 1.13: Convolution theorem. If the functions f1(t) and f2(t) are originals
and their images are, respectively, F1(s) and F2(s), then is justly the equality
t 

∫
L  f1 (t − τ ) f2 (τ )dτ  = F1 ( s)F2 ( s). (1.28)

0


Here the integral combination of functions is called the convolution of the originals
f1(t) and f2(t) and is denoted by
t
f1 f 2 =
∫ f (t − τ ) f τ dτ. (1.29)
0
1 2
Rule IX establishes a correspondence between the convolution of originals and the

product of images
f1 (t) f2 (t) → F1 ( s)F2 ( s).
The inverse transformation is formulated as follows. The product of two images is an

image and is equivalent to convolution of the originals:
F1 F2 →
∫ f (t − τ ) f τ dτ ,
0
1 2 t > 0. (1.30)
Rule IX has found wide application in technical applications.

Rule X. Theorem 1.14: Theorem of complex convolution. If the functions f1(t) and f2(t)
are originals and their images are, respectively, F1(s) and F2(s), then the product of the
functions f1(t) and f2(t) is also an original and is justly the equality
x + j∞
t  1
L 
∫ 
f1 (t) f2 (t) =
 2π j ∫
F1 ( s − w)F2 (w) dw. (1.31)

0  x − j∞
Here w is a complex number.

Expression (1.31) is valid only in the case when the abscissas x along which
integration is performed will be chosen so that the variables entering into the func-
tions F1(s) and F2(s) move in the half-plane of absolute convergence of the integrals
L[f1] and L[f2].
Rule X establishes a correspondence between the product of two originals and a
complex convolution of images
x + j∞
1
f1 (t) f2 (t) →
2π j ∫ F (w)F (s − w) dw,
1 2 x1 ≤ x < Re s − x2 ; (1.32)

x − j∞
x + j∞
1
f1 (t) f2 (t) →
2π j ∫ F (s − w)F (w) dw,
1 2 x2 ≤ x < Re s − x1 . (1.33)

x − j∞
The domains of absolute convergence of these integrals are illustrated in Figure 1.6.
Im s Im s
Re s Re s
O х₁ х х₂ O х₂ х х₁
Figure 1.6 Area of absolute convergence.
The equality of Parseval is widely used in technical applications:

∞ +∞
1
∫ ∫ F( jy)
2 2
f (t) dt = dy. (1.34)
2π

0 −∞
The reader will find a more detailed exposition of this question in Ref. [3].
The left-hand-side integral (1.34) is called the quadratic quality criterion.
In optimization processes, minimizing this integral is the defining characteristic.
1.6 The image of a unit function and

some other simple functions
We consider the cases of Laplace transform of the simplest functions, which have found
wide application in many scientific disciplines.
The first case. The function f(t) is the original. This function takes the unit value t > 0
and is zero for all t < 0. This function is often called either a single step function, or a single
jump function or simply a single jump, respectively, and various notations are used. For
example, a unit function is widely used of the mathematical apparatus of theories of auto-
matic control, signal processing and other technical applications. In the operational calcu-
lus, this function is called the Heaviside unit function and has the form
u(t) = 0, for t < 0,

(1.35)
u(t) = 1, for t > 0.
The graph of the unit function is shown in Figure 1.7a.

As can be seen from Equation (1.35) and Figure 1.7a, the unit function is undefined at
the point t = 0. The choice of one or another value of the unit function for t = 0 is related to
the features of a particular task.
The second case. A single jump occurs at time t = a > 0. Such a process is described by
the unit function u (t − a), which is called the shifted unit function (Figure 1.7b). Here it is fair:
u(t − a) = 0, for t < a,

(1.36)
u(t − a) = 1, for t > a.
(a) (b)
u(t – а)
u(t) 1
1
t а t
O O
Figure 1.7 Function of a unit step.
We find the image of the unit functions (1.35) and (1.36).

For the function (1.35):
∞ ∞
e − st ∞ 1
L [ u(t)] =
∫ u(t)e − st
dt =
∫ e − st dt = −
s
|0 = or u(t) → 1/s. (1.37)
s

0 0
For the function (1.36):

∞ ∞
e − as
L [ u(t − a)] =
∫ u(t − a)e − st
dt =
∫ e − st dt =
s
or u(t − a) → e − as /s. (1.38)

0 0
In mechanics, the initial values of the function are used in solving problems (i.e., values of
the function for t = 0). We denote it as u0(t) and the shifted function, respectively, u0(t − τ).
Taking (1.18) into account, we can write for them
u0 (t) → 1; (1.39)
u0 (t − τ ) → e − sτ . (1.40)
The physical meaning of the initial function is that at time t = 0 it takes the value of the
constant C. It follows from (1.39) that any constant C is an image of the same constant. In
the case of the Laplace transform, we always mean that the “constant” (initial function) is
a function of t, vanishes for t < 0 and is equal to C for t > 0 [1].
The third case. Special functions. To the category of special functions is the Dirac delta
function, also called the first-order impulsive unit function. The delta function is defined by
δ (t) = 0, for t ≠ 0,
(1.41)
δ (t) = ∞, for t = 0.
In this case is integral

+∞

∫ δ (t) dt = 1. (1.42)
−∞
Conditions (1.41) and (1.42) are incompatible from the point of view of classical mathemati-
cal analysis, and therefore the delta function does not belong to the “function” in the usual
sense. However, in the class of generalized functions, the delta function occupies an equal
place [2]. The notion of a “delta function” turns out to be significant when extending the
operation for differentiating discontinuous functions. For example, a sequence of functions
u0 (t) − u0 ( a)
fδ ( t , a ) = ,
a

characterizing pulses of height 1/a and duration a (Figure 1.8), for a → 0 converges to a
delta function. For example, the function
u1 (t , a) = 1/a[u0 (t) − u0 (t − a)] = 1/a, t < a,

(1.43)
u1 (t , a) = 1/a[u0 (t) − u0 (t − a)] = 0, t > a
has a physical meaning in mechanics, as a force of constant magnitude, acting for a period
of time a. The momentum of this force for a time interval of action is equal to one, regard-
less of the value of a. Such a function is called a delta function of the first order. This func-
tion is zero for all t except t = 0, when becomes infinite, so that lim a→ 0 au1 ( t , a ) = 1.
The shifted Dirac delta function δ (t − τ) is defined by
d(t − τ ) = 0 for t ≠ τ ,
(1.44)
d(t − τ ) = ∞ for t = τ .

Similarly, the displaced unit impulsive force will be denoted as u1 (t − τ), and
u1 (t − τ ) = 0 for t ≠ τ ,
(1.45)
u1 (t − τ ) = ∞ for t = τ .
This force is interpreted as a force, instantaneously communicating at a time t = τ for the

point of unit mass a speed, this equal to one.
For Equations (1.44) and (1.45) we have the relations
u1 (t) → s ; u1 (t − τ ) → e − sτ s. (1.46)
The reader will find delta functions of the second order (Figure 1.9) in Refs. [1,2]. We give
only the correlation of the originals and images
u2 (t) → s 2 ; u2 (t − τ ) → s2 e − sτ . (1.47)
1/а
t
O а
Figure 1.8 The function u1 (t, a).
1/а2
а 2а
t
O
1/а2
Figure 1.9 The function u2 (t, a).
The fourth case. The time function is eα, where α is an arbitrary complex or real number.
We represent such a function in the form
u(t)eα t = 0 for t < 0,

ατ αt
(1.48)
u(t)e = 1e for t > 0.
We take the function u(t) as a unit function, which for convenience of calculations we addi-
tionally represent in form of the multiplier 1(t).
The image of the function (1.48) will be
∞
e −( s−α )t ∞ 1
L 1 ( t ) e −α t
 =
∫ eα t e − st dt = −
s
|0 =
s−α
. (1.49)

0
If α is a complex number, then, depending on the values that the real and imaginary parts
take, function (1.48) characterizes the types of vibrations and motions. Expression (1.48)
takes on an explicit physical meaning. The reader will find a detailed exposition of this
question in Ref. [3].
The graph of the function (1.48), where α is the real negative number (α < 0), is shown
in Figure 1.10.
The fifth case. The function of time is t and the ratio
f (t) = 0 for t < 0,

(1.50)
f (t) = t for t > 0
is fair.
1(t)eαt
t
O
Figure 1.10 A truncated exponential with real α < 0.
We find the image of this function using integration by parts, we obtain for Re s > 0:
∞ ∞
te − st ∞ 1 − st e − st 1
L [t ] =
∫ te − st
dt = −
s
|0 +
s ∫
e dt = − 2 |∞0 = 2 . (1.51)
s s

0 0
The sixth case. The function of time is tn and ratio
f (t) = 0 for t < 0,

(1.52)
f (t) = t n for t > 0
is fair.
Using the previous method and repeated integration, we obtain an image for the func-
tion (1.52) in the following form:
n!
L t n  = . (1.53)
s n+ 1

The seventh case. We represent the image (1/s)⋅F (s) in the form
1 1 1
F( s) = F( s) n− 1 . (1.54)
sn s s

Using Rule IX (the convolution theorem), we obtain an image of an integer power of the
variable t:
t
F( s) 1
sn
→
(n − 1)! ∫ f (τ )(t − τ )n− 1
dτ . (1.55)

0
The mechanism of finding the image according to the given original, if this original exists,
reduces to calculating the integral (1.2). For the simplest functions, such an operation does
not present mathematical difficulties.
Therefore, the results of the transformations are given in tables of correspon-
dence between originals and images [1–4]. We give some correspondences in Tables 1.1
and 1.2.
Table 1.1 Table of originals and images of these originals

Number The original The image
1 1(t) 1/s
2 e−αt 1/(s + α)
3 t 1/s2
4 tn n!/sn+1
Table 1.2 Table of images and originals of these images

Number The image The original
1 1 δ(t)
2 1/s 1
3 1/s2 t
4 n!/sn tn
5 1/(s − a) eat
6 1/s (s − a) 1 at
(e − 1)
a
7 s 2 [X ( s) − x0 − x0′ /s] x″(t)
8 ω/(s2 + ω2) sin ωt
9 s/(s2 + ω2) cos ωt
1.7 Examples of solving some problems of mechanics

Many problems of mechanics reduce to the solution of differential equations of various
orders, in the course of which a general solution is first sought, and then, substituting the
initial values of the functions and their derivatives, find a particular solution. In mechan-
ics, to determine the solution uniquely, additional initial or boundary conditions are given.
The number of such conditions must coincide with the order of the differential equation
or system. Depending on the method of specifying additional conditions in differential
equations, the following problems are distinguished:
• The Cauchy problem, when all additional conditions are given in one point (as a rule
in the starting point) of interval
• Boundary value problem, when additional conditions are indicated by the values of
the function and its derivatives at the boundary of the interval—at the beginning
and at the end of the integration
As is known, the solution of such problems is connected with the problem of integrating
partial differential equations under given boundary conditions [6].
We show the advantages of the Laplace transform in solving a differential inhomoge-
neous first-order equation with constant coefficients.
Example: Let some process be described by the following differential equation, which
we call the initial equation:

y ′ + c0 y = f (t).(1.56)
Equation (1.56) is represented in the original space. In the image space, this equation
c orresponds to the depicting equation, which has the form

L[ y ′] + c0 L[ y ] = L[ f (t)]. (1.57)
The symbol L [...] denotes the transformation of the original equation by multiplying
both parts by e−st and integrating from 0 to ∞.
Applying Rule V, we write in the images of Equation (1.57):
sY ( s) − y ( +0 ) + c0Y ( s) = F(s). (1.58)
Thus, we obtained a linear algebraic equation with the initial value of the function
y(t) corresponding to the value y(+ 0) for the initial point t = 0 (Theorem 1.9).
The solution of this equation is quite simple:

Y ( s) [( s + c0 )] = F( s) + y(+0),
and finally we have
1 1
Y ( s) = F( s) + y(+0) . (1.59)
s + c0 s + c0

To the resulting image of Y(s) we find the corresponding original, using the inversion
formula (1.4) and Table 1.1 (item 8):
y(t) = f (t)e − c0t + y(+0)e − c0t . (1.60)
We note that the first term on the right-hand side of (1.59) is the product of two images,
which, according to Rule IX, corresponds to the convolution of two originals, the first
term in (1.60). Finally, we get
t t

y(t) =
∫
0
f (τ )e − c0 (t − τ ) = e − c0t
∫ f (τ )e
0
c0τ
dτ + y(+0)e − c0t . (1.61)
We note that (1.61) is a solution of the differential equation (1.56) for a given initial value
of the function y(t). This solution can be obtained even easier. We outline the course of
the solution. Let f(t) = 1(t) and y(+0) = 0. Using Tables 1.1 (item 1) and 1.2 (item 6), and
passing from the image to the original, we obtain the solution (1.56) in the following
form:
1 1
1(t) = 1/s ; Y( s) = ; Y ( s) ← y(t); y(t) = (1 − e − c0t ).
s( s + c0 ) c0

The solution of a homogeneous second-order differential equation with constant coef-
ficients is given on a concrete example.
Example of the second. Find the solution of equation
d2 x dx
2
+6 + 5x = 0
dt dt

with the initial conditions at t = 0, x(0) = 0, x′(0) = 1.
We write the depicting equation

L [ x′′] + L [6x′] + L [5 x] = 0.
According to Rule V we have
L[ x′′] = s 2 X ( s) − sx(0) − x′(0) = s 2 X ( s) − 1;

L[5 x′] = 6L[ x′] = 6sX (s) − 6s x(0) = 6s X (s); L[5 x] = 5X ( s)
or
s 2 X ( s) − 1 + 6 s X ( s) + 5X ( s) = 0, or X ( s)[( s 2 + 6 s + 5)] = 1.
From which

X ( s) = 1/( s 2 + 6 s + 5) = k1/( s + 1) + k 2 /( s + 5).
Here k1 = 1, k2 = − 1; the denominator of the fraction has roots s1 = −1, s2 = −5.

Now it is time to move from the image to the original. The inverse Laplace transform
gives us the following solution:
x(t) = L−1[X ( s)] = L−1[1/( s + 1)] − L−1[1/( s + 5)] = e − t − e −5t .
A detailed solution of differential equations of any order may be found in Ref. [3].
We give examples of the solutions of certain problems of mechanics with the aid of
the Laplace transform.
The first task. The motion of a material point of mass m under the action of a force
that depends on time. The differential equation of motion of a point of mass m has the
form
mx′′(t) = f (t). (1.62)
Decision. We write the depicting equation

L[mx′′(t)] = L[ f (t)] or ms 2 X (s) − msx(0) − mx′(0) = F( s). (1.63)
We rewrite (1.63) in the form
ms 2 X ( s) = msx0 + mx0′ + F( s). (1.64)
Here x0 and x0′ are the initial values of the function f(x) and its first derivative for t = 0.
Then
1 1 1 F( s)
X ( s) = x0 + 2 x0′ + . (1.65)
s s m s2

Turning to the original, using Table 1.2 (items 2, 3) and (1.53), we obtain the solution of
Equation (1.62) in the form
t
1
x = x0 + x0′ t +
m ∫ f (τ )(t − τ ) dτ . (1.66)

0
The same result can be obtained using the formulas (1.20), Table 1.2 (paragraph 1.7).
Assuming the initial conditions in the form
x = x0 , x′ = v0 or t = 0, (1.67)
we obtain the following depicting equation
ms 2 X ( s) = ms 2 x0 + msv0 + F( s). (1.68)
Or

X ( s) = x0 + sv0 + (1/ ms 2 )F( s). (1.69)
Analyzing the two depicting Equations (1.64) and (1.68), we draw the following conclu-
sions: the original of the second derivative x′′(t) corresponds to only one image s2 X(s);
the initial values of the functions for (1.64) are given by real quantities; and in Equation
(1.68), they automatically become complex numbers.
Passing from the image (1.69) to the original, we immediately obtain Equation (1.66).
The depicting Equation (1.68) can also be obtained using the impulsive functions (1.39),
(1.46). This can be done if we assume that for zero initial values of the coordinate and
velocity to a point of mass m, at the time t = 0, a pulse is applied, mx0′ u1 (t). This action
imparts a velocity x′0 to the point, as well as two oppositely directed impulsive shocks,
which impart an instantaneous displacement of x0 the point.
Let us show this by the example of motion of a point of unit mass according to the law
of motion
x′′(t) = f (t). (1.70)
Let’s write down the depicting equation, applying single impulsive functions,
s 2 X ( s) = u0 (t)F( s) + u1 (t)v0 + u2 (t)x0 . (1.71)
Taking into account the expressions (1.39), (1.47), we obtain
s 2 X ( s) = 1F(s) + sv0 + s 2 x0 . (1.72)
Moving from the image to the original, we have

t
x=
∫ f (τ )(t − τ )dτ + v t + x .
0
0 0 (1.73)
The second task. Vibrations of the simplest vibrator. Let the load P = mg suddenly be
suspended at the end of the stressed spring. We neglect the weight of the cargo. At the
same time the cargo is given a deviation by the value displacement of the spring x0 and a
deviation the speed of x0′. It is necessary to find a change in the elongation of the spring
x(t) under a given force (Figure 1.11).
Decision. Due to the fact that the spring is stiff, during the action of the force P and
the reported stroke length, this spring will change until the whole system reaches
equilibrium and the spring reaches at rest. Therefore, the motion of the particles of
the spring can be regarded as longitudinal oscillations, which are made at some point
in time.
Moreover, these oscillations will first be forced oscillation, and then at the initial
moment of equilibrium, the oscillations take on the character of free oscillations. We
use the method of introducing single impulsive functions, following the example
considered above. We write the equation of cargo movement
x΄0
x0
x
P = mg
Figure 1.11 The calculation scheme.
mx′′ + cx = mgu0 (t) + mx0′ u1 (t)v0 + mx′u2 (t)x0 , (1.74)
where c is the spring stiffness, and x is elongation.

The equation in the images (1.74) is written in the form
(ms 2 + c)X ( s) = mg + mx0′ s + mx0 s 2 . (1.75)
Then
g x′ s x s2
X ( s) = + 2 0 2 + 2 0 2 . (1.76)
2
s +k 2
s +k s +k

c
Here k = is the frequency of free oscillations.
m
g 1
We transform the first term on the right-hand side of (1.76) as follows: =g 2 .
s2 + k 2 s + k2
The second factor of the resulting expression is considered as
1 1  s2  1  1 s 
= 2 1− 2 =  − .
2
s +k 2
k  s + k 2  k 2  s s 2 + k 2 

Applying the analogous method for transforming the remaining terms (1.76) and
passing to the origin space (Table 1.2, items 8, 9), we finally obtain the well-known
solution
g x′
x(t) = (1 − cos kt ) + 0 sin kt + x0 cos kt.
k2 k

The reader may find an extensive exposition of this material in the literature [1].
1.8 Laplace transform in problems of

studying oscillation of rods
Consider elementary movements of an elastic cylindrical rod of mass m. Mass forces, dis-
tributed mass pairs, and surface pairs are absent. We write the equations of displacements
of particles of an elementary volume of a rod for the following cases:
1. The rod is given only translational motion (Figure 1.12)
∂2 u  ∂2 u 
= c 1  ∂ x 2  , (1.77)
∂t 2

where c1 is the coefficient characterizing the properties of the rod material.
Taking c1 = E/ρ, we finally obtain
∂2 u E ∂2 u
= . (1.78)
∂t 2 ρ ∂ x 2

Here, E is the modulus of elasticity of the material; ρ is the density of the material.
2. Only twisting movements appear in the rod (the rod is only exposed to the torsion
pulse), assuming that the set of planar cross sections of the rod rotate sequentially at
a distance dx from each other (Figure 1.13)
∂2 ϕ  ∂2 ϕ 
= c2  2  . (1.79)
∂t 2
 ∂x 

Taking c2 = G/ρ, we finally obtain
и dx
М
х
О
Figure 1.12 Linear motion of the rod.
dx
x
M
Figure 1.13 Twisting motion of the rod.
∂2 ϕ G ∂2 ϕ
= . (1.80)
∂t 2 ρ ∂x2

Here G is the shear modulus of the material.
In the case of wave processes propagating in an elastic rod, longitudinal and transverse
oscillations with allowance for (1.77) and (1.79) can be described by the formulas [7]
∂2 u 1 ∂2 u 1− µ
= , a1 = ; (1.81)
∂ x 2 a12 ∂t 2 (1 + µ )(1 − 2 µ )

∂2 ϕ 1 ∂2 ϕ G
= , a2 = . (1.82)
∂x 2 2
a2 ∂t 2
ρ

Here μ is Poisson’s ratio; G = E/[2(1 + μ)].
1.9 Relationship between the velocities of the particles of an

elementary volume of a cylindrical rod with stresses
If the elastic body is subject to wave perturbations, then the displacement of particles of
the elementary volume of the rod is accompanied by the appearance of stresses in it. We
also note that the elastic wave is two independently propagating waves: longitudinal and
transverse waves. In the case of propagation in the elastic body of a longitudinal wave,
the movement of particles is effected by the action of a normal perturbing force in the
direction of propagation of the wave itself (i.e., along the longitudinal axis of the rod). This
means that only normal stresses σ arise in the rod. In the propagation of a transverse wave,
the displacement of particles of elementary sections occurs under the action of a shearing
force in a plane orthogonal to the direction of propagation of the wave.
Thus, tangential stresses τ appear in the rod. In the absence of mass forces, distributed
mass and surface pairs, taking into account (1.77) and (1.79), the following relationships
are valid [8]:
∂υ ∂σ
ρ =− ; (1.83)
∂t ∂x

1 ∂σ ∂υ
=− ; (1.84)
E ∂t ∂x

∂Ω ∂τ
rρ =− ; (1.85)
∂t ∂x

1 ∂τ ∂Ω
=− . (1.86)
Gρ ∂t ∂x

Here υ is the travel speed of an elementary volume of an elastic rod along axis, υ = ∂u/∂t; r
is the radius of the rod; and Ω is the angular travel speed of the particles of the elastic rod
in the plane of the section, Ω = ∂ϕ/∂t.
Equations (1.83) and (1.84) describe the relationship between the velocities of longi-
tudinal displacement of plane sections of the elementary volume of an elastic rod and
changes in normal stresses with the gradients of changes of these variables along the
length of the rod.
Equations (1.85) and (1.86) describe the relationship between the shear rate of flat sec-
tions of the elementary volume of an elastic rod and the rate of change of the maximum
tangential stresses with the gradients of the variations of these variables along the length
of the rod.
For a short rod, Equations (1.83–1.86) can be written in ordinary derivatives. A detailed
exposition of this question is given in Ref. [8].
This approach, proposed by L. Kondratenko, allowed for development of a new
method for studying the dynamics of rotating and longitudinally moving elements of the
construction. The method makes it possible to estimate the magnitude and voltage oscil-
lations in the structural elements, as well as the speed of movement of the functional ele-
ment in engineering technologies.
Let us explain the essence of Kondratenko’s method.
1.10 An inertial disk rotating at the end of the rod

As a model, let us consider a structure consisting of a cylindrical rod, at the end of which
a disk of mass m with a flywheel moment of inertia J is fixed. The rod is imparted rotary
motion in the absence of mass forces, distributed mass, and surface pairs. Rotation of the
disc is impeded by the moment of resistance Mr (Figure 1.14). Such a model can be taken
as an imitation model of the machine spindle operation during hole machining. Rotational
motion to the rod is transmitted at point 1 (Figure 1.14). The driven link is a disk fixed to
the end of the rod (position 2). We denote the angular velocities of these elements as Ω1
and Ω2, respectively. Obviously, the values of the velocities will be different in magnitude.
Integrating Equation (1.86) with respect to the coordinate x and then differentiating it
with respect to t, we obtain the relation for the torsional vibrations of the rod
dτ
ϑk = Ω1 − Ω2 , (1.87)
dt

l
where ϑ k is the coefficient of torsional elasticity, ϑ k = ; Ω1, Ω2 is the angular velocity
ρGW
of rotation of the rod section and the cross section of the rod near the disk, respectively;
Ω1
1
l
2
J, Мr
Ω2
Figure 1.14 The design scheme.
and W is the geometric moment of resistance of the cross section of the rod near the disk,
W = πd3/16.
We integrate (1.87). We obtain the dependence of the tangential stresses on the varia-
tion of the twist angle. We integrate (1.87). We obtain the dependence of the tangential
stresses on the variation of the twist angle
l
υ 1kτ = ϕ 1 − ϕ 2 , ϑ 1k = . (1.88)
ρG

The tangential stresses developed in the rod overcome the resistance of the forces.
Mr = Mr0 + hk Ω2, as well as rising inertia forces Md = JdΩ2/dt. Here hx is the loss factor
proportional to the angular velocity of rotation of the disk. Taking into account that
τ = M/W, we finally obtain
dΩ2
τ (t)W = Mr 0 (t) + hx Ω2 (t) + J . (1.89)
dt

1.11 Equations of torsional oscillations of a disk
For Equations (1.85) and (1.86), we write the partial differential equations, assuming that
the density and the shear modulus of the rod material are equal and constant along the
length:
dτ ( s )
rρ sΩ ( s ) = − ; (1.90)
dx

sτ ( s) dΩ( s)
=− . (1.91)
rG dx

Differentiating (1.90) with respect to the coordinate x, eliminating the derivative dΩ(s)/dx
with the help of (1.91), and introducing the new variable θ k(s) = ± s (G−1ρ)1/2, we obtain a new
differential equation second-order
∂2 τ ( s)
− θ k2 ( s)τ ( s) = 0. (1.92)
∂x2

The solution of this equation has the form
τ ( s, x) = C1exp[θ k ( s)x] + C2 exp [−θ k ( s)x]. (1.93)
The constants of the integration C1, C2 are determined by the boundary conditions, for
x = 0,
∂τ ( s, x) θ 2 ( s)G
τ ( s, x) = t1 ( s, 0); =− k ⋅Ω1 ( s, 0). (1.94)
∂x s

The final solution (1.90) and (1.91) will be
1
Ω( s, x) = Ω1 ( s, 0)ch [θ k ( s)x ] − sτ 1 ( s, 0)sh [θ k ( s)x ] ; (1.95)
Gθ k ( s)

1
τ ( s, x) = τ 1 ( s, 0)ch [θ k ( s)x ] − rGθ k Ω1 ( s, 0)sh [θ k ( s)x ] . (1.96)
s

The coefficient θ k(s) is the symbolical coefficient of wave propagation. The solution obtained
in the images makes it possible to calculate the frequency characteristics of the driven link
depending on the change in the speed of the leading link, taking into account the emerg-
ing reactive force factors of the medium. The reader may find an extensive exposition of
this question in the literature [8].
1.12 Equations of longitudinal oscillations of a disk

Consider a sufficiently short rod of length l, at the end of which a disk with mass m is
fixed, counting, as before, the element 1 by the leading link, and the disk by the driven
link (Figure 1.15). Let’s give the leading link a linear motion. This model corresponds to the
mechanism of the pusher, where the driving link is the pusher. The motion of the driven
link is impeded by the force F. There are no distributed surface forces by the length of the
pusher. The oscillations of the disk do not affect the speed of the pusher.
For such a system, the relations (1.82) and (1.83) are valid. We rewrite these expressions
in ordinary derivatives:
dυ dσ
ρ =− ; (1.97)
dt dx

1 dσ dυ
=− . (1.98)
E dt dx

We perform actions similar to Section 1.10. Integrating (1.97) with respect to the coordinate
x and then differentiating with respect to t, we obtain the relation for the longitudinal
vibrations of the rod
dσ
ϑ n′ 0 = υ1 − υ 2 , (1.99)
dt

where ϑ′n0 is the coefficient characterizing the longitudinal elasticity, ϑ′n0 = l/E; υ1, υ 2 are the
linear velocities of the displacement of the points of the rod and disk sections, respectively;
and E is the modulus of elasticity of the material. Assuming that the modulus of elasticity
is the same and constant, we integrate (1.99), and we obtain the dependence of the normal
stresses on the displacements of the points of the rod:
2 υ2
1
F2
υ1 F1
F х
α
υ
mg
l
Figure 1.15 Linear motion of the rod.
ϑ n′ 0σ = (υ1 − υ 2 ) t. (1.100)
Or
l l
σ = x1 − x2 , σ = ∆u. (1.101)
E E

Expression (1.101) is the well-known Hooke’s law.
The normal stresses developed in the rod overcome the resistance arising on the disk
(slave link), which is the resultant force of its two components F2 = Ff = k sin α. Note that the
force F2 is the friction force, which is determined by the expression F2 = Ff = kf sin α. Here
kf is the coefficient of friction. The force F1 is the inertial component, and F1 = −Fi = mdυ 2/dt.
Without taking into account the direction of the speed of motion and taking F = σf,
where f is the cross-sectional area of the body, we write the following relation [8]:
dυ 2
βσ f = F0 (t) + k1υ1 (t) + hυ 2 + m . (1.102)
dt

Here β is the proportionality coefficient, which depends on the coefficient of friction caused
by the contact pressure and the direction of motion of the driven link, β = 1 − c sgn υ 2; k1, h
are the coefficients of friction loss proportional to the speeds of the β driving and driven
links. Solving with the help of the symbolical method jointly (1.99) and (1.102), taking into
account the direction of motion of the driven link, we finally obtain
( )
υ1 (t) ( 1 − c sgn υ 2 − k1υ n0 p ) − ϑ n0 pF0 (t) = υ 2 (t) 1 − c sgn υ 2 + hυ n0 p + mϑ n0 p 2 . (1.103)
Here ϑn0 is the elasticity of the mechanical system, ϑn0 = l/fE; p is a differential operator,
p ≡ d/dt.
Further transformations will be based on the energy approach of the deformation
of an elastic body and the rheological representation of the transfer of dynamic energy
through a metallic body (the Zener model) [9].
It is known that the realization of the principle of continuity of deformations in an
elastic body corresponds to the minimum value of the potential deformation energy accu-
mulated by the body [10] (i.e., in deformation processes, the stored energy in the body is
spent on performing work to restore the body shape to its original state after the load is
removed).
Taking into account the phenomenological Zener model, we write the differential
equation characterizing the redistribution of stresses and deformations in the body under
the static load of the body in some time [8]
η dσ dθ
σ+ = E1θ + η . (1.104)
E2 dt dt

Here η is the coefficient of proportionality, which characterizes the viscosity of the body;
E1, E2 are elastic constants of isothermal and isobaric deformation processes; and θ is linear
deformation of the body.
If we take θ′ = 0, then expression (1.104) is transformed into equation
η dσ η
σ+ = E1θ 0 , τ ε = (1.105)
E2 dt E1

with the decision
σ (t) = E1θ 0 + (σ 0 − E1θ 0 )exp(−t/τ ε ). (1.106)
Here τε is the relaxation time under the condition of constant deformation.

If we take σ′ = 0, then the solution of Equation (1.104) is
 σ   t  η
θ (t) = E1σ 0 +  θ 0 − 0  exp  −  ; τ σ = , (1.107)
 E1   τσ  E1

where τσ is the time of retardation (lag).
We transform Equation (1.104), going over to the operator form:
1 d
pσ = E2 pθ − (σ − E1θ ) , p ≡ . (1.108)
τε dt

We perform one more transformation
 1  1  E
σ  p +  = θ E2  p + ; k = 2 . (1.109)
 τε   kτ ε  E1

Passing under zero initial conditions to Laplace transforms, we rewrite Equation (1.109) in
the images
 1  1 
σ ( s)  s +  = θ ( s)E2  s + . (1.110)
 τε   kτ ε 

Taking into account the stepwise deformation, the Laplace transform of the stress change
function (1.110) is written in the form
1
s+
kτ ε
σ ( s) = θ 0E2 . (1.111)
 1

 s + s
τ ε 
The original is determined by means of residues with respect to the poles
1  τ   t   E   t 
σ (t) = θ 0E2  +  1 − ε   exp  −  = θ 0E1 1 +  2 − 1 exp  −   . (1.112)
k  kτ   τ   E   τε  

 ε  ε  1
Assuming that the relaxation constant is zero within the elastic range, when the stresses
do not exceed the yield strength, without taking into account the direction of motion in
accordance with (1.102) and (1.103), we write the following equations of motion of the mate-
rial object:
dυ 2
σ f (1 − c) = F0 (t) + hυ 2 + m (1.113)
dt

and
( )
υ1 (t)(1 − c) − ϑ n0 pF0 (t) = υ 2 (t) 1 − c + hϑ n0 p + mϑ n0 p 2 . (1.114)
In this case, the oscillations of the velocity of motion of a material object can be described
by the following equation [8]:
υ1 ( t ) − aϑ n0 pF0 ( t ) 1
υ2 (t ) = , a= . (1.115)
1 + ahϑ n0 p + amϑ n0 p 2 1− c

For the difference in the velocities of motion for υ1(t) = const, the following relation is valid:
− aϑ n0 pF0 (t)
∆υ (t) = . (1.116)
1 + ahϑ n0 p + amϑ n0 p 2

Integrating (1.116) with respect to t and passing to the differential form, we obtain the
equation of displacement of the material point relative to the leading member:
d∆u d 2 ∆u dF
∆u ( t ) + ahϑ n0 + amϑ n0 2
= − aϑ n0 0 . (1.117)
dt dt dt

From the solution of the system of equations (1.113) and (1.114), the stresses in the rod are
determined by the equality
a
σ (t) =  F0 (t) + hυ 2 (t)(1 + Tp)  , (1.118)
f 

where T = m/h is the inertial time constant.
A detailed exposition of this question can be found in Ref. [8].
The equations obtained make it possible to apply the motion transfer scheme and, on
its basis, to investigate the longitudinal and torsional oscillations of the moving techno-
logical object fixed to the end of the rod (the input and output links of the system).
1.13 Application of the Laplace transform

in engineering technology
The presented method allows mathematical modeling of many technological operations of
mechanical engineering to be carried out and features of the dynamic phenomena at work
of the equipment and interaction of the tool to be revealed with detail.
1.13.1 Method of studying oscillations of the velocities of motion

and stresses in mechanisms containing rod systems
To study the dynamic characteristics of the process of the functioning of mechanisms con-
taining rod systems, let us take a generalized model of the inertial disk rotating at the end
of the rod. Such a model can be taken as an imitation model when studying the process
of hole processing. The structural scheme for the transfer of rotational motion is shown in
Figure 1.16 [8].
Accepted designations: ϑk —coefficient of torsional elasticity; p—differentiation opera-
tor (p = d/dt); τ—tangential stresses; τε —relaxation constant; kε —ratio of the adiabatic and
isothermal elasticity modulus of the rod material; Mr —resultant moment of the resistance
forces; l—length of the rod with the disk; hk —coefficient of friction loss proportional to the
rotational speed; 1, 2—leading and driven links.
The transfer function of the effect of the oscillations of the torque on the rotational
speed of the disk is the relation
Ω2 ( s )
WΩ ( s ) = . (1.119)
Mr ( s )

Here Ω2 is the angular velocity of rotation of the disk, and Mr is the moment of resistance
of forces.
The transfer function (1.119) is a Laplace transform of the impulse response k(t) [4]. To
determine k(t), it is necessary to find the roots of the characteristic equation.
The proposed mathematical model allows us to investigate the dynamics of the rotat-
ing parts of the structural element, and also obtain equations describing in the rod the rela-
tionship between the angular acceleration of elementary sections and the gradient of the
tangential stress and the rate of change of this voltage with the angular velocity gradient.
1.13.2 Features of functioning of a drive with a long force line

To study the dynamic phenomena of the drive with a long force line, we use a generalized
model of an inertial disk rotating at the end of the rod.
The solution of differential equations (1.85) and (1.86) in the originals gives informa-
tion about oscillations in a mechanism with a short rod (i.e., in an elastic system with
lumped parameters).
l
2
1
x
Ω1 Ω2
Mc
G
– Ω2
Ω1 ∆Ω 1 1+ kετε p τ
W (J)–1
υk p 1 + τε p
– –
hk
N1
N2
Figure 1.16 The structural scheme for the transfer of rotational motion.
In the presence of long lines (an elastic system with distributed parameters), it is expe-
dient to carry out an investigation of such processes in the complex domain by means of a
one-dimensional Laplace transform.
Based on the proposed scheme of motion transfer (Figure 1.14), the transfer function
can be described by expression
Ω2 ( s) ϑ k s(s+ α 1 )
WΩ ( s) = = . (1.120)
Mr ( s) [α 2 + s(1 + ϑ k hkα 1 ) + s2ϑ k ( hk + Jα 1 ) + s3ϑ k J ]

Here α1, α 2 are quantities that take into account the peculiarities of the rotation of the disk
in the interaction medium (contact interaction, etc.).
1.13.3 Investigation of dynamic features of the system in

the technologies of deephole machining
When modeling dynamic processes in the technologies of deephole machining, we use
expression (1.120).
In view of the brevity of the presentation at the condition of the matched load, when
input to the system the moment of momentum, which is completely consumed in the load,
we give the final equations in the following form:
Ω2 ( s) [1 + hkϑ k ( s)s + Jϑ k ( s)s 2 ] = Ω1 ( s) ch−1[θ k ( s)l] – Mr ( s)sϑ k ( s);
 Ω (s)( hk + Js) 
τ c ( s) = ψ ( s) ⋅  Mr + 1 ;
 ch[θ k (s)l] 

1
ψ ( s) = ;
W[1 + hkϑ k ( s) + Jϑ k ( s)s2

l
ϑ k ( s) = Z k ( s);
GrW

th [θ k ( s)l ]
Zk ( s) = .
θ k ( s)l

Here Zk(s) is a function characterizing the degree of distribution of the parameters. For
tg α k
Ω1 = 0 and the Laplacian s = jω, the function Zk(jω) becomes real, i.e. Zk ( jω ) = ; αk is the
αk
parameter characterizing the properties of the structure, αk = lω (ρG−1)0,5; ω is the circular
frequency of harmonic oscillations; j = (−1)1/2.
These equations make it possible to calculate the frequency characteristics of the drive
and determine the drive response to the harmonic variation in the speed of the driving
link or the moment of resistance acting on the actuator.
Thus, we obtain two frequency characteristics, one of which is −WM(jω), illustrating
the influence of the oscillation of the moment of resistance on the angular velocity of rota-
tion of the disk Ω2; the other −WMτ(jω), determines the influence of the oscillations Mr on
the magnitude of tangential stresses τ, appearing in the section adjacent to the disk.
The graph of the function Z к(αк) is shown in Figure 1.17, from which it is clear that as
the parameter tends to zero, the function Zk tends to unity αк → 0, Z к → 1.
Zk
12
10
2 π 3
1 π
2 π 2 αk
1 2 3 4 5 6
–2
–4
–6
–8
–10
–12
Figure 1.17 Change in the function Zk as a function of the dimensionless parameter αk.
In cases of variation of αk in the intervals π/2 + kπ > αk > π + kπ, the function Z к takes
negative values Zk < 0.
The graph in Figure 1.17 clearly defines the zones of stable and unstable operation of
the mechanism.
A detailed exposition of this material and questions of mathematical modeling of the
rolling of the tube can be found in the literature [11–16].
References
1. Lur’e A.I. [Operacionnoe ischislenie i ego prilozheniya k zadacham mekhaniki] Operational Calculus
and Its Applications to Problems of Mechanics. GITTL, Moscow, 1950. (In Russ.).
2. Korn G., Korn T. Mathematical Handbook for Scientists and Engineers Definitions, Theorems and
Formulas for Reference and Review. McGraw-Hill Book Company, New York, San Francisco,
Toronto, London, Sydney, 1968.
3. Doetsch G. Anleitung zum praktiscen gebrauch der Laplace-transformation. R. Oldenbourg,
München, 1961.
4. Ivanov V.A., Chemodanov B.K., Medvedev V.S. [Matematicheskie osnovy teorii avtomatichesk-
ogo regulirovaniya] Mathematical Foundations of the Theory of Automatic Control. High School,
Moscow, 1971. (In Russ.).
5. Lavrent’ev M.A., SHabat B.G. [Metody teorii funkcij kompleksnogo peremennogo] Methods of the
Theory of Functions of a Complex Variable. The Science, Moscow, 1965. (In Russ.).
6. Mironova L.I. [Komp’yuternye tekhnologii v reshenii zadach teorii uprugosti] Computer Technologies
in Solving Problems in the Theory of Elasticity. Palmarium Academic Publishing, ISBN-13: 978-3-
659-72395-7; ISBN-10: 3659723959. (In Russ.).
7. Sedov L.I. [Mekhanika sploshnoj sredy] Continuum Mechanics. Nedra, Moscow. T.1, T.2, 1970. (In
Russ.).
8. Kondratenko L. A. [Raschet kolebanij detalyah i uzlah mashin] Calculation of Velocity Variations and
Stresses in Machine Assemblies and Components. Sputnik, Moscow, 2008. (In Russ.).
9. Eirich Frederik R. (Edit). Rheologiy. Academic Press, Inc., New York, VI, 1965.
10. Bezuhov N.I. [Osnovy teorii uprugosti, plastichnosti i polzuchesti] Fundamentals of the Theory of
Elasticity, Plasticity and Creep. High School, Moscow, 1968. (In Russ.).
11. Kondratenko L.A., Terekhov V.M., Mironova L.I. [Ob odnom metode issledovaniya krutil’nyh
kolebanij sterzhnya i ego primenenii v tekhnologiyah mashinostroeniya] About one method
of research torsional vibrations of the core and this application in technologies of mechanical
engineering. Engineering & Automation Problems. 2017, vol. 1, pp. 133–137. (In Russ.).
12. Kondratenko L., Terekhov V., Mironova L. The aspects of roll-forming process dynamics.
Vibroengineering PROCEDIA. At the 22nd International Conference on Vibroengineering,
Moscow, 2016, pp. 460–465. (In Russ.).
13. Kondratenko L.A. [Mekhanika rolikovogo val’cevaniya teploobmennyh trub] Mechanics
Roller Rolling Heat Exchange Tubes. Sputnik, Moscow, 2015. (In Russ.).
14. Kondratenko L.A., Terekhov V.M., Mironova L.I. [K voprosu o vliyanii dinamiki rolikovogo
val’cevaniya na kachestvo izgotovleniya teploobmennyh apparatov v atomnyh ehnerget-
icheskih ustanovkah] On the effect of the dynamics of the roller rolling on the quality of man-
ufacture of heat exchangers of nuclear power units. Heavy Engineering Construction. 2016, vol. 3,
pp. 10–14. (In Russ.).
15. Kondratenko L., Mironova L., Terekhov V. Investigation of vibrations during deepholes
machining. 25th International Conference Vibroengineering, Liberec, Czech Republic. JVE
International LTD. Vibroengineering Procedia. 2017, Vol. 11. ISSN 2345-0533, pp. 7–11. Crossref
DOI link: https://doi.org/10.21595/vp.2017.18285.
16. Kondratenko L., Mironova L., Terekhov V. On the question of the relationship between longi-
tudinal and torsional vibrations in the manufacture of holes in the details. 26th Conference in
St. Petersburg, Russia. JVE International LTD. Vibroengineering Procedia. 2017, Vol. 12. ISSN
2345-0533, pp. 6–11. Crossref DOI link: https://doi.org/10.21595/vp.2017.18461.
chapter two
Fourier series and its

applications in engineering
Smita Sonker and Alka Munjal
National Institute of Technology Kurukshetra
Contents
2.1 Introduction........................................................................................................................... 36
2.2 Periodic functions................................................................................................................. 36
2.3 Orthogonality of sine and cosine functions..................................................................... 37
2.4 Fourier series......................................................................................................................... 39
2.5 Dirichlet’s theorem............................................................................................................... 41
2.6 Riemann–Lebesgue lemma................................................................................................. 41
2.7 Term-wise differentiation.................................................................................................... 41
2.8 Convergence of Fourier series.............................................................................................42
2.9 Small order.............................................................................................................................43
2.10 Big “oh” for functions...........................................................................................................43
2.11 Fourier analysis and Fourier transform............................................................................43
2.12 Fourier transform..................................................................................................................44
2.13 Gibbs phenomenon...............................................................................................................44
2.13.1 Gibbs phenomenon with an example.................................................................... 45
2.13.2 Results related to Gibbs phenomenon................................................................... 47
2.14 Trigonometric Fourier approximation............................................................................... 47
2.15 Summability.......................................................................................................................... 47
2.15.1 Ordinary summability............................................................................................. 47
2.15.2 Absolute summability.............................................................................................. 48
2.15.3 Strong summability.................................................................................................. 48
2.16 Methods for summability.................................................................................................... 48
2.17 Regularity condition............................................................................................................. 49
2.18 Norm....................................................................................................................................... 49
2.19 Modulus of continuity.......................................................................................................... 50
2.20 Lipschitz condition............................................................................................................... 50
2.21 Various Lipschitz classes..................................................................................................... 50
2.22 Degree of approximation..................................................................................................... 51
2.23 Fourier series and music...................................................................................................... 52
2.24 Applications and significant uses.......................................................................................54
References........................................................................................................................................54
35
2.1 Introduction
Mathematics has its roots embedded within various streams of engineering and sciences.
The concepts of the famous Fourier series were originated from the field of physics.
The following two physical problems are the reasons for the origin of Fourier series:
i. Heat conduction in solid

ii. The motion of a vibrating string
Jean Baptiste Joseph Fourier (1768–1830) was the first physicist, mathematician, and
engineer who developed the concepts of Fourier analysis in dealing with the problems of
vibrations and heat transfer. He claimed that any continuous or discontinuous function of
t could also be expressed as a linear combination of cos(t) and sin(t) functions.
In the mathematical analysis, we do not usually get a full decomposition into the
simpler things, but an approximation of a complex system is usually achieved by a more
elementary system. When we truncate the Taylor series expansion of a function, we
approximate the function by using the polynomial.
The form of a Taylor series is as follows (infinite series):
∞
f (t) = ∑a t ,
n= 0
n
n
where a0, a1, a2, … are called the constant coefficients of the infinite series. A Taylor series
does not include terms with negative powers. The quality of the approximation depends
on the number of terms taken under consideration. Of course, for a function to have a
Taylor series, it must (among other things) be infinitely differentiable in some interval, and
this is a very restrictive condition.
The Fourier series, which is a sum of sines and cosines, can be used for the approxi-
mation of any periodic function. Sines and cosines serve as much more versatile “prime
elements” than powers of t. Sines and cosines can be used to approximate not only non-
analytic functions, but they even do a good job in the wilderness of the discontinuous
functions.
2.2 Periodic functions
A function satisfying the identity l(t) = l(t + T) for all t, where T > 0, is called periodic or
T-periodic as shown in Figure 2.1.
For a T-periodic function
l(t) = l(t + T ) = l(t + 2T ) = l(t + 3T ) =  = l(t + nT ).
y
T
Figure 2.1 A periodic function with T period.
Chapter two: Fourier series and its applications in engineering 37
y
1
x
–3 –2 –1 1 2 3
–1
Figure 2.2 A periodic function with T period.
Here, nT is also a period for any integer n > 0, and T is called a fundamental period. Any
interval of length T is the same as the definite integral of a T-periodic function. The follow-
ing example uses this property to integrate a 2-periodic function as shown in Figure 2.2.
Example 2.1: Let there exist a 2-periodic function f and I be a positive integer.
N
If f(x) = −x + 1 on the interval 0 ≤ x ≤ 2 , compute

∫f
−N
2
( x) dx.
Solution:
I I+2 I

∫
−I
f 2 ( x) dx =
∫
−I
f 2 ( x) dx +  +
∫f
I−2
2
( x) dx
{ }
I I 2 2
1
∫ f 2 ( x) dx = I
∫ ∫
f 2 ( x) dx = I (− x + 1)2 dx = I − (− x + 1)3
3

−I I−2 0 0
I
I 2
∫f 2
( x) dx = − [−1 − 1] = I .
3 3

−I
The most important periodic functions are those in the 2π-period of the trigonometric
system
1, cos t , cos 2t , cos 3t ,  , cos mt , 
sin t , sin 2t , sin 3t ,  , sin mt , 
2.3 Orthogonality of sine and cosine functions

Two functions f and g are orthogonal over the interval [a, b], if
b

∫ f (x)g(x) dx = 0.
a
Examples of orthogonal functions:
∫ cos mx cos nx dx = 0 for m ≠ n

−π
= π for m = n
∫ sin mx sin nx dx = 0 for m ≠ n

−π
= π for m = n
π

∫ cos mx sin nx dx = 0 for all m and n.
−π
Certain sequences of sin nt and cos nt functions are orthogonal on certain intervals. The
resulting expansions,
∞
f = ∑cϕ
i=1
i i
using the sin nt and cos nt become the Fourier series expansions of the function f.
First, we just consider the functions φ n(t) = cos nt. These are orthogonal on the interval
0 < t < π.
Example 2.2: The functions φ 0(x) = 1, φ1(t) = cos t, φ 2(t) = cos 2t, φ 3(t) = cos 3t, …, φn(t) =
π
cos nt, … are orthogonal on the interval 0 < t < π. Furthermore, |φ 0|2 = π and |φn|2 =
for n = 1, 2, …. 2
π
Proof: Using (ϕ n (t), ϕ m (t)) =

∫ cos(nt)cos(mt) dt
0
π
 1 1 
=
∫  2 cos (n + m) t + 2 cos(n − m)t  dt
0

π
 1 1 
= sin ( n + m) t + sin ( n − m) t  = 0,
 2 ( n + m) 2 ( n − m) 0
so, the φn are orthogonal.

The fact that |φ 0|2 = π is an easy verification:
π π
1
|ϕ n | = 2
∫ cos (nt) dt = ∫ 2 [1 + cos 2nt] dt
0
2
0

π
1 1  π
= t+ sin 2 nt  = .
2  2 n 0 2
Next, we just consider the functions ψ n(t) = sin nt. These are also orthogonal on
the interval 0 < t < π. The resulting expansion is called as the Fourier sine series
expansion of f.
Example 2.3: The functions ψ 1(t) = sin t, ψ 2(t) = sin 2t, ψ 3(t) = sin 3t, …, ψn(t) = sin nt, … are
π
orthogonal on the interval 0 < t < π. Furthermore, |ψn|2 = for n = 1, 2, …
2
Proof:
π π
1 1 
( y n (t), y m (t)) =
∫
0
sin(nt)sin(mt) dt =
∫  2 cos (n + m) t − 2 cos (n − m) t  dt
0

π
 1 1 
= sin ( n + m) t + sin ( n − m) t  = 0,
 2 ( n + m ) 2 ( n − m ) 0
so, the ψn are orthogonal.

π π
1
|ψ n | = 2
∫ sin (nt) dt = ∫ 2 [1 − cos 2nt] dt
0
2
0

π
1 1  π
=  t− sin 2 nt  = .
2  2n 0 2
Finally, we consider the functions φn(t) = cos nt and ψn(t) = sin nt. These are orthogonal
on the interval −π < t < π.
Example 2.4: The functions φ 0(t) = 1, φ1(t) = cos t, φ 2(t) = cos 2t, φ 3(t) = cos 3t, …, φn(t) =
cos nt, … and ψ 1(t) = sin t, ψ 2(t) = sin 2t, ψ 3(t) = sin 3t, …, ψn(t) = sin nt, … are orthogonal
on the interval −π < t < π. Furthermore, |φ 0|2 = 2π and |φn|2 = |ψn|2 = π for n = 1, 2, …
Proof: The fact that

(ϕ n (t), ϕ m (t)) = 0

and (ψ n (t),ψ m (t)) = 0
is shown in Examples 2.2 and 2.3. For (φn(t), ψm(t)), the third identity is used:
(ϕ n (t),ψ m (t)) =

∫ cos (nt) sin (mt) dt
−π
π
1 1 
=
∫  2 sin (m + n) t − 2 sin (m − n) t  dt

−π
π
 1 1 
= − cos ( n + m) t + cos ( n − m) t  = 0.
 2 ( n + m ) 2 ( m − n )  −π
Then |φ 0|2 = 2π is an easy verification and |φn|2 = |ψn|2 = π is shown in the same way
(see Examples 2.2 and 2.3).
2.4 Fourier series
Fourier series are special representation of the functions (signals) of the form
∞
f ( x ) = a0 + ∑ (a cos(nx) + b sin(nx)),
n= 1
n n
where a0, a1, a2, …, b1, b2, … are Fourier coefficients.

The coefficient a0 is determined by integrating both sides over the interval [−π, π]:
π π ∞ π

∫ f (x) dx = ∫ a dx + ∑ ∫ (a cos(nx) + b sin(nx)) dx.
−π −π
0
n= 1 −π
n n
π π

Since,
−π
∫ cos nx dx = ∫ sin nx dx = 0 for n = 1, 2, …
−π
π π π
1
∫ f (x) dx = ∫ a0 dx = 2π a0 ⇒ a0 =
2π ∫ f (x) dx.

−π −π −π
The coefficient an is determined by multiplying both sides with cos mx and integrating the
resulting equation over the interval [−π, π]:
π π ∞ π
∫ f ( x)cos(mx) dx =
∫ a0 cos(mx) dx + ∑ ∫ a cos(nx)cos(mx) dx
n= 1 −π
n
−π −π

∞ π
+ ∑ ∫ b sin(nx)cos(mx) dx.
n= 1 −π
n
π π π
Since
∫ cos mx dx = 0, ∫ cos mx sin nx dx = 0 for all m and ∫ cos mx cos nx dx = 0 for m≠n:
−π −π −π
π π
∫
−π
f ( x)cos(mx) dx = an (cos nx)2 dx = π an
∫
−π

π 2π
1 1
an =
π ∫
−π
f ( x)cos(mx) dx =
π ∫ f (x)cos(mx) dx.
0
Similarly, the coefficient bn is determined by multiplying both sides with sin mx and
i ntegrating the resulting equation over the interval [−π, π]:
π 2π
1 1
bn =
π ∫ f ( x)sin(mx) dx =
π ∫ f (x)sin(mx) dx .

−π 0
Trigonometric Fourier series associated with f is

n
sn ( f ; x ) =
a0
2
+ ∑ ( a coskx + b sinkx ) ,
k k ∀n ≥ 1 with s0 ( f ; x ) =
a0
2
,

k =1
denotes the (n + 1)th partial sums, called trigonometric polynomials of degree (or order) n,
of the Fourier series of f. The conjugate Fourier series of the series of f is defined by
∞ ∞

∑ (b cosnx − a sinnx) = ∑v ,
n= 1
n n
n= 0
n
and its nth partial sum is given by

n
sn ( f ; x ) = ∑ (b coskx − a sinkx) , ∀n ≥ 1 and s ( f ; x) = 0,

k =1
k k n
π π
1 1
where ak =
π ∫
−π
f ( x)cos kx dx , k = 0,1, 2,, and bk =
π
−π
∫ f (x)sin kx dx, k = 1, 2, 3, are
∞
called the Fourier coefficients of f. The sequence of partial sums of series ∑ uk ( x), given by
k=0
a n
sn ( f ; x) = 0 + ∑ ( ak cos kx + bk sin kx), is a trigonometric polynomial of order n.
2 k =1
2.5 Dirichlet’s theorem
The Fourier series of a piecewise smooth integrable function f converges at each point x to
f (x + ) + f (x − )
.
2

Hence, the Fourier series converges to f(x) at points of continuity and to the average of the
limiting values at a jump discontinuity.
2.6 Riemann–Lebesgue lemma
Fourier coefficient an and bn of any function tends to zero as n tends to infinity—that is,
π
1
lim
n →∞ π ∫ f (x)cos kx dx = 0

−π
and
π
1
lim
n →∞ π ∫ f (x)sin kx dx = 0.

−π
Validation of the asymptotic approximations for integrals can be done by the Riemann–
Lebesgue lemma. The method of steepest descent (rigorous treatments) and stationary
phase method are based on the Riemann–Lebesgue lemma.
2.7 Term-wise differentiation
A continuous, piecewise smooth 2π-periodic function is f on all of R with Fourier series
a0
+
2 n∈N ∑
an cos(nt) + ∑
bn sin(nt) .

n ∈N
If f´ is piecewise smooth, then the series can be differentiated term by term to yield the
following point-wise convergent series at every point t:
f ′(t + ) + f ′(t − )
2
= ∑ (nb cos(nt) − na sin(nt)).
n n

n ∈N
2.8 Convergence of Fourier series

The smoothness of the integrable function f is represented as the convergence of a Fourier
series. If f is continuous and piecewise smooth, series converges uniformly and absolutely
to f and otherwise, its Fourier series may diverge. The Fourier coefficients an and bn of a
continuous and k-times differentiable function approach 0 faster than 1/nk.
Example 2.5: Let f (x) = x be 2π -periodic function on [−π, π], and its second partial sum is
S2 = − sin(2 x) + 2 sin( x).
Figure 2.3 is a graph comparing the approximation to f; the Fourier series.

For the approximations of a function, trigonometric Fourier series expression is very
useful due to infinitely differentiable and term-wise integrability. Because of infinite
differentiability and fewer limitations, Fourier series are more applicable than Taylor
series.
In approximating the function f, the Taylor series expansion must be centered at a
certain point with the converging point x in a neighborhood of a certain radius around
that point. Note that the Fourier series does not have this kind of limitation (centered
at any specific point). Furthermore, differentiability of the function is a necessary
condition to have a Taylor series expansion, whereas a merely integrable function can
be approximated by the Fourier series. Some concepts like “small order” and “big ‘oh’”
are required to understand the convergence of the Fourier series, because a divergent
Fourier series will be of no use when approximating a function.
3
Function
2 f
1
Fourier
series
–3 –2 –1 1 2 3
–1
–2
–3
Figure 2.3 f (x) vs S2(f (x)).
2.9 Small order
Function g(n) is of a smaller order than function h(n), or g(n) approaches to 0 faster than
h(n), if
g(n)
lim = 0,
n →∞ h(n)

and we write g = o(h).
Example 2.6: Fourier coefficients of a piecewise smooth continuous periodic functions

are of smaller order than 1/n.
2.10 Big “oh” for functions

For real-valued functions f and g, if
f (t)
is bounded as t → a,
g(t)

where a could be ±∞, then f = O(g), or f is at most of order g.
Example 2.7: sin t = O (1) as t → a for any a.
2.11 Fourier analysis and Fourier transform
1. Fourier analysis is the study of the Fourier transform, Fourier series, and related
concepts.
2. The Fourier transform, Fourier series, and several related concepts are just special
cases of constructions from representation theory (writing a conjugacy-invariant
function on a group as a linear combination of characters).
3 Fourier analysis is not just a special case of representation theory—not even close.
4. These might at first sound contradictory, but they really are not. Of course there is
some subtlety in “related concepts,” but that is not really the fundamental problem.
Consider the following similar set of true statements:
a. Number theory is the study of the integers (and related concepts).
b. The integers are just a special case of some construction from category theory (the
initial object in some category of rings).
c. The integers are also just a special case of some construction from group theory
(the endomorphism algebra of the free Abelian group on one generator).
d. The integers are also a special case from set theory, model theory, etc.
e. But number theory is not a special case of any of those fields.
Even worse, number theory is not even the only field of mathematics devoted to studying
the integers—much of combinatorics is as well.
The fundamental problem is not that number theorists bring in additional concepts
like number fields, Galois groups, and modular forms. They do, but the issue arises even
when working with purely elementary statements like the four-square theorem. What
does this mean in the case of Fourier analysis? One question you might study in Fourier
analysis is whether the Fourier transform exists from one space of functions (may be an
Lp space) to another. Now to define the Fourier transform on this space (i.e., to uniquely
characterize it) one just needs to know what it is on some dense set—the smooth com-
pactly supported functions probably work. The Fourier transform on smooth compactly
supported functions is not very hard to set up, and it is a special case of a construction
from representation theory, as well as being a special case of a construction from integral
calculus, and probably many other fields.
In some sense, because of this uniqueness property, everything about the Fourier
transform on R in all its incarnations is determined by just this restriction to smooth com-
pactly supported functions, which is almost a purely algebraic object as one needs very
little analysis to define it. Furthermore, the styles of argument and thought typical in rep-
resentation theory are not so helpful for analyzing about Lp norms. All of these concepts
could fall under the umbrella of Fourier analysis. In the Fourier series approximation, the
periodic functions are represented as a sum of simple waves of sine and cosine. Extension
of the Fourier series is Fourier transform. Fourier transform is used when the time period
of given function is lengthened and approach to infinity.
2.12 Fourier transform
Form of Fourier integral representation of f ( x ) is
∞ ∞
1
f ( x) =
2π ∫ ∫ f (t)e is( t − x )
dt ds

−∞ −∞
1
∞
∞ 
=
2π ∫e ∫
−∞
− isx
 f (t)e ist dt  ds.

−∞

∞
If F( s) =
∫ f (t)e
−∞
ist
dt ,
∞
1
then f ( x) =
2π ∫ F(s)e − isx
ds.

−∞
The function F(s) is called as the Fourier transform of the function f ( x), and the function
f ( x) itself is called as the inverse Fourier transform of F ( s ). Linearity, similarity theorem
(change of scale property), shifting, and modulation properties are followed by Fourier
transform. Fourier transform is applicable for dealing with the boundary value problems
occurred in mathematical, physical, and engineering sciences like in heat conduction,
vibration of strings, and so on. In two-dimensional problems, it is sometimes required to
apply the transform twice and the required solution is obtained by double inversion.
2.13 Gibbs phenomenon
J. Willard Gibbs, an American physicist, studied the peculiar manner of Fourier series.
He stated that near the discontinuity manifested, due to lack of development in the
approximations, there is a continual presence of the overshoot or undershoot. After his

name, this phenomenon is called as Gibbs phenomenon (ringing artifacts) in which the
Fourier series of a periodic function f containing a piecewise continuously differentiability
behaves at jump discontinuous points. Near the jump point, due to large oscillations of the
nth partial sum of the Fourier series of the function f, the value of maximum (minimum)
of the partial sum of the function f might increase above (below) that of the function itself.
There would be always a minimum error due to an overshoot or undershoot near the discon-
tinuous point even if the approximation is found by the maximum number of the terms of
the series (Fourier series). Hence, the value of the size of the overshoot is independent of the
number of terms used in the Fourier approximation. Value of the size of overshoot (or under-
shoot) always is about 9% of the value of the size of the jump at the discontinuous points.
The reason of the overshoot is to approximate a discontinuous function f with the help
of a partial (i.e., finite) sum of continuous functions (linear combination of the sine and
cosine functions). Any finite sum of continuous functions is always a continuous function;
therefore, it is not possible to approximate the discontinuity of the function within any
arbitrary accuracy. But, the sum of the infinite number of terms of continuous functions
can represent the discontinuous function. In this way, the exhibition of the Gibbs phenom-
enon can be ignored.
2.13.1 Gibbs phenomenon with an example

The purpose of this example is to deal with the following terms:
i. Function f(t)
ii. The sequence of partial sums Sn(t)
iii.
Nörlund mean Np of partial sums of a Fourier series
iv. The sequence of averages (i.e., σ 1n (t)-means or (C, 1) mean)
Let the Nörlund mean of partial sums of a Fourier series be denoted by Np. The behavior of
the Nörlund mean is better than the sequence of partial sums (t). Similarly the σ 1n (t)-means
or (C, 1) mean also behave better than the (t) for the following function:
 −1, −π ≤ t < 0
f (t) =  .
1, 0≤t≤π

For all real values of t,
f (t + 2π ) = f (t).
Trigonometric Fourier series is given by

∞
∑ 1 − (n−1)
n
2
sin nt , − π ≤ t ≤ π .
π

n= 1
The nth Cesàro sum (𝜎1(x)) for the trigonometric Fourier series is given by
n
k   1 − (−1)k 
σ n1 (t) =
2
π ∑ 
 1 − 
n   k  sin( kt) ,

k =1
where 𝛿 = 1.
For trigonometric Fourier series sn(t) denoted the nth partial sum which is given by
4 sin(3t) sin(5t) sin(nt) 

sn (t) =  sin(t) + + ++ .
π 3 5 n 

The (C, 1) method is weaker than Nörlund method Np if Np has increasing weights {pn}
(Theorem 20 of Hardy’s “Divergent Series”).
The Nörlund mean Np of the function f(t) for pn = n + 1 is given by
t ( f ; t) =
N
n
2
(n + 1)(n + 2) ∑( n − k + 1) s ( f ; t).
k

k =0
In the interval [−𝜋, 𝜋], one can observe that sn(t) converges to f(t), but its converging rate is
very slow. The converging rate of σ n1 (t) and tnN ( f ; t) toward f(t) is higher than the converging
rate of sn(t) toward f(t). Near the points of discontinuities (−𝜋, 0, and 𝜋), as n increases, the
peaks of s5 and s10 move closer to the line passing through points of discontinuity (Gibbs
phenomenon), but for n = 5, 10 the peaks of the graph of σ n1 (t) and tnN ( f ; t) go flatter. Hence,
an overshoot or undershoot of the peculiarity of the Fourier series and other series of the
eigen-functions at simple discontinuous points is a Gibbs phenomenon; that is, near the
point of discontinuity the converging rate of the trigonometric Fourier series is very slow.
For the case of the various summable means (by using various summability methods
for approximation) of the trigonometric Fourier series of the function f(t), overshoot or
undershoot the Gibbs phenomenon and one can observe that the effect of the summability
method is very smooth. Hence, tnN ( f ; t) and σ n1 (t) is the better approximant than (t). One can
observe that Np method is stronger than (C, 1) method.
The graph in Figure 2.4 implies that except the point t0 = 0 (point of discontinuity of
f(t)), the sequence converges to f(t). Gibbs focused on this point and around this point the
behavior of the Fourier partial sums.
In the continuous area (−π < t < 0 and 0 < t < π), the graph tends to look more like that
of the original one by increasing the number of Fourier coefficients. But the amplitude of
the wiggles remains constant near the discontinuous point (around the origin). Hence, the
(a) (b)
y y A
1.0 1.0 B
0.5 0.5 C
D
x x
–3 –2 –1 1 2 3 –3 –2 –1 1 2 3
0.5 0.5
1.0 1.0
Figure 2.4 For n = 5 and n = 10, f(t) (A), sn(t) (B), σ n1 (t) (C), and tnN (f; t) (D).
partial sums of the trigonometric Fourier series will not smoothly converge to the mean
value at the points of discontinuity.
2.13.2 Results related to Gibbs phenomenon

1. Gibbs phenomenon occurs for a function that is piecewise smooth with a jump
discontinuous point at zero.
2. Gibbs phenomenon occurs at each of jump discontinuities for a piecewise smooth
function with a finite number of discontinuous points.
3. Gibbs phenomenon occurs at all jumps discontinuous points of wavelet approxima-
tions of the function.
4. If a summation method (convolution of a positive kernel with the function f) is used
for approximation of a discontinuous function f, then Gibbs phenomenon does not
occur at the jump discontinuous points.
5. Gibbs phenomenon also does not occur in the case of approximation by wavelets that
involve positive delta sequences. This result also used to show that in case of Haar
wavelets, Gibbs phenomenon does not occur.
2.14 Trigonometric Fourier approximation

The computation of error function En(f), defined by En ( f ) = min Tn (t) − f (t) , n > 0 is
called the error estimation of function through trigonometric Fourier series using summa-
bility techniques. The trigonometric polynomial Tn(t) is known as the Fourier approximant
of f. This method is called the Fourier approximation method. The summability methods
are used to find the Fourier approximation of the function.
2.15 Summability
In 1890, Cesàro deals with the sum of some divergent series and defined Cesàro summation
(summability methods).
There exist three types of summability:
i. Ordinary summability
ii. Absolute summability
iii. Strong summability
2.15.1 Ordinary summability

Let ∑un be an infinite series of real numbers with sequence of partial sums {sn}. Let T ≡ (an,k)
be a real or complex constants infinite matrix and tn given by
n
tn = ∑a
k=0
s , n = 0,1, 2, ,
n, k k
defines the matrix transform of the sequence { sn }n = 1 . Here the column vector of the tn is the
∞
product of the matrix T with the column vector of the sn. The sequence {sn} or the series ∑un
is said to be matrix summable to s, if lim tn = s.
n →∞
2.15.2 Absolute summability

The series ∑ n∞= 0 an with partial sum sequence { sn } is absolute summable to s, i.e., if
lim n→∞ tn = s
and if {tn } (sequence of the mean) is of bounded variation, i.e.,
∑t n − tn − 1 < ∞.

n= 1
Absolute summability of index q: The infinite series ∑ ∞n = 0 an with the sequence of the
partial sum { sn } is absolute summable with the index q to s, i.e., if
tn → s, as n → ∞ ,
n
and ∑kk =1
q−1 q
tk − tk − 1 < ∞ , as n → ∞.
It is denoted by A, q .
2.15.3 Strong summability

The infinite series ∑ ∞n = 0 an with the sequence of the partial sum { sn } is strong summable
with index q to s, i.e., if
n
∑k q q
tk − tk − 1 = O(n), as n → ∞ ,

k =1
and tn → s, as n → ∞.
It is denoted by  A, q  .
The following inclusion relations hold,
A, q ⊂  A, q  ⊂ ( A ) .

2.16 Methods for summability
These are some summability methods:
1
i. (C, 1) means when an, k = , 0 ≤ k ≤ n.
n+1
1
ii. Harmonic means when an, k = , 0 ≤ k ≤ n.
(n − k + 1)log n
Enδ −−k1
iii. Cesàro (C , δ ) means when an, k = , 0 ≤ k ≤ n.
Enδ
pn − k
iv. Nörlund means when an, k = , 0 ≤ k ≤ n.
Pn
pk
v. Riesz means when an, k = , 0 ≤ k ≤ n.
Pn
pn − k q k
vi. General Nörlund ( N , p , q) means when an, k = ,
Rn
n
where Rn = ∑p q
k=0
k n− k .
vii. Deferred Cesàro means: Agnew defined the deferred Cesàro mean of the sequence
x = ( xk ) by
q( n)
(Dp , q )n =
1
∑
q(n) − p(n) k = p( n)+ 1
xk

q(n) < p(n) and lim q(n) = ∞,
n→∞
where q(n) and p(n) are positive natural numbers sequences.
2.17 Regularity condition
The summability matrix T is regular, if
lim sn = s ⇒ lim tn = s.
n →∞ n →∞
Toeplitz and Silverman (1913) obtained necessary and sufficient conditions for the regularity
of matrix T.
n
1. ∑ an, k ≤ M , where M (finite constant) is independent of n.

k=0
lim an, k = 0, ∀k .
2.
n →∞
n
lim ∑ an, k = 1.
3.
n →∞ k = 0
2.18 Norm
A function that assigns strictly the positive length or size in a vector space is known as the
norm.
A function p: V → R is a norm on V if it satisfies the following properties (∀a ∈ R and
u, v ∈ V):
i. p(a v) = |a| p(v).

ii. p(u + v) ≤ p(u) + p(v).
iii. If p(v) = 0 then v is the zero vector.
iv. p(v) ≥ 0 (non-negativity).
In the analysis of the Fourier series, importance of the Lp norm cannot be ignored as it is an
essential tool. The condition p → ∞ will give the value of essential upper bound of the Lp
norm and Lp behavior represents the Lipschitz behavior at p → ∞. Hence, by replacing the
power function with the help of more general classes of functions, the results of Fourier
series can be generalized.
L 0 -Norm: The first norm is a L0-norm. L0-norm of x is x 0 = 0

1.
number of nonzero elements in a vector. i
xi0 , that is a total ∑
2.
L1-Norm (Absolute value norm): It is the sum of absolute difference between two
vectors or matrices, x 1 = ∑
xi − xi − 1 .
∑ (x
i
L2-Norm (Euclidean norm): It is a sum of squared difference x1 − x2
3. 2
= 1i − x 2 i )2
i
and has wide applicability in the signal processing field for mean-squared error
(MSE) measurement.
Lp -Norm: It is given by x1 − x2
4. p
= p
∑ (xi
1i − x2i )p , 1 ≤ p < ∞.
L ∞-Norm: It is the maximum entries’ magnitude of the vector x ∞ = max ( xi ) .

5.
2.19 Modulus of continuity
The modulus of continuity ω(f, δ) of a continuous function f in [a, b] is defined by
ω ( f , δ ) = sup { f ( x) − f ( y ) , x , y ∈[ a, b]}.

y − x ≤δ
Let f ∈ Lp[a, b], p ≥ 1, then a function ω p(f, δ) is called the integral modulus of continuity
and defined by
1
 b  p
∫
p
ω p ( f , δ ) = sup  f ( x + t) − f ( x) dx  .

0 < t ≤δ  
a
2.20 Lipschitz condition
Let f(x) be defined on an interval I and suppose we can find two positive constants M and
α such that
α
f (x1 ) − f (x2 ) ≤ M x1 − x2

for all x1 , x2 ∈ I . Then f is said to satisfy a Lipschitz condition of order α.
2.21 Various Lipschitz classes

Lip α: A function f ∈ Lip α, for 0 < α ≤ 1, if f ( x + t) − f ( x) ≤ O(tα ).
Lip (α, p): A function f ∈ Lip (α, p), for p ≥ 1, 0 < α ≤ 1, if
1/ p
 b 
∫
p
 f ( x + t) − f ( x) dx  ≤ O(tα ).

 a 
Lip (ξ(t), p): For a positive increasing function ξ(t) and an integer p ≥ 1, a signal f ∈ Lip
(ξ(t), p), if
1/ p
 b 
∫
p
 f ( x + t) − f ( x) dx  ≤ O(ξ (t)).

 a 
W(L p, ξ(t)): For a given positive increasing function ξ(t), an integer p ≥ 1 and β ≥ 0, f belongs
to weighted class W(Lp, ξ(t)), if
1
 b  p
∫ { f (x + t) − f (x)} sin x dx  = O(ξ (t)).
β p


 a 
Note: If β = 0, W(Lp, ξ(t)) coincides with the class Lip (ξ(t), p); if ξ(t) = tα, Lip (ξ(t), p) reduces to
Lip (α, p) and if p → ∞, then Lip (α, p) reduces to Lip α:
Lip α ⊆ Lip (α , p) ⊆ Lip (ξ (t), p) ⊆ W (Lp , ξ (t)).
L p-space: The set of Lebesgue integrable functions f: E[a, b] → R, such that

b
∫ f ( x)
p
dx ≤ ∞ , for 1 ≤ p ≤ ∞ is denoted by LP or LP. If E = [a, b] is the interval of finite
a
length, then we write Lp [a, b]. The Lp (E)-space (p ≥ 1) is a Banach space under the norms
defined by
1
 b  p
f p ∫
p
=  f ( x) dx  , for 1 ≤ p < ∞ and f ∞ { }
= sup f ( x) : x ∈[ a , b] .

 a 
2.22 Degree of approximation
A major portion of the study of theory of signals (functions) is concerned with the con-
nections between the structural properties of a function and its degree of approximation.
The objective is to relate the smoothness (by trigonometric Fourier approximation) of the
function to the rate of decrement in the degree of approximation to zero. This chapter
is to discuss the trigonometric approximation of the function (signal) and the concept
needed to find the approximation degree using Fourier series. Trigonometric approxima-
tion is the most classical setting where the results are the most penetrating and satisfying.
One of the basic problems in the theory of Fourier series is to examine the approximation
degree using certain methods. In this sense, one of the important results is encountered.
Quade [1] solved a problem related with approximation by trigonometric polynomial by
using Nörlund summability in Lp norm.
Theorem 2.1 [1]: Let f ∈ Lip(α , p), 0 < α ≤ 1. Then
f − σ n ( f ) p = O(n−α )

for either
i. p > 1 and 0 < α ≤ 1 or

ii. p = 1 and 0 < α < 1.
And if p = α = 1, then
f − σ n ( f ) 1 = O(n−1 log(n + 1)).

Chandra [2] improved the result [1] and proved the following:
Theorem 2.2 [2]: Let f ∈ Lip(α , p) and let ( pn ) be positive such that
(n + 1)pn = O( Pn ).
If either
i. p > 1 and 0 < α ≤ 1 and

ii. ( pn ) is monotonic
or
i. p = 1 and 0 < α < 1 and

ii. ( pn ) is nondecreasing.
Then
f − N n ( f ) p = O(n−α ).

These are very important basic results of the degree of approximation and are a
motivation for researchers working in this area. After this, several mathematicians
studied the degree of approximation by using different summability techniques of
a signal that belongs to various classes like Chandra [3,4], Khan [5,6], Mursaleen and
Mohiuddine [7], Mishra and Mishra [8], Chen and Hong [9], Mishra et al. [10–12], Chen
and Jeng [13], Mishra [14,15], and Alexits [16]. Bor [17–21] gave a number of t heorems
dealing with summability factors of the series and provided many applications.
Recently, Sonker and Munjal [22–29] gave a number of theorems exploring the appli-
cations of summability and absolute summability of the Fourier and infinite series.
Many engineering problems can be solved using the summability methods. (C, 1)
and (C, 2) can be used for increasing the rate of convergences of Gibbs phenomenon.
For getting the information of the system and any process, analysis of signals or time
functions can be done, and it is of great importance. Psarakis and Moustakides [30]
presented a method for designing the finite impulse response (FIR) digital filters.
2.23 Fourier series and music

In mathematical physics, the wave equations can be represented in the form of the trigono-
metric Fourier series. Due to this applicability, the synthesis and analysis of the music can
be understood easily. The human eardrums vibrate due to the presence of the variations in
air and they hear a sound. Vibration due to music is created by the following cases:
i. The piano string is struck.

ii. A guitar string is plucked.
iii. The bow is drawn across a violin string, etc.
This process transmits the vibrations of the music to the air and amplified the vibrations
of the air. The human eardrums feel the air pressure fluctuations and the human brain
converts them into electrical signals.
(a) (b)
t t
Figure 2.5 Waveforms (a) flute and (b) violin.
For the study of the two different music instruments (a) flute and (b) violin, the graphs
are plotted in Figure 2.5. For the sustained note D (294 vibrations per second), the graphs
of waveforms show the difference between flute and violin. The flute waveform is simpler
than that of the violin.
Fourier series approximation of this music is expressed as
πt πt 2π t  2π t 
P(t) = a0 + a1 cos   + b1 sin   + a2 cos  + b sin  +
 L  L  L  2  L 

The sum of simple pure sounds is used for the expression of the Fourier coefficients, which
have different values corresponding to the different musical instruments.
The nth term is called nth harmonic of P,
nπ t  nπ t 
an cos  + b sin  .
 L  n  L 

Amplitude is
An = an2 + bn2

and the energy of the nth harmonic is its square,
An2 = an2 + bn2 .
It can be observed again that flute waveform is quite simple in comparison to the violin
waveform. In violin, the highest harmonics are very strong but the energy of the flute
keeps decreasing very fast.
Hence, trigonometric Fourier series is very useful in expressing the sounds of musi-
cal instruments. Complex musical sounds can be made of a combination of various pure
sounds.
(a) (b)
A2n A2n
0 2 4 6 8 10 n 0 2 4 6 8 10 n
Figure 2.6 Energy speºctra (a) flute and (b) violin.
2.24 Applications and significant uses

1. Approximation Theory: With the help of the summability methods, Fourier approx-
imation of the given function can be found.
2. Signal Processing: Fourier analysis has significant uses for the signal processing. We
use Fourier series to write a signal in the form of a trigonometric polynomial.
3. Partial Differential Equation: By using the variable separable method, the partial
differential equations of the higher order can be solved. But for better understand-
ing of behavior of the solution of the differential equation, Fourier series and error
of Fourier approximation is used. With the help of Fourier series a prediction can be
taken about the dynamic nature of the solution.
4. Control Theory: The nonharmonic Fourier series is used to deal with control theory.
5. In Signal Processing (Speech Processing, Digital Image Processing and Audio
Signal Processing): Fourier approximation has a number of direct applications in
rectification of signals in FIR filter and infinite impulse response (IIR) filter. These
filters have numerous applications in image processing, speech processing, and
medical signal processing, communications, sonar, radar, etc.
6. Medical Signal Processing: It is very difficult to deal with a large number of long
sequences as DNA sequences. Multiple alignments by fast Fourier transform (MAFFT)
is the fastest and most accurate program for the “lining up” of DNA sequences to
reveal mutations, additions, and deletions between them in computational biology/
bioinformatics. Ramanujan-Fourier series is used for comparative analysis of DNA
sequences.
7. Digital Image Processing: For the reconstitution of a picture, approximation theory
is used. It is also applicable for removing speckles from photographs with the help
of the filter (summability methods). For the separation of the speckles and photo, the
highest-frequency components can be dropped. Fourier approximation has a great
mathematical importance and significance for L2 and L ∞ of the signals. In Ref. [31],
importance of the filter in Lp space is explained for the theory of signals. It is also use-
ful for investigating the perturbations of matrix valued functions and bounds of the
lattice norms.
8. Data Analysis: A necessary and sufficient condition for a system to be bounded
input bounded output (BIBO) stable is that the impulse response be absolutely
summable, i.e.,
∞
BIBO stable ⇔ ∑ h ( n ) < ∞.

n = −∞
Summability techniques are trained to minimize the error. With the use of summa-
bility technique, the output of the signals (found by Fourier approximation) can be
made stable, bounded, and used to predict the behavior of the input data, the initial
situation, and the changes in the complete process.
References
1. E. S. Quade, “Trigonometric approximation in mean,” Duke Mathematical Journal, vol. 3,
pp. 529–542, 1937.
2. P. Chandra, “Trigonometric approximation of functions in Lp-norm,” Journal of Mathematical
Analysis and Applications, vol. 275, pp. 13–26, 2002.
3. P. Chandra, “On the degree of approximation of a class of functions by means of Fourier

series,” Acta Mathematica Hungarica, vol. 52, no. 3–4, pp. 199–205, 1988.
4. P. Chandra, “A note on the degree of approximation of continuous functions,” Acta Mathematica
Hungarica, vol. 62, no. 1–2, pp. 21–23, 1993.
5. H. H. Khan, “On the degree of approximation of a functions belonging to the class Lip(𝛼, p),”
Indian Journal of Pure and Applied Mathematics, vol. 5, pp. 132–136, 1974.
6. H. H. Khan, Approximation of Classes of Functions [Ph.D. thesis], AMU, Aligarh, India, 1974.
7. M. Mursaleen and S. A. Mohiuddine, Convergence Methods for Double Sequences and Applications,
Springer, ISBN: 978-81-322-1611-7, 2014. doi: 10.13140/2.1.3899.5526.
8. V. N. Mishra and L. N. Mishra, “Trigonometric approximation in L(p ≥ 1)spaces,” International
Journal of Contemporary Mathematical Sciences, vol. 7, pp. 909–918, 2012.
9. J. T. Chen and H.-K. Hong, “Review of dual boundary element methods with emphasis
on hyper singular integrals and divergent series,” Applied Mechanics Reviews, vol. 52, no. 1,
pp. 17–32, 1999.
10. V. N. Mishra, K. Khatri, and L. N. Mishra, “Product (N, pn) (C, 1) summability of a sequence of
Fourier coefficients,” Mathematical Sciences, vol. 6, p. 38, 2012.
11. V. N. Mishra, K. Khatri, and L. N. Mishra, “Product summability transform of conjugate
series of Fourier series,” International Journal of Mathematics and Mathematical Sciences, vol. 2012,
Article ID 298923, 13 pages, 2012.
12. V. N. Mishra, H. H. Khan, K. Khatri, and L. N. Mishra, “On approximation of conjugate of
signals (Functions) belonging to the generalized weighted W(Lr, 𝜉(t)), (r ≥ 1)-class by product
summability means of conjugate series of Fourier series,” International Journal of Mathematical
Analysis, vol. 6, no. 35, pp. 1703–1715, 2012.
13. J. T. Chen and Y. S. Jeng, “Dual series representation and its applications to a string subjected
to support motions,” Advances in Engineering Software, vol. 27, no. 3, pp. 227–238, 1996.
14. V. N. Mishra, “On the degree of approximation of signals (Functions) belonging to the
weighted (Lp, (t)), (p ≥ 1)-class by almost matrix summability method of its conjugate Fourier
series,” International Journal of Applied Mathematics and Mechanics, vol. 5, no. 7, pp. 16–27, 2009.
15. V. N. Mishra, “On the degree of approximation of signals (functions) belonging to general-
ized weighted (Lp, (t)), (p ≥ 1)-class by product summability method,” Journal of International
Academy of Physical Sciences, vol. 14, no. 4, pp. 413–423, 2010.
16. G. Alexits, Convergence Problems of Orthogonal Series, Pergamon Elmsford, New York, 1961.
17. H. Bor, “Factors for generalized absolute Cesàro summability,” Mathematical and Computer
Modelling, vol. 53, no. 5, pp. 1150–1153, 2011.
18. H. Bor, “An application of almost increasing sequences,” Applied Mathematics Letters, vol. 24,
no. 3, pp. 298–301, 2011.
19. H. Bor, “Generalized absolute Cesàro summability factors,” Bulletin of Mathematical Analysis
and Applications, vol. 8, no. 1, pp. 6–10, 2016.
20. H. Bor, “On absolute summability factors,” Proceedings of the American Mathematical Society,
vol. 118, no. 1, pp. 71–75, 1986.
21. H. Bor, “Almost increasing sequences and their new applications II,” Filomat, vol. 28, no. 3,
pp. 435–439, 2014.
22. S. Sonker and A. Munjal, “Sufficient conditions for triple matrices to be bounded,” Nonlinear
Studies, vol. 23, no. 4, pp. 533–542, 2016.
23. S. Sonker and A. Munjal, “Absolute summability factor φ − |C, 1; δ|k of infinite series,”
International Journal of Mathematical Analysis, vol. 10, no. 23, pp. 1129–1136, 2016.
24. S. Sonker and A. Munjal, “Approximation of the function f belong to Lip (α, p) using infinite
matrices of Cesàro sub-method,” Nonlinear Studies, vol. 24, no. 1, pp. 113–125, 2017.
25. S. Sonker and A. Munjal, “A note on boundness conditions of absolute summability φ − |A|k
factors,” International Conference on Advances in Science and Technology, vol. 67, pp. 208–210, 2017.
Proceedings ICAST-2017 Type A, ISBN: 9789386171429.
26. S. Sonker and A. Munjal, “Absolute summability ϕ − C, α , β ; δ k of infinite series,” Journal of
Inequalities and Applications, vol. 168, pp. 1–7, 2017. doi: 10.1186/s13660-017-1445-5.
27. S. Sonker and A. Munjal, “Absolute summability factor N, p n k of improper integrals,”
International Journal of Engineering and Technology, vol. 9, no. 3S, pp. 457–462, 2017.
28. S. Sonker and A. Munjal, “Absolute Nörlund summability |N; pn|k of improper integrals,”
National Conference on Recent Advances in Mechanical Engineering (NCRAME-2017), vol. II, no. 90,
pp. 413–415, ISBN: 978-93-86256-89-8, 2017.
29. S. Sonker, Xh. Z. Krasniqi, and A. Munjal, “A note on absolute Cesàro ϕ − C, 1; δ ; l k summabil-
ity factor,” International Journal of Analysis and Applications, vol. 15, no. 1, pp. 108–113, 2017.
30. E. Z. Psarakis and G. V. Moustakides, “An L2-based method for the design of 1-D zero phase
FIR digital filters,” IEEE Transactions on Circuits and Systems I: Fundamental Theory Applications,
vol. 44, no. 7, pp. 591–601, 1997.
31. M. I. Gil’, “Estimates for entries of matrix valued functions of infinite matrices,” Mathematical
Physics Analysis and Geometry, vol. 11, no. 2, pp. 175–186, 2008.
chapter three
Soft computing techniques

and applications
Pankaj Kumar Srivastava and Dinesh Bisht
Mangey Ram
Graphic Era (Deemed to be University)
Contents
3.1 I ntroduction: Soft computing.............................................................................................. 58
3.2 Fuzzy logic............................................................................................................................. 58
3.2.1 Evolution of fuzzy logic........................................................................................... 59
3.3 Fuzzy sets............................................................................................................................... 59
3.3.1 Equal fuzzy sets........................................................................................................ 59
3.3.2 Membership function............................................................................................... 60
3.3.2.1 Z-Shaped membership function.............................................................. 60
3.3.2.2 Triangular membership function............................................................ 60
3.3.2.3 Trapezoidal membership function.......................................................... 60
3.3.2.4 Gaussian membership function............................................................... 60
3.4 Fuzzy rule base system........................................................................................................ 61
3.5 Fuzzy defuzzification........................................................................................................... 61
3.5.1 Center of area (CoA) method.................................................................................. 61
3.5.2 Max-membership function...................................................................................... 61
3.5.3 Weighted average method....................................................................................... 61
3.5.4 Mean-max method................................................................................................... 62
3.5.5 Center of sums........................................................................................................... 62
3.6 Comparison of crisp to fuzzy............................................................................................. 62
3.7 Examples of uses of fuzzy logic..........................................................................................63
3.8 Artificial neural networks...................................................................................................63
3.8.1 Artificial neurons......................................................................................................64
3.8.2 Firing rule..................................................................................................................64
3.8.3 Different types of neural networks........................................................................64
3.8.3.1 Feedback ANN...........................................................................................65
3.8.3.2 Feed-forward ANN...................................................................................65
3.8.3.3 Classification-prediction ANN................................................................65
3.9 Training of neural networks...............................................................................................65
3.9.1 Supervised training..................................................................................................65
3.9.2 Unsupervised training.............................................................................................65
3.9.3 Reinforced training.................................................................................................. 66
57
3.10 Adaptive neuro fuzzy inference system........................................................................... 66

3.11 Genetic algorithms............................................................................................................... 66
3.12 Working of genetic algorithm............................................................................................. 66
3.13 Applications of soft computing.......................................................................................... 67
References........................................................................................................................................ 68
3.1 Introduction: Soft computing

In this highly incorporated era, we need a tool that can be applicable to the problems aris-
ing in different fields. In this sense soft computing is an emerging computing approach
for upcoming years. Soft computing gives the flexibility to model the problem according
to given constraints. It helps to find quick solutions to the problems arising in various dis-
ciplines. These methods mimic human behavior. The roots of soft computing are found in
fuzzy logic, data analysis, and intelligent systems. The main objective of soft computing is
to develop intelligent machines to provide solutions to real-world problems, which are not
modeled or too difficult to model mathematically.
Its goal is to utilize the capacity for imprecision, uncertainty, approximation, and par-
tial truth to achieve resemblance with decision-making of human like. The development
history of soft computing can be described as follows:
1943: Neural Network by McCulloch and Pitts

1960: Evolutionary Computing by Rechenberg
1962: Evolutionary Programming by Fogel
1965: Evolutionary Strategies by Rechenberg
1965: Fuzzy logic by Zadeh
1970: Genetic Algorithms by Holland
1981: Soft Computing by Zadeh
1992: Genetic Programming by Koza
3.2 Fuzzy logic
Life is full of uncertainties; it can be vague or imprecise. To deal with such uncertainties
probability theory used to be a tool for mathematicians, which is based on classical set
theory. Zadeh, in 1965 [1], challenged that there are some uncertainties which are out of
the scope of probability theory. For example a company owner needs an honest person
for his company. Now there are available choices that can be extremely honest, very hon-
est, honest some of the time, and dishonest; which cannot be defined using classical logic
because in this logic there are only two choices—honest and dishonest. Zadeh named this
new concept as fuzzy set theory based on membership functions. Classical set theory is
about yes or no concepts, whereas fuzzy set theory includes gray part also. Fuzzy set the-
ory deals with appropriate reasoning in linguistic terms. Logic that deals mathematically
with imprecise information usually employed by humans is fuzzy logic. A multivalued
logic extends Boolean logic usually employed in computer science. Fuzzy logic is based
on the concept of logic having multidimensions that provide intermediate values to be
defined between conventional opposite evaluations like true or false, high or low, heat or
cold, etc. Fuzzy concept, introduced by Zadeh in 1960, resembles uncertainty to generate
decisions by human reasoning [1–3]. Fuzzy logic is a variety of multivalued logic which is
consequent of fuzzy set theory. This logic deals with interpretation of those that are near
rather than strict fuzzy logic system works with vague concepts as well. In fuzzy logic,
Chapter three: Soft computing techniques and applications 59
the membership of precision of a statement ranges between 0 and 1, while in classical it is

0 or 1. Plato was the first person who believed that there was something beyond true and
false. But it was Lukasiewicz who proposed an organized option to the bi-valued logic
of Aristotle [4–7]. The existence of Greeks is still a standard example of fuzziness and
precision at the same time [8].
Many decisions are based on beliefs. Occasionally, beliefs concerning uncertain events
are expressed in numerical form as odds or subjective probabilities. These beliefs are usu-
ally expressed in statements such as
“I think that… ”
“chances are… ”
“it is unlikely that… ”
and so forth.
The fuzzy expression contains a fuzzy proposition with its truth value in the interval [0,1].
It represents a mapping from [0,1] to [0,1] such as
g : [0, 1] → [0, 1].
The generalization of the domain into n dimension converts it as
G : [0, 1]n → [0, 1].
In view of this we may define it as a logical expression satisfying the following:
i. Truth values 0 and 1 and variable xi ∈[0,1], i = 1, 2, 3,…, n are fuzzy.

ii. If u is a fuzzy expression, then u is also fuzzy.
iii. If u and v are fuzzy expression, then u ∧ v and u ∨ v are fuzzy.
3.2.1 Evolution of fuzzy logic

1965: From crisp sets to fuzzy sets
1973: From fuzzy sets to granulated fuzzy sets (linguistic variable)
1999: From measurements to perceptions
3.3 Fuzzy sets
Fuzzy set F(m) is represented by a pair of two components: first is the member m and the
second is its membership grade µF (m) which maps any element m of universe of discourse
M to the membership space [0,1], as given below:
F(m) = {(m, µ F (m)), m ∈ M. (3.1)
3.3.1 Equal fuzzy sets

Two fuzzy sets F1 and F2 are said to be equal if all the members of F1 belong to F2 with the
same membership grade as in F1.
3.3.2 Membership function
A function that describes the membership grades of elements in a fuzzy set is said to be
a membership function. A membership function can be discrete or continuous. It needs a
uniform membership function representation for efficiency. Some well-known member-
ship functions are as discussed in the following sections.
3.3.2.1 Z-Shaped membership function

Z-Shaped membership function is given by
 2
  x− p
1− 2 ; if p < x ≥ ( p + q)/2
  q − p 

  x− p
2
Z( x ; p , q) =  2 ; if ( p + q)/2 < x ≤ q (3.2)
 q − p 



 1; if x ≤ p
 0; otherwise

3.3.2.2 Triangular membership function

Triangular membership function is given by
 x−p
 ; if p < x ≥ q
 q− p
 r−x
T (x ; p, q, r) =  ; if q < x ≤ r (3.3)
r−q


 0; otherwise


3.3.2.3 Trapezoidal membership function

Trapezoidal membership function is represented by
 x−p
 ; if p < x ≥ q
 q− p
 1; if q < x ≤ r
T ( x ; p , q , r , s) =  (3.4)
s− x

 ; if r < x ≤ s
 s−r

 0; otherwise
3.3.2.4 Gaussian membership function

Representation of the Gaussian membership function is as follows:
1
G( x ; σ , m) = 1  x−m 2 . (3.5)

 
2 σ 
e
3.4 Fuzzy rule base system

To understand the fuzzy rule base system, let us take a statement, “If there is high traffic
jam and heavy rain then I may get a little late.” Here high, heavy and little are fuzzy sets
related to variables traffic jam, rainfall, and late, respectively. Mathematically it can be
represented as follows: IF (x is F1 and y is F2) THEN (z is F3), where F1, F2, and F3 are fuzzy
sets, and x, y, and z are variables. A collection of all such rule for a particular system is
known as rule base [9].
3.5 Fuzzy defuzzification
It is quite difficult to take decision on the bases of fuzzy output; in that case this fuzzy
output is converted into crisp value. This process of converting fuzzy output into crisp
output is known as defuzzification [10]. Different methods are available in the literature;
some widely used methods are discussed in the following sections.
3.5.1 Center of area (CoA) method

This method is also known by the names centroid method and center of gravity method. CoA
is the most popular method of defuzzification among researchers. This method is based
on CoA taken by the fuzzy set. Its defuzzified value is calculated after considering the
entire possibility distribution and is given by



∫ (µ(m) × m) dm ; for continuous membership value of m


m* =  ∫ µ(m) dm (3.6)
∑ µ(m) × m ;


 for discrete membership value of m

 ∑ µ(m)
3.5.2 Max-membership function
This method is also called the height method and is applicable to peaked output functions.
Expression for this method is given by
µ(m*) ≥ µ(m) ∀ m ∈ M (Universe of discourse). (3.7)
3.5.3 Weighted average method

This method is applied when output is symmetrical and is given by
m* =
∑ µ(m) × m , (3.8)
∑ µ(m)

where m is the centroid of each symmetric membership function. This method is compu-
tationally efficient but less popular.
3.5.4 Mean-max method
This method is similar to the max-membership method; the only difference is that the
locations of maximum membership can be more than one. Expression is given by
m1 + m2
m* = , (3.9)
2

where m1 + m2 are the mean of maximum interval.
3.5.5 Center of sums
This method is based on the algebraic sum of fuzzy subsets. This method is very fast in
terms of calculations. The defuzzified value is give by
N n
∑ ∑ µ(m )
i=1
mi
k =1
i
m* = N n . (3.10)
∑ ∑ µ(m )

i
i=1 k =1
3.6 Comparison of crisp to fuzzy
Table 3.1 Comparison between crisp sets and fuzzy sets

Crisp Fuzzy
Response Is rain water colorless? Is Narendra Honest?
Yes or No Extremely dishonest (0)
Honest sometimes (0.3)
Very honest (0.7)
Extremely honest (1)
Operation 1. Operation of union 1. Operation of union
2. Operation of intersection 2. Operation of intersection
3. Complement Law 3. Complement Law
4. Operation of difference 4. Equality
5. Differences
6. Disjunctive sum
Properties 1. Commutative Law 1. Commutative Law
2. Associative Law 2. Associative Law
3. Distributive Law 3. Distributive Law
4. Idempotent Law 4. Idempotent
5. Identity Law 5. Identity Law
6. Law of Absorption Law 6. Absorption Law
7. Transitive Law 7. Transitive Law
8. Involution Law 8. Involution Law
9. De-Morgan’s Law 9. De-Morgan’s Law
10. Excluded Middle Law
11. Contradiction Law
3.7 Examples of uses of fuzzy logic

Examples of uses of fuzzy logic are
• Foam detection
• Imbalance compensation
• Water level adjustment
• Washing machine
• Food cookers
• Taking blood pressure
• Determination of “socioeconomic class”
• Cars
3.8 Artificial neural networks

A neural network is a complex network of neurons. Neurons are responsible for process-
ing and storing the information. This stored information can be used for the future. An
artificial neural network (ANN) is motivated by the functioning of the human brain. The
human brain and ANN both acquire knowledge from experience; here we can term it as
learning. Neurons are used to store this learning output.
In this modern era of computers, we are having computers perform mathematical
problems, but in some cases, computers are unable to give entirely accurate results (e.g.,
the distorted image of any animal or word can be easily recognized by the human brain
but computers fail to do so). The human brain can easily understand words spoken in a
noisy environment. This motivates study of the human brain. A fundamental unit of the
brain is called nerve cells or neurons. Neurons in the human brain vary in their shape and
size. Neurons are made up of soma which itself consist of the cell nucleus. Neurons are
biochemical units, which process the electrical signals. Dendrites linked with cell body,
receive signals from other neurons. Whereas the axon long fiber going out from the cell
body processes the output of the neuron. Eventually, the axon also branches into strands
and substrands called synapses. Synapses release chemical elements to dendrites, which
increases or decreases the electrical potential of neurons. If the total input established at
the cell body ranges to a threshold value, neurons get fired. After this fire, the neuron can-
not fire again for some fraction of time. This time duration is known as the absolute refrac-
tory period. The dendrites’ work is to receive signals from other neurons, whereas the axon
transmits triggered activity to other neurons. A receptor neuron receives material from
muscles. The size of synapses depends on the process of learning. McCulloch and Pitts
in 1943 introduced this concept of neural networks. McCulloch a psychiatrist and Pitts a
mathematician took a long time to come up with this conclusion. They claimed that any
complex problem can be easily solved by this method. Although this presented model of
McCulloch–Pitts looked simple, it was a revolutionary innovation in the field of artificial
intelligence [11]. Figure 3.1 demonstrates the basic principle of ANNs.
The progress of neural networks includes development of learning rules for neurons.
The first learning rule is given by Hebb (1949). This rule states that the strength of two
neurons increased by activations of neurons at the same time [12]. Linearly unseparable
functions could not be solved using perceptron [13,14]. Due to this drawback ANN did not
gain popularity until the discovery of the backpropogation algorithm [15–17]. The back-
propogation algorithm is based on multilayer networks. The backpropagation algorithm
was developed by Paul Werbos in 1974 and rediscovered independently by Rumelhart and
Target
Neural network
including connections Compare
Input (called weights) Output
between neurons
Adjust
weights
Figure 3.1 Basic principle of artificial neural networks.
Parker. Discovery of backpropagation is a benchmark in this field. This learning algorithm

has been used in feed-forward multilayer neural networks. The backpropagation algo-
rithm gives flexibility to train ANNs in parallel. This training improves the efficiency of
multilayer perceptron. The gradient descent supervised learning method minimizes the
error of the network [18–20].
3.8.1 Artificial neurons
McCulloch–Pitts gave the concept of the simplest neuron; it is also known as a threshold
logic unit (TLU). In this model inputs and outputs are taken as binary values [11]. Input
gets activated with the help of other neurons. Further synaptic weights are added and
compared with the threshold value. If this value is more than threshold, then this particu-
lar neuron gets fired (i.e., it gets activated). If it is less than threshold, the neuron will not
be activated.
3.8.2 Firing rule
One of the main concepts of ANN is the firing rule. A firing rule decides whether the neu-
ron will get activated or not with any input pattern. Hamming distance technique is one
of the basic and simple firing techniques. This technique is widely used due to its simple
calculations. There are two types of training patterns for a node: the first are responsible
for the firing of any neuron, called 1-taught set of inputs. The second type of training pat-
terns that oppose this change are called 0-taught set of inputs.
Let K = (k1,k2,…,kn) and L = (l1,l2,…,ln), then hamming distance is given by
H= ∑ k −l .
i i

i
3.8.3 Different types of neural networks

On the basis of the unique working of the human brain, different types of ANNs have
been proposed. All of these ANNs have some similarities to the functioning of the human
brain, so that they can be used for many complex problems (e.g., pattern recognition and
classification).
3.8.3.1 Feedback ANN
The feedback network reverses the information. These are applicable to the error correc-
tions of the internal system.
3.8.3.2 Feed-forward ANN
It is a neural network containing
i. An input layer
ii. An output layer
iii. One or more layers of neurons
Decision is based on group behavior of the connected neurons.
3.8.3.3 Classification-prediction ANN
Classification-prediction ANN identifies particular patterns and converts these into spe-
cific groups.
3.9 Training of neural networks

In training of ANN weights, biases or any other parameter, get adjusted according to data
set. Training is a kind of curve fitting where parameters are optimized to give best results.
Learning can be of two types: supervised or unsupervised.
3.9.1 Supervised training
Inputs and outputs both are given in supervised training. These inputs are processed into
a network and the outcome is compared with the actual output. Hence, the difference of
actual output and desired output is adjusted by adjusting weights. This process is repeated
until optimized error is achieved. These data which are used in adjustment of weights are
called training data. This learning is similar to class learning where a teacher is always
there to correct mistakes of students, thus it is sometimes referred to as supervised learn-
ing. If the input data lack some precise knowledge, then training may not be possible for
the particular network. For a good learning an appropriate number of data set is required,
otherwise the networks may not converge. In standard conditions we divide the data set
into three parts: one for training, one for testing, and the last part for validation. For an
unbeaten network it is necessary to examine all the basic parts again and again (e.g., num-
ber of hidden layers, connection weights, etc.). The adaption of feedback is done by most
the popular backpropagation technique. At last when the network is trained appropriately,
the final weights can be frozen.
3.9.2 Unsupervised training
In this type of learning only inputs are given to the network, not the outputs. The
decision of selecting features is done by grouping the input data. Hence, this learn-
ing algorithm is known as adaption. In real life there are several examples where exact
training of data is not possible (e.g., in an army, the situation where an army faces new
weapons). Tuevo Kohonen designed a self-organizing neural network that can learn
without desired output [21]. As this learning algorithm resembles a class of students
where students are learning themselves, without the help of a teacher, it is sometimes
referred to as learning without a teacher.
3.9.3 Reinforced training
This type of learning comes under supervised learning with condition. Here the teacher
works as a guide to tell either the output is correct or not but will not give the actual output
to the network. This learning algorithm is not popular among researchers.
3.10 Adaptive neuro fuzzy inference system

Adaptive neuro fuzzy inference system (ANFIS) is a combination of neural networks and
fuzzy logic to utilize their advantages. Neural network and fuzzy logic have their own
strengths and weaknesses. In this combination, neural network and fuzzy logic comple-
ment each other. In an ANFIS neural networks play the role of supporting tool to fuzzy
system [22]. The ANFIS makes it possible to plan complete neural network knowledge to
develop a fuzzy inference system. This strong combination helps overcome the shortcom-
ings of both techniques. To generate fuzzy rules for a fuzzy inference system, ANFIS can
easily learn from a sufficiently large data set [23–29].
3.11 Genetic algorithms
Optimization is a process for finding the best out of available solutions. Optimization
is needed everywhere, even in our daily routine, when we decide our to-do list or pri-
oritize our tasks for the day [30]. There are many traditional methods available in the
literature to solve the optimization problem. These traditional methods have the follow-
ing drawbacks:
1. Traditional methods may stick up at local optimum.

2. Gradient-based methods are not applicable in case of discontinuous objective
function.
3. Parallel computing is difficult for these methods.
4. Only unimodal problems can be handled.
5. Different methods are needed for different kinds of problems.
The above drawbacks of traditional methods motivate the search for a robust and effi-
cient method. Biological systems are flexible, robust, efficient, and self-guided. Genetic
algorithms are inspired by Darwin’s theory about evolution: “survival of the fittest.” The
method was developed by John Holland (1975) [30–32].
3.12 Working of genetic algorithm

The following steps are involved in genetic algorithms (Figure 3.2):
Step 1—Generation of population: Depending on the complexity of the problem, an
initial population generated. This generation is purely random. The most commonly used
encoding scheme is binary encoding. In binary encoding, the solutions are represented
in the form of a binary string. The length of each chromosome (solution in form of bit) is
dependent on the accuracy we required.
Start
Initialize a population of solution Gen = 0
Gen > = last_gen?
Yes Assign fitness to all solutions in

No
the population and select Mates.
End
Reproduction
Mutation
Gen = Gen + 1
Figure 3.2 Working cycle of a genetic algorithm.
Step 2—Fitness evolution: To calculate the fitness value of each solution, the decoding
is done to get the real value of the solution. This value of the variable is then substituted to
the given objective function to compare fitness of solution.
Step 3—Reproduction: Good strings (“fittest”) in a population are selected and
assigned a large number of copies to form a mating pool.
Step 4—Crossover: In this step parents exchange properties.
Step 5—Mutation: The concept of biological mutation is also preserved here. A sud-
den change in population is done to take the solution out of local optima.
3.13 Applications of soft computing

• A temperature control system tuned through soft computing is applicable to change
the temperature and humidity according to the climate conditions.
• Washing machines adjust operations for the best wash results on the basis of dirt con-
ditions. A combination of neural network and fuzzy logic can also be implemented to
get better wash results [33,34].
• Soft computing is also applicable in aircraft deciding vehicles.
• A Coke oven gas cooling plant is a first-rate application of the soft computing
methodology.
• These days control of automatic exposure in video cameras is also done through soft
computing.
• Wastewater treatment process control, reverse osmosis plants, quantitative pattern
analysis for industrial quality assurance, control of constraint satisfaction problems
in structural design and control of water purification plants are a few global applica-
tions of soft computing.
• Soft computing finds its measure application in medical diagnostic support systems
like in use of anesthesia, logical findings in Alzheimer’s patients, and diagnosis of
diabetes and prostate cancer.
• Automatic underground train operation, railway acceleration, train schedule control,

transportation, braking and stopping, and adaptive filter for nonlinear channel equal-
ization control of broadband noise are some other applications of soft computing.
• The motion planning of mobile robots is also done using soft computing techniques.
References
1. L. A. Zadeh, “Fuzzy sets,” Inf. Control, vol. 8, no. 3, pp. 338–353, 1965.
2. L. A. Zadeh, “Outline of a new approach to the analysis of complex systems and decision pro-
cesses,” IEEE Trans. Syst. Man Cybern., vol. 3, no. 1, pp. 28–44, 1973.
3. M. Gr. Voskoglou, “Measuring the uncertainty of human reasoning,” Am. J. Appl. Math. Stat.,
vol. 2, no. 1, pp. 1–6, 2013.
4. C. Lejewski, “Jan Lukasiewicz,” Encycl. Philos., vol. 5, pp. 104–107, 1967.
5. D. C. S. Bisht, M. Raju, and M. Joshi, “Simulation of water table elevation fluctuation using
fuzzy-logic and ANFIS,” Comput. Model. New Tech., vol. 13, no. 2, pp. 16–23, 2009.
6. A. Gupta, and N. Singhal, “Advice generation using fuzzy logic in OMR Pheonix technique,”
Int. J. Comput. Appl., vol. 52, no. 16, pp. 6–10, 2012.
7. E. Egrioglu, U. Yolcu, C. H. Aladag, and C. Kocak, “An ARMA type fuzzy time series forecast-
ing method based on particle swarm optimization,” Math. Probl. Eng., vol. 2013, pp. 1–12, 2013.
8. A. Reigber, My life with Kostas. Unpublished report, Neverending Story Press, 1999.
9. J. M. Mendel, Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Directions. Prentice
Hall PTR, Upper Saddle River, NJ, 2001.
10. R. R. Yager, and D. P. Filev, Essentials of Fuzzy Modeling and Control. John Wiley & Sons,
New York, 388 pp, 1994.
11. W. S. McCulloch, and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,”
Bull. Math. Biophys., vol. 5, no. 4, pp. 115–133, 1943.
12. D. O. Hebb, “Organization of behavior. New York: Wiley, 1949, pp. 335,” J. Clin. Psychol., vol. 6,
no. 3, pp. 307–307, 1950.
13. M. Dougherty, “A review of neural networks applied to transport,” Transp. Res. Part C Emerg.
Technol., vol. 3, no. 4, pp. 247–260, 1995.
14. A. L. Glass, and K. J. Holyoak, “Alternative conceptions of semantic theory,” Cognition, vol. 3,
no. 4, pp. 313–339, 1974.
15. J. A. Anderson, “A simple neural network generating an interactive memory,” Math. Biosci., vol.
14, no. 3–4, pp. 197–220, 1972.
16. L. Glass, and R. E. Young, “Structure and dynamics of neural network oscillators,” Brain Res.,
vol. 179, no. 2, pp. 207–218, 1979.
17. K. Fukushima, “Cognitron: A self-organizing multilayered neural network,” Biol. Cybern., vol.
20, no. 3–4, pp. 121–136, 1975.
18. I. Jung, L. Koo, and G.-N. Wang, “Two states mapping based neural network model for decreas-
ing of prediction residual error,” Int. J. Ind. Manuf. Eng., vol. 1, no. 7, pp. 322–328, 2007.
19. K. L. Priddy, and P. E. Keller, Artificial Neural Networks: An Introduction, vol. 68. SPIE Press,
Bellingham, WA, 2005.
20. S. Sapna, A. Tamilarasi, and M. P. Kumar, “Backpropagation learning algorithm based on
Levenberg Marquardt Algorithm,” Comp. Sci. Inf. Technol. CS IT, vol. 2, pp. 393–398, 2012.
21. T. Kohonen, ed., Self-Organizing Maps. Springer-Verlag, New York, Secaucus, NJ, 1997.
22. B. Dixon, “Applicability of neuro-fuzzy techniques in predicting ground-water vulnerability:
A GIS-based sensitivity analysis,” J. Hydrol., vol. 309, no. 1, pp. 17–38, 2005.
23. U. Nauck, and R. Kruse, “Design and implementation of a neuro-fuzzy data analysis tool in
Java,” Manual Technical University of Braunschweig Germany, 1999.
24. E. Khan, Neural fuzzy based intelligent systems and applications In: Jain, LC and NM Martin
eds., Fusion of Neural Networks, Fuzzy Sets and Genetic Algorithms: Industrial Applications. CRC
Press, Washington, DC, 1999.
25. J.-S. Jang, “ANFIS: Adaptive-network-based fuzzy inference system,” IEEE Trans. Syst. Man
Cybern., vol. 23, no. 3, pp. 665–685, 1993.
26. J. M. Keller, R. Krishnapuram, and F.-H. Rhee, “Evidence aggregation networks for fuzzy logic
inference,” IEEE Trans. Neural Netw., vol. 3, no. 5, pp. 761–769, 1992.
27. I. N. Aghdam, B. Pradhan, and M. Panahi, “Landslide susceptibility assessment using a novel
hybrid model of statistical bivariate methods (FR and WOE) and adaptive neuro-fuzzy infer-
ence system (ANFIS) at southern Zagros Mountains in Iran,” Environ. Earth Sci., vol. 76, no. 6,
p. 237, 2017.
28. A. M. Ahmed, and S. M. A. Shah, “Application of adaptive neuro-fuzzy inference system
(ANFIS) to estimate the biochemical oxygen demand (BOD) of Surma River,” J. King Saud
Univ.-Eng. Sci., vol. 29, no. 3, pp. 237–243, 2017.
29. A. Karkevandi-Talkhooncheh, S. Hajirezaie, A. Hemmati-Sarapardeh, M. M. Husein, K. Karan,
and M. Sharifi, “Application of adaptive neuro fuzzy interface system optimized with evolu-
tionary algorithms for modeling CO2–crude oil minimum miscibility pressure,” Fuel, vol. 205,
pp. 34–45, 2017.
30. A. Boultif, A. Kabouche, and S. Ladjel, “Application of genetic algorithms (GA) and thresh-
old acceptance (TA) to a ternary liquid–liquid equilibrium system,” Int. Rev. Model. Simul.
IREMOS, vol. 9, no. 1, pp. 29–36, 2016.
31. I. Cruz-Vega, C. A. R. García, P. G. Gil, J. M. R. Cortés, and J. de J. R. Magdaleno, “Genetic algo-
rithms based on a granular surrogate model and fuzzy aptitude functions,” in Evolutionary
Computation (CEC), 2016 IEEE Congress on, 2016, pp. 2122–2128.
32. R. L. Haupt, and S. E. Haupt, Practical Genetic Algorithms. John Wiley & Sons, New York, 2004.
33. D. C. S. Bisht, P. K. Srivastava, and M. Ram, “Role of fuzzy logic in flexible manufacturing
system,” Diagnostic Techniques in Industrial Engineering. Springer, Cham, pp. 233–243, 2018.
34. N. Mathur, P. K. Srivastava, and A. Paul, “Algorithms for solving fuzzy transportation
problem,” Int. J. Math. Oper. Res., vol. 12, no. 2, pp. 190–219, 2018.
chapter four
New approach for solving multi-

objective transportation problem
Gurupada Maity and Sankar Kumar Roy
Vidyasagar University
Contents
4.1 I ntroduction........................................................................................................................... 71
4.2 Preliminaries......................................................................................................................... 73
4.2.1 Concepts of solution................................................................................................. 74
4.3 Mathematical model.............................................................................................................77
4.4 Solution procedure............................................................................................................... 78
4.4.1 Fuzzy programming................................................................................................ 78
4.4.2 Goal programming................................................................................................... 79
4.4.3 Revised multi-choice programming......................................................................80
4.4.4 Vogel approximation method.................................................................................80
4.4.5 Merits and demerits.................................................................................................. 81
4.5 Numerical example..............................................................................................................83
4.5.1 Fuzzy programming................................................................................................84
4.5.2 Goal programming................................................................................................... 85
4.5.3 Revised multi-choice goal programming............................................................. 86
4.5.4 Vogel approximation method................................................................................. 86
4.6 Comparison........................................................................................................................... 87
4.7 Conclusion and future study.............................................................................................. 88
Acknowledgment........................................................................................................................... 88
References........................................................................................................................................ 89
4.1 Introduction
Operations research (OR) is a discipline that encompasses a wide range of methods in solv-
ing real-life decision-making problems. The mathematical methods are applied in pursuit
of improving decision-making and efficiency in the areas of mathematical optimization,
econometric methods, simulation, neural networks, decision analysis, and the analytic
hierarchy process. The study of OR arose during World War II. During this time, OR was
considered as a scientific way for providing respective departments with a quantitative
basis for making decisions corresponding to the operations under the entire system. The
term “optimization” is the root of the study in OR. The optimization is used in different
areas of study, like mathematical optimization, engineering optimization, economics and
business, information technology, etc.
In an optimization problem (OP), we basically treat the objective function, either
maximization or minimization, with or without some prescribed set of constraints.
71
Requirements in real-life decision-making situations enlarge the area of mathematical OPs

in different fields of multi-objective optimization (MOO) problems.
The transportation problem (TP), a special kind of decision-making problem, may be
considered as the central nervous system to keep the balance in economical infrastructure
from ancient times to today. TP can be treated as a special case of a linear programming
problem (LPP). A graphical network of TP is shown in Figure 4.1.
The classical sense of TP fixes how many units of goods are to be transported from
each node of origin to various nodes of destinations, satisfying availabilities of sources,
and demands of destinations, while minimizing the total cost of transportation along with
minimizing the costs per unit of commodities for the purchasers.
TP was originally developed by Hitchcock [1], and later the mathematical model was
presented by Koopmans [2]. A number of research works have been invented by several
authors in the thrust area of TP, such as that by Ebrahimnejad [3], Kaur and Kumar [4],
Mahapatra et al. [5], Maity and Roy [6,7], Maity et al. [8], Midya and Roy [9,10], Roy [11],
Roy and Maity [12], and Roy et al. [13]. The single-objective TP is not sufficient to tackle
real-life decision-making problems in competitive market scenarios. To tackle all the real-
life situations on TP, we have to introduce here multi-objective TP. Several approaches
for solving managerial problems involving multiple conflicting objective functions are
introduced by Charnes and Cooper [14]. Maity and Roy [15] solved a multi-objective TP
by fuzzy programming. A study on a multi-objective TP in a fuzzy environment to obtain
a compromise solution was introduced by Waiel [16]. Roy et al. [17] presented a study on
a multi-objective transportation problem (MOTP) using a conic scalarization approach.
Kumar et al. [18,19] solved different types of optimization problems using particle swarm
optimization. Recently, Pant et al. [20,21] applied the p article swarm optimization tech-
nique to solve different types of optimization problems.
Considering the situations of real-life decision-making problems, we design this chap-
ter on TPs in multi-objective ground where the objective functions are conflicting. Many
situations occurred where the solution of a MOTP is found as a compromise solution, but
the solution often depends on the weights of objective functions proposed by the decision
maker (DM). Then in the MOTP, the compromise solution satisfies the goals corresponding
to the objective functions which play an effective role in solving it.
m-Supply points n-Demand points
D1
O1 C11
C12
C
C14 13
D2
C21
C22
O2 C23
C24 D3
C31
C32
C33
O3 C34 D4
Figure 4.1 Graphical network of TP.
Chapter four: New approach for solving multi-objective transportation problem 73
Several approaches are available for solving a MOTP, such as fuzzy programming,
goal programming, revised multi-choice goal programming, etc. Zimmermann [22] intro-
duced the concept of solving a MOO problem using fuzzy programming. Basically, in a
fuzzy programming approach, a MOTP is converted to a single-objective optimization
problem and then its solution is treated as a compromise solution.
Goal programming (GP), an analytical tool, is introduced to address the decision-
making problem involving objective functions that are conflicting and noncommensurable
to each other, and targets have been assigned as goals to the objective functions. The DM
is interested in maximizing the aspiration level of the corresponding goals. The concept of
GP was introduced by Charnes and Cooper [14] and further developed by researchers such
as Hannan [23], Ignizio [24], Tamiz et al. [25], Romero [26], Liao [27], Tabrizi et al. [28], and
many others. However, the resources ambiguity and the incomplete information make it
almost impossible to set the specific aspiration levels (goals) and select a better decision
by DM. To tackle this situation, Chang [29] presented the multi-choice goal programming
(MCGP) approach to solve the multi-objective decision-making (MODM) problem. During
this time, again Chang [30] proposed RMCGP which is the revised form of MCGP for
solving the MODM.
In this chapter, we introduce a new approach for solving the MOTP. Especially, we
intend to solve the MOTP using the Vogel approximation method (VAM). The usefulness
of the algorithm is tested through a numerical example.
The rest of this chapter is organized in the following way: Section 4.2 describes the
preliminaries of the proposed chapter. Section 4.3 contains the mathematical model
of TP and MOTP. The solution procedure is presented in Section 4.4 which contains
five subsections. Fuzzy programming, goal programming, and revised multi-choice
goal programming are briefly presented in Sections 4.4.1–4.4.3, respectively. An algo-
rithm for solving the proposed MOTP by VAM is introduced in Section 4.4.4. Section
4.4.5 contains the merits and demerits of the proposed approaches for solving MOTP.
A numerical example is taken into consideration to justify our study; and comparison
among the obtained solutions from the approaches is carried out in Sections 4.5 and
4.6, respectively. The chapter ends with the conclusion and an outlook of the study in
Section 4.8.
4.2 Preliminaries
In an optimization problem, there are mainly two perspectives, namely, formulation of the
model and then finding its solution. Here, we present some useful definitions in connec-
tion of the study.
Definition 4.1: Optimization is a mathematical discipline which is concerned with
finding maximum or minimum of objective functions with or without constraints. In
the study of optimization, basically we need to optimize a real function f ( x1 , x2 , … , xn ),
of n variables x1 , x2 , … , xn with or without constraints.
In an OP, for modeling a physical system, if there be only one objective function, and
the task is to obtain the optimal solution, then it is referred to as a single-objective
optimization problem. The general form of single-objective optimization problem
can be depicted as follows:
minimize or maximize f ( x1 , x2 ,…, xn ) (4.1)

subject to h( x1 , x2 ,…, xn ) ≥ 0 (4.2)
g( x1 , x2 , … , xn ) ≤ 0 (4.3)
l( x1 , x2 ,…, xn ) = 0 (4.4)
( x1 , x2 ,…, xn ) ∈ F ⊂  n , (4.5)
F is the feasible region.

Definition 4.2: In an OP, if both objective function f and the constraints (4.2)–(4.5) are
linear functions of the decision variables x1 , x2 , … , xn, then an OP is called a linear OP.
Furthermore, if at least one of the constraints or the objective function is nonlinear,
then the OP is called nonlinear OP.
Definition 4.3: When an OP is used for modeling a real-life problem which involves
more than one objective function, the task to find the optimal solution is called a
MOO problem.
The most general mathematical form of a MOO problem is depicted as follows:
minimize or maximize f = f ( f1 , f 2 ,… , f k )
(4.6)
subject to the constraints (4.2) − (4.5)

where f1 , f2 , … , f k are the objective functions containing the decision variables
x1 , x2 , … , xn.
Definition 4.4: In a MOO problem, if all the objective functions f1 , f2 , … , f k and the
constraints (4.2)–(4.5) are linear functions in terms of decision variables x1 , x2 , … , xn,
then the MOO problem is called a linear MOO problem. Furthermore, if at least one
of the constraints or one of the objective functions becomes a nonlinear type, then the
MOO problem is called a nonlinear MOO problem.
4.2.1 Concepts of solution
In a single-objective optimization problem, the “best” solution is defined in terms of
“optimal solution” for which the value of the objective function is optimized satisfying the
set of all feasible restrictions. In a MOO problem, the “best” solution is usually referred to
as the “Pareto optimal solution.” Here are some useful definitions related to the solution
of a MOO problem.
{
Definition 4.5 (Feasible Solution, FS): A solution set X = x : x ∈  n is said to be a }
feasible solution to a MOO, if it satisfies all the constraints. A set S consisting of all
FSs is called a feasible solution set which lies in the space of action, where
{
S = x : x ∈  n satisfying the constraints (4.2)–(4.5) . }
Definition 4.6 (Optimal Solution): An optimal solution of minimization problem
MOO is a FS which gives the minimum value of each objective function simultane-
ously, i.e., if, x * ∈ S where x* is an optimal solution and f k ( x * ) ≤ f k ( x), k = 1, 2,…, K for
all x * ∈ S.
In the space of objective function, the optimal solution is located within the
oundary of the feasible space. Here, the optimal solution is also known as the infe-
b
rior solution. Generally, there is no optimal solution to a MOO problem because the
objective functions are conflicting in nature. In MOO with conflicting objective func-
tions, the optimum solution corresponding to each objective function is obtained
individually but the optimum solution of MOO problem reflecting optimum values
of objective functions individually does not exist in general.
Optimal Compromise Solution: Compromise programming seeks the compro-
mise solution among several objective functions of a MOO problem. The idea is based
on the minimization of the distance between the ideal and the desired solutions.
Definition 4.7: The optimal compromise solution of a MOO is a solution x ∈ X which
is preferred by the DM to all other solutions, taking into consideration all objectives
contained in the several functions of the MOO problem.
It is generally accepted that an optimal compromise solution has to be an efficient
solution according to the definition of an efficient solution. For a real-life practical
problem, the complete solution (set of all efficient solutions) is not always necessary.
We need only a procedure that finds an optimal compromise solution.
Definition 4.8 (Pareto-optimal solution [Efficient solution]): A feasible solution x of
a MOO problem is said to be a nondominated (noninferior) solution if there does not
exist any other feasible solution x which dominates the solution obtained through x.
Therefore, for a nondominated solution, an increase in the value of any one objective
function is not possible without decreases in the value of at least one other objective
function. Mathematically, a solution x ∈ X is nondominated if there does not exist
any x ∈ X, such that f k ( x) ≤ f k ( x), k = 1, 2,…, K; and f k ( x) ≠ f k ( x), at least one k.
Fuzzy Programming (FP): In real-life uncertain situations, the fuzzy set theory is
an important and an effective tool to take into account for analyzing the MODM
problem. Although the fuzzy set theory is rigorously used in the field of operations
research as a tool for solving a MOO problem, here we intend to use a fuzzy set for
accommodating real-life situations in a MOO problem through the fuzzy parameters.
 is a pair ( A, µ  ) where A is a crisp set
Definition 4.9 (Fuzzy set) [31]: A fuzzy set A A
that belongs to the universal set X and µ A : X → [0,1] is a function, called a member-
ship function.
Fuzzy membership function: Membership values are used in order to determine the
degree of membership of the elements to the fuzzy set. The evaluation of a member-
ship value is of critical importance in the application of fuzzy set theory in the field of
engineering and science. The linear membership function is defined by two flexible
points, such as upper and lower aspiration levels, or two bounds of tolerance intervals.
Definition 4.10 (Membership function of triangular fuzzy number): The membership
 = ( a, b, c) is depicted as follows (Figure 4.2):
function of a triangular fuzzy number A
 x−a if a ≤ x ≤ b
 ,
 b−a
 c−x if b ≤ x ≤ c
µ A ( x) =  ,

 c−b
 0, if elsewhere

y
1
x
o a b c
Figure 4.2 Membership function of triangular fuzzy number.
In a usual sense, the goal of an objective function in an OP is a specified aspira-

tion level determined by the DM. Historically, the main concept of GP was to
minimize the deviation between the achievement goal and the achievement level.
GP is also useful to optimize a MOO problem with goals to each of the objective
functions.
Definition 4.11: The mathematical model of GP for solving a MOO problem can be
considered in the following form:
Model GP:
minimize ∑w
i=1
i f i ( x) − g i

subject to x ∈ F ,
where F is the feasible set and wi are the weights attached to the deviation of the
achievement function. fi ( x) is the i-th objective function of i-th goal, and gi is an aspi-
ration level of the i-th goal. fi ( x) − g i represents the deviation of the i-th goal.
A modification of GP is provided and is denoted as weighted goal programming
(WGP) which can be displayed in the following form:
Model WGP:
minimize ∑w ( d
i=1
i
+
i + di− )

subject to fi ( x) − di+ + di− = g i ,
di+ ≥ 0, di− ≥ 0 (i = 1, 2,…, K ),
x ∈F,
where di+ and di− are over and under achievements of the i-th goal, respectively.
However, the conflicts of resources and the incompleteness of available information
make it almost impossible for DMs to set the specific aspiration levels and choose a
better decision. To overcome this situation, a revised multi-choice goal programming
(RMCGP) approach was presented by Chang [30].
Definition 4.12: The mathematical model of RMCGP for solving a MOO problem can
be defined as follows:
Model RMCGP:
minimize ∑ w ( d
i=1
i
+
i ) (
+ di− + α i ei+ + ei−  )
subject to fi ( x) − di+ + di− = y i (i = 1, 2,…, K ),

y i − ei+ + ei− = gi ,max or g i ,min (i = 1, 2,…, K ),
g i ,min ≤ y i ≤ g i ,max (i = 1, 2,…, K ),
di+ ≥ 0, di− ≥ 0, ei+ ≥ 0, ei− ≥ 0 (i = 1, 2,…, K ),
x ∈ F.
Here yi is the continuous variable associated with the i-th goal which is restricted
between the upper ( g i ,max ) and lower ( g i ,min ) bounds; ei+ and ei− are positive and
negative deviations attached to the i-th goal of y i − g i,max ; α i is the weight attached
to the sum of the deviations of y i − g i,max ; and other variables are defined as
in WGP.
4.3 Mathematical model
The mathematical model of a transportation problem is as follows:
Model 1
m n
minimize Z = ∑ ∑ C x , (4.7)
i=1 j=1
ij ij
subject to ∑x
j=1
ij ≤ ai (i = 1, 2, … , m), (4.8)
∑x
i=1
ij ≥ bj ( j = 1, 2, … , n), (4.9)
xij ≥ 0 ∀ i, j, (4.10)
where xij is the decision variable that represents the amount of goods delivered from i-th
origin to j-th destination. Cij is the transportation cost per unit commodity. ai and bj are
∑ ∑
m n
supply and demand at i-th origin and j-th destination, respectively, and ai ≥ bj
i=1 j=1
is the feasibility condition.
A single-objective transportation problem does not serve to formulate all the real-life
transportation problems. As, for example, if it is required to minimize cost of transportation,
maximize profit, and minimize distance in a single TP, then it reduces to multi-objective
ground. The transportation problem with multiple objective functions is considered as a
MOTP. However, we deal with such kind of objective functions, which are conflicting and
noncommensurable to each other involved in TP. A mathematical model of MOTP is as
follows:
m n

minimize/maximize Z = t
∑∑C x
i=1 j=1
t
ij ij (t = 1, 2,…, K )
subject to the constraints (4.8)–(4.10).
( )
Here Cijt = Cij1 , Cij2 , … , CijK is the cost per unit goods for each objective function when the
total amount is transported from i-th origin to j-th destination.
4.4 Solution procedure
In this section, we introduce three well-known techniques for solving the MOTP: fuzzy
programming, goal programming, and revised multi-choice programming. Thereafter,
we propose a new algorithm using VAM to solve the MOTP.
4.4.1 Fuzzy programming
To solve the MOTP, we use the approach of fuzzy programming which reduces a multi-
objective problem to a single-objective problem, then the single-objective problem is solved
to find the compromise solution of the MOTP. The steps to be followed for converting a
multi-objective problem to a single-objective problem are as follows:
Step 1: First, we solve each of the objective functions separately, under the demand and
supply restrictions, and obtain optimal solutions of K linear objective functions. Let
X 1* , X 2* , … , X K* be the ideal solutions of the K objectives Zt, (t = 1, 2,…, K).
Step 2: Each objective function is evaluated corresponding to the ideal solutions
obtained in Step 1, and we formulate a pay-off matrix of order K × K as follows:
Table 4: Pay-off Matrix
 Z1 (X 1* ) Z 2 (X1* ) … Z K (X 1* ) 
 
 Z1 (X 2* ) Z 2 (X 2* ) … Z K (X 2* ) 
 1 * 2 * 

 Z (X ) 3 Z (X ) 3 … Z K (X 3* ) 
 
 
 Z (X K* )
1
Z (X K* )
2
… Z K (X K* ) 
 
Step 3: When the t-th objective function is a minimizing type, obtain the lower bound
Lt (best solution) and upper bound Ut (worst solution) corresponding to the t-th objec-
tive function. Then formulate the membership function using Zimmermann’s [22]
approach corresponding to each objective function Zt (t = 1, 2,…, K) as follows:
 0, if Zt ≥ U t

 U t − Z t (X )
µ A ( t
)
Z (X ) = 
U t − Lt
, if Lt ≤ Zt (X ) ≤ U t


 1, if Zt (X ) ≤ Lt


Again, if the t-th objective function is the maximizing type, obtain the lower bound
Lt (worst solution) and upper bound Ut (best solution) corresponding to the t-th objec-
tive function. Then the membership function for the objective function Zt (t = 1, 2,…, K)
is formulated as follows:
 0, if Zt (X ) ≤ Lt

 Z t (X ) − Lt
µ A ( t
)
Z (X ) = 
U t − Lt
, if Lt ≤ Z t (X ) ≤ U t


 1, if Zt ≥ U t

Step 4: Introduce an auxiliary variable λ and formulate an equivalent fuzzy linear
programming problem in the following form:
maximize λ
(
subject to λ ≤ µ A Zt (X ) ) (t = 1, 2,…, K ),
and the constraints (4.8)–(4.10).
( )
Here, µ A Zt (X ) is the membership function of the t-th objective function for (t = 1,
2,…, K) as given in Step 3.
Step 5: Solve the crisp model obtained in Step 4, and derive the optimal compromise
solution.
4.4.2 Goal programming
Goal programming is a useful approach for solving MOTP. Here we discuss the goal
programming approach for solving MOTP (see Model 1A).
In this procedure, DM needs to decide goals corresponding to each of the objective
functions. Consider g i,max and g i,min as the maximum and minimum aspiration values of
the t-th objective function in MOTP. Consider di+ and di− as positive and negative deviations
corresponding to the t-th objective function. Then the mathematical model of GP is formu-
lated as follows:
Model 1A
k
minimize ∑w ( d
t=1
t
+
t )
+ dt− (4.11)
subject to ft ( x) − dt+ + dt− = yt (t = 1, 2, … , K ), (4.12)
gt ,min ≤ yt ≤ gt ,max (t = 1, 2, … , K ), (4.13)
dt+ ≥ 0, dt− ≥ 0 (t = 1, 2, … , K ), (4.14)

the constraints (4.8) – (4.10).
4.4.3 Revised multi-choice programming

A modification of GP produces the RMCGP to solve the MOTP. Under the prediction of
goals of objective functions, the RMCGP model is described in the following model:
Model 1B
k
minimize ∑ w ( d
t=1
t
+
t ) ( )
+ dt− + α t et+ + et−  (4.15)
subject to ft ( x) − dt+ + dt− = yt (t = 1, 2, … , K ), (4.16)

+ −
yt − e + e = gt ,min or gt ,max
t t (t = 1, 2, … , K ), (4.17)
gt ,min ≤ yt (t = 1, 2, … , K ), (4.18)
d ≥ 0, d ≥ 0, et+ ≥ 0, et− ≥ 0 (t = 1, 2,…, K ), (4.19)
+
t
−
t

the constraints (4.8)–(4.10).
where yt is a continuous variable that lies between upper ( gt ,max ) and lower ( gt , min ) bounds,
and it is denoted as the aspiration level of the t-th objective function. Again et− and et+ are
negative and positive deviations attached to the t-th goal of yt − gt,max , and αt is taken as
the weight attached with the sum of the deviations of yt − gt,max .
4.4.4 Vogel approximation method

Here, we present a new approach to find a compromise solution of the MOTP. Especially, a
new approach is defined to reduce the MOTP to a single-objective LPP. The cost parameters
of different objective functions may be in different scales, and we propose an approach that
reduces each of the cost parameters into a real value under the same unit scale. Following
are the steps of the new algorithm:
• Step 1: Consider a MOTP with K objective functions Z t (t = 1, 2,…, K ) .

• Step 2: Assume Zp to be an objective function of maximization type. If Cip′j ′ is the
maximum value of the cost parameter, then take it by the value 1. After that each of
the cost parameters is divided by Cip′j ′ which gives the value Cijp /Cip′j ′ = rijp (say) ∀i, j
with 0 ≤ rijp ≤ 1. It means the allocation possibly made at the values of rijp where it is
maximum. Then each of the cost parameters in the objective function of maximiza-
tion type is reduced in the same scale.
Again, choose Zq to be an objective function of minimization type. If Ciq′j ′ is the
maximum value of cost parameter, then assume it by the value 1. After that each of
the costs is assigned with the value Cijq /Ciq′j ′ = sijq (say) ∀i, j with −1 ≤ sijq ≤ 0. As the
objective function is a minimization type, so the allocations are made at the nodes
where values of sijq are larger. Then each of the cost parameters in the objective func-
tions of minimization type is reduced in the same scale.
• Step 3: Thereafter, consider the weights wt (normalized) for the objective functions
Zt for all t.
(
• Step 4: Formulate the objective function, i.e., maximize Z = ∑ mi = 1 ∑ nj = 1 ∑t∈T ′ wt rijt +
)
∑t∈T ′′ wt sijt xij under the constraints (4.8)–(4.10) described in MOTP. Here T’ and T”
are the sets of objective functions with maximization and minimization types,
respectively.
• Step 5: The formulated model in Step 4 is a LPP; solve the LPP by simplex algorithm.
• Step 6: Obtain the optimum solution of the LPP from Step 5 and notice the a llocations
made at the cells. Let Xij be the optimum solution. Then the compromise solution of
the t-th objective function is Zt = ∑ mi = 1 ∑ mj = 1 Cijt X ij ∀t .
• Step 7: Stop.
The presented algorithm is more effective for solving the MOTP based on the following
reasons:
• In the proposed algorithm, the compromise solution of MOTP is introduced without

using the solution procedure of MOTP like fuzzy programming, goal programming,
revised multi-choice goal programming, etc.
• A bigger form of real-world MOO problem is made into a simple form with the
single-objective optimization problem, which is easy to solve and to produce the
solution with less computational burden.
• An important factor of the proposed algorithm is that there is not required any
auxiliary variable, any additional constraint to reduce the problem into a TP, and it is
easy to solve by VAM.
• The weight is an important factor for solving MOTP which is considered in the pro-
posed algorithm. To increase the impact of any objective function, DM should change
the weights, but the weights must be normalized with a better impact for the results.
4.4.5 Merits and demerits

Here, four approaches are considered to solve the MOTP for finding compromise s olutions.
Practically, it is not possible to choose a better solution or comparison among the optimal
compromise solutions. But from the technical point of view, we are able to discuss the
merits and demerits of the approaches and then classify the appropriateness for solving
the MOTP among them. At first, let us see about the fuzzy programming. In fuzzy pro-
gramming, it is required to solve several objective functions individually, and thereaf-
ter, the single-objective function is constructed to find the optimal compromise solution.
Altogether, in this technique, we need to solve (K + 1) objective functions. So, the technique
is laborious and time consuming.
For GP, we need to consider the goals corresponding to the objective functions, and
then a single objective function is formulated to obtain a compromise solution. But, the
main drawback in GP is how to construct the goals corresponding to the objective function.
Based on the priority, the goals are predicted by the DM regarding the appropriateness in
the decision-making process.
Now for RMCGP, the drawbacks regarding the consideration of goals in RMCGP are
similar to the GP. Again, the number of auxiliary variables and constraints in RMCGP
is two times the number of those used in GP. But, it is proved that the RMCGP produces
a better result than GP. The following theorem is presented in connection with GP and
RMCGP.
Theorem 4.1: Solution of MOTP by RMCGP produces a better result than GP.
Proof: The mathematical model of GP for solving a MOTP is depicted as follows:
GP
minimize ∑w Z (x) − g (4.20)

i=1
i
i
i
subject to x ∈ F. (4.21)
Let us consider di+ = Z i ( x) − g i , if Z i ( x) ≥ g i = 0, and di+ = 0 otherwise (i = 1, 2,…, K). Also,

we take di− = g i − Z i ( x), if Z i ( x) ≤ g i = 0, and di− = 0 otherwise (i = 1, 2,…, K). Therefore,
Z i ( x) − g i = di+ − di−, which produces the relation Z i ( x) − di+ + di− = g i. Furthermore,
Z i ( x) − g i = di+ + di−.
Thus, the modification on the model GP reduces to the model WGP as follows:
WGP
minimize ∑w ( d
i=1
i
+
i + di− )
subject to Z i ( x) − di+ + di− = gi ,
di+ ≥ 0, di− ≥ 0 (i = 1, 2,…, K ),

x ∈ F.
From the above discussion, we observe that the mathematical model GP is a func-
tion of the weights and goals along with the decision variables, whereas the objective
function in the mathematical model of WGP contains goal deviations (di+ ) and (di− )
as variable. It is observed that both models GP and WGP produce the same result,
but the WGP model is easier to tackle than GP as its objective function includes a
minimum number of variables compared to GP.
If the goal (gi) for the i-th objective function is not a real value but it is taken
as an interval [ g i ,min , g i ,max ], then to obtain a better solution of a maximization-type
objective function, gi should attain the maximum value of the range, and for objective
function of a minimization type, it should take its minimum value of the specified
range.
Thereafter, we introduce a new variable yi in the model WGP as Z i ( x) − di+ + di− = y i,
and two deviation variables such as ei+ and ei− are similar to di+ and di− , respectively,
along with the constraint y i − ei+ + ei− = g i ,min or g i ,max . Then, the objective function
( ) ( )
is converted into the form as minimize ∑ ik= 1  wi di+ + di− + α i ei+ + ei− , where αi are
weights corresponding to the goal deviations. Using this objective function, we
construct the model RMCGP. The RMCGP minimizes the deviations (di+ + di− ) and
(ei+ + ei− ), whereas the GP minimizes the deviation of the value of objective function,
i.e., ∑ ik= 1 wi (di+ + di− ). Thus, in fact of minimizing the objective function in the RMCGP
( )
model, the second part of objective function ∑ ik= 1 α i ei+ + ei− is also minimized. This
implies that the value of yi tends to g i ,max for a maximizing-type objective func-
tion, and yi tends to g i,min for a minimizing-type objective function. In WGP or
GP, the goal deviations are only minimized, which does not consider the type
of objective function. Subsequently, the additional variables ei+ and ei− tackle the
situation that minimizes the deviations according to the type of the objective func-
tion. Hence, we establish that RMCGP produces a better result compared to WGP
or GP models.
Furthermore, in the mathematical model of RMCGP, if we consider the goal
deviations ei+ and ei− as 0, then, it reduces to the form of a WGP. Also, model WGP is
a modification of the model GP. Therefore, it is cleared that the solution of RMCGP
is better than the solution of WGP and GP. Hence, the arguments evince the proof of
the theorem.
The efficiency of our proposed algorithm is presented followed by the algorithm,
which establishes a better utility of our proposed algorithm in comparison to FP, GP,
and RMCGP.
4.5 Numerical example
A rice merchant has three warehouses at three locations O1, O2, and O3. He delivers rice
into three different markets D1, D2, and D3. In the warehouses, there are different capaci-
ties of rices with different prices in different locations. It is noticed by the merchant that
the maximum supplying capacity of rice at the origins O1, O2, and O3 are 1100, 1250, and
1150 kg, respectively. Furthermore, the minimum demands of rice at the destinations D1,
D2, and D3 are 1150, 1100, and 1225 kg, respectively.
The merchant wishes to deliver rice to the destinations keeping in mind that he has
to minimize transportation cost but maximize the profit with the consideration that
the transportation costs are paid by the customers. Basically every customer wishes to
minimize transportation cost in time of purchasing rice, whereas the merchant wishes to
maximize his profit. Conflicting situations occur, and the problem becomes a MOTP with
two objective functions. We present the data regarding the transportation parameters in
Tables 4.1 and 4.2.
Considering the aforementioned data, the following MOTP is formulated:
Table 4.1 Transportation cost per unit kilogram rice (in $)

D1 D2 D3
O1 3.5 4.1 4.5
O2 5.5 4.5 5.0
O3 4.5 4.2 4.0
Table 4.2 Profit per unit kilogram rice (in $)

D1 D2 D3
O1 2.5 1.5 2.0
O2 2.1 1.8 1.5
O3 1.5 2.2 1.9
Model P
minimize Z1 = 3.5 x11 + 4.1x12 + 4.5 x13 + 5.5 x21 + 4.5 x22
+ 5.0 x23 + 4.5 x31 + 4.2 x32 + 4.0 x33 (4.22)
maximize Z1 = 2.5 x11 + 1.5x12 + 2.0x13 + 2.1x21 + 1.8x22
+ 1.5 x23 + 1.5 x31 + 2.2 x32 + 1.9 x33 (4.23)
subject to x11 + x12 + x13 ≤ 1100 (4.24)

x21 + x22 + x23 ≤ 1250 (4.25)
x31 + x32 + x33 ≤ 1150 (4.26)
x11 + x21 + x31 ≥ 1150 (4.27)
x12 + x22 + x32 ≥ 1100 (4.28)
x13 + x23 + x33 ≥ 1225 (4.29)
xij ≥ 0 ∀i and j. (4.30)
4.5.1 Fuzzy programming
The ideal solutions obtained by solving the objective functions Z1 and Z2 separately
subject to the constraints (4.24)–(4.30) are given as [X 1* ] = [1100, 0, 0, 50,1100,75, 0, 0,1150],
[X 2* ] = [0, 0,1100,1175, 0,75, 0,1100, 50]. Based on the ideal solutions, we formulate the p ay-off
matrix, which is shown in Table 4.3.
Using Table 4.3, we formulate the following membership functions corresponding to
each objective function of the proposed problem as
 0, if Z 1 ≥ 16,607.5

 16,607.5 − Z 1 (X )
(
µ Z (X ) = 
1
) , if 14, 050 ≤ Z 1 (X ) ≤ 16,607.5
 16,607.5 − 14, 050
 1, if Z 1 (X ) ≤ 14, 050,

 0, if Z 2 (X ) ≤ 7132.5

 Z 2 (X ) − 7132.5
(
µ Z (X ) = 
2
) , if 7132.5 ≤ Z 2 (X ) ≤ 7295
 7295 − 7132.5
 1, if Z 2 ≥ 7295.

Table 4.3 Pay-off matrix

Z1 Z2
X 1* 14,050 16,607.5
X 2* 7,132.5 7,295
Using the procedure described in Section 4.4.1, finally we design the following model as:
Model P1
maximize λ
subject to 1957.5λ ≤ 16,607.5 − (3.5 x11 + 4.1x12 + 4.5 x13 + 5.5 x21 + 4.5 x22 + 5.0 x23
+ 4.5 x31 + 4.2 x32 + 4.0 x33 ), 162.5λ ≤ (2.5 x11 + 1.5 x12 + 2.0 x13 + 2.1x21 + 1.8x22

+ 1.5x23 + 1.5x31 + 2.2 x32 + 1.9x33 ) − 7132.5,
Model P1 is an LPP and using LINGO10 software, we obtain the compromise optimal
solution as follows:
[X * ] = [559.53, 0, 540.47,615.47,634.53, 0, 0, 465.47,684.53].
The minimum value of objective function Z 1 (X * ) = $15, 324.02, and the maximum value of
objective function Z 2 (X * ) = $7239.05 . The value of the aspiration level is λ = 0.66.
4.5.2 Goal programming
According to the market situations, the DM has some knowledge of approximate profit
connecting with the optimum transportation cost. In that situation, the DM wishes to solve
the MOTP in such a way that the transportation cost belongs to the interval [15,000, 16,000]
(lesser value is preferred by DM) and the profit belongs to [7100, 7500] (greater value is pre-
ferred by DM). So, it is required to schedule the amount of rice to be transported satisfying
the predetermined goals assumed by the DM.
To achieve the goals in Model P, we formulate the model by goal programming in the
following way: Assume the deviations of goal 1 and goal 2 are 1000 and 400, respectively.
Consider the weights w1 = 1/1000 and w2 = 1/400 to Model P, then Model P reduces to
Model P2 as follows:
Model P2
1 1
minimize
1000
(
d1+ + d1− +)400
(
d2+ + d2− )
subject to 3.5 x11 + 4.1x12 + 4.5 x13 + 5.5 x21 + 4.5 x22 + 5.0 x23 + 4.5x31 + 4.2 x32
+ 4.0 x33 − d1+ + d1− = y1 , 15, 000 ≤ y1 ≤ 16, 000,
2.5 x11 + 1.5 x12 + 2.0 x13 + 2.1x21 + 1.8 x22 + 1.5 x23 + 1.5 x31 + 2.2 x32
+ 1.9 x33 − d2+ + d2− = y 2 ,7100 ≤ y 2 ≤ 7500,
dt+ ≥ 0, dt− ≥ 0, t = 1, 2,

The following compromise solution is obtained by solving Model P2:
[X*] = [618.66, 0, 481.34, 450.70, 774.3, 0, 80.63, 325.7, 743.66]. The values of the objective
functions are Z 1 (X * ) = $15, 000 and Z 2 (X * ) = $7100, respectively.
4.5.3 Revised multi-choice goal programming

Assume the same deviations and weights as taken in Model P2 and construct the follow-
ing Model P3 under RMCGP for solving MOTP.
Model P3
1 1 1 1 +
minimize
1000
( )
d1+ + d1− +
400
(
d2+ + d2− +
1000
) (
e1+ + e1− +
400
)
e2 + e2− ( )
subject to 3.5 x11 + 4.1x12 + 4.5 x13 + 5.5 x21 + 4.5 x22 + 5.0 x23 + 4.5x31 + 4.2 x32
+ 4.0 x33 − d1+ + d1− = y1 , y1 − e1+ + e1− = 15, 000,

15, 000 ≤ y1 ≤ 16, 000,
2.5 x11 + 1.5 x12 + 2.0 x13 + 2.1x21 + 1.8 x22 + 1.5 x23 + 1.5 x31 + 2.2 x32 + 1.9 x33 − d2+ + d2− = y 2 ,
y 2 − e2+ + e2− = 7500,
7100 ≤ y 2 ≤ 7500,
dt+ ≥ 0, dt− ≥ 0, et+ ≥ 0, et− ≥ 0, t = 1, 2,
The following compromise solution is obtained by solving Model P3:

[X*] = [706.82, 0, 393.18, 468:18; 781.82, 0, 0, 318.18, 831.82]. The values of the objective
functions are Z1 (X * ) = $15, 000 and Z 2 (X * ) = $7224.32 .
4.5.4 Vogel approximation method

According to the proposed algorithm, first we reduce the transportation parameters in
normal form, which are shown in Tables 4.4 and 4.5. To do this, we divide each of the cost
Table 4.4 Normalized transportation cost per unit kilogram rice (in $)
D1 D2 D3
O1 0.636 0.745 0.816
O2 1 0.818 0.909
O3 0.818 0.764 0.727
Table 4.5 Normalized profit per unit kilogram rice (in $)

D1 D2 D3
O1 1 0.6 0.8
O2 0.84 0.72 0.6
O3 0.6 0.88 0.76
Table 4.6 Reduced value of transportation parameter

D1 D2 D3
O1 −0.364 0.145 0.018
O2 0.16 0.098 0.309
O3 0.218 −0.116 −0.033
Table 4.7 Optimal compromise solution

Weight for Z1 Weight for Z2 Solution of Z1 Solution of Z2
0.50 0.50 14,050 7,132.5
0.45 0.55 14,162.5 7,177.5
0.40 0.60 14,162.5 7,177.5
0.35 0.65 14,187.5 7,185
0.30 0.70 14,187.5 7,185
0.20 0.80 14,187.5 7,185
0.10 0.90 14,300 7,192.5
0.05 0.95 16,555.5 7,295
0.00 1.0 16,555.5 7,295
components of Tables 4.1 and 4.2 by the maximum cost component to each of them, and
they are 5.5 and 2.5, respectively.
According to the proposed algorithm, if we take equal weights to the objective
functions, then we find the transportation cost which is shown in Table 4.6 in normalized
form corresponding to the equivalent single-objective function for solving the proposed
problem.
From Table 4.6, we see that some transportation parameters take negative value, we
add the magnitude of the most negative value to each of the cost parameters in Table 4.6
and solve it by VAM to get the optimal allocation in the transportation cell. Then the fol-
lowing compromise solution is obtained: [X*] = [1100, 0, 0, 50, 1100, 75, 0, 0, 1150]. Finally, the
values of the objective functions are Z1 (X * ) = $14, 050 and Z 2 (X * ) = $7132.5 . This optimal
compromise solution is the best solution preferred by buyer.
In a similar way, to choose different weights for the objective functions, we prepare
Table 4.7 which contains optimal compromise solutions.
From Table 4.7, the DM can choose any one of the optimal compromise solutions, and
we present the best solutions preferred by both merchant and buyer. According to the
choice of the DM, he may take the solution Z1 (X * ) = $14, 300 and Z 2 (X * ) = $7192.5 corre-
sponding to the weights 0.1 and 0.9, respectively, which is better for both merchant and
buyer, not dominating any one by the other.
4.6 Comparison
According to the obtained solutions of formulated Model P by FP, GP, RMCGP, and VAM, it
is clear that the algorithm using VAM produces a better solution than FP, GP, and RMCGP.
Also, we have seen that there is no need for any auxiliary variable in VAM which is nec-
essary in FP, GP, and RMCGP. In this regard, we may say that the proposed algorithm is
more effective with less computation burden for solving MOTP. The mathematical model
of GP is a special structure of RMCGP because the value of α i = 0 for all i in RMCGP which
produces GP. GP tries to optimize the goal values but not prefer the goals properly for
maximization or minimization problems, whereas RMCGP treats these goals as the DM’s
choices. Also, in GP or RMCGP one of the most important drawbacks is how to select the
goals. There may be a situation, if the goal is not selected in proper way, in which the solu-
tion is infeasible. If, for example, the DM selects the goals [12,000, 13,000] (lesser value is
preferred by DM) and the profit belongs to [7400, 7600] (greater value is preferred by DM),
then we cannot find any optimal compromise solution from both the GP and the RMCGP.
Again in FP, we have solved the objective functions separately and to form a pay-off
matrix; and finally a single-objective function is derived and solved to find the optimal
compromise solution. During the process, altogether, we have solved three objective func-
tions to obtain an optimum solution. Also, we have used two additional constraints and
one auxiliary variable to solve the MOTP. So, this approach is laborious to solve MOTP. In
addition to that, it is seen from our proposed example that the solution of MOTP by FP does
not depend on the expected solution by the DM, if there is to be any kind of expectation for
optimum values of objective functions. That is why, in most of the real-life decision-mak-
ing problems, the FP is less important to produce a more optimal compromise solution.
Finally, in the proposed approach through VAM, the obtained optimal compromise
solution is better than the solutions of FP, GP, and RMCGP. Furthermore, there is no need
to use any auxiliary variable or any additional constraints to solve the MOTP by the pro-
posed algorithm. It is clear that the set of all normalized weights wi produces a set of opti-
mal compromise solutions of the MOTP. The better optimal compromise solution is one of
the compromise solution picked by the DM. Here, we derive a better solution compared to
the solutions obtained by FP, GP, and RMCGP.
4.7 Conclusion and future study

This chapter has explored the study of MOTP under the approaches such as FP, GP,
RMCGP, and VAM. FP, GP, and RMCGP have been well-known methods to formulate
the mathematical model and solve multi-objective decision-making problems which are
discussed in this chapter. Contrary to existing approaches, here we have presented a
new algorithm to solve MOTP using VAM. Basically, the optimization models of MOTP
through FP, GP, and RMCGP have been carried out by the help of software for com-
promise solution. But our presented algorithm has been good enough to solve MOTP
without any help from mathematical software. Again, in this chapter, we presented a
comparison among the solution approaches along with the merits and demerits of the
proposed approaches.
In future study, multi-objective transportation planning should be integrated in the
different areas of study, such as networks, stations, user information, and fare payment
systems. Again, the proposed model of MOTP can be used in the selection of modes in a
variety of transportation improvement policies, such as mobility management strategies,
pricing reforms, and smart growth land use policies. In addition, the proposed study can
be implemented in different uncertain environments to accommodate real-life situations
for selecting optimum modes of transportation.
Acknowledgment
The author Gurupada Maity acknowledges the University Grants Commission of India
for supporting the financial grant to carry on this research work under JRF(UGC) scheme:
Sanctioned letter number [F.17-130/1998(SA-I)] dated 26/06/2014.
References
1. F.L. Hitchcock, The distribution of a product from several sources to numerous localities,
Journal of Mathematics and Physics 20 (1941) 224–230.
2. T.C. Koopmans, Optimum utilization of the transportation system, Econometrica 17 (1949)
136–146.
3. A. Ebrahimnejad, An improved approach for solving fuzzy transportation problem with
triangular fuzzy numbers, Journal of Intelligent and Fuzzy Systems 29(2) (2015) 963–974.
4. A. Kaur and A. Kumar, A new method for solving fuzzy transportation problems using
ranking function, Applied Mathematical Modelling 35(12) (2011) 5652–5661.
5. D.R. Mahapatra, S.K. Roy and M.P. Biswal, Multi-choice stochastic transportation problem
involving extreme value distribution, Applied Mathematical Modelling 37(4) (2013) 2230–2240.
6. G. Maity and S.K. Roy, Solving multi-objective transportation problem with interval goal
using utility function approach, International Journal of Operational Research 27(4) (2016)
513–529.
7. G. Maity and S.K. Roy, Solving multi-choice multi-objective transportation problem:
A utility function approach, Journal of Uncertainty Analysis and Applications (2014)
DOI:10.1186/2195–5468–2–11.
8. G. Maity, S.K. Roy and J.L. Verdegay, Multi-objective transportation problem with cost
reliability under uncertain environment, International Journal of Computational Intelligence
Systems 9(5) (2016) 839–849.
9. S. Midya and S.K. Roy, Single-sink, fixed-charge, multi-objective, multi-index stochastic trans-
portation problem, American Journal of Mathematics and Management Sciences 33 (2014) 300–314.
10. S. Midya and S.K. Roy, Analysis of interval programming in different environments and
its application to fixed charge transportation problem, Discrete Mathematics, Algorithms and
Applications 9(3) (2017) 1750040, 17 pages.
11. S.K. Roy, Multi-choice stochastic transportation problem involving Weibull distribution,
International Journal of Operational Research 21(1) (2014) 38–58.
12. S.K. Roy and G. Maity, Minimizing cost and time through single objective function in
multi-choice interval valued transportation problem, Journal of Intelligent and Fuzzy Systems
32(3) (2017) 1697–1709.
13. S.K. Roy, G. Maity and G.W. Weber, Multi-objective two-stage grey transportation problem using
utility function with goals, Central European Journal of Operations Research 25(2) (2017) 417–439.
14. A. Charnes and W.W. Cooper, Management Model and Industrial Application of Linear Program
ming, 1, Wiley: New York (1961).
15. G. Maity and S.K. Roy, Solving a multi-objective transportation problem with nonlinear
cost and multi-choice demand, International Journal of Management Science and Engineering
Management 11(1) (2016) 62–70.
16. F.A.E.W. Waiel, A multi-objective transportation problem under fuzziness, Fuzzy Sets and
Systems 117(1) (2001) 27–33.
17. S.K. Roy, G. Maity, G.W. Weber and S.Z. Alparslan Gök, Conic scalarization approach to solve
multi-choice multi-objective transportation problem with interval goal, Annals of Operations
Research 253(1) (2017) 599–620.
18. A. Kumar, S. Pant, M. Ram and S.B. Sing, On solving complex reliability optimization prob-
lem using multi-objective Particle Swarm optimization, Mathematics Applied to Engineering,
Academic Press, (2017) 115–131.
19. A. Kumar, S. Pant and M. Ram, System reliability optimization using grey wolf optimizer algo-
rithm, Quality and Reliability Engineering International, John Wiley & Sons 33 (2017) 1327–1335.
20. S. Pant, A. Kumar, S.B. Sing and M. Ram, A modified Particle Swarm optimization algorithm
for nonlinear optimization, Nonlinear Studies 24(1) (2017) 127–138.
21. S. Pant, A. Kumar and M. Ram, Reliability optimization: A particle swarm approach, In: Ram
M., and Davim J. (eds), Advances in Reliability and System Engineering, Management and Industrial
Engineering, Springer, Cham,
22. H.J. Zimmermann, Fuzzy programming and linear programming with several objective
functions, Fuzzy Sets and Systems 1 (1978) 45–55.
23. E.L. Hannan, An assessment of some of the criticisms of goal programming, Computers &
Operations Research 12 (1985) 525–541.
24. J.P. Ignizio, Goal Programming and Extensions, Lexington Books: Lexington, MA (1976).
25. M. Tamiz, D.F. Jones and C. Romero, Goal programming for decision making: An overview of
the current state-of-the-art, European Journal of Operational Research 111 (1998) 569–581.
26. C. Romero, A general structure of achievement function for a goal programming model,
European Journal of Operational Research 153 (2004) 675–686.
27. C.N. Liao, Formulating the multi-segment goal programming, Computers and Industrial
Engineering 56 (2009) 138–141.
28. B.B. Tabrizi, K. Shahanaghi and M.S. Jabalameli, Fuzzy multi-choice goal programming,
Applied Mathematical Modelling 36 (2012) 1415–1420.
29. C.T. Chang, Multi-choice goal programming, Omega 35 (2007) 389–396.
30. C.T. Chang, Revised multi-choice goal programming, Applied Mathematical Modelling 32
(2008) 2587–2595.
31. L.A. Zadeh, Fuzzy sets, Information and Control 8 (1965) 338–353.
chapter five
An application of dual-response surface

optimization methodology to improve
the yield of pulp cooking process
Boby John and K.K. Chowdhury
Indian Statistical Institute
Contents
5.1 I ntroduction........................................................................................................................... 91
5.2 Simultaneous optimization of multiple characteristics.................................................. 96
5.2.1 Derringer’s desirability function method............................................................. 96
5.2.2 Taguchi’s loss function approach........................................................................... 97
5.2.3 Fuzzy logic approach............................................................................................... 98
5.2.4 Dual-response surface methodology..................................................................... 99
5.3 Data collection and modeling........................................................................................... 100
5.4 Optimization....................................................................................................................... 104
5.5 Validation............................................................................................................................. 106
5.6 Conclusion........................................................................................................................... 108
References...................................................................................................................................... 109
5.1 Introduction
Many modern processes are reasonably complex and have multiple output characteristics.
For example, a heat treatment process like induction hardening needs to be executed to
meet the requirements on surface hardness and case depth. Even an agile software devel-
opment process needs to be executed to simultaneously meet the goals set on performance
characteristics like sprint productivity, spring velocity, defect density, etc. (John et al., 2017).
In such a scenario, the process engineers need to execute the processes in such a way to
meet the customer requirements on multiple characteristics. In other words, the engineers
need to identify an optimum setting of process control factors, which would simultane-
ously optimize multiple output characteristics. This can be done using the application
of simultaneous optimization of multiple characteristics methodology. A lot of research
has been carried out in the past on simultaneous optimization of output characteristics,
and many approaches have been proposed. The important among them are Derringer’s
desirability function approach, Taguchi’s loss function approach, dual-response surface
methodology and fuzzy logic–based approach. In this chapter, the authors present a case
study on simultaneous optimization of multiple output characteristics of the pulp cooking
process. The methodology used for simultaneous optimization is dual-response surface
methodology.
91
This study is carried out at an organization manufacturing rayon grade pulp. The
rayon grade pulp is the raw material for manufacturing viscous staple fiber. The viscous
staple fiber is used for making clothes. The pulp cooking process is an important step in
the manufacturing of rayon grade pulp. The pulp is the cellulose component of the wood.
The cellulose is separated from other components and impurities of wood by cooking the
wood chips in a highly pressurized chamber followed by multiple stages of washing and
chemical treatments.
The company produces approximately 210 tons of pulp daily and sells at a price of
Indian Rupees 28,000 per ton of pulp. Even a small increase in pulp yield can have huge
economic benefits for the organization. The yield of the pulp cooking process is defined as
wp
y= × 100 (5.1)
wc
where y is the pulp yield, wp is the weight of pulp produced, and wc is the weight of the
wood chips loaded. Unfortunately, the pulp yield cannot be increased indefinitely as it
will adversely affect the pulp viscosity. The pulp with viscosity beyond 52 centipoises (cp)
is graded as low quality. One centipoise is equal to one millipascal-second. To quantify
the current status of the pulp cooking process, the data on yield and viscosity of the past
twenty batches are collected. The collected data are given in Table 5.1.
The descriptive summary of the pulp yield is given in Figure 5.1.
Figure 5.1 shows that the average pulp yield per batch is only 34.027. So there is a lot
of scope for improvement. Figure 5.1 also revealed that the yield is normally distributed,
Table 5.1 Yield and viscosity data of pulp cooking process

Batch number Yield Viscosity
1 33.8 51.40
2 34.0 50.76
3 34.0 52.24
4 33.9 51.11
5 34.0 52.24
6 34.2 49.87
7 33.9 51.12
8 34.0 52.05
9 34.2 51.46
10 33.9 50.64
11 33.7 51.76
12 34.2 49.09
13 34.2 50.25
14 34.0 50.74
15 34.2 50.01
16 34.0 51.95
17 34.1 51.06
18 34.2 48.72
19 34.0 50.77
20 34.1 50.56
Chapter five: An application of dual-response surface optimization methodology 93
Summary report for yield
Anderson-Darling normality test

A-squared 0.39
P-value 0.344
Mean 34.027
StDev 0.141
Variance 0.020
Skewness –0.542541
Kurtosis 0.103251
N 20
Minimum 33.700
1st quartile 33.925
33.7 33.8 33.9 34.0 34.1 34.2 Median 34.015
3rd quartile 34.157
Maximum 34.240
95% confidence interval for mean
33.962 34.093
95% confidence interval for median
95% confidence intervals 34.000 34.138
95% confidence interval for StDev
Mean 0.107 0.206
Median
33.95 34.00 34.05 34.10 34.15
Figure 5.1 Descriptive summary of yield of pulp cooking process.
as the p-value of Anderson–Darling normality test is greater than 0.05 (Mathews, 2005).
Similarly, the descriptive summary of the viscosity is given in Figure 5.2.
Figure 5.2 shows that the average viscosity is 50.89 with a standard deviation of 0.973.
The upper specification limit (USL) on pulp viscosity is 52 cp. Hence, it is very likely that
the pulp cooking process is not capable of meeting the customer requirement of producing
pulp with viscosity within 52 cp. Figure 5.2 also shows that the viscosity is normally dis-
tributed as Anderson–Darling normality test p-value > 0.05. Hence, the viscosity data are
subjected to capability analysis. The process capability analysis result is given in Figure 5.3.
Figure 5.3 shows that the Ppk is only 0.38 which is less than 1.0 indicating that the pulp
cooking process is not capable of meeting the customer requirements on viscosity. Hence,
there is a need to make the pulp cooking process capable of meeting the customer require-
ment on viscosity as well as improving the yield of the process as far as possible.
The performance of the pulp cooking process can be unsatisfactory due to the pres-
ence of assignable causes. To check whether the pulp cooking process is in statistical con-
trol, control charts are constructed for the pulp yield and viscosity. Since both yield and
viscosity are normally distributed, the individual x chart (Montgomery, 2002) is used. The
individual x chart of yield is given in Figure 5.4 and that of viscosity is given in Figure 5.5.
Figures 5.4 and 5.5 show that none of the points plotted is beyond the upper or
lower control limits (UCL or LCL). Moreover, none of the out-of-control run rules
(Leavenworth and Grant, 2000) is violated in both the cases. This shows that the pulp
cooking process is under control and free from the influence of assignable causes. In
Summary report for viscosity
Anderson-Darling normality test

A-squared 0.29
P-value 0.589
Mean 50.890
StDev 0.973
Variance 0.947
Skewness –0.599269
Kurtosis 0.089415
N 20
Minimum 48.720
1st quartile 50.328
Median 50.915
49 50 51 52 3rd quartile 51.685
Maximum 52.240
95% confidence interval for mean
50.435 51.345
95% confidence interval for median
50.579 51.446
95% confidence intervals
95% confidence interval for StDev
Mean 0.740 1.421
Median
50.50 50.75 51.00 51.25 51.50
Figure 5.2 Descriptive summary of pulp viscosity.
Process capability report for viscosity

USL
Overall
Process data
Within
LSL •
Target •
USL 52
Overall capability
Sample mean 50.89
Sample N 20 Pp •
PPL •
StDev (overall) 0.972917
PPU 0.38
StDev (within) 1.11702 Ppk 0.38
Cpm •
Potential (within) capability

Cp •
CPL •
CPU 0.33
Cpk 0.33
49 50 51 52 53
Performance
Observed Expected overall Expected within
PPM < LSL • • •
PPM > USL 150000.00 126956.10 160180.99
PPM total 150000.00 126956.10 160180.99
Figure 5.3 Process capability analysis of pulp viscosity.

I chart of yield
34.50
UCL = 34.461
34.25
Individual value
X = 34.027
34.00
33.75
LCL = 33.594
33.50
1 3 5 7 9 11 13 15 17 19
Observation
Figure 5.4 Individual x chart of pulp yield.
I chart of viscosity
55
54 UCL = 54.241
53
52
Individual value
51 X = 50.89
50
49
48
LCL = 47.539
47
1 3 5 7 9 11 13 15 17 19
Observation
Figure 5.5 Individual x chart of pulp viscosity.
other words, it is not just a process control problem but needs process optimization.
Hence, it is decided to carry out the design of experiments to improve and optimize
the pulp cooking process. Moreover, the pulp cooking process needs to be optimized to
meet the requirements of two output characteristics, namely, pulp yield and v iscosity.
So a widely popular simultaneous optimization of multiple output characteristics
methodology, namely, dual-response surface methodology (Box and Draper, 2007;

Myers et al., 2009) is used in this study.
The remaining part of this chapter is organized as follows: A brief description of
commonly used multiple response optimization methodologies is given in Section 5.2,
Section 5.3 discusses the data collection and modeling details, the optimization meth-
odology is given in Section 5.4, Section 5.5 discusses the validation of the results and
conclusions are given in Section 5.6.
5.2 Simultaneous optimization of multiple characteristics

A lot of research has been carried out in the field of simultaneous optimization of mul-
tiple characteristics. Much research has resulted in suggesting different methodologies
for simultaneous optimization of characteristics. The widely popular among them are
discussed in this section.
5.2.1 Derringer’s desirability function method

The desirability function is available for the smaller the better (STB), the larger the better
(LTB) and nominal the best (NTB) type characteristics (Derringer, 1994; Harrington, 1965).
The desirability function for NTB characteristics is defined as
α
y − LSL
d= , if LSL < y ≤ T (5.2)
T − LSL
β
y − USL
d= , if T ≤ y < USL (5.3)
T − USL
d = 0, if y ≤ LSL or y ≥ USL (5.4)
where y is the characteristic under study, T is the target, LSL is the lower specification
limit, and USL is the upper specification limit of y.
The desirability function for STB characteristics is defined as
α
y − USL
d= , if y min < y < USL (5.5)
y min − USL
d = 0, if y ≥ USL (5.6)
d = 1, if y ≤ y min (5.7)
where y is the characteristic under study, USL is the upper specification limit of y and ymin
is the practically achievable most desirable minimum value or target of y.
Similarly, the desirability function for LTB characteristics is defined as
α
y − LSL
d= , if LSL < y < y max (5.8)
y max − LSL
d = 0, if y ≤ LSL (5.9)
d = 1, if y ≥ y max (5.10)
where y is the characteristic under study, LSL is the lower specification limit of y and ymax
is the practically achievable most desirable maximum value or target of y. The weights α
and β in the desirability function need to be chosen based on the desirability of quality
characteristic y with respect to its target and specification limits.
Equations (5.2)–(5.10) show that the desirability value d will be 1 when the character-
istic y is on the target. The desirability value decreases as y moves away from the target.
The desirability value will be 0 when the characteristic under study y is on or beyond the
specification limits. For simultaneous optimization of multiple characteristics, the desir-
ability function value di, i = 1, 2, …, k is computed for each characteristic yi,, and the over-
all desirability D is computed as the geometric mean of individual desirability values as
shown in Equation (5.11):
D = (d1 × d2 ×  × dk )k (5.11)
Finally, the values of the factors that would simultaneously optimize multiple character-
istics are found out by maximizing the overall desirability value D. Some of the impor-
tant applications of the desirability function approach for simultaneous optimization
of response variables are in CNC turning of AISI P-20 tool steel (Aggarwal et al., 2008),
analytical methods development (Candioti et al., 2014), carbonitriding of bushes (John,
2013), etc.
5.2.2 Taguchi’s loss function approach

Taguchi’s loss function measures the weighted square of the deviation of the characteristic
y from its target (Taguchi et al., 2005). The loss function is available for NTB, STB, and LTB
types of characteristics. Let y1, y2, …, yn be n observations of the response variable y, then
the expected quality loss l(y) is defined as
n
l( y ) =
1
n
k ∑ (y − T ) , for NTB (5.12)
i=1
i
2
l( y ) =
1
n
k ∑y ,
i=1
2
i for STB (5.13)
l( y ) =
1
n
k ∑ y1 ,
i=1
i
2
for LTB (5.14)
where T is the target and k is a proportionality constant known as quality loss coeffi-
cient. For the STB-type response variable, the target is taken as zero and for the LTB-type
response variable, the target is generally taken as infinity. From Equations (5.12) to (5.14),
it is clear that the expected loss l(y) will be zero when the responses are on target and the
loss increases as the response variables move away from the respective targets. For simul-
taneous optimization of multiple responses, k is often chosen in such a way that expected
quality loss l(y) will be equal to 1 when the response variables are either on upper or lower
specification limits. For example, for NTB response variables, k is chosen as
2
 2 
k= (5.15)
 USL− LSL 
where USL is the upper specification limit and LSL is the lower specification limit. To
use Taguchi’s loss function approach for simultaneous optimization, the expected loss is
computed for each response variable yj, j = 1, 2, …, k using Equations (5.12)–(5.14), and the
overall expected loss L(y) is computed as the average of the individual expected losses as
shown in Equation (5.16):
k
L( y ) =
1
k ∑ l(y ) (5.16)
j=1
j
Finally, the values of the factors that would simultaneously optimize multiple responses
are found out by minimizing the overall expected loss L(y). Many applications of Taguchi’s
loss function approach for simultaneous optimization of multiple responses are available
in the literature (Antony, 2001; John, 2012; Lin and Lin, 2002; Nian et al., 1999).
5.2.3 Fuzzy logic approach

The fuzzy logic approach for optimization is suggested by Kim and Lin (1998). In fuzzy
logic approach, each characteristic y is assigned a membership value using a member-
ship function m(z). Generally, an exponential membership function is used as given in
Equations (5.17) and (5.18):
ed − ed z
m( z) = , if d ≠ 0 (5.17)
ed − 1
m( z) = 1 − z , if d = 0 (5.18)
where d is called the exponential constant, and z measures the deviation of the character-
istic y from its target value. The z is defined for all three types of characteristics, namely
STB, LTB, and NTB are given in Equations (5.19)–(5.21):
y −T y −T
z= or , for NTB (5.19)
yU − T T − yL
y − y min
z= , for STB (5.20)
yU − y min
y max − y
z= , for LTB (5.21)
y max − y L
where y is the characteristic under study, T is the specified target on y, yU, and yL are the
upper and lower limits of y, ymin is the best possible minimum value y can achieve in case
of the STB and ymax is the best possible maximum value y can achieve in case of the LTB.
The membership function m(z) is assigned a value of 0 whenever y is beyond the upper
or lower limit. When y = T in Equation (5.19) or y = ymin in Equation (5.20) or y = ymax in
Equation (5.21), z will be 0 and the membership function m(z) achieves the maximum value
of 1. In other words, m(z) achieves the best value of 1 when y is on target or at best possible
value. Moreover, the rate of decrease of m(z) will be high when d < 0, the rate of decrease
will be low when d > 0 and the rate of decrease will be constant when d = 0. The user can
choose the value of d based on the desired rate of decrease of m(z).
For simultaneous optimization of multiple characteristics using fuzzy logic, the mem-
bership function m(z) is computed for all the characteristics, and the optimum values of the
factors are identified by maximizing the minimum of m(z) values (Lin et al., 2000).
5.2.4 Dual-response surface methodology

The response surface methodology aims to improve and optimize processes using sta-
tistical and mathematical techniques. In response surface methodology, the mean of the
response variable is optimized (Ding et al., 2004). The emphasis of dual-response surface
methodology is on simultaneously optimizing the mean and variance of the response
variable (John, 2015). This is achieved by developing polynomial models for mean and
standard deviation of the response variable as
k k k
yˆ µ = a0 + ∑ a x + ∑ a x + ∑ ∑ a x x (5.22)
i=1
i i
i=1
2
ii i
i< j
ij i j
k k k
yˆ σ = b0 + ∑ b x + ∑ b x + ∑ ∑ b x x (5.23)
i=1
i i
i=1
2
ii i
i< j
ij i j
where is ŷ µ is the estimated mean, ŷσ is the estimated standard deviation of the response
variable y and xi, i = 1, 2, …, k are the explanatory variables. Then the optimum values of
the explanatory variables that would simultaneously optimize the estimated mean and
variance of the response variable (Vining and Myers, 1990) are obtained by formulating
and solving the optimization problem of
Minimize yˆ σ (5.24)
Subject to yˆ µ = T (5.25)
The aforementioned optimization problem can be solved using Microsoft Excel Solver (Del
Castillo and Montgomery, 1993). The methodology can also be used for simultaneous opti-
mization of multiple responses. For example, a STB-type response variable and a NTB-
type response variable can be simultaneously optimized by solving
Minimize yˆ 1 (5.26)
Subject to yˆ 2 = T (5.27)
where ŷ1 is the estimated value of the STB-type response variable, ŷ 2 is the estimated value
of the NTB response variable, and T is the specified target value of ŷ 2 .
There are many other approaches also available for simultaneous optimization of mul-
tiple response variables, namely, grey relational analysis (Chiang and Chang, 2006), prin-
cipal component analysis (Tong et al., 2005), artificial neural networks (Noorossana et al.,
2009), genetic algorithm (Ortiz et al., 2004), etc. In this case study, the authors have used
the dual-response surface methodology for simultaneous optimization of pulp yield and
viscosity of the pulp cooking process.
5.3 Data collection and modeling

Through discussions with the technical professionals of the company, three factors,
namely, percentage of sulfidity of cooking medium (x1), percentage of black liquor in cook-
ing medium (x2) and cooking time (x3) are selected for experimentation. The pulp yield (y1)
and pulp viscosity (y2) are taken as the response variables. The engineers also suspected
interaction between the factors; hence, an eight-run full factorial experiment is designed.
The operational personnel also suggested that the relationship between the response vari-
ables and factors may not be linear, so four central points are also added to the design to
verify the form of relationship between response variables and factors. The factors chosen
for the experiments with its levels are given in Table 5.2.
The experiments are conducted as per the design, and the response variables are mea-
sured. The experimental layout with the response variables is given in Table 5.3.
The experimental data on response variable pulp yield (y1) are subjected to analysis of
variance. The analysis of variance (ANOVA) table is given in Table 5.4.
Table 5.4 shows that the p-values of regression and interaction are less than 0.05, indi-
cating that one or more factors, as well as their interactions, are significant at the 5% level.
Table 5.2 Factors and levels chosen for experiment

Factor name Factor code Level 1 (−1) Center point (0) Level 2 (+1)
Sulfidity (%) x1 15 18 21
Black liquor (%) x2 0 5 10
Cooking time (minutes) x3 55 60 65
Table 5.3 Experimental layout with response values

Experiment
number x1 x2 x3 y1 y2
1 −1 −1 −1 36 51.65
2 −1 −1 +1 35.8 50.1
3 −1 +1 −1 36.3 53.55
4 −1 +1 +1 36.5 48.9
5 +1 −1 −1 36.9 55.2
6 +1 −1 +1 36.8 51.7
7 +1 +1 −1 36.7 55.45
8 +1 +1 +1 36.5 53.5
9 0 0 0 36.5 52.7
10 0 0 0 36.6 49.7
11 0 0 0 36.4 51.45
12 0 0 0 36.6 49.3
Table 5.4 ANOVA table for pulp yield

Source df SS MS F p-value
Regression 3 0.70375 0.23458 4.9010 0.03213
Residual 8 0.3829 0.04786
Interaction 3 0.30375 0.10125 11.045 0.03958
Pure quadratic 2 0.0204 0.01021 1.1136 0.43478
Pure error 3 0.0275 0.00917
Total 11 1.0867 0.09879
Table 5.4 also shows that the pure quadratic term is not significant at the 5% level as the
corresponding p-value = 0.43478 > 0.05 (Montgomery, 2013). The ANOVA table is again
constructed by dropping the insignificant quadratic term. The modified ANOVA table of
pulp yield is given in Table 5.5.
Table 5.5 shows that the p-values for factor sulfidity (x1) and interaction between sul-
fidity and black liquor (x1x2) are significant (p-value < 0.05) at the 5% level. Hence, a model
is developed for yield (y1) using sulfidity (x1) and interaction between sulfidity and black
liquor (x1x2) as explanatory variables (Draper and Smith, 2003). The coefficient table for the
pulp yield model is given in Table 5.6.
From Table 5.6, the model for pulp yield (y1) is identified as
yˆ 1 = 36.4375 + 0.2875 x1 − 0.1875 x1x2 (5.28)
The model accuracy measures are given in Table 5.7.
Table 5.5 Modified ANOVA table of pulp yield (y1)

Source DF SS MS F p-Value
Model 6 1.0075 0.16792 10.61 0.01
Linear 3 0.70375 0.23458 14.82 0.006
x1 1 0.66125 0.66125 41.76 0.001
x2 1 0.03125 0.03125 1.97 0.219
x3 1 0.01125 0.01125 0.71 0.438
Two-way interaction 3 0.30375 0.10125 6.39 0.037
x1x2 1 0.28125 0.28125 17.76 0.008
x1x3 1 0.01125 0.01125 0.71 0.438
x2x3 1 0.01125 0.01125 0.71 0.438
Error 5 0.07917 0.01583
Lack-of-fit 2 0.05167 0.02583 2.82 0.205
Pure 3 0.0275 0.00917
Total 11 1.08667
Table 5.6 Coefficient table of pulp yield

Term Coefficients SE coefficient t p-Value
Constant 36.4375 0.0491 742.81 0.00
x1 0.2875 0.0491 5.86 0.002
x1x2 −0.1875 0.0491 −3.82 0.012
Table 5.7 Accuracy measures of pulp model

Statistics Value
R2 90.73
Adjusted R2 87.03
Standard error 0.138744
Table 5.7 shows that the R 2 and adjusted R 2 are very high and standard error is reason-
ably close to zero. Hence, it is concluded that the model is accurate. The residual plots of
pulp yield model are given in Figure 5.6.
Figure 5.6 shows that the residuals are more or less normally distributed, and there is
no trend or pattern in the plot of residuals versus fitted values or observation order. So it is
concluded that the model is adequate (Montgomery et al., 2003).
Similarly, the response variable viscosity is also subjected to ANOVA. The ANOVA
table for viscosity is given in Table 5.8.
Table 5.8 shows that the regression is significant at the 5% significant level (p-value =
0.03104 < 0.05). The p-value for interaction is 0.89732 > 0.05 indicating that the interaction
is not significant. But the p-value of the pure quadratic term is 0.0927 < 0.1, indicating the
quadratic term is significant at the 10% level. So to develop a full polynomial model for
viscosity, more experiments need to be carried out at factor axial points, which would
make the study costlier. Hence, the possibility of developing a linear model for viscosity
by transforming the response is explored. A linear model is found to be the best fit model
for the logarithm of viscosity ( y 2′ ). The experimental layout with the logarithm of viscosity
( y 2′ ) is given in Table 5.9.
Residual plots for y1

Normal probability plot Versus fits
99 0.2
90 0.1
Residual
Percent
50 0.0
10 –0.1
1 –0.2
–0.30 –0.15 0.00 0.15 0.30 36.0 36.2 36.4 36.6 36.8
Residual Fitted value
Histogram Versus order

2.0 0.2
1.5 0.1
Frequency
Residual
1.0 0.0
0.5 –0.1
0.0 –0.2
–0.15 –0.10 –0.05 0.00 0.05 0.10 0.15 1 2 3 4 5 6 7 8
Residual Observation order
Figure 5.6 Residual plots of pulp yield model.

Table 5.8 ANOVA table of pulp viscosity

Source df SS MS F p-Value
Regression 3 34.8759375 11.6253 4.9699 0.03104
Residual 8 18.7132 2.33915
Interaction 3 0.5984375 0.19948 0.190 0.89732
Pure quadratic 2 12.2551 6.12755 5.8231 0.09270
Pure error 3 3.156875 1.05229
Total 11 53.5892 4.87174
Table 5.9 Experimental layout with logarithm of viscosity ( y 2′ )

Experimental
number x1 x2 x3 y2 ( y 2′ )
1 −1 −1 −1 51.65 3.944490
2 −1 −1 +1 50.1 3.914021
3 −1 +1 −1 53.55 3.980616
4 −1 +1 +1 48.9 3.889777
5 +1 −1 −1 55.2 4.010963
6 +1 −1 +1 51.7 3.945458
7 +1 +1 −1 55.45 4.015482
8 +1 +1 +1 53.5 3.979682
9 0 0 0 52.7 3.931826
10 0 0 0 49.7 3.906005
11 0 0 0 51.45 3.940610
12 0 0 0 49.3 3.897924
The logarithm of viscosity is subjected to ANOVA. The ANOVA table for the trans-
formed viscosity is given in Table 5.10.
Table 5.10 shows that neither the pure quadratic term nor the interaction is significant
(p-value > 0.05). Hence, the ANOVA table is modified by dropping the insignificant terms.
The modified ANOVA table is given in Table 5.11.
Table 5.11 shows that only the factors sulfidity (x1) and cooking time (x3) have a signifi-
cant effect on the response variable. So the model is developed for the logarithm of viscos-
ity ( y 2′ ) using sulfidity (x1) and cooking time (x3) as explanatory variables. The coefficient
table of the logarithm of viscosity ( y 2′ ) model is given in Table 5.12.
From Table 5.12, the model for logarithm of pulp viscosity ( y 2′ ) is identified as
Table 5.10 ANOVA table for transformed viscosity ( y 2′ )

Source df SS MS F p-Value
Regression 3 0.012713196 0.00424 4.8483 0.03299
Residual 8 0.0070 0.00087
Interaction 3 0.000257734 8.6E-05 0.207 0.88578
Pure quadratic 2 0.0045 0.00224 5.3948 0.10147
Pure error 3 0.001244542 0.00041
Total 11 0.0197 0.00179
Table 5.11 Modified ANOVA table of logarithm of viscosity ( y 2′ )

Source DF SS MS F p-Value
Model 3 0.01271 0.00424 4.85 0.033
Linear 3 0.01271 0.00424 4.85 0.033
x1 1 0.0062 0.0062 7.09 0.029
x2 1 0.00032 0.00032 0.37 0.562
x3 1 0.0062 0.0062 7.09 0.029
Error 8 0.00699 0.00087
Lack-of-fit 5 0.00575 0.00115 2.77 0.215
Pure 3 0.00125 0.00042
Total 11 0.01971
Table 5.12 Effect and coefficient table of transformed viscosity

Term Coefficient SE coefficient t p-Values
Constant 3.96006 0.00631 627.66 0.00
x1 0.02783 0.00631 4.41 0.007
x3 −0.02783 0.00631 −4.41 0.007
yˆ 2 ′ = 3.96006 + 0.02783 x1 − 0.02783 x3 (5.29)
The accuracy statistics of the model are given in Table 5.13.

Table 5.13 shows that the R 2 and adjusted R2 are high, and standard error is reasonably
close to zero. Hence, it is concluded that the model is accurate. The residual plots of the
model are given in Figure 5.7.
Figure 5.7 shows that the model residuals are more or less normally distributed, and
there is no trend or pattern in residuals versus fitted value plot or residuals versus observa-
tion order plot indicating that the model is adequate.
5.4 Optimization
The optimum setting of the factors that would increase the pulp yield as much as pos-
sible without increasing the viscosity beyond 52 is identified by formulating the problem
as a constraint optimization problem (Hillier and Lieberman, 2008; Taha, 2014). Since the
model is developed for the logarithm of viscosity, the problem is formulated to maximize
the pulp yield (y1) subject to the constraint that the logarithm of pulp viscosity ( y 2′ ) will not
exceed the upper limit. The upper limit k ′ is computed as
k ′ = k − 1.96 s (5.30)
Table 5.13 Accuracy measures of pulp viscosity model

Statistics Value
R2 88.61
Adjusted R2 84.06
Standard error 0.0178452
Residual plots for in_y2

Normal probability plot Versus fits
99 0.02
90 0.01
Percent
Residual
50 0.00
10 –0.01
1 –0.02
–0.04 –0.02 0.00 0.02 0.04 3.900 3.925 3.950 3.975 4.000
Residual Fitted value
Histogram Versus order

3 0.02
0.01
Frequency
2
Residual 0.00
1
–0.01
0 –0.02
1 2 3 4 5 6 7 8
–0.015
–0.010
–0.005
0.000
0.005
0.010
0.015
0.020
Observation order
Residual
Figure 5.7 The residual plots of viscosity model.
where k is the logarithm of 52, the upper specification limit of viscosity, and s is the stan-
dard error of viscosity model. The upper limit k ′ is taken 1.96 standard deviations less than
the logarithm of 52 cp. This is to ensure that even the individual values of viscosity are
very unlikely to fall outside the specification limit. Substituting the values of s and loga-
rithm of 52 in Equation (5.30), k ′ has become
k ′ = 3.95124 − 1.96 × 0.01758 = 3.91627 (5.31)
Using Equation (5.28)–(5.31), the optimization problem is formulated as follows:
Maximize 36.4375 + 0.2875 x1 − 0.1875 x1x2 (5.32)
Subject to 3.96006 + 0.02783 x1 − 0.02783 x3 ≤ 3.91627 (5.33)
−1 ≤ x1 ≤ 1 (5.34)
−1 ≤ x2 ≤ 1 (5.35)
−1 ≤ x3 ≤ 1 (5.36)
The optimization problem given in Equations (5.31)–(5.36) is solved using Microsoft Excel
Solver utility (Fylstra et al., 1999). The solution obtained is given in Table 5.14. The optimum
Table 5.14 Optimum solution

Code Value Name Value
x1 −0.5735 % Sulfidity 16.28
x2 1 % Black liquor 10
x3 1 Cooking time 65
y1 36.38 Pulp yield 36.38
y 2′ 3.91627 Log viscosity 3.91627
y2 50.21 Viscosity 50.21
values of explanatory variables x1, x2, and x3 along with corresponding values of percent-
age sulfidity, percentage black liquor and cooking time are given in Table 5.14.
Table 5.14 shows that by executing the pulp cooking process with a cooking medium
having sulfidity 16.28%, black liquor 10%, and a cooking time of 65 minutes would give a
yield of 36.38 and a viscosity of 50.21 cp. This is well within the customer specified upper
limit of 52 cp on viscosity. The validation of the findings of the study is given in the next
section.
5.5 Validation
The results of the study are validated by cooking 14 batches of pulp at the optimum com-
bination of factors, namely, sulfidity at 16.28%, black liquor 10%, and cooking time 65 min-
utes. The pulp yield and viscosity are measured for each batch and are given in Table 5.15.
The individual x control chart comparing the pulp yield performance before and after
the study is given in Figure 5.8 and that of pulp viscosity is given in Figure 5.9.
Figure 5.8 shows that executing the pulp cooking process with the optimum combi-
nation of factors suggested by the study would significantly improve the pulp yield. The
validation data show that on an average the yield increased from 34% to 36.43%. Figure 5.9
shows that the optimum combination of factors reduced the mean as well as variation in
Table 5.15 Validation results

Batch Yield Viscosity
1 36.5 49.91
2 36.4 49.76
3 36.3 51.36
4 36.4 50.36
5 36.4 49.55
6 36.5 49.94
7 35.9 50.25
8 36.4 50.16
9 36.3 49.83
10 36.3 50.6
11 36.7 50.48
12 36.3 49.73
13 36.4 49.56
14 36.2 50.48
I chart of pulp yield by group

Before After
37.0
UCL = 36.845
36.5
X = 36.431
36.0 LCL = 36.018

Individual value
35.5
35.0
34.5
34.0
33.5
1 4 7 10 13 16 19 22 25 28 31 34
Observation
Figure 5.8 Comparison of pulp yield before and after study.
I chart of pulp viscosity by group

Before After
55
54
53
52
Individual value
UCL = 51.657
51
50 X = 50.141
49
LCL = 48.625
48
47
1 4 7 10 13 16 19 22 25 28 31 34
Observation
Figure 5.9 Comparison of pulp viscosity before and after study.

pulp viscosity. When the pulp cooking process is operating under statistical control, it is
very unlikely that the viscosity will be more than the upper specification limit of 52 cp, in
fact, the viscosity will be less than 51.65 cp. The process capability analysis results of pulp
viscosity with the validation data are given in Figure 5.10.
Figure 5.10 shows that running the cooking process with the optimum combination
of factors would improve process capability with respect to viscosity to 1.25. Hence, the
pulp manufacturing company has decided to use the optimum combination of factors sug-
gested by the study for all future batches of pulp cooking process.
5.6 Conclusion
In this chapter, the authors presented a case study on optimizing the pulp cooking pro-
cess. The cooking process is an important step in the rayon grade pulp manufacturing
process. The rayon grade pulp is the raw material for manufacturing viscous staple fiber,
which in turn is used for cloth making. The challenge in improving the pulp yield of the
cooking process is that it would result in deteriorating the pulp viscosity. The pulp yield
needs to be improved without increasing viscosity beyond 52 cp. This is achieved by the
application of dual-response surface methodology and design of experiments.
Through discussions with technical professionals, three factors, namely, percentage
of sulfidity in the cooking medium, percentage of black liquor in the cooking medium,
and cooking time are selected for the study. The pulp yield and viscosity are taken as the
Process capability report for viscosity

USL
Overall
Process data Within
LSL *
Target * Overall capability
USL 52 Pp *
Sample mean 50.1407 PPL *
Sample N 14 PPU 1.25
StDev (overall) 0.494438 Ppk 1.25
StDev (within) 0.505319 Cpm *
Potential (within) capability

Cp *
CPL *
CPU 1.23
Cpk 1.23
49.0 49.5 50.0 50.5 51.0 51.5 52.0
Performance
Observed Expected overall Expected within
PPM < LSL * * *
PPM > USL 0.00 84.82 116.88
PPM total 0.00 84.82 116.88
Figure 5.10 Process capability analysis of viscosity.

response variables. Since the engineers suspected that interaction between the factors, a
full factorial experiment is designed. To explore whether the relationship between the
factors and response variables is nonlinear or not, four center points are also added to
the design. The experiments are carried out as per the design and data on pulp yield and
viscosity are collected. Based on the data, models are fitted for pulp yield and logarithm
of pulp viscosity. Then the optimum combination of the factors that would maximize the
pulp yield subject to the constraint on viscosity is obtained by formulating the problem as
a constraint optimization problem and solving it using Microsoft Solver utility.
Validation of the results showed that the yield of the pulp cooking process has sig-
nificantly improved from 34% to 36.43%. Moreover, the study also reduced pulp viscosity
as well as variation in pulp viscosity. The process capability index Ppk of pulp viscosity
improved to 1.25. Hence, the company decided to execute the pulp cooking process with
the optimum combination of factors suggested by the study for the upcoming batches.
Many of the modern-day processes have multiple output characteristics. The process
manager or engineer needs to find an optimum setting of process control factors that
would result in simultaneously meeting the requirements on all the output characteris-
tics. In this chapter, the authors have demonstrated the dual-response surface methodol-
ogy for the simultaneous optimization of multiple output characteristics. Even though
the case study is dealing with the optimization of the pulp cooking process for simulta-
neously meeting the requirements of the output characteristics, namely, pulp yield and
viscosity, the methodology can be used for optimizing any process. Moreover, more than
two output characteristics can also be simultaneously optimized using response surface
methodology.
References
Aggarwal, A., Singh, H., Kumar, P., and Singh, M. (2008). Optimization of multiple quality charac-
teristics for CNC turning under cryogenic cutting environment using desirability function.
Journal of Materials Processing Technology, 205(1), 42–50.
Antony, J. (2001). Simultaneous optimisation of multiple quality characteristics in manufacturing
processes using Taguchi’s quality loss function. International Journal of Advanced Manufacturing
Technology, 17(2), 134–138.
Box, G. E. P., and Draper, N. R. (2007). Response Surfaces, Mixtures and Ridge Analysis. 2nd edition,
New Jersey, NJ: John Wiley and Sons.
Candioti, L. V., De Zan, M. M., Cámara, M. S., and Goicoechea, H. C. (2014). Experimental design
and multiple response optimization. Using the desirability function in analytical methods
development. Talanta, 124, 123–138.
Chiang, K. T., and Chang, F. P. (2006). Optimization of the WEDM process of particle-reinforced
material with multiple performance characteristics using grey relational analysis. Journal of
Materials Processing Technology, 180(1), 96–101.
Del Castillo, E., and Montgomery, D. C. (1993). A nonlinear programming solution to the dual
response problem. Journal of Quality Technology, 25(3), 199–204.
Derringer, G. (1994). A balancing act: Optimizing product’s properties. Quality Progress, 27(6): 51–58.
Ding, R., Lin, D. K., and Wei, D. (2004). Dual-response surface optimization: A weighted MSE
approach. Quality Engineering, 16(3), 377–385.
Draper, N. R., and Smith, H. (2003). Applied Regression Analysis. 3rd edition, Singapore: John Wiley
and Sons (Asia) Pte Ltd.
Fylstra, D., Lasdon, L., Watson, J., and Waren, A. (1999). Design and use of the Microsoft Excel Solver.
Interfaces, 28(5): 29–55.
Harrington, E. (1965). The desirability function. Industrial Quality Control, 21(10): 494–498.
Hillier, F. S., and Lieberman, G. J. (2008). Operations Research – Concepts and Cases. 8th edition.
New Delhi: Tata McGraw-Hill Publishing Company Ltd.
John, B. (2012). Simultaneous optimization of multiple performance characteristics of carbonitrided

pellets: A case study. International Journal of Advanced Manufacturing Technology, 61(5), 585–594.
John, B. (2013). Application of desirability function for optimizing the performance characteristics of
carbonitrided bushes. International Journal of Industrial Engineering Computations, 4(3), 305–314.
John, B. (2015). A dual response surface optimization methodology for achieving uniform coating
thickness in powder coating process. International Journal of Industrial Engineering Computations,
6(4), 469–480.
John, B., Kadadevaramath, R.S., and Edinbarough, I.A. (2017). Designing software development pro-
cesses to optimise multiple output performance characteristics. Software Quality Professional
19(4): 16–24.
Kim, K.-J., and Lin, D. K. (1998). Dual response surface optimization: A fuzzy modeling approach.
Journal of Quality Technology, 30(1), 1.
Leavenworth, R. S., and Grant, E. L. (2000). Statistical Quality Control. 7th edition, New Delhi: Tata
McGraw-Hill Education.
Lin, J. L., and Lin, C. L. (2002). The use of the orthogonal array with grey relational analysis to opti-
mize the electrical discharge machining process with multiple performance characteristics.
International Journal of Machine Tools and Manufacture, 42(2), 237–244.
Lin, J. L., Wang, K. S., Yan, B. H., and Tarng, Y. S. (2000). Optimization of the electrical discharge
machining process based on the Taguchi method with fuzzy logics. Journal of Materials
Processing Technology, 102(1), 48–55.
Mathews, P. (2005). Design of Experiments with Minitab. New Delhi: Pearson Education (Singapore)
Pte Ltd.
Montgomery, D. C. (2002). Introduction to Statistical Quality Control. 4th edition, New Delhi: Wiley
India (P) Ltd.
Montgomery, D. C. (2013). Design and Analysis of Experiments. 8th edition, New Delhi: Wiley India
(Pvt) Ltd.
Montgomery, D.C., Peck, E.A., and Vining, G.G. (2003). Introduction to Linear Regression Analysis. 3rd
edition, Singapore: John Wiley and Sons (Asia) Pte Ltd.
Myers, R. H., Montgomery, D. C., and Anderson-Cook, C. M. (2009). Response Surface Methodology:
Process and Product Optimization Using Design of Experiments. 3rd edition, New Jersey, NJ: John
Wiley and Sons.
Nian, C. Y., Yang, W. H., and Tarng, Y. S. (1999). Optimization of turning operations with multiple
performance characteristics. Journal of Materials Processing Technology, 95(1), 90–96.
Noorossana, R., Tajbakhsh, S. D., and Saghaei, A. (2009). An artificial neural network approach to
multiple-response optimization. International Journal of Advanced Manufacturing Technology,
40(11–12), 1227–1238.
Ortiz Jr, F., Simpson, J. R., Pignatiello Jr, J. J., and Heredia-Langner, A. (2004). A genetic algorithm
approach to multiple-response optimization. Journal of Quality Technology, 36(4), 432.
Taguchi, G., Chowdhury, S., and Wu, Y. (2005). Taguchi’s Quality Engineering Handbook. Hoboken, NJ:
John Wiley and Sons.
Taha, H. A. (2014). Operations Research – An Introduction. 9th edition, New Delhi: Pearson.
Tong, L. I., Wang, C. H., and Chen, H. C. (2005). Optimization of multiple responses using princi-
pal component analysis and technique for order preference by similarity to ideal solution.
International Journal of Advanced Manufacturing Technology, 27(3), 407–414.
Vining, G., and Myers, R. (1990). Combining Taguchi and response surface philosophies – A dual
response approach. Journal of Quality Technology, 22, 38–45.
chapter six
Time-dependent conflicting
bifuzzy set and its applications
in reliability evaluation
Shshank Chaube
University of Petroleum and Energy Studies
S.B. Singh
G.B. Pant University of Agriculture and Technology
Sangeeta Pant
Anuj Kumar
Contents
6.1 I ntroduction......................................................................................................................... 112
6.2 Basic concept of time-dependent CBFS and some definitions..................................... 112
6.2.1 Time-dependent CBFS........................................................................................... 112
6.2.2 Normal CBFS........................................................................................................... 113
6.2.3 Convex CBFS........................................................................................................... 113
6.2.4 Conflicting bifuzzy number.................................................................................. 113
6.2.5 (α, β)-Cut of a time-dependent CBFS.................................................................... 113
6.2.6 Triangular time-dependent CBFS........................................................................ 113
6.3 Problem formulation.......................................................................................................... 113
6.4 Reliability evaluation with time-dependent CBFN....................................................... 115
6.5 Reliability evaluation of series and parallel system having
components following time-dependent conflicting bifuzzy failure rate................... 118
6.5.1 Series system........................................................................................................... 118
6.5.2 Parallel system......................................................................................................... 120
6.5.3 Parallel-series system............................................................................................. 121
6.5.4 Series-parallel system............................................................................................. 123
6.6 Examples.............................................................................................................................. 125
6.6.1 Series system........................................................................................................... 125
6.6.2 Parallel system......................................................................................................... 125
6.6.3 Parallel-series system............................................................................................. 126
6.6.4 Series-parallel system............................................................................................. 126
6.7 Conclusion........................................................................................................................... 127
References...................................................................................................................................... 127
111
6.1 Introduction
The conventional reliability of a system is defined as the probability that the system will
perform a predefined operation under some specified condition for a fixed time period.
Traditionally, system reliability evaluation is dependent on the probabilistic approach. But
this approach is not always valid, since in reality a lot of times data related to the system
information do not represent the realistic situation correctly due to uncertainties present
in it. Therefore, in many cases, reliability assessment of the system becomes a very dif-
ficult task. Hence, to evaluate reliability of a system when available information is uncer-
tain, then we apply the fuzzy approach. Zadeh (1965) constituted the foundation for this
approach by his works on fuzzy set theory with the assumption that the nonmembership
degree is equal to one minus the membership degree. Here we can consider member-
ship degree and nonmembership degree as positive and negative aspects of a situation. It
implies if the membership is correct, then the nonmembership is wrong, which is a con-
trary relation. Over this theory Atanassov (1986) introduced the concept of intuitionistic
fuzzy sets. He proposed the condition 0 ≤ µ A ( x) + ν A ( x) ≤ 1, where µ A ( x) and ν A ( x) rep-
resent the degree of membership and the degree of nonmembership, respectively. Many
researchers (Burillo & Bustinces, 1996; Li, Shan & Cheng, 2005; Supriya, Ranjit & Akhil,
2005; Gianpiero & David, 2006) have done work on intuitionistic fuzzy sets. Other theories
like L-fuzzy sets (Goguen, 1967), Ying-Yang bipolar fuzzy logic (Zhang & Zhang, 2004), soft
sets (Basu, Deb & Pattanaik, 1992), vague sets (Gau & Buehrer, 1993), and interval-valued
intuitionistic fuzzy sets (Atanassov, 1999) also were introduced to handle the uncertainty.
Then, Zamali, Lazim, and Osman (2008) introduced the concept of a conflicting bifuzzy
set (CBFS), and proposed that the sum of membership degree and nonmembership degree
can be more than one.
Several authors (Singer, 1990; Cai, Wen & Zhang, 1991; Chen, 1994, 1996; Roy et al.,
2017) proposed and developed the fuzzy reliability theory. Extending these works, in this
chapter, applications of conflicting bifuzzy sets are applied in fuzzy reliability theory.
In this chapter, some basic concepts of triangular CBFS are defined, and a proce-
dure using triangular CBFS is introduced to estimate the fuzzy reliability of the sys-
tem. Here, membership and nonmembership functions of fuzzy reliability of systems
are constructed by considering the failure rate of each component as a time-dependent
triangular CBFN.
6.2 B asic concept of time-dependent CBFS and some definitions

6.2.1 Time-dependent CBFS
Let X denote the universe of discourse, and T is a nonempty set whose elements are said
to be time moments. A time-dependent CBFS in X is defined as
A(t) = { x, µ A( t ) ( x), ν A(t ) ( x) : x ∈ X , t ∈ T }

where µ A(t ) ( x) : X → [0, 1] and ν A(t ) ( x) : X → [0, 1] are the degrees of membership and non-
membership, respectively, of the element x ∈ X at time moment t to A ⊆ X . For every x ∈ X
and t ∈ T ,
0 ≤ µ A(t ) ( x) + v A(t ) ( x) ≤ 2 (6.1)

Chapter six: Time-dependent conflicting bifuzzy set and its applications 113
6.2.2 Normal CBFS

A CBFS A in universe of discourse X is normal if there exist at least two points a, b ∈ X such
that µ A ( a) = 1 and ν A (b) = 1.
6.2.3 Convex CBFS

A CBFS A in the universe of discourse X is convex if and only if
i. Membership function µ A ( x) of A is fuzzy convex, i.e.,
µ A ( λ x1 + (1 − λ )x2 ) ≥ min ( µ A ( x1 ), µ A ( x2 )) ∀x1 , x2 ∈ X , λ ∈ [0, 1]
ii. Nonmembership function ν A ( x) of A is fuzzy concave, i.e.,
ν A ( λ x1 + (1 − λ )x2 ) ≤ max (ν A ( x1 ), ν A ( x2 )) ∀x1 , x2 ∈ X , λ ∈ [0, 1]
6.2.4 Conflicting bifuzzy number

{
A conflicting bifuzzy subset A = x , µ A ( x), v A ( x) : x ∈ R } of the set of real numbers R, is
said to be a conflicting bifuzzy number (CBFN) if
i. A is normal and convex.

ii. μA is upper semicontinuous, and νA is lower semicontinuous.
iii. Support of A = { x ∈ X : µ A ( x) > 0} is bounded.
6.2.5 (α, β)-Cut of a time-dependent CBFS

(α, β)-Cut of a time-dependent CBFS A(t) with time moment t is defined as
Aα , β (t) = { x ∈ X : µ A(t ) ( x) ≥ α , ν A(t ) ( x) ≤ β } ; 0 ≤ α , β ≤ 1, α + β ≤ 1
6.2.6 Triangular time-dependent CBFS

A time-dependent triangular CBFS A(t) is defined as
A(t) = ( m(t) − l(t), m(t), m(t) + n(t), m(t) − l ′(t), m(t), m(t) + n ′(t))
where m(t) ∈ R is the center, l(t) > 0 and n(t) > 0 are the left and right spreads of the
membership function of A(t), and, l ′(t) > 0 and n ′(t) > 0 are the left and right spreads of the
nonmembership function of A(t), at time t.
6.3 Problem formulation

Let X and Y be two sets and failure rate function on X is represented by a CBFS F(t) as
F(t) = { x , µ F (t ) ( x), ν F (t ) ( x) : x ∈ X , t ∈ T }
Now for membership function, α-cut of F(t) is defined as
Fα (t) = { x : µ F (t ) ( x) ≥ α , α ∈ [0, 1]}
for nonmembership β-cut of F(t) is defined as
Fβ (t) = { x : ν F (t ) ( x) ≤ β , β ∈ [0, 1]}
it is very obvious that both Fα (t) and Fβ (t) are crisp sets.
Assume that F(t) is a CBFN then by the fuzzy-convexity property of the membership
function of CBFN, we have
Fα (t) =  f1α (t), f2α (t)  , ∀α ∈ [0, 1]
and by the fuzzy-concavity property of the nonmembership function of CBFN, we have
Fβ (t) =  f1β (t), f2 β (t)  , ∀α ∈ [0, 1]
where f1α (t), f2α (t) are increasing functions of α, and f1α (t), f2α (t) are decreasing functions
of β; α , β ∈ [0, 1].
Define a bounded differential function ψ from X to Y as
ψ : X → Y such that y = ψ ( x) ∀x ∈ X
Now to calculate reliability function R(t) by applying ψ to the set F(t).
R(t) = { y, µ R(t ) ( y ), ν R(t ) ( y ) : y ∈ Y }

where membership and nonmembership functions for R(t) are evaluated as
µR(t ) ( y ) = sup { µF (t ) ( x) : y = ψ ( x), x ∈ X }
ν R(t ) ( y ) = inf {ν F (t ) ( x) : y = ψ ( x), x ∈ X }

and ψ ( x) = y ∈ [ a1α (t), a2α (t)] , where x ∈  f1α (t), f2α (t) 
ψ ( x) = y ∈  a1β (t), a2 β (t)  , where x ∈  f1β (t), f2 β (t) 

where a1α (t) and a2α (t) are minimum and maximum of ψ over F(t), respectively, and a1β (t)
and a2 β (t) are minimum and maximum of ψ over F(t), respectively, i.e.,
a1α (t) = min ψ ( x); f1α (t) ≤ x ≤ f2α (t) (6.2)

a2α (t) = max ψ ( x); f1α (t) ≤ x ≤ f2α (t) (6.3)
a1β (t) = min ψ ( x); f1β (t) ≤ x ≤ f2 β (t) (6.4)
a2 β (t) = max ψ ( x); f1β (t) ≤ x ≤ f2 β (t) (6.5)
If a1α (t) and a2α (t) are invertible, then left shape function g R(t ) ( y ) and right shape function
hR(t ) ( y ) are obtained as
−1
 
g R(t ) ( y ) = [ a1α ]
−1
=  min u  (6.6)
y ≤y ≤ y 
1 2 
−1
 
hR(t ) ( y ) = [ a2α ]
−1
=  max u  (6.7)
y ≤ y ≤y 
 2 3 
From Equations (6.6) and (6.7), the membership function can be constructed as
 g R(t ) ( y ), y1 ≤ y ≤ y 2

µR(t ) ( y ) =  hR(t ) ( y ), y2 ≤ y ≤ y3

 0, otherwise
where g R(t ) ( y1 ) = hR(t ) ( y 3 ) = 0 and g R(t ) ( y 2 ) = hR(t ) ( y 2 ) = 1.

Similarly, the nonmembership function can be constructed as
 g R(t ) ( y ), y1′ ≤ y ≤ y 2

ν R(t ) ( y ) =  hR(t ) ( y ), y 2 ≤ y ≤ y 3′

 0, otherwise
where g R(t ) ( y1′ ) = hR(t ) ( y 3′ ) = 1 and g R(t ) ( y 2 ) = hR(t ) ( y 2 ) = 0 .
6.4 Reliability evaluation with time-dependent CBFN

If f(t) is the failure rate function, then the system reliability function can be obtained as
 t 


 0
∫
R(t) = exp  − f ( k ) dk  , t > 0 (6.8)


Let the failure rate function be represented by triangular CBFN as
F(t) = ( m(t) − l(t), m(t), m(t) + n(t); m(t) − l ′(t), m(t), m(t) + n ′(t))
For the membership function α-cut of F(t) is
Fα (t) = ( m(t) − l(t) + α l(t), m(t) + n(t) − α n(t)) , ∀α ∈ [0, 1]
Similarly, for the nonmembership function β-cut of F(t) is
Fβ (t) = ( m(t) − β l ′(t), m(t) + β n ′(t)) , ∀β ∈ [0, 1]

Now Equations (6.2)–(6.5) can, respectively, be written as
  t 




0
∫
a1α (t) = min  exp  − x( k ) dk   s.t. m(t) − l(t) + α l(t) ≤ x(t) ≤ m(t) + n(t) − α n(t) (6.9)
 

  t 




0
∫
a2α (t) = max exp  − x( k ) dk   s.t. m(t) − l(t) + α l(t) ≤ x(t) ≤ m(t) + n(t) − α n(t) (6.10)
 

  t 




0
∫
a1β (t) = min  exp  − x( k ) dk   s.t. m(t) − β l ′(t) ≤ x(t) ≤ m(t) + β n ′(t) (6.11)
 

  t 




0
∫
a2 β (t) = max exp  − x( k ) dk   s.t. m(t) − β l ′(t) ≤ x(t) ≤ m(t) + β n ′(t) (6.12)
 

Here, R(t) attains its extremes at the bounds. Therefore, we have
  t 




0
∫
a1α (t) = exp  − {m( k ) + n( k ) − α n( k )}dk   , t > 0 (6.13)
 

  t 




0
∫
a2α (t) =  exp  − {m( k ) − l( k ) + α l( k )} dk   , t > 0 (6.14)
 

  t 



 ∫
a1β (t) =  exp  − {m( k ) + β n ′( k )} dk   , t > 0 (6.15)
0
 

  t 



∫
a2 β (t) =  exp  − {m( k ) − β l ′( k )}dk   , t > 0 (6.16)
0
 

By taking the inverses of Equations (6.13)–(6.16), µR(t ) and ν R(t ) can be obtained as
 t
∫
 ln( y ) + ( m( k ) + n( k )) dk
  t   t 


0
t
, 
 ∫ 
  ∫
exp − {m( k ) + n( k )} dk ≤ y ≤ exp  − m( k ) dk 


 ∫
0
n( k ) dk  0   0 
µR(t ) = t



∫
ln( y ) + ( m( k ) − l( k )) dk
 t   t 
 −

0
t
,
 ∫   ∫
exp  − {m( k ) dk  ≤ y ≤ exp  − {m( k ) − l( k ) dk 



∫
0
l( k ) dk  0   0 
(6.17)
 t

 ∫
ln( y ) + m( k ) dk
 t   t 
 −
 t
0
,
 ∫   ∫
exp  − {m( k ) + n′( k )} dk  ≤ y ≤ exp  − m( k ) dk 


 0
∫
n′( k ) dk  0   0 
ν R(t ) = t
(6.18)


∫
 ln( y ) + m( k ) dk  t   t 
 t
0
,
 ∫   ∫
exp  − {m( k ) dk  ≤ y ≤ exp  − {m( k ) − l′( k ) dk 


 ∫
l′( k ) dk  0   0 
 0
It is very clear that R(t) is CBFN. Now we can consider the following two models:
Model 1. When the failure rate function is fixed, i.e., F(t) = F, then l(t) = l, m(t) = m,
n(t) = n, l′(t) = l′ and n′(t) = n′. Now we have
Fα (t) = [ m − l + α l, m + n − α n] , ∀α ∈ [0, 1]

and Fβ (t) = [ m − β l ′ , m + β n ′ ] , ∀β ∈ [0, 1]
Since R(0) = 1 and R(∞) = 0, from Equations (6.19) and (6.20) we obtain
 ln( y ) + (m + n)t
 , exp[−(m + n)t] ≤ y ≤ exp[− mt], 0 < t < ∞
 nt
µR(t ) = (6.19)
 − ln( y ) + (m − l)t , exp[− mt] ≤ y ≤ exp[−(m − l)t], 0 < t < ∞
 lt
 ln( y ) + mt
 − , exp[−(m + n ′)t] ≤ y ≤ exp[− mt], 0 < t < ∞
 n ′t
ν R(t ) = (6.20)
 − ln( y ) + (m − l)t , exp [ − mt ] ≤ y ≤ exp[−(m − l ′)t], 0 < t < ∞
 l ′t
Model 2. When the failure rate function is not constant, i.e., F(t) depends on m(t), l(t), n(t),
l′(t), and n′(t).
Let us assume that l(t) = l = constant, n(t) = n = constant, l′(t) = l′ = constant, n′(t) = n =
constant, and m(t) = pe qt, where p is a positive constant. Since R(0) = 1 and R(∞) = 0, from
Equations (6.19) and (6.20) we get
 p
 ln( y ) +  exp(qt) − 1 + nt
 q  p   p 
, exp  −  exp(qt) − 1 − nt  ≤ y ≤ exp  −  exp(qt) − 1 
 nt  q   q 
µR(t ) = for 0 < t < ∞
 p
ln( y ) +  exp(qt) − 1 − lt
 q  p   p 
 − lt
, exp  −  exp(qt) − 1  ≤ y ≤ exp  −  exp(qt) − 1 + lt 
q q
    
(6.21)
 p
 ln( y ) +  exp(qt) − 1
 − q  p   p 
, exp  −  exp(qt) − 1 − n ′t  ≤ y ≤ exp  −  exp(qt) − 1 
 n ′t q q
    
µR(t ) = for 0 < t < ∞
 ln( y ) + p  exp(qt) − 1
 q   p   p 
 , exp  −  exp(qt) − 1  ≤ y ≤ exp  −  exp(qt) − 1 + l ′t 
 l ′t  q   q 
(6.22)
6.5 R
eliability evaluation of series and parallel system having
components following time-dependent conflicting bifuzzy
failure rate
6.5.1 Series system
Consider a series system having “j” components. Let the failure rate of the ith component,
γ i (t) be represented as
γ i (t) =  mi (t) − li (t), mi (t), mi (t) + ni (t); mi (t) − li′(t), mi (t), mi (t) + ni′(t) 
Let the reliability of the ith component at time t be Ri (t) for i = 1, 2, …, j.

Therefore, at time t, reliability of the system is
RS (t) = R1 (t)R2 (t) R j (t) (6.23)
From Equation (6.8), we have
 t 


 0
∫
Ri (t) = exp  − γ i ( k ) dk  ; i = 1, 2,  , j, t > 0


Hence, the system reliability becomes
 t  j
  t 
RS (t) = exp  −

 0
∫∑

i=1

 


 0
∫
γ i ( k )dk  = exp  − γ S ( k ) dk  (6.24)


where γ S (t) = ∑ γ (t).

i=1
i
α-Cut of γ i (t), for the membership function, is
γ i (t , α ) =  mi (t) − li (t) + α li (t), mi (t) + ni (t) − α ni (t)  , ∀α ∈ [0, 1]
β-Cut of γ i (t), for the nonmembership function, is
γ i (t , β ) =  mi (t) − β li′(t), mi (t) + β ni′(t)  , ∀β ∈ [0, 1]

Hence, for the membership function, α-cut of γ S (t) is
 j j

γ S (t , α ) = 

∑
i=1
mi (t) − li (t) + α li (t), ∑ m (t) + n (t) − α n (t) ,
i=1
i i i ∀α ∈ [0, 1] (6.25)
Similarly,
 j j

γ S (t , β ) = 

∑ i=1
mi (t) − β li′(t), ∑ m (t) + β n′(t) ,
i=1
i i ∀α ∈ [0, 1] (6.26)
Now since RS (t) is also a CBFN, therefore, from Equations (6.11)–(6.14), we can have the
α-cut and β-cut of RS (t), respectively, for the membership function and the nonmember-
ship function as
  t j
   t j
  
∫∑ ∫∑
 
RS (t , α ) =  exp −  m j ( k ) + n j ( k ) − α n j ( k ) dk  , exp −  m j ( k ) − l j ( k ) + α n j ( k ) dk  

  0  i=1
 
  0  i=1
  

(6.27)
  t j
   t j
  
∫∑ ∫∑
 
RS (t , β ) =  exp −  m j ( k ) + β n′j ( k ) dk  , exp −  m j ( k ) − β l′j ( k ) dk   (6.28)

  0  i=1
 
  0  i=1
  

From Model 1, considering the failure rate as a constant, then the α-cut of RS (t) for
the membership function and β-cut of RS (t) for the nonmembership function are
obtained as
   j
    j
 
RS (t , α ) =  exp −t 
  
∑ mi + n j − α n j   , exp −t 
   
∑ m − l + α l   (6.29)
j j j
 i=1  i=1 
   j
    j
 
RS (t , β ) =  exp −t 
  
∑ mi + β n ′j   , exp −t 
   
∑ m − β l ′   (6.30)
j j
 i=1  i=1 
From the Model 2, when the failure rate is not fixed, then the α-cut and β-cut of RS (t) for the
membership function and the nonmembership function are calculated as
  t j
   t j
  
∫∑ ∫∑
  
RS (t , α ) = exp −  pi e qi k
+ ni − α ni  dk  , exp −  pi e qi k + li + α li  dk   (6.31)
       
  0  i=1   0  i=1 
  t j
   t j
  
∫∑ ∫∑
  
RS (t , β ) = exp −  pi e qi k
+ β ni′  dk  , exp −  pi e qi k − β li′ dk   (6.32)
       
  0  i=1   0  i=1 
6.5.2 Parallel system

Consider a parallel system with “j” components with failure rate function γ i (t) for the ith
component, where
γ i (t) =  mi (t) − li (t), mi (t), mi (t) + ni (t); mi (t) − li′(t), mi (t), mi (t) + ni′(t) 
Let the reliability of the ith component at time t be Ri (t) for i = 1, 2, …, j.
It is well known that the reliability of system RP (t) at time t is
j
RP (t) = 1 − ∏ (1 − R (t))
i=1
i
(6.33)
j   t 
= 1− ∏i=1

0
∫
 1 − exp  − γ i ( k ) dk  
  

Now since RP (t) is also a CBFN, therefore, from Equations (6.11)–(6.14), the α-cut and β-cut
of RP (t) for the membership function and the nonmembership function, respectively, are
obtained as
 j   t 
RP (t , α ) = 1 −
 ∏   ∫
 1 − exp  − ( mi ( k ) + ni ( k ) − α ni ( k )) dk   ,
 

 i=1 0

j   t  

1− ∏i=1
 
0
∫
 1 − exp  − ( mi ( k ) − li ( k ) + α li ( k )) dk   
  
 
(6.34)
 j   t  j   t 
R
P (t , β ) = 1 −
 ∏   ∫
 1 − exp  − ( mi ( k ) + β ni′( k )) dk   , 1 −
 

∏   ∫
 1 − exp  − ( mi ( k ) − β li′( k )) dk   
  
 
 i=1 0 i=1 0
(6.35)
From Model 1, if the failure rate is constant, then we have
 j j

RP (t , α ) = 1 −

∏(
i=1
{ (
1 − exp −t mi + n j − α n j )}) , 1− ∏ (1 − exp {−t ( m − l + α l )}) (6.36)
i=1
i j j
 j j

RP (t , β ) = 1 −

∏(
i=1
{ (
1 − exp −t mi + β n ′j )}) , 1− ∏ (1 − exp {−t ( m − β l ′ )}) (6.37)
i=1
i j
Again, from Model 2, if the failure rate is not constant, then we have
 j   t 
RP (t , α ) = 1 −
 ∏   ∫(
 1 − exp  − pi e qi k + ni − α ni dk   ,
 

)
 i=1 0
j   t  
1− ∏i=1
 
0
∫(
 1 − exp  − pi e qi k − li + α li ) dk   
  
 
) (6.38)
 j   t  j   t  
RP (t , β ) = 1 −
 ∏   ∫( )
 1 − exp  − pi e qi k + β ni′ dk   , 1 −
 

∏   ∫( )
 1 − exp  − pi e qi k − β li′ dk    (6.39)
  
 
 i=1 0 i=1 0
6.5.3 Parallel-series system

Consider a parallel-series system with “j” branches which are in parallel configuration and
there are “i” components in each branch connected in a series configuration as shown in
Figure 6.1.
Let failure rate function of the sth component of the rth branch (r = 1, 2, …, i and s = 1,
2, …, j) be represented by time-dependent CBFS γ rs (t) as
γ rs (t) = ( mrs (t) − lrs (t), mrs (t), mrs (t) + nrs (t); mrs (t) − lrs′ (t), mrs (t), mrs (t) + nrs′ (t))
From Equation (6.27), reliability of the sth component of the rth branch is given by
 t 


 0
∫
Rrs (t) = exp  − γ rs ( k ) dk 


It is well known that the reliability of a parallel-series system RPS (t) at time t is
j
 i

RPS (t) = 1 − ∏
s=1
1−

∏R
r =1
rs 

(6.40)
j  i  t 
= 1− ∏
s=1
1−
 ∏
r =1

0
∫
exp  − γ rs ( k ) dk  
 

Now since RPS (t) is also a CBFN, therefore, from Equations (6.31)–(6.34), the α-cut and
β-cut of RPS (t), respectively, for the membership and the nonmembership functions, are
obtained as
 j  i  t 
RPS (t , α ) = 1 −
 ∏ 1−
 ∏  ∫
exp  − ( mrs ( k ) + nrs ( k ) − α nrs ( k )) dk   ,
 

 s=1 r =1 0

j  i  t  
1− ∏s=1
1−
 ∏
r =1

0
∫
exp  − ( mrs ( k ) − lrs ( k ) + α lrs ( k )) dk   
  
 
(6.41)
Figure 6.1 Parallel-series system.

 j  i  t 
RPS (t , β ) = 1 −
 ∏ 1−
 ∏  ∫
exp  − ( mrs ( k ) + β nrs′ ( k )) dk   ,
 

 s=1 r =1 0

j  i  t  
1− ∏ s=1
1−
 ∏ r =1

0
∫
exp  − ( mrs ( k ) − β lrs′ ( k )) dk   
  
 
(6.42)
if the failure rate is constant, then we have
 j
 i

RPS (t , α ) = 1 −

∏ s=1
1−

∏ ( exp {−t ( m
r =1
rs + nrs − α nrs )}  ,

)

j
 i

1− ∏ s=1
1−

∏ ( exp {−t ( m
r =1
rs − lrs + α lrs )} ) 
 
(6.43)
 j
 i

RPS (t , β ) = 1 −

∏ ∏ exp {−t ( m
s=1
1−
 r =1
+ β nrs′ )  ,
rs

}

j
 i

1− ∏ ∏
s=1
1−
 r =1
{
exp −t ( mrs + β lrs′ )  
 
} (6.44)
Again, if the failure rate function is not constant, i.e., if the failure rate function γ rs (t) of the
sth component of the rth branch is represented as
where lrs(t) = lrs = constant, nrs(t) = nrs = constant, l′rs(t) = l′rs = constant, n′rs(t) = n′rs = constant,
and m(t) = prs e qrst , here prs is a positive constant, then we have
 j  i  t 
RPS (t , α ) = 1 −
 ∏ 1−
 ∏ ∫(
exp  − prs e qrs k + nrs − α nrs dk   ,
  

)
 s=1 r =1 0

j  i  t
  
1− ∏ s=1
1−
 ∏r =1

0
∫(
exp  − prs e qrs k − lrs + α lrs dk   
  
 
) (6.45)
 j  i  t 
RPS (t , β ) = 1 −
 ∏ 1−
 ∏ ∫(
exp  − prs e qrs k + β nrs′ dk   ,
  

)
 s=1 r =1 0

j  i  t
  
1− ∏ s=1
1−
 ∏ exp  − ∫ ( p e
r =1 0
rs
qrs k
)
− β lrs′ dk   
  
 
(6.46)
6.5.4 Series-parallel system

Consider a system having “i” subsystems connected in series and each subsystem contains
“j” components connected in parallel as in Figure 6.2.
Let the failure rate function of the rth component of the sth subsystem be represented
by time-dependent CBFS γ rs (t) as
From Equation (6.10), reliability of the rth component of the sth subsystem is
 t 


 0
∫
Rrs (t) = exp  − γ rs ( k ) dk 


It is well known that the reliability of system RSP (t) at time t is
i  j

RSP = ∏
s=1
1−

∏ (1 − R
r =1
rs )


i  j   t 
= ∏ 1−
 ∏   ∫
 1 − exp  − γ rs ( k ) dk   
  

(6.47)
s=1  r =1 0
Now since RSP (t) is also a CBFN, therefore, from Equations (6.11)–(6.14), the α-cut and β-cut
of RSP (t) for the membership function and the nonmembership function, respectively, are
obtained as
 i  j   t 
RSP (t , α ) = 
 ∏ 1−
 ∏   ∫
 1 − exp  − ( mrs ( k ) + nrs ( k ) − α nrs ( k )) dk    ,
  

 s=1  r =1 0

i  j   t
 
∏ 1−
 ∏   ∫
 1 − exp  − ( mrs ( k ) − lrs ( k ) + α lrs ( k )) dk    
   
  
(6.48)
s=1  r =1 0
Figure 6.2 Series-parallel system.

 i  j   t 
RSP (t , β ) = 
 ∏ 1−
 ∏   ∫
 1 − exp  − ( mrs ( k ) + β nrs′ ( k )) dk    ,
  

 s=1  r =1 0

i  j   t 
∏ 1−
 ∏   ∫
 1 − exp  − ( mrs ( k ) − β lrs′ ( k )) dk    
   
  
(6.49)
s=1  r =1 0
Now, if the failure rate function is constant, then we have
 i  j

RSP (t , α ) = 


∏ s=1
1−

∏ (1 − exp ( −t ( m
r =1
rs + nrs − α nrs ))  ,

)

i  j

∏
s=1
1−

∏ (1 − exp ( −t ( m
r =1
rs − lrs + α lrs )) ) 
 
(6.50)
 i  j

RSP (t , β ) = 
 ∏ 1−
 ∏ (1 − exp ( −t ( m rs

))
+ β nrs′ )  ,
 s=1 r =1

i  j
 (6.51)
∏ 1−
 ∏ (1 − exp ( −t ( m rs − β lrs′ ) )) 
 
s=1 r =1 
Again, if the failure rate function is not constant, i.e., if the failure rate function γ rs (t) is
represented as
where lrs(t) = lrs = constant, nrs(t) = nrs = constant, l′rs(t) = l′rs = constant, n′rs(t) = n′rs = constant,
and m(t) = prs e qrst , where prs is a positive constant, then we have
 i  j   t 
RSP (t , α ) = 
 ∏ 1−
 ∏ ∫(
 1 − exp  − prs e qrs k + nrs − α nrs dk    ,
    

)
 s=1  r =1 0

i  j   t 
∏ 1−
 ∏   ∫(
 1 − exp  − prs e qrs k − lrs + α lrs ) dk    
   
  
) (6.52)
s=1  r =1 0
 i  j   t 
RSP (t , β ) = 
 ∏ 1−
 ∏ ∫(
 1 − exp  − prs e qrs k + β nrs′ dk    ,
    

)
 s=1  r =1 0

i  j   t  
∏ 1−
 ∏   ∫( )
 1 − exp  − prs e qrs k − β lrs′ ) dk    
   
  
(6.53)
s=1  r =1 0
6.6 Examples
Here in this section, some numerical examples are discussed to illustrate the new approach.
6.6.1 Series system

Consider an example of a hydro power plant. Let there be three turbines in a hydro power
plant and all must work normally to generate a predefined amount of electricity. Let the
failure rates of turbines be in the form of triangular CBFS as
γ 1 = (0.0015, 0.002, 0.0035; 0.001, 0.002, 0.0045)

γ 2 = (0.0017, 0.002, 0.003; 0.0015, 0.002, 0.004)
γ 3 = (0.0019, 0.002, 0.0037; 0.0017, 0.002, 0.005)
The system reliability of a power plant is calculated by using Equations (6.29) and (6.30).
The reliabilities are obtained as triangular CBFN for a conflicting bifuzzy failure rate of
turbines with different values of time t.
When time t = 150, then the system reliability is obtained as
RS (150) = (0.2165, 0.4065, 0.4653; 0.1423, 0.4065, 0.5325)
RS (175) = (0.1677, 03499, 0.4096; 0.1027, 0.3499, 0.4795)
RS (225) = (0.1007, 0.2592, 0.3174; 0.0536, 0.2592, 0.3886)
6.6.2 Parallel system

Suppose there are three active and independent transmitters in a broadcast station. For a
successful transmission at least one of these transmitters must work properly. Here, failure
rates of transmitters are as
γ 1 = (0.05, 0.1, 0.13; 0.04, 0.1, 0.15)

γ 2 = (0.06, 0.1, 0.14; 0.05, 0.1, 0.16)
γ 3 = (0.07, 0.1, 0.15; 0.06, 0.1, 0.17)
The reliability of the system is evaluated using Equations (6.38) and (6.39). The system
reliability is triangular CBFN for the conflicting bifuzzy failure rate of components with
different values of time t.
At t = 10 RP (10) = (0.5742, 0.7474, 0.9106; 0.4932, 0.7474, 0.9414)
At t = 15 RP (15) = (0.3266, 0.7964, 0.5311; 0.2489, 0.7964, 0.8802)
At t = 20 RP (20) = (0.1738, 0.3535, 0.6672; 0.1189, 0.3535, 0.7567)

6.6.3 Parallel-series system

Suppose there is a communication system that receives the input signal and transmits the
output signal. For this there are two receivers and two transmitters in the system con-
nected as is shown in Figure 6.3. For a successful communication, at least one receiver and
one transmitter connected in series configuration must work properly.
The failure rates of receivers and transmitters are in the form of triangular CBFNs
given as
γ 11 = (0.05, 0.1, 0.15; 0.04, 0.1, 0.16)
γ 12 = (0.06, 0.11, 0.16; 0.05, 0.11, 0.17)

γ 21 = (0.07, 0.12, 0.17; 0.06, 0.12, 0.18)
γ 22 = (0.08, 0.13, 0.18; 0.07, 0.13, 0.19)
The reliability of the system is evaluated using Equations (6.43) and (6.44).
At time t = 10 system reliability is
RPS (10) = (0.1394, 0.6046, 0.9058; 0.0816, 0.6046, 0.9429)
At time t = 20 system reliability is
RPS (20) = (0.4381, 0.9676, 0.9959; 0.2551, 0.9676, 0.9997)
6.6.4 Series-parallel system

Suppose there is a communication system, which receives the input signal and transmits
the output signal. For this there are two receivers and two transmitters in the system con-
nected as shown in Figure 6.4. For a successful communication, at least one receiver and
one transmitter must work properly.
The failure rates of receivers and transmitters are in the form of triangular CBFNs
given as
γ 11 = (0.05, 0.1, 0.15; 0.04, 0.1, 0.16)

γ 21 = (0.07, 0.12, 0.17; 0.06, 0.12, 0.18)
γ 12 = (0.06, 0.11, 0.16; 0.05, 0.11, 0.17)
γ 22 = (0.08, 0.13, 0.18; 0.07, 0.13, 0.19)
R11 (Receiver 1) R12 (Transmitter 1)
Input Output
Figure 6.3 Parallel-series system.

Input Output
Figure 6.4 Series-parallel system.
The reliability of the system is evaluated using Equations (6.51) and (6.52).
At t = 10 system reliability is
RSP (10) = (0.000001, 0.00021, 0.0092; 0.0000006, 0.00021, 0.0176)
At t = 20 system reliability is
RSP (20) = (0.000004, 0.00040, 0.0513; 0.0000012, 0.00040, 0.0871)
6.7 Conclusion
In this study, a procedure is introduced to construct the membership and the nonmem-
bership functions of the fuzzy reliability function, by considering the failure rates as time-
dependent CBFN. With the introduced approach, reliability of different systems (series,
parallel, parallel-series, and series-parallel systems) is evaluated in the form of a t riangular
CBFN. In all of these systems, the failure rate of each component is taken as time-dependent
triangular CBFS. Since the fuzzy set and intuitionistic fuzzy set are the special cases of the
CBFS, the proposed method is very nicely applicable in these types of sets also. Hence, we
can conclude that this approach can easily be applied for the assessment of the reliability of
the systems whenever there is some uncertainty in the information, available for the systems.
References
Atanassov K. Intuitionistic fuzzy sets. Fuzzy Sets and Systems 1986; 20(1):87–96.
Atanassov K. Intuitionistic Fuzzy Sets. New York: Physica-Verlag; 1999.
Basu K, Deb R, Pattanaik PK. Soft sets: An ordinal formulation of vagueness with some application
to the theory of choice. Fuzzy Sets and Systems 1992; 45:45–58.
Burillo P, Bustinces H. Construction theorem for intuitionistic fuzzy sets. Fuzzy Sets and Systems
1996; 84:271–281.
Cai KY, Wen CY, Zhang ML. Fuzzy variables as a basis for a theory of fuzzy reliability in the
possibility context. Fuzzy Sets and Systems 1991; 42(2):145–172.
Chen SM. Fuzzy system reliability analysis using fuzzy number arithmetic operations. Fuzzy Sets
and Systems 1994; 64(1):31–38.
Chen SM. New method for fuzzy system reliability analysis. Cybernetics and Systems: An International
Journal 1996; 27:385–401.
Gau WL, Buehrer DJ. Vague sets. IEEE Transactions on Systems, Man, and Cybernetics 1993; 23:610–614.
Gianpiero C, David C. Basic intuitionistic principle in fuzzy set theories and its extension
(A terminological debate on Atanassov IFS). Fuzzy Sets and Systems 2006; 157:3198–3219.
Goguen J. L-fuzzy sets. Journal of Mathematical Analysis and Applications 1967; 18:145–174.
Li DF, Shan F, Cheng CT. On properties of four IFS operators. Fuzzy Sets and Systems 2005; 154:151–155.
Roy SK, Maity G, Weber GW, Gok SZA. Conic scalarization approach to solve multi-choice
multi-objective transportation problem with interval goal. Annals of Operations Research 2017;
253(1):599–620.
Singer D. A fuzzy set approach to fault tree and reliability analysis. Fuzzy Sets and Systems 1990;
34(2):145–155.
Supriya KD, Ranjit B, Akhil RR. Some operations on intuitionistic fuzzy sets. Fuzzy Sets and Systems
2005; 156:492–495.
Zadeh LA. Fuzzy sets. Information and Control 1965; 8(3):338–353.
Zamali T, Lazim MA, Osman MTA. An introduction to conflicting bifuzzy set theory. International
Journal of Mathematics and Statistics 2008; 3(A08):86–95.
Zhang WR, Zhang L. Yin-Yang bipolar logic and bipolar fuzzy logic. Information Sciences 2004;
165:265–287.
chapter seven
Recent progress on failure time data

analysis of repairable system
Yasuhiro Saito
Japan Coast Guard Academy
Tadashi Dohi
Hiroshima University
Contents
7.1 I ntroduction......................................................................................................................... 129
7.2 Model description............................................................................................................... 131
7.3 Parametric estimation method......................................................................................... 132
7.3.1 Single failure-occurrence time data case............................................................. 133
7.3.2 Multiple failure-occurrence time data case........................................................ 134
7.4 Nonparametric estimation methods................................................................................ 135
7.4.1 Constrained nonparametric ML estimator......................................................... 135
7.4.1.1 Single failure-occurrence time data case.............................................. 135
7.4.1.2 Multiple failure-occurrence time data case.......................................... 137
7.4.2 Kernel-based approach.......................................................................................... 138
7.4.2.1 Single failure-occurrence time data case.............................................. 138
7.4.2.2 Multiple failure-occurrence time data case.......................................... 140
7.5 Numerical examples........................................................................................................... 142
7.5.1 Simulation experiments with single minimal repair data............................... 142
7.5.2 Real example with multiple minimal repair data sets...................................... 144
7.6 Conclusions.......................................................................................................................... 146
References...................................................................................................................................... 147
7.1 Introduction
In recent years, industrial systems have been more large scaled and more complex, and
played a significant role to improve the quality of our daily life. For utilizing the ability
of such systems, it is important to understand the behavior of failure phenomena of the
industrial systems. People who operate the repairable systems may try to assess the sys-
tem reliability and/or availability accurately for a long time. In addition, estimating the
cost to maintain operation of the repairable systems is regarded as an important issue
for practitioners. To describe the stochastic behavior of a cumulative number of failures
occurring as the operating time progresses, we can apply the stochastic point processes as
a powerful mathematical tool. In fact, there are many research results for life data analysis
of the industrial systems including production machines, which are based on stochastic
modeling of time-to-failure phenomena. By modeling a conditional intensity function that
129
represents the rate of occurrence of failure at an arbitrary time, the various types of fail-
ure phenomena can be described by the stochastic point processes. These stochastic point
processes can be characterized by the failure time (lifetime) distribution and/or the kind
of repair operation (Ascher and Feingold 1984; Nakagawa 2005).
For non-repairable systems, the failed system or component is usually replaced by
the new one or repaired appropriately, when a failure occurs. In the situation where the
replacement time can be assumed to be negligible with respect to the failure time scale,
the time evolution can be described by a renewal process (RP) with independent and
identically distributed (i.i.d.) inter-failure time distributions (renewal distribution) (Cox
1972). We may also consider the failure occurrence phenomena in repairable systems by
modeming with another representative stochastic point processes. It is well known that
the nonhomogeneous Poisson process (NHPP) is the simplest but most useful tool for
modeling such phenomena. In the repairable systems, we perform the repair action after
failure occurs in order to return the failed component to the normal condition. Such an
activity may restore only the damaged part of the failed component back to a working con-
dition that is only as good as it was just before the failure in some cases. This repair action
is called minimal repair. The minimal repair process with a negligible repair time sequence
can be represented by a NHPP. Therefore, the analytical treatment of the minimal repair
process is rather easier than that in the RP.
In this chapter we mainly focus on the failure time data analysis based on the NHPP
and discuss several statistical estimation methods for a periodic replacement problem
with minimal repair as the simplest application of life data analysis with NHPP. NHPP is
characterized by the intensity function, which represents the rate at which events occur, or
the corresponding mean value function, which is defined by the integral of the intensity
function and means the expected cumulative number of events by an arbitrary time. Two
types of statistical inference approaches for NHPP are considered according to the situa-
tion where we can know the information on intensity function (or equivalently mean value
function) or cannot in advance. If the form of intensity function is known in advance, a
parametric model having any parametric intensity function is usually applied. But, if the
form of intensity function is unknown, nonparametric models may be applied to avoid the
mis-specification of failure occurrence phenomena. In this chapter, we consider the well-
known parametric model called the power law process as an example, and summarize two
nonparametric methods called the constrained nonparametric maximum likelihood estimation
(CNPMLE) and the kernel-based estimation for the NHPP.
As an application example of statistical inference of failure processes, a periodic
replacement problem with minimal repair is one of the most fundamental, but most
important maintenance solutions (Barlow and Proschan 1996). The original periodic
replacement model with minimal repair has been extended from various points of view
by several authors (Boland 1982; Colosimo et al. 2010; Nakagawa 1986; Park et al. 2000; Sheu
1990, 1991; Valdez-Flores and Feldman 1989) after the seminal contribution by Barlow and
Hunter (1960). Recently, Okamura et al. (2014) developed a dynamic programming algo-
rithm to compute effectively the optimal periodic replacement time in Nakagawa (1986).
We apply not only the parametric maximum likelihood estimation for the power
law process but also CNPMLE and kernel-based estimation methods for estimating the
cost-optimal periodic replacement problem with minimal repair, where single or multiple
minimal repair data are assumed. The former means that a single time series of failure
(minimal repair) time is observed; the latter implies that the multiple time series data are
observed from multiple production machines, where the multiple data involve the single
data case as a special case. In the numerical example, we conduct a simulation experiment
Chapter seven: Recent progress on failure time data analysis of repairable system 131
of single minimal repair process and a real data analysis with the multiple field data sets
of minimal repair of diesel engine in Nelson and Doganaksoy (1989).
7.2 Model description
Suppose that more than one failure may occur in each system component of a repairable
system. Usually, two kinds of repair actions are performed to return the failed component
state to the normal condition after each failure. One is called the minimal repair (Barlow
and Proschan 1996). This repair activity restores only the damaged part of the failure com-
ponent back to a working condition that is only as good as it was just before the failure.
Another is called the periodic replacement. This is a preventive maintenance action which is
planned in advance, where the used component is replaced by a new one at a prespecified
time. For describing the failure occurrence phenomena under such repair activities, it is
well known that an NHPP is useful. That is, NHPP can be used to model the stochastic
behavior of a cumulative number of failures under the minimal repair.
More specifically, suppose that the failure time T follows an absolutely continu-
ous probability distribution function Pr (T ≤ t ) = F ( t ) and a probability density func-
tion dF ( t )/dt = f ( t ). Define F ( t ) = 1 − F ( t ) . If the minimal repair is made at the first
failure, then the probability that the system does not fail beyond the time t is given by
t
F (t ) +
∫ ( F (t )/F ( x)) dF ( x). Continuing similar manipulations up to the n-th failure yields
0
an NHPP, { N ( t ) , t ≥ 0}, with the mean value function (Baxter 1982):
E  N ( t )  = Λ ( t ) = − log F ( t ) (7.1)
where N(t) represents the cumulative number of minimal repairs by time t. It is well known
that the NHPP { N ( t ) , t ≥ 0} possesses the following properties:
• N(0) = 0
• {N(t), t ≥ 0} has independent increments
• Pr{N(t + Δt) − N(t) ≥ 2} = o(Δt)
• Pr{N(t + Δt) − N(t) = 1} = λ(t) Δt + o(Δt)
where o(Δt) is the higher term of Δt, and the function λ(t) is called the intensity function
of NHPP. The mean value function in Equation (7.1) is also defined as an integral of the
intensity function:
t
Λ(t) =
∫ λ(x) dx (7.2)
0
and means the expected cumulative number of failures occurred by time t. Then, the prob-
ability mass function (p.m.f.) of the NHPP is given by
Pr { N ( t ) = n} =
{ Λ ( t )}
n
exp {−Λ ( t )} (7.3)

n!
If the component fails before time τ ( > 0 ) , then the failed component is restored to
a working condition that is only as good as it was just before failure in the periodic
replacement problem with minimal repair. Here, the minimal repair is made so that the
failure rate remains undisturbed by repair after each failure. Also, after the operational
time reaches τ, we replace the used component by a new one preventively. Therefore,
it is easier to plan the preventive replacement periodically than the age replacement
since the past replacement history is not needed to record in the periodic replacement.
However, an additional cost is necessary for the periodic preventive replacement since
the used component is replaced by a new one at time τ, where the repaired component
before time τ is also used in operation. Define the time length from the beginning of the
operation to the periodic replacement as one cycle. Then, the expected total cost for one
cycle can be represented by cm Λ (τ ) + c p , where cm (> 0) and c p (> 0) represent the fixed cost
of each minimal repair and a periodic replacement, respectively. Dividing the expected
cost for one cycle by time τ leads to the long-run average cost per unit time in the peri-
odic replacement with minimal repair:
C (τ ) =
cm Λ (τ ) + c p
=
cm
∫
0
λ ( t ) dt + c p
(7.4)
τ τ
where τ (> 0) is a decision variable in our problem and denotes the periodic replacement
time. Then, the purpose is to derive τ * which minimizes Equation (7.4). Differentiating
Equation (7.4) with respect to τ and setting it to zero implies the first-order condition of
optimality:
τ
τλ (τ ) −
∫
0
λ ( t ) dt = γ (7.5)
where γ is called the cost ratio and defined by γ = c p /cm. For solving the nonlinear Equation (7.5)
with respect to τ, the unknown intensity function has to be estimated via an either para-
metric or nonparametric way. This is usually done based on the information on the inten-
sity function, which is available from the past minimal repair record or history. Under
τ
 
τ →∞  0 ∫
the strictly increasing intensity function, i.e., dλ ( t )/dt > 0, if lim  τ λ (τ ) − λ ( t ) dt  > γ

holds, then a unique and finite optimal periodic replacement time τ * (0 < τ * < ∞) minimiz-
ing Equation (7.4) always exists.
7.3 Parametric estimation method

The maximum likelihood (ML) estimation is a commonly used technique to identify the
failure time distribution or the minimal repair process. In this method, we assume that
the form of intensity function is known in advance, and estimate the model parameters
with the underlying minimal repair data. As the representative parametric models, the
power-law process (Crow 1974) and Cox–Lewis process (Cox and Lewis 1966) are fre-
quently assumed without justification for the failure occurrence phenomena under mini-
mal repair. Once the intensity function is specified, the problem is reduced to estimate the
model parameters included in the intensity function using the underlying minimal repair
(failure time) data, and the point estimate of the optimal periodic replacement time (or
the corresponding minimum long-run average cost per unit time) is derived as a plug-in
estimate with model parameters estimated by the ML method.
7.3.1 Single failure-occurrence time data case

We assume at the moment that the form of intensity function is completely known from
the past experience. Various failure occurrence phenomena in different situations can be
considered by assuming the form of intensity function λ(t) or the corresponding mean
value function Λ(t) appropriately. The power law model (Crow 1974) is also assumed for easy
understanding:
β −1
 β t 
λ (t ;η , β ) =     (7.6)
 ηη
It is also known that the power law model represents two different situations where the
repairable systems monotonically deteriorate or are monotonically improved over time t.
The intensity function λ(t) is decreasing with time under the model parameter β < 1, which
means that the system is deteriorating in time. But, λ(t) is increasing with time under the
model parameter β > 1, which implies the system’s improvement. Furthermore, λ(t) is con-
stant when β = 1. This corresponds to the special case where the underlying failure process
reduces to a homogeneous Poisson process.
Once the model is selected, the next step is to estimate the model parameters (η, β) with
failure-occurrence time data. Suppose that n failure-occurrence time data, which are the
random variables, are given by 0 < T1 ≤ T2 ≤ ≤ Tn ≤ T , where T is the right censoring time
of these observation data. That is, it is assumed that n failures occur by time t , which is
realization of T , and the realizations of Ti ( i = 1, 2, , n), say, ti, are observed, where tn ≤ t.
( )
Then, the ML estimates ηˆ , βˆ are defined as the parameters that maximize the following
likelihood function:
n
(
LF η , β  ti = ) ∏ λ (t ;η , β ) exp ( −Λ (t ;η , β )) (7.7)
i
i=1
For easy calculation, it is usually considered to maximize the log-likelihood function,

which is derived by taking the logarithm of both sides in Equation (7.7):
n
(
LLF η , β  ti = ) ∑ log λ (t ;η , β ) − Λ (t ;η , β ) (7.8)
i
i=1
From the first-order condition of optimality in Equation (7.8) for each parameter, the ML
( )
estimates ηˆ , βˆ can be derived by solving the following simultaneous equations:
t
ηˆ = 1 βˆ (7.9)
n
n
βˆ = (7.10)
t
∑
n
log
i=1 ti
Following the above procedure, we obtain the ML-based plug-in point estimates, τˆ * and
C ( τˆ * ) , of the optimal periodic replacement time τ * and its associated minimum long-
run average cost per unit time by substituting the resulting intensity function with the
( )
estimates ηˆ , βˆ into Equation (7.5). From the property of intensity function of the power
law process, if the parameter β is greater than 1, then it is guaranteed to exist a unique and
finite optimal periodic replacement time.
7.3.2 Multiple failure-occurrence time data case

In the failure time data analysis, it is often assumed that not only single but also multiple
time-series data on minimal repair processes are observed. Suppose that l-independent
sets of failure-occurrence time data for identical components are available. Let
( )
t ji j = 1, 2, , l ; i = 1, 2,  , n j represent ith failure-occurrence time data of jth data set,
where nj means the number of failures occurred in the jth component. Also, each failure-
occurrence time datum is assumed to be censored at time tj . The censoring time can be
considered to be deterministic or regarded as an observation of random censoring time Tj.
It is reasonable to assume that T1 , T2 ,… , Tl follow an identically unknown probability dis-
tribution that is independent of the underlying Poisson process in the latter case. In other
words, their probability distribution does not depend on the model parameters (η, β) of the
intensity function. Then the ML estimates of model parameters ηˆ , βˆ , which are included
in the intensity function or mean value function of NHPP, are defined by maximizing the
( )
following likelihood function:
 l nj
  l tˆ j 
(
LLF η , β  t ji ) ∏ ∏ λ (t ) exp  ∑ ∫
=

ji ()
λ ( x ) dx  f t (7.11)

j=1 i=1 j=1
0
()
where f t is the joint density function of T1 , T2 , … , Tl . Since this density function is also
independent of model parameters, the ML estimates ηˆ , βˆ can be derived by maximizing
the following log-likelihood function similar to the single failure-occurrence time data
( )
case:
l  nj 
(
LLF η , β  t ji = ) ∑∑  ( )
log λ t ji ; η , β − Λ tˆj ; η , β ( ) (7.12)
j=1  i = 1 
From the first-order condition of optimality in Equation (7.12), ML estimates ηˆ , βˆ can be ( )

calculated with multiple failure-occurrence time data sets by solving the following simul-
taneous equations:
1
1 tˆjβ  β
l ˆ ˆ
ηˆ = 
m  ∑j=1
 (7.13)
n j 
 l nj
 tˆj 
βˆ 
∑ ∑
n
0=  j+ log t ji −   log tˆj  (7.14)

j=1 
βˆ i=1  ηˆ  

Unfortunately, in this case, β̂ cannot be derived in closed form. For the case of multiple
minimal repair data sets, by assuming the form of intensity function λ(t) or the correspond-
ing mean value function Λ(t), the model parameters that are included in these functions
can be derived regardless of the form of intensity function. Once the estimates of intensity
function are obtained, we can get the optimal periodic replacement time with Equation
(7.5). Therefore, in a similar way with single minimal repair data, it is also easy to handle
the multiple minimal repair data case.
Though a number of variations have been studied in the literature for the preventive
maintenance problems including the age replacement problem and the periodic replace-
ment problem, it is assumed in many cases that the failure time distribution or equivalently
minimal repair process is completely known. Therefore, once the failure time distribution
is specified, the problem is to estimate the model parameters from the underlying failure
time data, and the point estimate of the optimal preventive maintenance time is derived as
a plug-in estimate with the estimated model parameters. In other words, when the failure
time distribution is unknown in advance, the analytical models in the literature cannot
provide the optimal solution.
7.4 Nonparametric estimation methods

In related work on the preventive maintenance problem, Arunkumar (1972), Bergman
(1979), and Ingram and Scheaffer (1976) were concerned with the nonparametric estima-
tion for a simple age replacement model (Barlow and Proschan 1996). Under the assump-
tion where the independent and identically distributed (i.i.d.) failure time data of the
unknown probability distribution function are available, nonparametric point estimates
of an optimal age replacement were derived in their works. In this section, we also con-
sider nonparametric estimation methods for a periodic replacement problem with mini-
mal repair, where the form of intensity function or the corresponding mean value function
is completely unknown.
7.4.1 Constrained nonparametric ML estimator

In the case where nonparametric estimation methods are used to estimate the optimal
periodic replacement time, Gilardoni et al. (2013) developed nonparametric estimation
techniques for a periodic replacement problem with minimal repair. They assumed that
the minimal repair process or its superposition is observed. The idea is based on the total
time on test transform with multiple time series data (Bergman 1979) and a CNPMLE of
NHPP by Boswell (1966). Of course, CNPMLE can be applied for the single minimal repair
time data case as well.
7.4.1.1 Single failure-occurrence time data case

Consider the case where the failure time distribution F(t) and the intensity function λ(t)
are completely unknown. In this case, the common method of ML no longer works. In a
fashion similar to the parametric case, suppose that the failure-occurrence time data are
given by random variables 0 < T1 ≤ T2 ≤ ≤ Tn ≤ T with realizations 0 < t1 ≤ t2 ≤ ≤ tn ≤ t.
At first, we introduce the most primitive but simplest method to estimate the intensity
function of an NHPP. The basic idea is to use a piecewise-linear interpolation. The follow-
ing piecewise-linear estimate with breakpoints ti can be defined with n failure-occurrence
time data for the mean value function of NHPP:
ˆ ( t ) = i + t − ti , ti < t ≤ ti + 1 ; i = 0,1, , n − 1 (7.15)

Λ
ti + 1 − ti
where t0 = 0. By plotting n failure points and connecting them by line segments, we can
obtain the resulting estimate of the mean value function in Equation (7.15). This is called
the naïve estimator and has a property that the mean square error between the cumulative
number of failures and the mean value function of NHPP is always zero. The correspond-
ing naïve estimate of the intensity function can be defined as the slope of mean value func-
tion in each failure time interval by
1
λˆ ( t ) = , ti − 1 < t ≤ ti ; t0 = 0 (7.16)
ti − ti − 1
Note that the naïve estimator in Equation (7.16) does not work well for the generalization
ability. Although it can fit the past observation (training data) very well, the prediction
result for the unknown (future) pattern data is rather poor. In addition, the naïve estimator
in Equation (7.16) tends to fluctuate everywhere with big noise, and does not provide stable
estimation results.
Contrary to this, Boswell (1966) introduced the idea on isotonic estimation and gave
a CNPMLE. Suppose that the nondecreasing intensity function λ(t) with respect to time t,
i.e., the mean value function, is nondecreasing and convex in time. Boswell (1966) proved
that the intensity function that maximizes the likelihood function of NHPP under the
nondecreasing property is given by a step-function with breakpoints at any realization ti.
Therefore, we can define the likelihood function as a function of unknown intensity func-
tions at respective time points:
n
 tˆ 
LF ( λ ( ti ) , i = 1, 2, , n) = exp −
 ∫0
λ ( t ) dt 

∏ λ (t ) (7.17)
i=1
i
Then the nonparametric ML estimation method with Equation (7.17) is formulated as the
following variational problem with respect to λ(·):
arg max L ( λ ( ti ) , i = 1, 2, , n)

(7.18)
λ ( ti ) , i = 1, 2, , n
Under the nondecreasing assumption, Boswell (1966) formulated it as a max–min problem

and gave the solution:
 0, 0 ≤ t < t1 ,
ˆ 
λ (t ) =  q−p (7.19)
max 1≤ p ≤ i min i ≤ q ≤ n , ti ≤ t < ti + 1 ; i = 1, 2, , n − 1
 tq − tp

It can be easily checked that Equation (7.19) leads to an upper bound of Equation (7.17)
for an arbitrary nondecreasing intensity function by substituting Equation (7.19) into
Equation (7.17). Although the resulting estimator in Equation (7.19) is still discontinuous,
it is somewhat smoother than Equation (7.16). The computation cost is quite low for the
CNPMLE compared with other representative nonparametric estimation methods. We
introduce the following simple algorithm for the CNPMLE of an NHPP:
• Set h = 1 and i1 = 1.
• Repeat until ih + 1 = n :
Set ih + 1 to be the index i, which minimizes the slopes between ( tih , ih − 1) and (ti , i − 1)
( i = ih + 1,… , n).
( )( )
• The CNPMLE is then given by λˆ ( t ) = i j + 1 − i j ti j+1 − ii j whenever ti j ≤ t < ti j+1 .
Since we assume at the moment that the intensity function is nondecreasing with respect
to time, the resulting CNPMLE is regarded as a specific estimator that represents an
increasing intensity trend. Therefore, if the data have such an increasing trend, then the
nondecreasing CNPMLE is expected to be useful. However, when the assumption on an
increasing intensity trend is violated, it may be less effective.
Contrary with the above discussion, we also consider a nonincreasing intensity func-
tion λ(t). In this case, the mean value function is nondecreasing but concave in time. By
solving the variational problem in Equation (7.18) under the condition that λ(t) is non-
increasing, the corresponding CNPMLE can be derived as the following min–max solution:
 q−p
ˆ  min 1≤ p ≤ i max i ≤ q ≤ n , ti − 1 ≤ t ≤ ti ; i = 1, 2, , n,
λ (t ) =  tq − tp (7.20)
 0, t ≥ tn

where t0 = 0. By assuming the monotone trend (nondecreasing or nonincreasing) of inten-
sity function in advance, CNPMLE can also represent the degradation or improvement of
systems appropriately.
Since the left side of the first-order condition of optimality in Equation (7.5) with
CNPMLE is a step function with breakpoints at any realization of failure-occurrence time,
the resulting optimal periodic replacement time is also always equal to any realization of
failure-occurrence time. This may be a drawback to apply the CNPMLE for the periodic
replacement problem with minimal repair. If there does not exist the failure-occurrence
time data that are closed to the true optimal periodic replacement time, then the resulting
estimate of optimal periodic replacement time never provides the near value to the true one.
7.4.1.2 Multiple failure-occurrence time data case

Similar to the parametric ones, the idea of CNPMLE can be extended easily to the case with
multiple failure-occurrence time data sets. Let Tji ( j = 1, 2, , l ; i = 1, 2, , n j ) denote the
random variable of ith failure-occurrence time data at jth Poisson process with realizations
t ji, and let T * = max Tj with realization t * , where Tj is the random censoring time of jth com-
1≤ j ≤ l
ponent with realizations tj . Change the order of components’ failures to a single-ordered
sample (0 =)t0 < t1 < < ts < t *, where ti does not include the tie data and s is the total num-
ber of time data sets after omitting the tie data. Following the idea by Zielinski et al. (1993),
the CNPMLE under nondecreasing condition with multiple failure-occurrence time data
sets is obtained by
 0, 0 ≤ t < t1 ,

∑
q
 mi
λˆ ( t ) =  i= p (7.21)
 max 1≤ p ≤ i min i ≤ q ≤ f , ti ≤ t ≤ ti + 1 ; i = 1, 2, , s − 1
∑
q
 ∆i
 i= p
where
∑ {0, min {(t }}

l
∆i = jn j )
− ti , ( ti + 1 − ti ) (7.22)
i=1
and mi means the number of failures occurred at time ti. Similar to this, the CNPMLE
under a nonincreasing condition with multiple failure-occurrence time data sets are also
derived by

∑
q
 mi
i= p
ˆ  min 1≤ p ≤ i max i ≤ q ≤ f , ti − 1 ≤ t ≤ ti ; i = 1, 2, , s,
λ (t ) = 
∑
q
∆i (7.23)
 i= p
 0, t ≥ ts

where t0 = 0, and mj and Δj are well defined.

Note that random censoring times Tj are also independent of the underlying identi-
cal NHPPs and do not influence the resulting intensity functions in Equations (7.21) or
(7.23). Furthermore, similar to the case of single minimal repair data, the candidates of
estimated optimal periodic replacement time are obtained from single-ordered samples
ti ( i = 1, 2, , s ). Hence, the estimation results with CNPMLE may be far from the true
optimal solution.
7.4.2 Kernel-based approach
We introduce an alternative nonparametric estimation technique, which is called the
kernel-based method, for the periodic replacement problem with minimal repair. The
kernel-based approach is well known to be quite useful since the convergence of non-
parametric estimators can be improved (Rinsaka and Dohi 2005). Recently, Gilardoni
and Colosimo (2011) applied a kernel-based estimation method to obtain the opti-
mal periodic replacement time with multiple minimal repair data sets. Similar to the
CNPMLE, they applied the so-called TTT method and transformed the multiple-case
to the single one. In this section, we use the well-known Gaussian kernel function, and
apply two bandwidth estimation methods with integrated least squares error (Diggle
and Marron 1989) and log likelihood function (Guan 2007). Diggle and Marron (1989)
proved the equivalence of smoothing parameter selection between the probability den-
sity function with i.i.d. samples and the intensity function estimation with the minimal
repair data.
7.4.2.1 Single failure-occurrence time data case

Of our interest here is the derivation of absolutely continuous nonparametric estimators.
For this purpose, we apply the kernel-based approach to estimate the intensity function of
an NHPP. Define
λˆ ( t ) =
1
h ∑ K  t −h t 
i=1
i
(7.24)
where K(·) denotes a kernel function and h is a positive constant, called smoothing parameter or
bandwidth. Roughly speaking, the kernel-based method approximates the intensity function
with a superposition of kernel functions with location parameter at each failure-occurrence
time. Since the choice of h is more sensitive rather than the choice of kernel function to improve
the accuracy of λ(t), we deal with only a well-known Gaussian kernel function:
1  t2 
K (t ) =  − 2  (7.25)
2π
It corresponds to the probability density function of the standard normal distribution.

The main reason we prefer the absolutely continuous nonparametric estimator is that the
naïve estimator in Equation (7.16) and the CNPMLE in Equations (7.19) or (7.20) are func-
tions of ti ( i = 1, 2, , n). Therefore, the resulting estimate of the optimal periodic replace-
ment time minimizing the long-run average cost per unit time has to be selected from
the past failure-occurrence time. In other words, if a sufficiently large number of failure-
occurrence time data is obtained, then both approaches may work better but will not in
the small sample problems.
We consider two estimation methods for bandwidth h for the kernel-based approach.
Diggle and Marron (1989) determined a bandwidth with the least-squares cross-validation
(LSCV) method by minimizing the relevant integrated least squares error between the
kernel-based intensity function and unknown “real” intensity function of NHPP. The
second approach is to apply the log-likelihood cross-validation (LLCV) method by Guan
(2007). For preparation of both methods, we divide the underlying failure-occurrence time
data into training data and validation data. By leaving out one of each ith ( i = 1, 2,  , n) data
from the n original failure time data ti ( i = 1, 2,  , n), we can make n training data sets
with (n − 1) failure data. It is called the leave-one-out cross-validation. Although other cross-
validation approaches can be considered even in our scheme, we concentrate the simplest
case here. The integrated least squares error of the intensity function λˆ ( t ) in LSCV method
is defined by
∫ {λˆ (t ) − λ (t )} dt
tˆ 2
ISE ( h ) =
0
(7.26)
tˆ tˆ tˆ
∫ λˆ ( t ) dt − 2
∫ λˆ ( t )λ ( t ) dt +
∫ λ (t ) dt
2 2
=
0 0 0
where λ(t) is the “true” but unknown intensity function. After omitting the last term,
which is independent of h, and approximating the second term in Equation (7.26), it can
be checked that the optimal bandwidth minimizing ISE (h) is equal to h minimizing the
following function:
tˆ n
CV ( h ) =
∫ 0
λˆ ( t ) dt − 2
2
∑ λˆ
r =1
h, r ( tr ) (7.27)
where
n
λˆ h , r ( t ) =
1
∑ t − t
K  i
h i = 1, i ≠ r  h 
 (7.28)
In the LLCV method with the same n training data sets, we consider obtaining the optimal
bandwidth h by maximizing the log-likelihood function with an unknown intensity func-
tion. The log-likelihood function based on the cross-validation approach is given by
n n n
ln L ( h ) = ∑∑
k =1 r =1
ln λˆ h , r ( tk ) − ∑ Λˆ (tˆ ) (7.29)
r =1
h, r
where
t
ˆ h, r (t ) =
Λ
∫ λˆ 0
h, r ( t ) dt (7.30)
and λˆ h , j ( t ) is already defined in Equation (7.28).

It is noted that the above kernel estimates of intensity function are absolutely continu-
ous, but may fluctuate everywhere similar to the naïve estimate. Nevertheless, the result-
ing estimates of optimal periodic replacement time and corresponding minimum long-run
average cost per unit time can be expected to be much smoother than that. Of course, sev-
eral kinds of kernel functions such as biweight kernel function and Epanechnikov kernel
function (Rinsaka and Dohi 2005) can be applied for the periodic replacement problem
with minimal repair as well. Furthermore, an adaptive choice of bandwidth is possible
though we do not consider that here, i.e., h = h ( i ) ( i = 1, 2,  , n).
7.4.2.2 Multiple failure-occurrence time data case

Similar to the case of CNPMLE, suppose that l-independent sets of failure-occurrence time
data t ji ( j = 1, 2, , l ; i = 1, 2,  , n j ) with censoring time tj ( j = 1, 2, … , l ) are available. In
the case of multiple failure-occurrence time data sets, several cross-validation techniques
are also considered. In this section, we propose two estimation methods for intensity func-
tion of NHPP based on the kernel-based approach.
One is the simple extension of the single failure-occurrence time data case. For jth set
of failure-occurrence time data, we define
nj
t − t ji 
λˆ j1 ( t ) =
1
h ∑ K 
i=1
h 
 (7.31)
where K(·) is defined in Equation (7.25). Then the CV (h) in Equation (7.27) with the LSCV
method can be rewritten, by means of the leave-one-out cross-validation technique, as
l tˆ j  nj

CV ( h ) = ∑∫
j=1
0 
ˆ
 λ j ( t ) dt − 2
1 2
r =1
∑ ( )
λˆ j1, h , r t jr  (7.32)

where
nj
 t ji − t 
λˆ j1, h , r ( t ) =
1
∑ K
h i = 1, i ≠ r  h 
 (7.33)
Furthermore, the ln L ( h ) in Equation (7.29) can be rewritten as

l  nj nj nj

ln L ( h ) =∑ ∑∑ 

ln λˆ j1, h , r t jk − ( ) ∑ ( )
ˆ 1j , h , r tˆj  (7.34)
Λ

j=1  k =1 r =1 r =1
where
t
ˆ 1j , h , r ( t ) =
Λ
∫ λˆ
0
1
j, h, r ( t ) dt (7.35)
and λˆ j1, h , r ( t ) is already defined in Equation (7.33). Unfortunately, if only one failure occurred
for jth component, then Equation (7.33) or Equation (7.35) based on leave-one-out cross-
validation cannot work. Therefore, we remove such a data set in the analysis. By minimiz-
ing Equation (7.32) or maximizing Equation (7.34), we estimate the optimal bandwidth h.
In this scheme, the intensity function of NHPP for component j is based on the only jth set
of failure-occurrence time data, since the intensity function is defined as Equation (7.31).
Therefore, l-different intensity functions can be derived according to the behavior of each
set of failure-occurrence time data. It may be useful to consider the arithmetic mean of all
intensity functions by taking the average of the failure-occurrence phenomena. That is,
λˆ ( t ) =
1
m ∑ λˆ (t ) (7.36)
j=1
1
j
These bandwidth estimation methods are labeled as LLCV1 and LSCV1, respectively, in
this chapter.
The second approach is based on the idea of superposition of NHPP (Arkin and
Leemis 1998). Reorder all the failure-occurrence time data as a single-ordered sample
( 0 = ) t0 ≤ ≤ tn* < t *, where n* is the total number of failure included the tie data, say,
n* = ∑ lj = 1 n j, and t * is the realization of the maximum of random censoring time, say,
T * = max Tj . For the single-ordered sample, the intensity function of superposition of
1≤ j ≤ l
NHPP is defined similar to the single failure-occurrence time data case:
n*
λˆ 2 ( t ) =
1
h ∑ K  t −h t  (7.37)
i=1
i
where K(·) is defined in Equation (7.25). Then the CV (h) in Equation (7.27) with the LSCV
method can be rewritten with a superposed intensity function:
n*
∫( )
tˆ *
∑ λˆ
2
CV ( h ) = λˆ 2 ( t ) dt − 2 2
h, r ( tr ) (7.38)
0
r =1
where
n*
λˆ h2, r ( t ) = 1
h ∑ K  t h− t  (7.39)
i = 1, i ≠ r
i
Furthermore, the ln L ( h ) in Equation (7.29) can be rewritten with the superposed intensity
function:
n* n* n*
ln L ( h ) = ∑ ∑ ln λ
k =1 r =1
2
h, r ( tk ) − ∑ Λˆ (tˆ ) (7.40)
r =1
2
h, r
*
where
t
ˆ 2h , r ( t ) =
Λ
∫ λˆ
0
2
h, r ( t ) dt (7.41)
and λˆ h2, r ( t ) is defined in Equation (7.39). By minimizing Equation (7.38) or maximizing

Equation (7.40), we estimate the optimal bandwidth h. In this scheme, the function that is
defined in Equation (7.37) is an intensity function of a superposition of NHPPs. Therefore,
for each component, the intensity function can be defined by
n*
λˆ ( t ) =
11
l h ∑ K  t −h t  (7.42)
i=1
i
These bandwidth estimation methods are labeled LLCV2 and LSCV2, respectively. Actually,
the estimation results of these four methods, LLCV1, LLCV2, LSCV1, and LSCV2, are slightly
different from each other.
7.5 Numerical examples
7.5.1 Simulation experiments with single minimal repair data
We conduct the Monte Carlo simulation to investigate properties of parametric and non-
parametric methods. Here, we focus on the single minimal repair data. The (true but
unknown) minimal repair process is assumed to follow a power law NHPP model hav-
ing the model parameters ( β , η ) = ( 3.2, 0.23 ) of intensity function λ ( t ) = ( β /η ) ( t/η ) . By
β −1
applying the thinning algorithm of NHPP (Lewis and Shedler 1979), the original failure
(minimal repair) time data are generated as the pseudo random number. The “real” opti-
mal periodic replacement time and its minimum long-run average cost per unit time are
calculated numerically as τ * = 0.44 and C (τ * ) = 98.8, under the fixed cost ratio γ = 30.
The optimal periodic replacement time and its minimum long-run average cost per
unit time are estimated with both parametric and nonparametric methods for n = 160
minimal repair time data. Here, two parametric models are assumed: the power law
(PL) NHPP model with ( t ) = ( β /η ) ( t/η ) , and the Cox–Lewis (CL) NHPP model with
β −1
λ ( t ) = exp (α + β t ). Without any prior knowledge of the underlying system, it is very dif-
ficult to select the correct model (power law model) exactly. To investigate the effect of
mis-specification of the underlying failure process model, we assume the CL model. Also,
we apply the CNPMLE and the kernel-based approaches with LSCV and LLCV. In this
example, we normalize the minimal repair data in order to reduce the computation cost.
Therefore, all data ti ( i = 1, 2,  , 160 ) are divided with the maximum value t160.
For four cases with n = 40, 80, 120, and 160, the estimation results of the optimal peri-
odic replacement time and the minimum long-run average cost per unit time are presented
Table 7.1 Estimation results of the optimal periodic replacement time

n PL CL CNPMLE LSCV LLCV
40 0.42 0.44 0.43 0.44 0.43
80 0.44 0.48 0.43 0.44 0.43
120 0.44 0.50 0.43 0.43 0.43
160 0.44 0.50 0.43 0.44 0.43
Table 7.2 Estimation results of the minimum long-run average cost per unit time
n PL CL CNPMLE LSCV LLCV
40 99.7 98.8 98.1 102.6 98.6
80 100.5 102.0 98.1 103.0 98.4
120 100.4 104.7 98.1 101.5 98.3
160 100.5 107.9 98.1 102.5 98.2
in Tables 7.1 and 7.2. From Table 7.1, it can be seen that three nonparametric estimation
methods give very close results to the real optimal periodic replacement time τ * = 44. But,
estimation results based on the mis-specification (CL model) are the worst among all cases.
In this way, the influence by mis-specification of the parametric model is significant when
the exact information of the underlying failure process is not available from the past expe-
rience. Focusing on Table 7.2, the kernel-based approach with LLCV can provide the best
estimation results of the minimum long-run average cost per unit time. Also, CNPMLE
shows the similar estimation results, because the original minimal repair data include
a closed data 0.43 to the optimal periodic replacement time τ * = 0.44. Of course, this is a
quite rare case. If there is no failure data near the optimal solution, CNPMLE cannot work
in the small sample problem. Furthermore, it is known that two parametric models and
a kernel-based approach with LSCV tend to give rather pessimistic estimates. This fact
indicates that even if the real model (power law model) can be assumed, the estimation
results may be biased. It may be caused that the minimum long-run average cost per unit
time is very sensitive to the optimal periodic replacement time τ *. For the CL model, the
difference from the real optimal solution tends to get larger as the number of minimal
repair data increases.
Apart from the optimization, we estimate the long-run average cost per unit time at
different periodic replacement time in order to examine the estimation accuracy of our
methods. In addition to the naïve estimator, we compare the estimation results of long-
run average cost per unit time with our six methods at arbitrary four time points, t = 0.25,
0.50, 0.75, and 1.00, where the number of minimal repairs is given by n = 180. In Table 7.3,
“TRUE” represents the estimate calculated by the power law model with modeling of the
parameters (β, η) = (3.2, 0.23). It can be said that the model, which shows good results near
to the theoretically optimal solution, provides good accuracy at different periodic replace-
ment time points. In this table, we calculate the mean squared error between the estima-
tion results and “true” values at 20 time points from t = 0 to t = 1.00 by 0.05. In this case, the
power law model can provide the best accuracy performance among all models. But, four
nonparametric models, especially for the kernel-based approach with LLCV and naïve
estimator, show similar good accuracy performance. It is evident that the mis-specification
of the parametric model also leads to the worst results as well.
Table 7.3 Estimation accuracy of each method for arbitrary periodic replacement time
t TRUE PL CL CNPMLE LSCV LLCV Naïve
0.25 128.8 130.0 148.5 120.7 126.0 128.0 128.1
0.5 100.5 102.5 109.8 101.5 106.2 104.0 104.3
0.75 138.8 138.8 132.4 134.8 140.0 140.0 140.7
1 216.1 210.0 210.0 209.0 200.9 209.5 210.0
MSE 0 3.53 140.8 17.8 19.4 13.4 14.1
7.5.2 Real example with multiple minimal repair data sets

The real failure data analysis is useful to give a reality to the mathematical modeling
(Dohi et al. 2007). We also study a real example to demonstrate how to use the paramet-
ric methods and several nonparametric approaches for the periodic replacement problem
with minimal repair.
We consider the multiple failure-occurrence time data sets of a diesel engine, which is
shown in Nelson and Doganaksoy (1989) as the minimal repair data. Although there are
repair records for 41 diesel engines in this data set, we select 24 diesel engines, in which
more than one failure are observed. The data set is shown in Table 7.4.
Table 7.4 Failure-occurrence time data of diesel engines

Number Censoring time Replacement time
1 389 166 206 348
2 582 323 449
3 585 202 563 570
4 589 573
5 589 139 139
6 594 249
7 595 265 586
8 601 410 581
9 603 367
10 606 165 408 604
11 613 344 497
12 614 120 479
13 641 635
14 642 76 538
15 644 254 276 298 640
16 648 61 539
17 649 349 404 561
18 650 258 328 377 621
19 653 646
20 653 92
21 663 87
22 667 98
23 667 326 653 653
24 667 84
The total number of failures is 48 in the entire data set. For this data set, we apply
one parametric model (power law model) and two nonparametric models (CNPMLE and
kernel-based approach). First, we estimate the intensity function and mean value func-
tion with each model in Figures 7.1 and 7.2. From Figure 7.1, it is seen that the intensity
function with CNPMLE is represented by a nondecreasing step function. In other models,
only the intensity function of the power law model increases as time goes by. LLCV1 and
LSCV1 show the unimodal intensity functions. LSCV2 gives the multimodal shape. LLCV2
also indicates a similar trend to the naïve estimator and fluctuates everywhere. Looking
at Figure 7.2, we can see that both the results of the power law model and CNPMLE have
convex shapes. Although the mean value function of the power law model is a smoothed
curve, CNPMLE constitutes the mean value function by several line segments. The mean
value functions with LLCV2 and LSCV2 are not smooth compared with other models.
Especially for LLCV2, this property is remarkable.
Cumulative
number of failures CNPMLE
LLCV2
2.0 Power law
model
LSCV2
1.5
LLCV1
1.0
LSCV1
0.5
t
100 200 300 400 500 600
Figure 7.1 Estimation results on intensity function.
Intensity
LLCV2 CNPMLE Power law
function
model
0.004
0.003 LSCV2
0.002 LLCV1
0.001 LSCV1
t
100 200 300 400 500 600
Figure 7.2 Estimation results on mean value function.

Table 7.5 Estimation results on the optimal periodic replacement time

γ PL CNPMLE LLCV1 LLCV2 LSCV1 LSCV2
0.04 73 61 109 58 190 61
0.05 85 61 123 58 220 69
0.06 97 61 136 58 650 76
0.07 108 61 148 58 650 84
0.08 119 61 160 58 650 92
Table 7.6 Estimation results on the minimum long-run average cost per unit time
γ PL CNPMLE LLCV1 LLCV2 LSCV1 LSCV2
0.04 19.1 6.6 20.0 7.0 17.6 20.6
0.05 20.4 8.2 20.9 8.8 18.1 22.2
0.06 21.5 9.8 21.7 10.5 18.3 23.5
0.07 22.5 11.5 22.4 12.2 18.5 24.8
0.08 23.3 13.1 23.0 13.9 18.6 25.9
Suppose that the cost ratio γ varies from 0.04 to 0.08 by 0.01. Tables 7.4 and 7.5 sum-
marize the estimation results of the optimal periodic replacement time and the minimum
long-run average cost per unit time.
Focusing on Table 7.5, we can know that the optimal periodic replacement time with
CNPMLE and LLCV2 is always constant and almost similar even though the cost ratio
γ increases. But, the results with other estimators are increasing as γ increases. In this
example, LLCV2 gives the smallest optimal periodic replacement time and LSCV1 gives
the largest optimal periodic replacement time. The optimization results with the LSCV
method are often influenced by the differences between two approaches (LSCV1, LSCV2),
compared with the LLCV method. It is also observed that the minimum long-run average
cost per unit time with the parametric power law model and LLCV1 takes closed values
from each other in Table 7.6. Furthermore, CNPMLE and LLCV2 tend to show the relatively
optimistic results among all models. Conversely, LSCV2 gives the most pessimistic estima-
tion results.
7.6 Conclusions
In this chapter we focused on the failure time data analysis based on the NHPP and dis-
cussed several statistical estimation methods for a periodic replacement problem with min-
imal repair as the simplest application of life data analysis with NHPP. We have applied
not only the parametric ML estimation for the power law process but also CNPMLE and
kernel-based estimation methods for estimating the cost-optimal periodic replacement
problem with minimal repair, where single or multiple minimal repair data are assumed.
Furthermore, we conducted a simulation experiment of a single minimal repair process
and a real data analysis with the multiple field data sets of minimal repair of diesel engine
in the numerical example. Throughout numerical examples, we investigated properties
of parametric and nonparametric methods and showed how to use the parametric meth-
ods and several nonparametric approaches for the periodic replacement problem with
minimal repair.
References
Arkin, B. L. and Leemis, L. M. (1998). Nonparametric estimation of the cumulative intensity function
for a nonhomogeneous Poisson process from overlapping realizations. Management Science 46,
pp. 989–998.
Arunkumar, S. (1972). Nonparametric age replacement policy.Indian Journal of Statistics Series A 34,
pp. 251–256.
Ascher, H. and Feingold, H. (1984). Repairable Systems Reliability: Modeling, Inference, Misconceptions
and Their Causes. New York, Marcel Dekker.
Barlow, R. E. and Hunter, L. C. (1960). Optimum preventive maintenance policies. Operations Research
8, pp. 90–100.
Barlow, R. E. and Proschan, F. (1996). Mathematical Theory of Reliability. Philadelphia, SIAM.
Baxter, L. A. (1982). Reliability applications of the relevation transform. Naval Research Logistics
Quarterly 29, pp. 323–330.
Bergman, B. (1979). On age replacement and the total time on test concept. Scandinavian Journal of
Statistics 6, pp. 161–168.
Boland, P. J. (1982). Periodic replacement when minimal repair costs vary with time. Naval Research
Logistics Quarterly 29, pp. 541–546.
Boswell, M. T. (1966). Estimating and testing trend in a stochastic process of Poisson type. Annals of
Mathematical Statistics 37, pp. 1564–1573.
Colosimo, E. A., Gilardoni, G. L., Santos, W. B. and Motta, S. B. (2010). Optimal maintenance time
for repairable systems under two types of failures. Communications in Statistics—Theory and
Methods 39, pp. 1289–1298.
Cox, D. R. (1972). The statistical analysis of dependencies in point processes. Stochastic Point
Processes, Stochastic Point Processes: Statistical Analysis, Theory and Applications (Lewis, P. A. W.
ed.), pp. 55–66. New York, Wiley.
Cox, D. R. and Lewis, P. A. W. (1966). The Statistical Analysis of Series of Events. New York, Wiley.
Crow, L. H. (1974). Reliability analysis for complex repairable systems. Reliability and Biometry: Statistical
Analysis of Lifelength (Proschan, F. and Serfling, R. J. eds.), pp. 379–410. Philadelphia, SIAM.
Diggle, P. and Marron, J. S. (1989). Equivalence of smoothing parameter selectors in density and
intensity estimation. Journal of the American Statistical Association 83, pp. 793–800.
Dohi, T., Kaio, N. and Osaki, S. (2007). Optimal (T, S)-policies in a discrete-time opportunity-based age
replacement: An empirical study. International Journal of Industrial Engineering 14, pp. 340–347.
Gilardoni, G. L. and Colosimo, E. A. (2011). On the superposition of overlapping Poisson processes
and nonparametric estimation of their intensity function. Journal of Statistical Planning and
Inference 141, pp. 3075–3083.
Gilardoni, G. L., De Oliveira, M. D. and Colosimo, E. A. (2013). Nonparametric estimation and
bootstrap confidence intervals for the optimal maintenance time of a repairable system.
Computational Statistics & Data Analysis 63, pp. 113–124.
Guan, Y. (2007). A composite likelihood cross-validation approach in selecting bandwidth for the
estimation of the pair correlation function. Scandinavian Journal of Statistics 34, pp. 336–346.
Ingram, C. R. and Scheaffer, R. L. (1976). On consistent estimation of age replacement intervals.
Technometrics 18, pp. 213–219.
Lewis, P. A. W. and Shedler, G. S. (1979). Simulation of nonhomogeneous Poisson processes by thin-
ning. Naval Research Logistics Quarterly 26, pp. 403–413.
Nakagawa, T. (1986). Periodic and sequential preventive maintenance policies. Journal of Applied
Probability 23, pp. 536–542.
Nakagawa, T. (2005). Maintenance Theory of Reliability. Berlin, Springer.
Nelson, W. and Doganaksoy, N. (1989). A computer program for an estimate and confidence limits
for the mean cumulative function for cost or number of repairs of repairable products. TIS
report 89CRD239. New York, General Electric Company Research and Development.
Okamura, H., Dohi, T. and Osaki, S. (2014). A dynamic programming approach for sequential pre-
ventive maintenance policies with two failure modes. Reliability Modeling with Applications:
Essays in Honor of Professor Toshio Nakagawa on His 70th Birthday (Nakamura, S., Qian, C. H. and
Chen, M. eds.), pp. 3–16. Singapore, World Scientific.
Park, D. H., Jung, G. M. and Yum, J. K. (2000). Cost minimization for periodic maintenance policy
of a system subject to slow degradation. Reliability Engineering & System Safety 68, pp. 105–112.
Rinsaka, K. and Dohi, T. (2005). Estimating age replacement policies from small sample data. Recent
Advances in Stochastic Operations Research (Dohi, T., Osaki, S. and Sawaki, K. eds.), pp. 145–158.
Singapore, World Scientific.
Sheu, S. H. (1990). Periodic replacement when minimal repair costs depend on the age and the num-
ber of minimal repairs for a multi-unit system. Microelectronics Reliability 30, pp. 713–718.
Sheu, S. H. (1991). A generalized block replacement policy with minimal repair and general random
repair costs for a multi-unit system. Journal of the Operational Research Society 42, pp. 331–341.
Valdez-Flores, C. and Feldman, R. M. (1989). A survey of preventive maintenance models for sto-
chastically deteriorating single-unit systems. Naval Research Logistics Quarterly 36, pp. 419–446.
Zielinski, J. M., Wolfson, D. B., Nilakantan, L. and Confavreux, C. (1993). Isotonic estimation of the
intensity of a nonhomogeneous Poisson process: The multiple realization setup. Canadian
Journal of Statistics 21, pp. 257–268.
chapter eight
View-count based modeling for

YouTube videos and weighted
criteria–based ranking
N. Aggrawal and A. Arora
A. Anand and M.S. Irshad

University of Delhi
Contents
8.1 I ntroduction......................................................................................................................... 149
8.2 YouTube view count: A twofold perspective.................................................................. 150
8.3 Literature review................................................................................................................. 152
8.4 Model development............................................................................................................ 153
8.4.1 Model I: Linear growth.......................................................................................... 155
8.4.2 Model II: Exponential growth............................................................................... 155
8.4.3 Model III: Repeat viewing..................................................................................... 156
8.5 Data analysis and model validation................................................................................. 156
8.6 Conclusion........................................................................................................................... 163
References...................................................................................................................................... 164
8.1 Introduction
Online social networks such as Facebook, Twitter, YouTube, Google+, etc., are part of our
daily life now. Social networks are merged as a platform since they are able to bring people
from varied backgrounds and common interests to the same place. Billions of people inter-
act with each other on such platforms. It is also changing the way in which people create,
share and consume information (Khan and Vong 2014). Information is not just shared in
the form of text but also via images, gifs, videos, memes, etc. Social network sites are also
instrumental in creating awareness on some topics, which directly or indirectly unites
people and forces lawmakers and firms to make better decisions keeping the interests of
the user in mind.
The social networking websites are emerging as an important factor in expanding a
product’s online market. There are several examples where special privileges are provided
to the e-commerce customers over the traditional shop customers. Like the launching of
certain products exclusively on e-commerce sites, namely, Flipkart, Amazon, Snapdeal, and
so on. For the first few weeks they are exclusively sold through e-commerce. Every shop-
ping cart allows users to share their experience on social media like Facebook, YouTube,
Google+, etc., which works the word-of-mouth publicity in traditional media/markets.
149
Among various video sharing sites, YouTube has emerged as a leader in video sharing
platforms. It started as a website to share short entertainment videos and has grown into a
massive platform for people to connect freely. One can find videos of different genres like
music, sports, comedy, recreational activities, religious and spiritual content, educational,
etc. Since YouTube is acting more like a marketing forum, it is helping individuals as well
as large production houses to increase their audience. Nowadays every news channel,
production house, celebrity, and organization has their YouTube channels where they post
their recent activities in the form of videos to stay connected to their fans, group mem-
bers or general public. Independent content creators have built grassroots followings num-
bering in the thousands at very little cost or effort. YouTube’s revenue-sharing “Partner
Program” made it possible for people to earn a substantial living as a video producer
alone – with each of its top 500 partners earning more than $100,000 annually and its
10 highest-earning channels grossing from $2.5 to $12 million (Berg 2015).
Since YouTube provides a glimpse of the product’s success (in term of statistics), manu-
facturing companies and production houses prefer to launch previews of their products or
trailers on YouTube first. The video’s popularity would provide good business to not only
YouTube but also the product manufacturer. YouTube’s features, such as the “like,” “dislike,”
“comments,” “share,” “subscribe,” etc., go a long way in helping the customers/audiences to
express themselves. A video’s number of likes, dislikes, comments in favor and comments
against them clearly shows a product’s performance and the viewers’ satisfaction. It pro-
vides a clear picture as to whether the product is going to be a hit in market or not. In the
comments section one can also find the basis on which the customers compare two or more
products. It clearly works in favor of the firm as firms get to know what to produce and what
characteristics of the product lead a customer to buy that product. Manufacturers can easily
estimate their potential buyers from available statistics and also know the “X-factor” of their
product by deeply analyzing the comment or the review videos that are again posted by
some YouTube user (uploader) on it. Therefore, it is quite clear that YouTube is not just adver-
tising the product, but it is also giving a glimpse of its future. Features provided by YouTube
help users as well as uploaders to interact better with each other.
According to an Internet survey conducted by Alexa in 2005, YouTube is the fastest
growing website and was ranked second in traffic generation among all the websites sur-
veyed (Cheng et al. 2008). Each minute 400 hours of content are uploaded and approxi-
mately 1 billion hours of content are viewed daily on YouTube (YouTube Press). Further,
Alexa categorized YouTube’s speed as “SLOW” as its average load time is 3.6 seconds and
it is slower than 69% of their surveyed websites (Cheng et al. 2008). The results are almost
similar even today. In October 2017, YouTube’s average load time was 2.38 seconds, and it
is slower than 68% of other websites.
The rest of the chapter is distributed as follows: Section 8.2 describes the twofold
perspective of YouTube view count followed by a literature review in Section 8.3. In
Section 8.4, a methodical approach to frame the view-count based models in a dynamic
environment has been proposed. Section 8.5 contains the model validation carried out on
YouTube video data sets. The conclusion and references are presented last.
8.2 YouTube view count: A twofold perspective

YouTube has gained much fame due to its user-friendly policy. It does not charge the users
for uploading any content; in fact, it pays them if the content is popular. The popularity
of the content is generally measured by the number of views on that video. The increased
number of views on a video is both good as well as bad news for YouTube. On one hand
Chapter eight: View-count based modeling for YouTube 151
it increases the monetary gains, while on the other it further slows down the platform
due to the additional traffic generated. Therefore, understanding and predicting the view
count is a twofold perspective: one that popular content generates more traffic, and hence
understanding popularity has a direct impact on caching and replication strategy that the
provider should adopt; and the other perspective that popularity has a direct economic
impact (Richier et al. 2014).
A number of researchers have tried to understand and model the popularity and
virality pattern of YouTube using various tools and techniques (Bauckhage et al. 2015;
Richier et al. 2014; Vaish et al. 2012; Yu et al. 2015; Zhou et al. 2010). For predicting a video’s
popularity or the high view count, one must know how YouTube works and when a view
count increases. The flowchart presented in Figure 8.1 explains legitimate view-count
accountability.
It is important to predict a video’s high demand and popularity and make better deci-
sions of which videos should be cached on the limited data space of the proxy servers.
A view is a video playback that was requested by an actual user. A fake view includes
misleading views, misleading titles and thumbnails that attract views. When a video has a
large number of views that last for mere seconds after clicking, the views are not counted
as legitimate. So if a video is viewed in its entirety by someone who clicked on it, it is
counted as one view. But not all views are fully played. Google Ad Sense works only with
videos that are over 30 seconds in length so that the click-through rates get registered. In
fact, some videos are lucky enough to have just 10 seconds of play being considered as
a view. Thereby, it can be understood that the amount of video played should be above
a threshold percentage of the length of the video. The type and genre of the video also
affect video length. YouTube considers views from the same IP address in a time interval
of 6–8 hours. So one person viewing the same video repeatedly would only generate three
to five views a day, even though he/she has viewed it over 300 times. A viewer being
redirected to YouTube upon clicking an embedded video counts as one view. If there is
an embedded video with auto play, it is not counted as a view. In December 2012, 2 bil-
lion views were removed from the view counts of Universal and Sony music videos on
YouTube, provoking a claim by The Daily Dot that the views had been deleted because of
infringement of the site’s terms of service, which banned the use of automated processes
Video is cached
Video is Request by
on proxy
uploaded user
servers
Final view- Logs of view-

count generated If present
count updated Yes on proxy
on all proxy
on main server servers server
No
View-count
directly Requested from
updated on main server.
main server
Figure 8.1 View-count accountability.

to inflate view counts. In another incident on August 5, 2015, YouTube removed the feature
that caused a video’s view count to freeze at “301” (later “301+”) until the time the actual
count was verified to counteract view-count fraud. There might be many more restrictions
and rules that go into categorizing a request as a view that might not have been looked
into as for now.
A detailed number of attributes have been looked upon and carried on for analysis
in terms of number of views. The next section provides a highlight of certain work in the
related area.
8.3 Literature review
As far as a prediction of view count is concerned, there are several factors that can affect
the number of views gathered by a video. There are several research proposals concern-
ing the total number of view count (Bauckhage et al. 2015; Richier et al. 2014; Vaish et al.
2012; Yu et al. 2015) or predicting the growth rate of view count (Bauckhage et al. 2015;
Richier et al. 2014; Yu et al. 2015). Then there is literature showing how recommendation
in YouTube helps in increasing view count of a video (Yu et al. 2015). In some research,
different caching techniques are given to reduce the caching time. Cheng et al. (2008)
provided a glimpse of YouTube statistics. They found the active life span of the video in
terms of caching, i.e., up to what time it is required to cache the video on the proxy server.
They also showed how videos can be distributed on the basis of video category, the age of
the video, video length, video file size, and video bit rate on YouTube. According to their
results, 97.9% video lengths are within 600 seconds, and 99.1% are within 700 seconds
in their entire data set. Since 22.9% of the videos fall in the music category and 17.8% of
videos fall under the entertainment category, their segregation can be understood. If the
rate of growth of view count is very high, then the content is considered viral, i.e., a large
number of views in a very short period of time. Richier et al. (2014) fitted six bio-inspired
models in their research to find the view count dynamics of YouTube videos for a fixed
and growing population. Khan and Vong (2014) captured the effect of external influence
on the virality of content. For this they used webometrics and network diagram and
found a correlation between the various attributes of the video to the cause of virality.
They also state that the external network (other social network sites except for YouTube)
also have a good contribution in making a content viral. Zhou et al. (2010) highlighted the
importance of a recommendation system on view count of a video. They found that the
recommendation system is the cause for 30%–40% of views of a video. Rather research-
ers contradict with the traditional definition of virality and relate it to broadcasting. Goel
et al. (2015) made an attempt to provide an insight to YouTube and found that there is a
clear difference between broadcasting and structural virality. The traditional approach
of virality largely depends on the total view count of the video, i.e., a video having a
greater view count is more viral than a video having a comparatively less view count.
Since it is very difficult to differentiate between broadcasting and virality, in this chap-
ter we have also taken a traditional definition of virality and considered the number of
views to be increasing rapidly due to virality, not because of broadcasting. As both yield a
large number of views in a short duration, there is a need to understand that content can
be said to be viral only if it is spread by the initial viewers, and Goel et al. (2015) found
that structural virality is typically low, and remains independent of size, suggesting that
popularity is largely driven by the size of the largest broadcast. Several researchers have
also studied the life cycle of videos and found various phases that occur in the lifetime of
a YouTube video. Yu et al. (2015) found phases as a description of the burst popularity of a
video and found the multiple peaks of popularity in its life cycle. They also directly relate
the phases to content type and evolution of popularity on the basis of the power law. Of
late, Aggrawal et al. (2017) have proposed a novel approach of studying the life cycle of a
YouTube video and categorization of viewers. Bauckhage et al. (2015) used a bio-inspired
model in their research to determine “how viral were viral videos.” They utilized con-
volution theory and took a joint effect of exponential infection and recovery rate in the
Markov process to find the probability density function for the epidemic model (virality
of videos). Ding et al. (2011) did their research for understanding the uploaders of the con-
tent where they demonstrate the positive reinforcement between online social behavior
and uploading behavior. They also examined whether YouTube users are truly broadcast-
ing themselves via characterizing and classified videos as user generated and user copied
(2011). Their results claim that most of the content on YouTube is not user generated, and
63% of the most popular uploaders are just uploading the user-copied content. The UCC
(user-copied content) uploaders upload more videos than the UGC (user-generated con-
tent). Further Vaish et al. (2012) used different factors like share count, number of views,
number of likes and number of dislikes for calculating the virality index of the video and
provide a conventional and hybrid asset valuation technique to demonstrate how virality
can fit in to provide accurate results.
In this chapter, we have tried to capture the growth pattern of view count for cer-
tain videos on YouTube. The initial stage of the life cycle of a video starts with it being
posted on YouTube (i.e., people are getting aware and then diffusing the information in the
Internet market). As the video becomes more popular, it attracts a larger number of views,
likes, dislikes, comments, shares, etc. This is considered as the growing phase of the video.
After attaining maximum popularity or becoming viral or being viewed by most of its
target viewers, the video is said to have matured, and its active life span is considered to
be almost over. From this time onward, the video’s growth will be very slow and steady as
compared to the earlier phase (Cheng et al. 2008). A video is never deleted from YouTube,
so there is no fixed life span, but the video’s life is said to be over when the growth in a
number of views is negligible over time. YouTube’s revenue depends on advertisements, so
it is very important to know the right time to introduce an advertisement on a video. The
high rate of advertisement during the growth stage of the video is likely to have the maxi-
mum impact and yield high profits for the advertiser and YouTube. Therefore, our study of
the view count growth shall prove to be a helpful contribution in this area.
8.4 Model development
In the proposed modeling framework, we have captured the view-count dynamics of
YouTube viewers and have extended this framework for the dynamic Internet model. The
popularity of YouTube can be judged by the number of likes, dislikes, comments, shares,
views, etc. All the aforesaid attributes can be represented as a counting process. Out of these
we have considered view counts as a counting process in the present research. As we know
from the literature (Kapur et al. 1999), a counting process (N(t), t > 0) is said to be a nonho-
mogenous Poison process with intensity function λ(t) if it satisfies the following conditions:
i. N(0) = 0
ii. {N(t), t > 0} has independent increments
iii. P{ N (t + h) − N ( h) ≥ 2} = o( h)
iv. P{ N (t + h) − N ( h) = 1} = λ (t)h + o( h)
where, o(h) denotes a quantity that tends to zero for small “h.”
t
Let ν(t) represent the expected number of views by time t, i.e., v(t) =
then it can be shown that
∫ λ(x) dx, t > 0,
0
Pr[ N (t) = k ] =
( v(t))k e − v(t) , k = 0,1, 2,… (8.1)
k!
In other words, N(t) has a Poisson distribution with mean value function ν(t). Consider
a case when the time scale of the content diffusion is very large as compared to the size
of the potential population. Hence, we model the case where contents gain popularity
through advertisement and other marketing tools: examples are when advertisement is
done for a large pool of users of a social network and netizens access the content at random
thereafter.
Hence, we assume that the expected number of views in (t, t + Δt) is essentially propor-
tional to the expected number of views left from total expectation at time t, i.e.,
v(t + ∆t) − v(t) = b { a − v(t)} ∆t + o( ∆t) (8.2)
where o( ∆t)/∆t → 0 as ∆t → 0 , and b is a constant of proportionality and can be explained

in terms of viewing rate. Dividing Equation (8.2) by Δt and letting ∆t → 0 , a differential
equation of the following form can be obtained:
dv(t)
= b ( a − v(t)) (8.3)
dt
Solving Equation (8.3) under initial condition ν(0) = ν 0, we have
v(t) = v(0) + ( a − v(0)) (1 − e − bt ) (8.4)
In Equation (8.4), v(0) is nothing but the number of view counts when the process has
started. Here, it is assumed that there is some number of views when the actual noting
starts. This model is similar to that proposed by Richier et al. (2014). In Equation (8.4), we
assume ν(0) = 0, i.e., when the system starts there are no views. So we get an expression
given by Equation (8.5):
v(t) = a{1 − e − bt } (8.5)
where a is the expected number of total views that can be observed in a video.
The model explains how the rate of view count is directly linked with the leftover
views of a video. This model can act as a very strong forecasting tool to predict the level a
video can reach.
It is to further note that in today’s market when everything is dynamic, one cannot
actually talk about fixed market size. Especially in the Internet market where the market
varies significantly because of various reasons, for example, with an increase in the popu-
larity of a video, the number of viewers increases and so does the number of views. To
inculcate this dynamic behavior, Equation (8.3) can be redesigned as
dv(t)
= b ( a(t) − v(t)) (8.6)
dt
where a(t) represents dynamic Internet market size. In the present chapter we have explic-
itly taken three forms of varying market size: linear growth, exponential growth, and
growth because of repeat viewership. The various forms that a(t) can take are systemati-
cally discussed in the following sections.
8.4.1 Model I: Linear growth

In YouTube, viewers generally grow in spurts that are dependent on both environment
and people’s influence. However, one can often observe a constant rate of growth. These
periods of constant growth are often referred to as the linear portions of the growth curve.
In this case, we can construct a linear equation to model the linear phase of growth for
viewers or we can say view count.
Suppose the number of views of a particular video grows linearly during the last
phase, from t = 0 to t = 48 hours. Let us take 50K (50,000) views on a video on Monday
morning (at t = 0) and then 80K (80,000) views on Tuesday morning (at t = 24) and then
110K (110,000) views on Wednesday morning (at t = 48), and so on. So it is growing by 30K
(30,000) views a day, a fixed amount.
So we can define a(t) as
a(t) = a(1 + α t) (8.7)
And thereby Equation (8.3) can be restructured as follows:
dv(t)
= b ( a(1 + α t) − v(t)) (8.8)
dt
8.4.2 Model II: Exponential growth

Consider an example of exponential growth which is seen in view count. View count is
increased by the viewers. One viewer influences the other viewer to watch the video by
sharing it and by word-of-mouth. This influence takes time depending on each person. If
1000 viewers are placed in a large hall with an unlimited supply of people who have not
watched that video (i.e., they can influence as many people as they wish), after an hour
there will be the first round of influence (with each influencing one), resulting in 2000
viewers. In another hour, each of the 2000 viewers will influence double, resulting in 4000
viewers; after the third hour, there should be 8000 viewers in the hall; and so on. The
important concept of exponential growth is that the population growth rate, the number
of viewers increased after every hour, is accelerating; that is, it is increasing at a greater
and greater rate. After 1 day and 24 of these cycles, the viewers would have increased from
1000 to more than 16 billion. When the population size, N, is plotted over time, a J-shaped
growth curve is produced.
So, a(t) can be defined as
a(t) = aeα t (8.9)
And Equation (8.3) can be designed as
dv(t)

dt
( )
= b aeα t − v(t) (8.10)
8.4.3 Model III: Repeat viewing

An important phenomenon observed during the view-count process is the possibility
of repeat viewing. The existing viewers may re-view a number of videos for the second
or for more number of times. The increase in a number of view counts of a video can be
due to both initial viewing and repeat viewing. Several firms are interested in estimat-
ing the increase in the number of viewers due to repeat viewing, since the advertisement
displays on the video have reduced the impact factor, because the viewer has already
watched that advertisement of the video but firms have to pay for the advertisement. For
online videos, repeat viewing can be observed during the later stages of a video’s life
cycle. The first two models (as discussed) might provide an ambiguous fit to the data in
that scenario. The present case emphasizes the repeat viewing phenomenon which is
a realistic scenario in today’s market. In the model we have considered that at a given
time t, the proportion of viewers, say α(0 ≤ α ≤ 1), is susceptible to repeat viewing, and
repeat viewing is influenced by all factors (both internal and external) affecting final
views. So we can define a(t) as
a(t) = a + α v(t) (8.11)
And likewise Equation (8.3) can be described as follows:
dv(t)
= b ( a + α v(t) − v(t)) (8.12)
dt
Making use of the initial condition that in starting there are no views, the functional form
of all the aforesaid models can be represented in Table 8.1.
8.5 Data analysis and model validation

We collected data from YouTube manually at 72 different time points for 7 videos, namely,
DS I, DS II, DS III, DS IV, DS V, DS VI, and DS VII, where DS I, II, III, and VI correspond
to the music category; DS IV and V correspond to the entertainment category; and DS
VII corresponds to the technology category. The interval between the two time points
is approximately the same. All the view counts are in the thousands, i.e., 233 represents
233,000 views. We have used the software package SPSS (least square method and maxi-
mum likelihood method) to estimate the parameters a, b, and α that are shown in Table 8.2.
The aforesaid predicted and actual view counts are evaluated and shown in Tables 8.3
through 8.9 for the knowing performance of the three models based on the following com-
parison criteria: bias, variance, RMPSE, M.S.E., R 2, and S.S.E.
Table 8.1 Models for different forms of a(t)

Model a(t) form Final model
 α  
Model I (linear growth) a(t) = a(1 + α t) ( )
v ( t ) =   a 1 − e − bt  1 −   + α at 
 b 
  
a
Model II (exponential growth) a ( t ) = aeα t v (t ) =  eα t − e − bt 
α +b 
a 
Model III (repeat viewing) a (t ) = a + α v (t ) v (t ) = 1 − e − b (1− α )t 
1−α 
Table 8.2 The estimated parameters of the data sets on all three models
Dataset
Parameter DS I DS II DS III DS IV DS V DS VI DS VII
Parameter estimation for different dataset using Model I
a 5,499.146 4,038.262 1,402.6 17,684.89 18,936.66 2,032.839 4,609.456
b 0.279 0.549 0.284 0.189 0.36 0.216 0.144
α 0.013 0.003 0.007 0.004 0.02 0.01 0.004
Parameter estimation for different dataset using Model II
a 5,894.747 4,058.969 1,454.733 17,852.3 18,978.54 2,151.82 4,660.231
b 0.241 0.539 0.255 0.186 0.358 0.193 0.142
α 0.008 0.002 0.005 0.003 0.002 0.007 0.003
Parameter estimation for different dataset using Model III
a 8,493.813 2,782.836 11,82.168 20,123.59 1,845.306 2,251.78 4,777.419
b 0.08 0.602 0.187 0.135 0.322 0.109 0.115
α 0.062 0.376 0.362 0.014 0.909 0.259 0.12
Table 8.3 Values of comparison parameters on the proposed models for Data Set I
Models
Comparison
parameters Model I Model II Model III
Bias 4.229 1.447 −88.783
Variance 250.337 249.531 749.213
RMPSE 250.372 249.535 754.455
M.S.E 62,686.291 62,267.613 569,202.180
R2 0.983 0.983 0.845
S.S.E 4,513,412.925 4,483,268.110 40,982,556.960
Table 8.4 Values of comparison parameters on the proposed models for Data Set II
Models
Comparison
Bias −3.286 −3.397 −9.024
Variance 103.677 106.259 205.414
RMPSE 103.729 106.313 205.612
M.S.E 10,759.629 11,302.558 4,2276.422
R2 0.944 0.941 0.780
S.S.E 774,693.266 813,784.143 3,043,902.371
Table 8.5 Values of comparison parameters on the proposed models for Data Set III
Models
Comparison
Bias −5.525 −6.231 −17.574
Variance 76.304 79.675 130.630
RMPSE 76.504 79.919 131.806
M.S.E 5,852.879 6,386.973 17,372.916
R2 0.938 0.933 0.817
S.S.E 421,407.265 459,862.083 1,250,849.931
Table 8.6 Values of comparison parameters on the proposed models for Data Set IV
Models
Comparison
Bias −33.935 −35.183 −82.531
Variance 363.805 371.675 763.370
RMPSE 365.384 373.336 767.818
M.S.E 133,505.451 139,379.865 589,544.964
R2 0.990 0.989 0.955
S.S.E 9,211,876.145 9,617,210.648 40,678,602.510
Table 8.7 Values of comparison parameters on the proposed models for Data Set V
Models
Comparison
Bias 20.157 19.929 21.208
Variance 133.237 132.567 145.133
RMPSE 134.753 134.056 146.674
M.S.E 18,158.358 17,971.066 21,513.276
R2 0.957 0.957 0.949
S.S.E 1,307,401.808 1,293,916.786 1,548,955.894
Table 8.8 Values of comparison parameters on the proposed models for Data Set VI
Models
Comparison
Bias −4.873 −5.109 −20.301
Variance 347.482 350.105 642.360
RMPSE 347.516 350.142 642.680
M.S.E 120,767.653 122,599.623 413,038.037
R2 0.980 0.980 0.932
S.S.E 8,695,271.028 8,827,172.856 29,738,738.660
Table 8.9 Values of comparison parameters on the proposed models for Data Set VII
Models
Comparison
Bias −2.323 −3.116 −26.352
Variance 56.387 66.752 191.522
RMPSE 56.435 66.825 193.326
M.S.E 3,184.932 4,465.605 37,374.959
R2 0.992 0.989 0.904
S.S.E 226,130.190 317,057.955 2,653,622.061
The seven data sets have been graphically analyzed as shown in Figures 8.2–8.8. The
plots show the actual data and the estimated values from the three models (Model I, Model
II, and Model III).
All three models give equally fine results. Moreover, looking at the graphs, it is very
difficult to determine which model is performing the best. To sort out the query, we used
the weighted criteria approach given by Anand et al. (2014). The weighted criteria approach
12,000
10,000
View count (thousands)
8,000
6,000 Actual view

Model I
4,000 Model II
Model III
2,000
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
Time (days)
Figure 8.2 Graphical analysis of Data Set I.
6000
5000
4000
3000 Actual view

Model I
2000 Model II
Model III
1000
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
Time (days)
Figure 8.3 Graphical analysis of Data Set II.
2500
2000
1500
Actual view
1000 Model I
Model II
Model III
500
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
Time (days)
Figure 8.4 Graphical analysis of Data Set III.

2500
2000
1500
Actual view
Model I
1000
Model II
Model III
500
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
Time (days)
Figure 8.5 Graphical analysis of Data Set IV.
25,000
20,000
15,000
Actual view
Model I
10,000
Model II
Model III
5,000
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
Time (days)
Figure 8.6 Graphical analysis of Data Set V.
4000
3500
3000
2500
2000 Actual view

Model I
1500 Model II
Model III
1000
500
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
Time (days)
Figure 8.7 Graphical analysis of Data Set VI.

7000
6000
5000
4000
Actual view
3000 Model I
Model II
2000 Model III
1000
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
Time (days)
Figure 8.8 Graphical analysis of Data Set VII.
is a ranking tool that helps us to determine the best fit among various models on the basis
of the comparison parameters for each data set.
The algorithm of the approach is as follows:
• In the criteria value matrix, each element aij shows the value of the jth criteria of the
ith model.
• For each model compute the attribute value, i.e., the maximum value and a minimum
value of each criterion.
• The criteria ratings are determined as under:
Case 1. When the smaller value of the criterion represents appropriate fitting to the
actual data, i.e., best value, then
Max. value in dataset − Criteria value

X ij = (8.13)
Max. value in dataset − Min. value in dataset
Case 2. When the bigger value of the criterion represents appropriate fitting to the
actual data, i.e., best value, then
Criteria value − Min. value in dataset

X ij = (8.14)
Max. value in dataset − Min. value in dataset
• The weight matrix can be represented as follows:
Wij = 1 − X ij (8.15)
• The weighted criteria value matrix is computed by the product of weight of each cri-
terion with the criteria value, i.e.,
Aij = Wij * aij (8.16)
where aij = value of the jth criteria of the ith model.

• The permanent value is the weighted mean value of all criteria:

m
∑A ij
Zi = i=1
m (8.17)
∑
i=1
Wij
The ranking of models is done on the basis of the expression obtained in Equation (8.17)
(i.e., based on permanent value). The smaller permanent value of the model represents
good rank as compared to the bigger permanent value of the model. So all permanent val-
ues are compared, and ranks for each model are provided. The analysis of DS-I is shown
in Tables 8.10–8.13, and the algorithm can be carried for the rest of the data sets.
In Data Set I, Model II is performing best. We found that the best performing model
differs from data set to data set. The ranking of each model for different data sets is shown
in Table 8.14.
Table 8.14 shows the models and their ranks corresponding to various data sets under
consideration. We found that Model I (linear increment) is performing best in five data sets
(DS II, DS III, DS V, DS VI, and DS VII), and Model II (exponential increment) is performing
best on two data sets (DS I and DS IV).The relevance of studying these scenarios lies in the
fact that the first two models, Model I and Model II, can be understood in terms of virality
of the video and with what rate the videos are becoming viral. The third model (i.e., Model
III) helps to model a certain real-life scenario where a video might be getting some repeat
Table 8.10 Comparison parameter matrix (DS-I)

Models
Comparison
Bias 4.229 1.447 −88.783
Variance 250.337 249.531 749.213
RMPSE 250.372 249.535 754.455
M.S.E 62,686.291 62,267.613 569,202.180
R2 0.983 0.983 0.845
S.S.E 4,513,412.925 4,483,268.110 40,982,556.960
Table 8.11 Criteria matrix (DS-I)

Models
Comparison
Bias 0.000 0.030 1.000
Variance 0.998 1.000 0.000
RMPSE 0.998 1.000 0.000
M.S.E 0.999 1.000 0.000
R2 1.000 1.000 0.000
S.S.E 0.999 1.000 0.000
Table 8.12 Weighted value criteria matrix (DS-I)

Weighted matrix Model I Model II Model III
Bias 1.0000 0.9701 0.0000
Variance 0.0016 0.0000 1.0000
RMPSE 0.0017 0.0000 1.0000
M.S.E 0.0008 0.0000 1.0000
R2 0.0000 0.0000 1.0000
S.S.E 0.0008 0.0000 1.0000
Sum of weights 1.0049 0.9701 5.0000
Table 8.13 Ranking (DS-I)

Models Zi Rank
Model I 3,765.913 2
Model II 1.447 1
Model III 8,310,653.000 3
Table 8.14 Ranking of models for different data sets

Models
Datasets Model I Model II Model III
DS I 3 1 2
DS II 1 2 3
DS III 1 2 3
DS IV 2 1 3
DS V 1 2 3
DS VI 1 2 3
DS VII 1 2 3
Best in 5 data sets Best in 2 data sets Best in 0 data set
counts and thereby increasing the view count to the permissible amount. Practically all of
these scenarios can exist in the market, and the analysis is clearly able to represent that.
8.6 Conclusion
The fact that the population is fixed, is often a reasonable approximation when the evolu-
tion of the popularity of content increases quickly and then dies out within a short span of
time. In this chapter, we considered the case in which the Internet market growth and the
dynamic of view counts of content are in literacy linked. To compute such a dependence,
different growth scenarios have been considered. Further, the models have been ranked
based on a weighted criteria technique to know which type of video undergoes which
type of growth.
In the future, it would be interesting to know how the concept works in the environ-
ment of irregular fluctuations in diffusion rate and about its connectivity based on attri-
butes apart from view count.
References
Anand, Adarsh, Parmod Kumar Kapur, Mohini Agarwal, and Deepti Aggrawal. “Generalized
innovation diffusion modeling & weighted criteria based ranking.” In Reliability, Infocom
Technologies and Optimization (ICRITO) (Trends and Future Directions), 2014 3rd International
Conference on, pp. 1–6. IEEE, 2014.
Aggrawal, Niyati, Anuja Arora, and Adarsh Anand. “Modelling and characterizing viewers of
YouTube videos.” International Journal of System Assurance Engineering and Management, 2018.
doi:10.1007/s13198-018-0700-6.
Bauckhage, Christian, Fabian Hadiji, and Kristian Kersting. “How viral are viral videos?” In
ICWSM, pp. 22–30. 2015.
Berg, Madeline. “The World’s Top-Earning YouTube Stars 2015”. Forbes. (November 2015)
Cheng, Xu, Cameron Dale, and Jiangchuan Liu. “Statistics and social network of YouTube videos.”
In Quality of Service, 2008. IWQoS 2008. 16th International Workshop on, pp. 229–238. IEEE, 2008.
Ding, Yuan, Yuan Du, Yingkai Hu, Zhengye Liu, Luqin Wang, Keith Ross, and Anindya Ghose.
“Broadcast yourself: Understanding YouTube uploaders.” In Proceedings of the 2011 ACM
SIGCOMM Internet Measurement Conference, pp. 361–370. ACM, 2011.
Khan, Gohar Feroz, and Sokha Vong. “Virality over YouTube: An empirical analysis.” Internet
Research 24, no. 5 (2014): 629–647.
Goel, Sharad, Ashton Anderson, Jake Hofman, and Duncan J. Watts. “The structural virality of
online diffusion.” Management Science 62, no. 1 (2015): 180–196.
Kapur, Parmod Kumar., R. B. Garg, and Santosh Kumar. Contributions to Hardware and Software
Reliability. Vol. 3. World Scientific, Singapore, 1999.
Richier, Cédric, Eitan Altman, Rachid Elazouzi, Tania Altman, Georges Linares, and Yonathan
Portilla. “Modelling view-count dynamics in YouTube.” arXiv preprint arXiv:1404.2570 (2014).
Vaish, Abhishek, Rajiv Krishna, Akshay Saxena, Mahalingam Dharmaprakash, and Utkarsh Goel.
“Quantifying virality of information in online social networks.” International Journal of Virtual
Communities and Social Networking 4, no. 1 (2012): 32–45.
YouTube Press [https://www.youtube.com/yt/about/press/]
Yu, Honglin, Lexing Xie, and Scott Sanner. “The lifecycle of a YouTube video: Phases, content and
popularity.” In ICWSM, pp. 533–542. 2015.
Zhou, Renjie, Samamon Khemmarat, and Lixin Gao. “The impact of YouTube recommendation
system on video views.” In Proceedings of the 10th ACM SIGCOMM Conference on Internet
Measurement, pp. 404–410. ACM, 2010.
chapter nine
Market segmentation-based modeling:

An approach to understand multiple
modes in diffusion curves
A. Anand, R. Aggarwal, and O. Singh
University of Delhi
Contents
9.1 I ntroduction......................................................................................................................... 165
9.2 Mathematical modeling..................................................................................................... 167
9.2.1 Early market adoption model............................................................................... 169
9.2.2 Main market adoption model............................................................................... 169
9.2.3 Total adoption modeling........................................................................................ 171
9.3 Parameter estimation......................................................................................................... 171
9.4 Discussion and summary.................................................................................................. 175
References...................................................................................................................................... 176
9.1 Introduction
The diffusion of innovation is an essential topic of research in the field of marketing man-
agement. Since the 1960s, plenty of innovation diffusion models have been introduced
to study the diffusion process of a product. A plethora of diffusion models based on the
highly pertinent work of Bass (1969) are available in literature. Bass (1969) has contrib-
uted greatly to the understanding of a variety of diffusion models. The simple structure
of the Bass model has led to its higher number of applications over the last few decades.
The Bass model perceives that an innovation spreads throughout the market by two main
channels: mass media (external influence) and word of mouth (internal influence). His
model assumes the nature of consumers to be homogenous with respect to their response
behavior (Agliari et al. 2009). The inexorable expansion of the market forces the research-
ers to explore alternative diffusion models with high explanatory power. The variation in
customers’ buying behavior requires a renewed focus toward the segmented market struc-
ture, which directly affects the expected profit of the firm (Wedel and Kamakura 2012).
In today’s era of competition to build long-lasting relations and gain trust with consum-
ers, it becomes mandatory for management to take into account different characteristics
and adoption behaviors of customers in various segments of markets. Hence, it becomes
vital for marketers to understand the concept of multisegmented marketing (Singh et al.
2015). However, the launch of a new product would raise awareness about its usage and
may trigger demand among their potential customers (Aggrawal et al. 2014). Furthermore,
the availability of the product/service centers of the technological products would also
165
impact the number of adopters of the product. Due to the different adoption behavior, the
studies suggest the presence of a dual market: an “early” market corresponding to the high
needs and less price sensitivity and a “main” market corresponding to the relatively less
needs and high price sensitivity.
Main market adopters are different from early market adopters. Recent literature sug-
gests that, at least regarding high-tech products, main market adopters are not opinion
generators; moreover, they do not influence the potential customers of the product (Moore
1991, 1995). In addition, industry studies ascertain different motives for adoption of inno-
vative products among early and main market consumers. Early adopters are mainly
technophiles attracted to a product for its competitive edge over similar products in the
segment; main market consumers are primarily more interested in the product’s enduring
functions (Goldenberg et al. 2002).
Existing dual-market models tend to overlook price sensitivity when it comes to con-
sidering adoption behaviors of early and main market adopters. Early market adopters are
higher risk takers as they endorse a product in spite of unpredictability and possible mis-
givings/imperfections at the initial stage of introduction of the product (Kim et al. 2014).
In comparison, main market adopters are more calculative and are rationalists who weigh
out the benefits offered by the product in the given price bracket before they make the final
purchasing decision (Rogers 1995). This reasoning accounts for entrance of main market
adopters late into the market. However, the existing diffusion models presume their entry
at the earliest stages of the market. Hence, there is essentially some time of consideration
after a main market adopter comes to know about the innovation and before he adopts it.
Another limitation of the above-mentioned models is that they take into account a
single market for a single product. However, potential adopters are not strictly observant
of various factors of the adoption system and may respond differently over time (Anand
et al. 2016b). In addition, as the demographic distribution over population and potential
adopters might be spread across vast regions, this may introduce some time lag. Hence, the
introduction of product to a new customer through a mass-mediated process or personal-
ized interaction is bound to take time. Thus, inclusion of the feature of time lag between
early and main market for diffusion of product is necessary for a comprehensive under-
standing of the diffusion model.
The understanding of the dual-market structure in the initial stages of product life
cycle has invoked marketing practitioners to introduce the concept of time lag between
early and main markets of the product. High-tech executives have increasingly come to
use terms like “early market/main market” or “visionaries/pragmatists” to comprehend
the diffusion process. According to Moore (1991), this difference necessitates change in
marketing strategy including product launch. The main market varies from the early mar-
ket with respect to its magnitude, population distribution, nature, customer expectations,
price sensitivity, and major benefits derived from the product (Gatingon and Robertson
1985). For effective product management, marketing agents should be sensitive to the time
difference of the new and main markets, to the extent of being able to predict the time
at which the mainstream consumers take over early adopters. Accordingly, marketing
strategies can be orchestrated to suit the initiation of early and main market buyers. The
interim period is also significant as it is closely related to other early product life cycle
modifications.
Product life cycle (PLC) is considered to be the trajectory of sales of a product from its
genesis to its final stages (Chandrasekaran and Tellis 2007). Considering it from a macro
perspective, other researchers describe it as the fluctuations in the market during the prod-
uct’s lifetime (Helfat and Peteraf 2003). Hence, PLC can help to determine product-related
Chapter nine: Market segmentation-based modeling 167
strategy decisions for the company (Wong and Ellis 2007). Hofer (1975) studied and reem-
phasized the importance of PLC on business planning. Forrester was a pioneer in study-
ing PLC and its applicability as a tool for management analysis and managerial modeling.
He assumes the industry and products to be homogenous in terms of their characteristics
and customer viewpoint to analyze the PLC stages. Hence, it is quintessential in mapping
the development of innovation and its market opportunities. Rogers regards the diffusion
curve based on potential adopters into five market segments: innovators, early adopters,
early majority, late majority, and laggards. Subsequently, Moore worked on Rogers’ normal
diffusion curve and adopters’ categories to describe expansion of the new products in the
market. Moore detects a break in the process as the later consumer or mainstream market
does not necessarily depend on the earlier adopters for product information. However,
it can be perceived that Moore’s purported “break” is not as sharp as he would make us
believe. After the initial life span of the innovation, there may be a slump in the market,
yet the other market comes up simultaneously before the previous market has died down.
Similarly, there may be entry of the other market during the decline phase of the foregoing
market. Therefore, assuming time lag might be misleading as at some point of time two or
more markets exist side by side. Introduction of a multimodal product life cycle curve for
the simultaneous multimarkets phenomena is more imperative as it is more realistic. But
in this study, we consider the existence of two simultaneous markets to study the bimodal
structure as a particular case of multimodal curves. The improved curve is going to have
new long-bearing repercussions on marketing strategies for both the dying early market
and the mainstream market.
Although a new trend in marketing literature differentiates between early and main
markets for new products requiring separate treatment by marketers (Mahajan and Muller
1998), the existence of discontinuity in the diffusion process has not been sufficiently
explored. This discontinuity may be due to insufficient transmission of product informa-
tion between early market adopters and mainstream consumers (Moore 1991). Any signifi-
cant difference in the adoption rates of the two markets will inevitably affect the overall
sales. It may result in a temporary decline in the sales of the product at the intermedi-
ate stage (Goldenberg et al. 2002). Inclination or reluctance of consumers of the two seg-
ments of the market may be markedly different (Rogers 1995). This differentiation calls for
bimodal curves for corresponding dual markets. This chapter proposes a new dual-market
innovation diffusion model framework that considers the division of consumers as early
adopters and mainstream consumers. The main adopters are assumed to enter the market
after a certain period of time. Considering different influences of the product over poten-
tial buyers, we study the different adoption behaviors using distribution functions for
main market adopters. We use the new dual-market model to study the pattern of product
life cycles of innovations.
The remainder of this chapter is as follows: in Section 9.2, we present the details and
mathematical framework of our model, followed by empirical analysis and validation of
our proposal in Section 9.3. At last, we discuss the implications of our findings and con-
clude this chapter in Section 9.4.
9.2 Mathematical modeling
The proposed methodology is based on the following set of assumptions:
• The diffusion process is subject to adoption due to the remaining number of adopters
in the market.
• The adoption process between the early and main market are disconnected.
• Both the markets have their own potential buyers based on their buying behavior.
• One market adoption is not influenced by the other, i.e., there is no cross-market
influence.
• Market size (potential adopters) is fixed during the diffusion process.
• There is a time lag between both the markets.
In this section, we present the dual-market model guided under the above-mentioned
assumption. As available in literature, according to Bass (1969) the adoption process occurs
because of two adopter groups: innovators (external influentials) and imitators (internal
influentials). The mathematical representation given by Bass is
dN (t)  N (t) 
n(t) = = p + q
M 
[ M − N (t)] (9.1)
dt 
where p and q are the coefficients of external and internal influence, respectively. The
cumulative number of adopters at time t, N(t) can be obtained over the remaining adopters
of potential market size, M.
Building on the Bass model, Kapur et al. (2004) proposed an alternative formulation
N (t)
of the Bass model by replacing p + q as b(t) in Equation (9.1) to avoid the distinction
M
between innovators and imitators, as innovator for one product may be imitator for the
other. Equation (9.1) can be thus rewritten as
dN (t)
= b(t)[ M − N (t)] (9.2)
dt
where b(t) defines the rate of adoption of an innovation at time t.
The uniqueness of the adoption behavior of both early and main markets is worthy
of elaboration. Hence, the dual-market innovation diffusion model assumes that adopters
are highly affected by the information that transfers from their own peer group rather
than the same information disseminated throughout the entire population (Goldenberg
et al. 2002). Here we use the index i to define the notations of early market and the index m
defines the main market segment.
The adoption process in the early market and in the main market segment progresses
as follows:
For early market:
dI (t)
= bi (t)[ N i − I (t)] (9.3)
dt
For main market:
dM(t)
= bm (t)[ N m − M(t)] (9.4)
dt
Here, bi(t) denotes the hazard rate function that an early market consumer will adopt the
product as a result of external and internal forces of marketing, and bm(t) is the rate at
which the main market consumer will adopt the product as a result of external and inter-
nal forces of marketing. Ni describes the market potential of the early market, and Nm
defines the market potential of the main market. I(t) stands for the cumulative number of
adopters at time t for the early market population, and M(t) is the cumulative number of
adopters of the main market population at time t.
The early market adoption process is similar to the differential equation as defined in
Equation (9.3). But for main market adoption, we employ a new parameter τ for delayed
entry of the main market adopters. It is widely accepted in the literature that the early and
main market adopters differ with respect to their adoption behavior and also have differ-
ent levels of price sensitivity. Hence, the entry of main market adopters after a certain time
τ can be represented in the following way:
dM(t − τ )
= bm (t − τ )[ N m − M(t − τ )] (9.5)
dt
If τ equals 0, Equation (9.5) is equivalent to Equation (9.4). Let t ′ = t − τ , and Equation (9.5)
can be rewritten as
dM(t ′)
= bm (t ′)[ N m − M(t ′)] (9.6)
dt
9.2.1 Early market adoption model

As early market adopters are more open to choices, we assume the early market behaves
the same as that of Bass and follows the S-shaped adoption curve. Also the information
about the new product spread with time (Anand et al. 2016a). Hence, we assume that ini-
tially only the innovators of the early market will adopt the product, but later on imita-
tors will enter into the market. Also by definition of an S-shaped diffusion pattern, it is
clear that the diffusion initially expands at a slow rate and later on, the number of adopt-
ers increases with time. Therefore, with this mindset it is justifiable to consider the early
market adoption pattern to be logistic viz. S-shape. The S-curve is a long-standing meth-
odology used to predict sales of products in the logistic model. Hence, adoption in this
segment can be best described by considering the logistic distribution adoption function,
b
i.e., the adoption function can be expressed as bi (t) = .
1 + β e − bt
By substituting the value of bi(t) in Equation (9.3), the cumulative number of early mar-
ket adopters can be given as
 1 − e − bi t 
N i (t) = N i  (9.7)
 1 + β e − bi t 
One should note that when there is only one market, i.e., M(t) = 0, our model converges to
a special case of logistic growth model obtained by Kapur et al. (2004).
9.2.2 Main market adoption model

Main market adopters are genetically utilitarian and much more interested in the appli-
cability of the innovation (Goldenberg et al. 2002). They appraise the benefits of adopting
a product and also wait until the utility of the product overrides its price before entering
into the market (Rogers 1995). In other words, we can say that the main market buyers are
highly practical and price sensitive. This is the time point at which the cost of the product
declines and becomes affordable or less expensive relative to its utility at that point that
main market adopters enter the market and the sales curve increases dramatically (Golder
and Tellis 1998). We assume that the main market is fully developed at the time of its intro-
duction. Based on this assumption we can say that adoption here can take more or less
time vis-a-vis early market depending on the product’s availability in market and its util-
ity. To address the heterogeneity of the main market, we have mentioned different types
of S-shaped distribution functions. We have also considered the scenario when the main
market adopters enter into market with the fastest pace, due to major change in the mar-
keting policy of the firm. For that, we have considered the exponential growth function for
main market adopters. In addition to this, the different adoption distribution functions of
the main market have yielded the following expressions:
Case 1: Here we are taking into consideration of bm(t) as the exponential distribution
function:
bm (t) = bm
Using this in Equation (9.7), the total number of main market adopters at time t ′ is found as
(
M(t ′) = N m 1 − e − bmt ′ (9.8) )
Case 2: Using two-stage Erlang function as the rate of adoption, i.e.,
bm 2t ′
bm (t ′) =
1 + bmt ′
Then the solution of Equation (9.7), corresponding to the above defined bm (t ′) is
(
M(t ′) = N m 1 − ( 1 + bmt ′ ) e − bmt ′ (9.9) )
Case 3: Considering the logistic rate of adoption for main market adopters, i.e.,
bm
bm (t ′) =
1 + β m e − bmt ′
Substituting it in Equation (9.7), then the corresponding total number of main market
adopters is given as
 1 − e − bmt ′ 
M(t ′) = N m  (9.10)
 1 + β m e − bmt ′ 
Case 4: Assuming the rate as two-stage Erlang logistic function in Equation (9.7), i.e.,
bm (bmt ′ + (1 − e − bmt ′ )β )
bm (t ′) =
(1 + β m + bmt ′)(1 + β m e − bmt ′ )
then the corresponding total number of main market adopters is calculated as
 1 − ( 1 + bmt ′ ) e − bmt ′ 
M(t ′) = N m   (9.11)
 1 + β m e − bmt ′
We now define a function L(t) as the cumulative number of adopters of the main market at
time t, which starts from the initial time point 0, as follows:
 M(t − τ ) for t ≥ τ ,
L(t) =  (9.12)
0 for t < τ .

The different values of function M(t) have been taken from Equations (9.8) to (9.11).
9.2.3 Total adoption modeling

Using the unified modeling approach, the dual-market innovation diffusion model
(DMIDM) has been formulated. By adding the cumulative number of adopters of early and
main market by time t, we can find the cumulative number of adopters at any given time as
N (t) = I (t) + L(t) (9.13)
Here it is noted that the market potentials of early market Ni and main market Nm have
been obtained from the market potential of total market, M. Assume θ defines the propor-
tion of the early market in the population of the total market; as such,
N i = θ M and N m = (1 − θ )M where (0 ≤ θ ≤ 1) (9.14)
Then by substituting the expressions of Ni and Nm in the above expression proposed in the
last section, and then putting the values of I(t) and L(t), we summarize all the dual-market
innovation diffusion models to find total sales N(t) in Table 9.1, corresponding to various
early and main market adoption functions.
9.3 Parameter estimation
Parameter estimation for all above-defined dual models has been adjudged in this section
by using the techniques of the nonlinear least squares method. For the practical affirmation,
Table 9.1 Dual market innovation diffusion models

Models I(t) L(t) N(t)
DMIDM-I Logistic Exponential   1 − e − bi t  
M θ  − bi t 
  1 + β i e 
(
+ (1 − θ ) 1 − e − bm (t −τ ) )

DMIDM-II Logistic Two-stage Erlang   1− e − bi t
 
M θ 
  1 + β i e − bi t 

( )
+ (1 − θ ) 1 − ( 1 + bm (t − τ )) e − bm (t −τ ) 

DMIDM-III Logistic Logistic   1 − e − bi t   1 − e − bm (t −τ )  
M θ  − bi t 
+ (1 − θ )  
  1 + β i e   1 + β me − bm (t −τ )  
DMIDM-IV Logistic Two-stage Erlang   1 − e − bi t   1 − ( 1 + bm (t − τ )) e − bm (t −τ )  

logistic M θ  − bi t 
+ (1 − θ )  
  1 + β i e   1 + β me − bm (t −τ )  
it is approachable to show the fit of the proposed models on the real-life data sets in terms
of cumulative distribution functions. The empirical analysis of proposed models has been
done over real-life Data Set I (DS I), which refers to cable TV, and Data Set II (DS II), which
refers to cloth dryer sales data taken from Van den Bulte and Lilien (1997). In this study,
statistical software package SPSS (Statistical Package for Social Sciences) nonlinear regres-
sion models have been used to estimate the parameters and their standard errors for the
above-defined four models. The SPSS is an interactive and user-friendly software to apply
more sophisticated models to the data. Also, the statistical software R has been used to
draw the box plots of relative errors for all defined models. For the above-mentioned mod-
els in Table 9.1, the estimates of parameters are summarized in Table 9.2. It is assumed that
the delay entry time parameter τ is a fixed number for all the models. In the case of DS I,
the value of the parameter τ is taken as 5, and it is fixed as 8 for DS II.
The value of weight parameter θ is given in the eighth column of Table 9.2 and signifies
that for DS I, the main market is more significant than the early market. But in the case of
DS II, it implies that the early market dominates the main market. Hence, it is not justifi-
able to declare the importance of one segment of the market over the other without using
a mathematical model. The performance analysis of the proposed models is measured by
using the most common goodness-of-fit criteria as MSE (mean square error), R 2 (coefficient
of determination), bias, and variation. The values of these comparison criteria are shown
in Table 9.3, confirming the robustness of the approach.
For practical purposes, it is mandatory to find the better-fitted model to the given data
sets. Hence, we have shown the cumulative sales data and predicted sales of the defined
data sets in Figures 9.1 and 9.2. It can be observed that all the models are indistinguishable
and equally fit to the actual sales data set as all graphs of predicted sales are overlapping
Table 9.2 Estimates of the parameters for the dual models

Data sets Models M bi bm βi βm θ
DS I DMIDM-I 121.1794 0.85 0.08238 7.906133 0.298833
DMIDM-II 87.78899 0.832 0.368664 9.252265 0.449032
DMIDM-III 92.7719 0.75 0.26261 6.460756 1.781948 0.414223
DMIDM-IV 85.9224 0.705 0.417982 7.142053 1.1 0.497557
DS II DMIDM-I 30.11925 0.384933 0.1 11.0378 0.65
DMIDM-II 26.85485 0.379993 0.436111 7.021542 0.65
DMIDM-III 25.97044 0.434153 0.736788 16.0431 29.73104 0.77473
DMIDM-IV 31.54691 0.379977 0.376455 14.60838 18.10933 0.75
Table 9.3 Goodness-of-fit measures

Data
sets DS I DS II
DMIDM- DMIDM- DMIDM- DMIDM- DMIDM- DMIDM- DMIDM- DMIDM-
Models I II III IV I II III IV
MSE 4.415 3.563 5.02 5.14 0.229 0.544 0.055 0.068
Bias 0.065 0.04 0.078 0.055 0.018 0.261 −0.014 0.007
Variance 9.915 7.983 10.164 10.375 0.516 1.9 0.113 0.138
R2 0.995 0.996 0.995 0.995 0.998 0.994 0.999 0.999
90
80
70
60
Actual sales
50
Sales
DMIDM-I
40
DMIDM-II
30
DMIDM-III
20
DMIDM-IV
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Time
Figure 9.1 Actual versus predicted sales for DS I.
30
25
20
Actual sales
Sales
15 DMIDM-I
DMIDM-II
10
DMIDM-III
5 DMIDM-IV
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Time
Figure 9.2 Actual versus predicted sales for DS II.
DS I DS II
0.20
0.30
Relative errors
Relative errors
0.20
0.10
0.10
0.00
0.00
DMIDM-I DMIDM-II DMIDM-III DMIDM-IV DMIDM-I DMIDM-II DMIDM-III DMIDM-IV
Figure 9.3 Box plot of the relative errors for DS I and DS II.
to each other. For each product, the relative error has been carried out to examine the pre-
dictive performance of all dual models. To put it another way, we draw the box plots as
shown in Figure 9.3, which depicts the range of relative errors of all estimated dual models
for both data sets. This figure shows that the proposed dual-market innovation diffusion
model with logistic rate growth of early market composite with two-stage Erlang growth
function in the main market gives the best result in the case of DS I, whereas the same
model gives the worst result for DS II.
In innovation diffusion literature, most of the product life cycle follows the bell-shape
structure. But in our study, we have shown that the concept of the dual market brings the
multimodal structure of the diffusion curve of innovation. Figures 9.4 and 9.5 plot the
noncumulative sales of the proposed models. It can be seen that the bimodal structure of
innovation is well captured in these figures. As the early segment market is introduced,
the sales of the product initially increase and reach a peak and afterward decrease with
time, until the main market is introduced in the market. This shape helps us to explain
why it is essential for a firm to choose the introduction of the main market after a certain
10
9
8
Noncumulative sales
7
6
DMIDM-I
5
4 DMIDM-II
3 DMIDM-III
2 DMIDM-IV
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Time
Figure 9.4 Noncumulative sales curve for DS I.
2.5
Noncumulative sales
DMIDM-I
1.5
DMIDM-II
1
DMIDM-III
0.5 DMIDM-IV
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Time
Figure 9.5 Noncumulative sales curve for DS II.

period of time. This sales growth curve also drives the firm to allocate the promotional
efforts and sales strategies on time because the market growth can also be captured by
the introduction of the main market of the product and that helps them to increase their
product revenue.
9.4 Discussion and summary

In this study, by considering the dual-market phenomenon, we have examined how a new
product is diffused among potential adopters. We have shown that the dual-market phe-
nomenon is one of the key factors in setting diffusion strategies for some durable prod-
ucts. Another key driver of the dual-market innovation diffusion model is the shape of the
product life cycle. We specifically find that the multimodal shape of the product life cycle
results due to multimarket structure models. By considering different adoption distribu-
tion functions, we represent the different multimodal diffusion curve of the product life
cycle.
In this study we propose a new dual-market model that considers the heterogeneous
rate of main market adopters by keeping the rate of early market adoption to be logistic.
Moreover, we have used the S-shaped logistic adoption function for early market adopters,
as early market adopters are fixed and slow at the initial stage, and later on imitators of
the early market enter into the market. Main market adoption can be a slow or a fast rate
of adoption as compared to early market. With this mindset, we have considered various
distribution functions for main market adopters. A unified approach has been applied for
modeling the adoption of these two different market segments. Based on our study, it can
be seen that the proposed model based on the dual-market assumption can be used effec-
tively to predict the product growth curve into the market.
Some of the applications of our study are in determining that the main market adopt-
ers are different from the early market adopters. We find that when the diffusion of the
early market slows down, it is beneficial for the firm to target the main market by keep-
ing in mind various factors such as price sensitivity, reliability, utility, etc., of product.
We also find that when diffusion slows down at some time point, it is beneficial for the
firm to introduce marketing strategies about the product to capture the high margin from
the main market. During the phase of the early market, the firm needs to ready the plat-
form for the main market. The firm can generate more revenue with a decision to make a
delayed purchase.
The proposed diffusion model may not alter decisions such as introduction of product
in the market, but certainly would affect the decision-making process for optimal time
and minimum cost of promotional strategies. Another possible implication of our find-
ings is that a firm may achieve a significant strategic advantage by driving the market at
an appropriate phase in the product life cycle. It is also of interest to marketing manag-
ers that they study the behavior of their customers as to whether they are interested in
immediate purchase or wait until the price decreases. We might expect that decisions of
delayed purchase would slow the diffusion process, but after some time it again increases
at a fast pace. Taking into account this structure helps in impacting the firm’s financial
performance.
Our model has some possible extensions. Our model does not consider the cross-
market influence. However, to extend this model we can consider the interaction between
both markets. Also, we have considered the delay time for all the models to be fixed, which
can vary for products. So, it would be of interest to find the delay time for all models. We
leave these extensions for our future work.
References
Aggrawal, D., Singh, O., Anand, A., & Agarwal, M. (2014). Optimal introduction timing policy for a
successive generational product. International Journal of Technology Diffusion (IJTD), 5(1), 1–16.
Agliari, E., Burioni, R., Cassi, D., & Maria Neri, F. (2009). Word-of-mouth and dynamical inhomoge-
neous markets: An efficiency measure and optimal sampling policies for the pre-launch stage.
IMA Journal of Management Mathematics, 21(1), 67–83.
Anand, A., Aggarwal, R., Singh, O., & Aggrawal, D. (2016a). Understanding diffusion process in the con-
text of product dis-adoption. St. Petersburg State Polytechnical University Journal Economics, 9(2), 7–18.
Anand, A., Singh, O., Aggarwal, R., & Aggrawal, D. (2016b). Diffusion modeling based on customer’s
review and product satisfaction. International Journal of Technology Diffusion (IJTD), 7(1), 20–31.
Bass, F. M. (1969). A new product growth for model consumer durables. Management Science, 15(5),
215–227.
Chandrasekaran, D., & Tellis, G. J. (2007). A critical review of marketing research on diffusion of new
products. In N. K. Malhotra (Ed.), Review of Marketing Research (Vol. 3, pp. 39–80). Bingley: Emerald
Group Publishing Limited.
Gatingon, H., & Robertson, T.S. (1985). A propositional inventory for new diffusion research. Journal
of Consumer Research, 11(4), 849–867.
Goldenberg, J., Libai, B., & Muller, E. (2002). Riding the saddle: How cross-market communications
can create a major slump in sales. Journal of Marketing, 66(2), 1–16.
Golder, P. N., & Tellis, G. T. (1998). Growing, growing, gone: Modeling the sales slowdown of really
new consumer durables. University of Southern California Working Paper.
Helfat, C. E., & Peteraf, M. A. (2003). The dynamic resource‐based view: Capability lifecycles.
Strategic Management Journal, 24(10), 997–1010.
Hofer, C. W. (1975). Toward a contingency theory of business strategy. Academy of Management
Journal, 18(4), 784–810.
Kapur, P. K., Bardhan, A. & Jha, P. C. (2004), An alternative formulation of innovation diffusion model.
In V. K. Kapoor (Ed.), Mathematics and Information Theory, (pp. 17–23). New Delhi: Anamaya
Publication.
Kim, T., Hong, J. S., & Lee, H. (2014). Predicting when the mass market starts to develop: The dual
market model with delayed entry. IMA Journal of Management Mathematics, 27(3), 381–396.
Mahajan, V., & Muller, E. (1998). When is it worthwhile targeting the majority instead of the innova-
tors in a new product launch? Journal of Marketing Research, 35, 488–495.
Moore, G. A. (1991), Crossing the Chasm. New York: Harper Business.
Moore, G. A. (1995), Inside the Tornado. New York: Harper Business.
Rogers, E. M. (1995), The Diffusion of Innovations, 4th ed. New York: The Free Press.
Singh, O., Kapur, P. K., & Sachdeva, N. (2015), Technology management in segmented markets.
Quality, Reliability, Infocom Technology and Industrial Technology Management (pp. 78–89).
New Delhi: I K International Publishing House.
Van den Bulte, C., & Lilien, G. L. (1997). Bias and systematic change in the parameter estimates of
macro-level diffusion models. Marketing Science, 16(4), 338–353.
Wedel, M., & Kamakura, W. A. (2012). Market Segmentation: Conceptual and Methodological Foundations
(Vol. 8). New York: Springer Science & Business Media.
Wong, H. K., & Ellis, P. D. (2007). Is market orientation affected by the product life cycle? Journal of
World Business, 42(2), 145–156.
chapter ten
Kernel estimators for data analysis

Piotr Kulczycki
Systems Research Institute, Polish Academy of Sciences;
AGH University of Science and Technology
Contents
10.1 I ntroduction......................................................................................................................... 177
10.2 Methodology of kernel estimators................................................................................... 178
10.2.1 Modification of smoothing parameter................................................................. 181
10.2.2 Support boundary.................................................................................................. 182
10.3 Identification of atypical elements................................................................................... 183
10.3.1 Basic version of the procedure.............................................................................. 183
10.3.2 Extended pattern of population............................................................................ 185
10.3.3 Equal-sized patterns of atypical and typical elements..................................... 186
10.3.4 Comments for Section 10.3.................................................................................... 187
10.4 Clustering............................................................................................................................. 187
10.4.1 Procedure................................................................................................................. 188
10.4.2 Influence of the parameters values on obtained results................................... 190
10.5 Classification........................................................................................................................ 192
10.5.1 Bayes classification................................................................................................. 192
10.5.2 Correction of values of smoothing parameter and modification intensity.... 193
10.5.3 Reduction to pattern sizes..................................................................................... 194
10.5.4 Structure for nonstationary patterns (concept drift)......................................... 195
10.6 Example practical application and final comments....................................................... 198
Acknowledgments....................................................................................................................... 200
References...................................................................................................................................... 201
10.1 Introduction
Perversely, one can state that contemporary data analysis has developed too vigorously,
which negatively caused, in particular, an absence of due care and attention regarding
formalism, mathematical justification, and ultimately compact subject methodology. Before
the computer revolution of the second half of the 20th century, data analysis was conducted
based on already well-established, effective mathematical apparatuses of statistics. The main
trouble was then the inadequacy of the data, expressed primarily in small sample sizes, and
– in consequence – statistical procedures were directed toward m aximal effectiveness in the
sense of gaining as much information as possible from them. The situation was diametrically
reversed in the 1980s, with the spread of not only efficient numerical calculation systems, but
also methods of automatic measurement. In a relatively short time, a total reversal in the con-
ditions occurred: the data became too numerous and carried too complex information for
177
processing by classic statistical methodology. Moreover, the absence of the ability to super-
vise such excessive and complicated data sets through a statistician’s intuition led to the
danger of the appearance of evidently erroneous data, resulting, for example, in faults occur-
ring during the measurements of particular elements. Such drastically reversed conditions
of data analysis tasks caused an enormous need for totally new procedures, and the speed
of progress brought about a situation in which they were based on separated, often specific,
concepts without mathematical justification or attempts at the unification of methodology.
Currently, it seems, the time for their modification, proving, and generalization has arrived.
The subject of this chapter is the presentation of a coherent concept of establishing the
methodology of kernel estimators for the three main tasks of data analysis: identification/
detection of atypical elements (outliers), clustering, and classification. The application of a
uniform apparatus for all three basic problems facilitates comprehension of the material
and, in consequence, creation of individualized modifications, and also in the latter phase
of the designing of a personal computer application. The use of nonparametric kernel esti-
mators frees the results from data distribution – this concerns not only the shape of their
grouping, but also the possibility of their partition in separate incoherent parts. The meth-
odology investigated in this chapter is practically parameter free, i.e., it is not required
from the user, the calculation of parameter values, although it is possible to optionally
modify them in order to achieve the specific desired properties.
This chapter is constructed as follows. After this introduction, Section 10.2 presents
an outline of the methodology of kernel estimators. This will be applied in Section 10.3
to the identification of atypical element task, to clustering in Section 10.4, and in Section
10.5 – classification. In the framework of the final summary, in Section 10.5, an exam-
ple application of the investigated material in the creation of a mobile phone operator’s
marketing support strategy is presented.
10.2 Methodology of kernel estimators

Let the n-dimensional random variable X : Ω → R n, with a distribution having the density
f, be given. Its kernel estimator fˆ : Rn → [0, ∞) is calculated on the basis of the m-elements
simple random sample
x1 , x2 ,  , xm ∈ R n (10.1)
experimentally obtained from the variable X, and it is defined in a basic form by

m
fˆ ( x) =
1
mhn ∑ K  x −h x  , (10.2)
i=1
i
where the measurable function K : Rn → [0, ∞), symmetrical with respect to zero and
having a weak global maximum at this point, fulfils the condition
∫
K (x) d x = 1 and is
Rn
called a kernel, whereas the positive coefficient h is referred to as a smoothing parameter.
For details, see the classic monographs (Kulczycki 2005; Silverman 1986; Wand and Jones
1995). Notably, a kernel estimator enables the identification of density for practically any
distribution, especially with no assumptions regarding its membership of a fixed class;
unusual, complex, or multimodal distributions are treated here as a typical unimodal
case. The form of the kernel K and the value of the smoothing parameter h are commonly
provided based on the mean integrated square error criterion.
Chapter ten: Kernel estimators for data analysis 179
Thus, the selection of the kernel form is practically meaningless from a statistical point
of view and, in consequence, the user should above all take into account properties of the
desired estimator or/and computational aspects, useful for the application problem being
worked out; for details, see the literature (Kulczycki 2005, Section 3.1.3; Wand and Jones
1995, Sections 2.7 and 4.5).
For the one-dimensional case (i.e., when n = 1), the normal (Gauss) kernel
1  x2 
K j ( x) = exp  −  (10.3)
2π  2
is generally held as basic. For special purposes, other types can be proposed; here, the
uniform kernel
 1
 for x ∈[−1, 1]
K j ( x) =  2 (10.4)
 0 for x ∉[−1,1]

will be used henceforth – it has bounded support and assumes a finite number of values,
which will be taken advantage of later in this chapter.
In the multidimensional case (i.e., when n > 1), a so-called product kernel will be
applied hereinafter.* The main idea here is the division of particular variables with the
multidimensional kernel then becoming a product of n one-dimensional kernels for
specific coordinates. Thus, the kernel estimator (10.2) is then given as
m
fˆ ( x) =
1
∑ x −x
K  1 i ,1   x2 − xi ,2 
 K 2  h
 xn − xi , n 
  K n  h  , (10.5)
∏
n 1
m h  h 1 2 n
j i=1
j=1
where Kj (j = 1, 2, …, n) denotes one-dimensional kernels, e.g., normal (10.2) or uniform

(10.3), hj (j = 1, 2, …, n) are smoothing parameters individualized for particular coordinates,
while assigning to coordinates
 x1   xi ,1 
   
x=
x2  and xi =  xi ,2  for i = 1, 2, … , m. (10.6)
   
 x   
 n   xi , n 
The above kernels fulfill the additional requirements of the particular procedures used
henceforth.
The value of the smoothing parameter is highly significant for the estimation quality,
and many advantageous algorithms for calculating it on the basis of a random sample have
been proposed.
First, consider the one-dimensional case. In specific conditions, e.g., during initial
research or a numerous random sample (10.1) with relatively regular distribution, the
* For description of another – radius – type, see the monographs by Kulczycki (2005, Section 3.1.3) and Wand
and Jones (1995, Section 4.5), where it is called spherically symmetric. This notion will not be used in this text.
approximate method (Kulczycki 2005, Section 3.1.5; Wand and Jones 1995, Section 3.2.1) is
sufficient, according to which
15
 8 π W (K ) 1 
h= σˆ , (10.7)
 3 U ( K ) m 
2
∞ ∞
where W ( K ) =
∫
−∞
K ( x)2 d x and U ( K ) =
∫−∞
x 2 K ( x) d x, while σ̂ denotes the estimator of a
standard deviation:
m m
1
∑( xi − Eˆ ) 1
∑ x . (10.8)
2
σˆ = with Eˆ = i
m−1 i=1
m i=1
The functional values occurring in formula (10.7) are, respectively, for normal kernel (10.3)
1
W (K ) = , U ( K ) = 1 (10.9)
2 π
and for uniform (10.4)
1 1
W (K ) = , U ( K ) = . (10.10)
2 3
For specific cases, the more sophisticated yet effective plug-in method (Kulczycki 2005,
Section 3.1.5; Wand and Jones 1995, Section 3.6.1) can be recommended. Its concept consists
of the calculation of the smoothing parameter using the approximate method described
above, and after r steps improving the result, one obtains a value close to optimal. On the
basis of simulation research carried out for the needs of the material worked out in this
chapter, r = 2 can be proposed. In this case, the plug-in method consists of the application
of the following steps:
105
d8 = , (10.11)
32 πσˆ 9
where σ̂ is given by formula (10.8), and subsequently,
19
 −2 K (6) (0) 
g II =  (10.12)
 mU (K )d8 
17
 −2 K (4) (0) 
gI =  ; (10.13)
 mU (K )d6 ( g II ) 
finally
15
 V (K ) 
h= , (10.14)
 mU (K ) d4 (g I ) 
2
while
m m
 xi − x j 
dp (g ) =
1
m2 g p + 1 ∑ ∑ K
i=1 j=1
( p)
 g  for p = 4,6. (10.15)
The kernel K, applied in estimator (10.2), is used only in the last step (10.14). In other steps,
represented by formulas (10.12), (10.13), and (10.15), the different kernel K may be used.
Generally, a normal kernel (10.3) is assumed; the quantities occurring in formulas (10.12),
(10.13), and (10.15) are then given by dependence (10.9) and also
1 1 15
K (6) ( x) = ( x 6 − 15 x 4 + 45 x 2 − 15) exp  − x 2  , K (6) (0) = − (10.16)
2π  2  2π
1 1 3
K (4) ( x) = ( x 4 − 6 x 2 + 3) exp  − x 2  , K (4) (0) = . (10.17)
2π  2  2π
For the multidimensional case, thanks to using a product kernel, the methods presented
can be simply applied n times, sequentially for each coordinate.
Finally, it is worth noting that too small a value of the smoothing parameter h implies
the appearance of an excessive number of local extremes of the estimator f̂ , whereas too
large causes its overflattening – this property will be actively used in later considerations.
In practical applications of kernel estimators, one can also use specific concepts,
generally improving the estimator properties, and others optionally fitting the model to a
considered reality.* In the first group, a so-called modification of the smoothing parameter –
presented in Section 10.2.1 (Kulczycki 2005, Section 3.1.6; Silverman 1986, Section 5.3.1) –
will be used henceforth, while in Section 10.2.2, the support boundary (Kulczycki 2005,
Section 3.1.7; Silverman 1986, Section 2.10), belonging to the second group, is presented.
10.2.1 Modification of smoothing parameter

For the kernel estimator definitions (10.2) and (10.5), the impact of the smoothing parameter
on individual kernels is identical. Positive results can be achieved by individualizing this
effect, obtained by mapping the positive modifying parameters s1, s2, …, sm, to successive
kernels, which are given by
−c
 
 fˆ* ( xi ) 
si =   for i = 1, 2,  , m, (10.18)
∏
m
m fˆ* ( xi ) 
 i=1 
where c ∈[0, ∞), f̂* means the kernel estimator without modification, and finally defining
the kernel estimator with modification of the smoothing parameter as
m
fˆ ( x) =
1
mhn ∑ s1 K  xhs− x  . (10.19)
i=1
n
i i
i
If the product kernel is used, the counterpart of definition (10.5) becomes

m
fˆ ( x) =
1
∑ 1 x −x
K  1 i ,1   x2 − xi ,2   xn − xi , n 
 K 2  s h   K n  s h  . (10.20)
∏
n 1
m h s  sh
i i 1 i 2 i n
j i=1
j=1
* According to the experience of the author and his research team, it is worth maintaining sensible self-restraint
in the application of specific ideas available in the subject literature, often ineffective in practice, and increasing
the complexity of the procedures.
As a consequence of the above concept, in the areas in which the elements of random
sample (10.1) are rare, the kernel estimators are additionally flattened, and in the regions of
their concentration – additionally peaked. The parameter c determines the intensity of the
modification procedure – when its value is larger/smaller, it becomes more/less distinct.
Using the criterion of the integrated mean square error, one can propose
c = 0.5. (10.21)
For details see the monographs by Kulczycki (2005, Section 3.1.6) and Silverman (1986,
Section 5.3.1).
10.2.2 Support boundary
For practical applications, specific coordinates of the random variable can describe diverse
quantities. A number of these, in particular representing distance or time, for their correct
interpretation, must belong to properly bounded subsets, e.g., nonnegative numbers. Due
to omitting misinterpretations and calculational errors resulting from this, a beneficial
procedure for bounding a kernel estimator’s support can be applied.
First, consider the one-dimensional case (when n = 1) and the left boundary – i.e., the
case where the condition fˆ ( x) = 0 for x < x*, with x* ∈ R (mostly x* = 0), is desired. A fragment
of the ith kernel which lays outside the interval [ x* , ∞) is symmetrically “reflected” with
respect to the boundary x* , and becomes a part of the kernel “hooked” in the element xi
“reflection”; therefore, in the point 2 x* − xi . So, after defining the function K x* : R → [0, ∞) by
 K ( x) when x ≥ x*
K x* ( x) =  , (10.22)
0 when x < x*

the basic form of kernel estimator (10.2) is the following
m
fˆ ( x) =
1
mh ∑  K
i=1
x*
 x − xi  + K  x + xi − 2 x*  

h 
 x* 
 h
  (10.23)

and analogously, the formula with the modification of the smoothing parameter (10.20)
m
  x + xi − 2 x*  
fˆ ( x) =
1
mh ∑ s1  K
i=1
i
x*
 x − xi 
 hs  + K x* 
i hsi  . (10.24)

Cut fragments of kernels, lying outside of the assumed support, are embodied into the
support directly near its boundary, so with such a small introduced change, this is accept-
able in practice.
The consideration for the right boundary of the support can be followed analogously.
In the multidimensional case, the concept presented may naturally be applied subse-
quently for every coordinate of the considered random variable. These cases, however,
will not be used further in this text. For more details, see the books by Kulczycki (2005)
and Silverman (1986, Section 2.10).
A broader description regarding various aspects of kernel estimators is found in the
classic monographs (Kulczycki 2005; Silverman 1986; Wand and Jones 1995). In the next
sections of the chapter, this methodology will be uniformly applied to three fundamental
procedures of data analysis: identification of atypical elements, clustering, and classification.
10.3 Identification of atypical elements

The task of identifying atypical elements is one of the fundamental problems of contempo-
rary data analysis (Aggarwal 2013), above all, in the preliminary phase of processing. The
occurrence of such elements can be interpreted in two ways. The first, and more popular,
associates them with gross errors handicapping some elements of the set being considered.
They are then eliminated or corrected. In this case, the identification of atypical elements
can be termed “detection,” which is generally connoted with negative occurrences. In the
second, which is less common yet more constructive, atypical elements represent uncon-
ventional phenomena, exceptional items, and new trends. They then provide exception-
ally valuable information, and stimulate nontrivial behaviors and innovative thinking.
In order to cover this case, it is worth replacing the notion of “detection” with the more
neutral “identification,” as is done throughout this text.
There is no one definition of atypical elements. The most general is that they are
observations originating from a distribution other than the remaining population.
However, this view does not help to recognize them in a specific data set. The above
definition is most often refined by the classic notion of “outliers,” to a distance-based con-
cept, indicating those elements furthest from the majority. Here, the frequency approach
will be applied, whereby atypical elements are rare, i.e., the probability of their appear-
ance is faint. In this way, one can identify atypical observations not only on the peripher-
ies of the population, but in the case of multimodal distributions with wide-spreading
segments, and also those lying in between these segments, even if close to the center of
the set.
10.3.1 Basic version of the procedure

Let the set be given, with elements representative for the population
x1 , x2 ,  , xm ∈ R n . (10.25)
Treat these elements as realizations of the n-dimensional continuous random variable X

with distribution having density f and calculate – in accordance with Section 10.2 – the
kernel estimator f̂ ; preferably with modification of the smoothing parameter (10.20) and
normal kernel (10.3). Next, consider the set of its value for elements of set (10.25), so
fˆ−1 ( x1 ), fˆ−2 ( x2 ),  , fˆ− m ( xm ), (10.26)
where f̂− i means the kernel estimator f̂ calculated excluding the ith element, for
i = 1, 2,  , m. It is worth noting that, regardless of the dimension of the random variable
X, the values of set (10.26) are real (one-dimensional). Particular values fˆ− i ( xi ) character-
ize the probability of the occurrence of the element xi, and therefore, the lower the value
fˆ− i ( xi ), the more the element xi can be interpreted as “less typical,” or rather happening
more rarely.
Define now the number
r ∈(0,1) (10.27)
establishing sensitivity of the procedure for identifying atypical elements. This number
will determine the assumed proportion of atypical elements in relation to the total popu-
lation and, therefore, the ratio of the number of atypical elements to the sum of atypical
and typical. One can naturally assume r > 0.5, as otherwise the atypical elements become
typical and vice versa. In practice
r = 0.01, 0.05, 0.1 (10.28)
is the most often used, with particular attention paid to the second option.
Let us treat set (10.26) as realizations of a real (one-dimensional) random variable
and calculate the estimator for the quantile of the order r. The positional estimator of
the second order (Parrish 1990, Kulczycki 1998) will be applied as follows, given by the
formula
 z1 for mr < 0.5

qˆ r =  , (10.29)
 (0.5 + i − mr ) z i + (0.5 − i + mr )zi + 1 for mr ≥ 0.5
where i = [mr + 0.5], and [y] denotes an integral part of the number y ∈ R, while zi is the ith
value in size of set (10.26) after being sorted; thus,
{z1 , z2 , , zm } = { fˆ−1(x1 ), fˆ−2 (x2 ), , fˆ− m (xm )} (10.30)

with z1 ≤ z2 ≤  ≤ zm. Application of the positional quantile estimator guarantees its value
does not exceed beyond the support of the random variable under investigation, or rather
to be more precise, thanks to the use of kernel (10.3) with positive values, the inequality
qˆ r > 0 is fulfilled.
Generally, there are no special recommendations concerning the choice of the sort-
ing algorithm (Canaan et al. 2011) used to specify set (10.30). However, let us interpret
definition (10.29), taking into account condition (10.28). So, it is enough to sort only the i + 1
smallest values in the set { z1 , z2 , , zm } and, therefore, in practice about 1%–10% of its size.
One can apply a simple algorithm consisting of subsequently finding the i + 1 smallest
elements of the set { z1 , z2 , , zm } .
Finally, if for a given tested element
x ∈ R n (10.31)
the condition
fˆ ( x ) ≤ qˆ r (10.32)
is fulfilled, then this element should be considered atypical; for the opposite
fˆ ( x ) > qˆ r (10.33)
it is typical. Note that for the correctly estimated quantities f̂ and qˆ r, the above guarantees
obtaining the proportion of the number of atypical elements to total population at the
assumed level r.
The above procedure for identifying atypical elements, combined with the properties
of kernel estimators, allows in the multidimensional case for inferences based not only
on values for specific coordinates of a tested element, but above all, on the relationships
between them.
10.3.2 Extended pattern of population

Although, from a theoretical point of view, the procedure presented in the previous sec-
tion seems complete, when the values r are applied in practice – see condition (10.29) – and
the size m is not large, the estimator of the quantile qˆ r is encumbered with a large error,
due to the low number of elements zi being smaller than the estimated value. To counter-
act this, a data set will be extended by generating additional elements with distribution
identical to that characterizing the subject population, based on set (10.25).
The methodology for enlarging a set representative for the investigated population is
suggested using Neumann’s elimination concept (Gentle 2003). This allows the generation
of a sequence of random numbers of distribution with support bounded to the interval [a, b],
while a < b, characterized by the density f of values limited by the positive number c, i.e.,
f ( x) ≤ c for x ∈[ a, b]. (10.34)
In the multidimensional case, the interval [a, b] generalizes to the n-dimensional cuboid
[ a1 , b1 ] × [ a2 , b2 ] ×  × [ an , bn ], while a j < b j for j = 1, 2, …, n.
First, the one-dimensional case is considered. Let us generate two pseudorandom
numbers u and v of distribution uniform to the intervals [a, b] and [0, c], respectively. Next,
one should check that
v ≤ f (u). (10.35)
If the above condition is fulfilled, then the value u ought to be assumed as the desired real-
ization of a random variable with distribution described by the density f, that is
x = u. (10.36)
In the opposite case, the numbers u and v need to be removed and the above procedure
repeated until the desired number of pseudorandom numbers x with density f is obtained.
In the presented procedure, the density f is established by the methodology of kernel
estimators, described in Section 10.2. Denote its estimator as f̂ . The uniform kernel will
be employed, allowing easy calculation of the support boundaries a and b, as well as the
parameter c appearing in condition (10.34). Namely,
a = min xi − h (10.37)
i = 1,2,, m
b = max xi + h (10.38)
i = 1,2,, m
and
c = max
i = 1,2,, m
{ fˆ ( x − h) , fˆ ( x + h)}. (10.39)
i i
The last formula results from the fact that the maximum for a kernel estimator with the
uniform kernel must occur on the edge of one of the kernels. It is also worth noting that
calculations of parameters (10.37)−(10.39) do not require much effort. This is thanks to the
appropriate choice of kernel form, taking advantage of the kernel estimators’ r obustness
in form.
In the multidimensional case, Neumann’s elimination algorithm is similar to the
previously discussed one-dimensional version. The edges of the n-dimensional cuboid
[ a1 , b1 ] × [ a2 , b2 ] ×  × [ an , bn ] are calculated from formulas comparable to dependences

(10.34)−(10.36) separate for particular coordinates. The kernel estimator maximum is thus
located in one of the corners of one of the kernels; therefore,
  x ± h  
   i ,1  
   xi ,2 ± h  
c = max  fˆ     following all combinations of ± . (10.40)
i = 1,2,…, m
   
   xi , n ± h  
  
The number of these combinations is finite and equal to 2n. Using the formula presented,
n particular coordinates of pseudorandom vector u and the subsequent number v are
generated, after which, condition (10.35) is checked.
The results of empirical research show that for the properly extended set (10.25), the
procedure investigated here for identifying atypical elements allows us to obtain a pro-
portion of this type of element throughout the whole population, with great accuracy,
sufficient from an applicational point of view.
10.3.3 Equal-sized patterns of atypical and typical elements

Let us consider set (10.25) introduced in Section 10.3.1, consisting of elements repre-
sentative for an investigated population, and potentially extended as described in
accordance with Section 10.3.2. In taking its subset comprising these observations xi
for which condition (10.32) is fulfilled, one can treat it as a pattern of atypical elements.
Denote it thus:
x1at , x2at ,  , xmatat . (10.41)
Similarly, the set of observations for which the opposite inequality (10.33) is true may be
considered as a pattern of typical elements:
x1t , x2t ,  , xmtt . (10.42)
Sizes of the above patterns equal mat and mt, respectively. Of course mat + mt = m ; we also
have
mat
≅ r. (10.43)
mat + mt
In this way, unsupervised in its nature, the problem of identifying atypical elements
has been reduced to a supervised classification task, although with strongly unbalanced
patterns – taking into account relation (10.43) with condition (10.28), set (10.41) is in practice
around 10–100 times smaller than pattern (10.42). Classification is relatively conveniently
conditioned and can use many different well-developed methods. However, most pro-
cedures work much better if patterns are of similar or even equal sizes (Kaufman and
Rousseeuw 1990). Using once again the algorithm presented in Section 10.3.3, the size of
the set can be increased to mt, so that mat = mt, thus equaling patterns of atypical (10.41) and
typical (10.42) elements.
10.3.4 Comments for Section 10.3

The conducted broad empirical research confirmed the validity of the investigated
method. For any fixed value for the parameter m, the quality of the procedure increased
together with the value m*, but only within the range of 1000–10,000. One can interpret this
as exhausting the information contained in the m-element sample (10.25). Moreover, an
interesting phenomena occurred in the case m* = m, i.e., when the generated – according
to Section 10.3.2 – sample was of the same size as the basic set (10.25). The results were
better for the further case, which may be explained by stabilization, of sorts, of results,
“filtered” through the distribution calculated for set (10.25). Such a positive “initial con-
dition” provides additional motivation for the concept of extending the population size,
presented in Section 10.3.2. Over and above the investigations of the generated synthetic
sets and benchmarks, the methodology worked out here was applied for making correct
medical decisions regarding biochemical blood tests concerning plasma component analy-
sis, and also in research on the dependency between the level of hemoglobin and death
due to heart attack and conditions of the circulatory system. In the work by Kulczycki
and Kruszewski (2017a), the procedure was designed to submit the results in fuzzy and
intuitionistic fuzzy forms was investigated.
A detailed summary of the studies presented in this section can be found in the work
by Kulczycki and Kruszewski (2017b).
10.4 Clustering
Clustering has become the second basic problem within data analysis – compared to other
procedures, it is more loosely defined and at a lesser advanced stage in research (Everitt
et al. 2011; Xu and Wunsch 2009). It can be found between classical data analysis, where the
research objective has already been specified, and exploration data analysis, in which the
aim of future investigations is unknown a priori, and its detection is an integral component
of the research. In the first case, clustering may be applied for the purposes of classifica-
tion, however, without fixed patterns, whereas the second treats it as a division of the
explored data into a few groups, each comprising elements that are similar to each other
but significantly differ between particular groups.
Consider a set of elements from an investigated population. The most intuitive and
natural concept is the assumption that specific clusters are related to modes (local maxima)
of distribution density; thus, the “valleys” become the borders of the resulting clusters
(Fukunaga and Hostetler 1975). The algorithm described in this section is presented in
its entirety, which can be applied without the requirement for users to conduct laborious
research. Its attributes can be summarized as follows:
1. Parameter values can be numerically calculated based on optimizing criteria.

2. The algorithm requires stringent assumptions concerning the number of clusters,
which enables the obtained number to better describe a real structure of data.
3. The parameter that is responsible for the cluster number is specified; it also displays
how possible modifications to its value – e.g., obtained in point 1 – have an influence
over increases or decreases in the number of clusters but still without determining
their precise number.
4. Furthermore, the next parameter is indicated and the proportion of the number of
clusters in dense versus sparse regions of data elements can influences its value; it
can also be obtained by the optimization criteria and with the option for further
modification with the aim of simultaneously increasing the cluster quantity in dense
regions and reducing or even eliminating them from sparse areas of data, or vice versa.
5. The suitable relationship between the parameters mentioned in points 3 and 4 enables
reducing and even eliminating clusters in sparse regions, virtually without affecting
the cluster quantity in dense areas of data.
The characteristics from point 4, and consequently point 5, are particularly worthwhile
to highlight as being practically absent in other clustering methods. In applications, one
should underline the consequences of points 1 and 2, as well as possibly point 3.
10.4.1 Procedure
Consider – as in the previous section – a data set
x1 , x2 ,  , xm ∈ R n , (10.44)
treated as realizations of the n-dimensional continuous random variable X with distribu-

tion having density f. Appling the methodology presented in Section 10.2, one can create
the kernel estimator f̂ . After assuming that subsequent clusters are mapped to the local
maxima of the above function, the elements (10.44) can be shifted in the direction of gradi-
ent ∇f̂ with the suitable step. It can be carried out iteratively using the classic Newtonian
method (Kinkaid and Cheney 2002), given as
xi0 = xi for i = 1, 2,  , m (10.45)

∇fˆ ( x k )
xik + 1 = xik + b ˆ ki for i = 1, 2,  , m and k = 0,1,  , k *, (10.46)
f ( xi )
whereas b > 0 and k * ∈ N\{0}. Based on an optimizing criterion, one can suggest
2
1 
b= min h j  , (10.47)
n + 2  j = 1,2,, n 
while hj denotes the smoothing parameter value of the jth coordinate. To the above
task, the estimator with smoothing parameter modification with standard intensity
(10.21) can be applied; the product kernel (10.5) is used in the multidimensional case. As
a (one-dimensional) kernel, the normal kernel (10.3) is proposed because of its analyti-
cal convenience, differentiability within the entire domain, and the fact that its values
are positive, which defends against division by zero in formula (10.46). In this case, the
quotient on the right side of Equation (10.46) takes the convenient form
 (x k − x ) 
 − i ,1 2 2 i ,1 
 s1 h1 
 (x − x )
k 
∇fˆ ( xik )  − i ,2 2 2 i ,2 
=  s2 h2  , (10.48)
fˆ ( xik )  

 
 ( xi , n − xi , n )
k

 − sn2 hn2 
 
with denotation of particular coordinates
 xk 
 i ,1 
 xk 
xik =  i ,2  . (10.49)
  
 xik, n 
 
Next, it is assumed that algorithm (10.45) and (10.46) needs to be completed, in the event
that the following inequality is fulfilled after the subsequent kth step
Dk − Dk − 1 ≤ aD0 (10.50)
in which a > 0 and

m− 1 m m− 1 m m− 1 m
D0 = ∑∑
i=1 j=i+1
d( xi ,x j ), Dk − 1 = ∑∑i=1 j=i+1
d( xik − 1 ,x kj − 1 ), Dk = ∑ ∑ d(x ,x ), (10.51)
i=1 j=i+1
k
i
k
j
where d denotes Euclidean metric in R n. Thus, D0 and Dk − 1, Dk mean the sums of the
distances between specific elements (10.44) during the starting of the algorithm and fol-
lowing the performance of the (k − 1)th and kth steps, respectively. Initially, one can sug-
gest a = 0.001. The possible reduction of this value has practically no influence over the
results; however, growth needs validation of the potential consequences. Last, when con-
dition (10.50) is fulfilled after the kth step, then
k * = k (10.52)
and as a result this is considered to be the final step.

A procedure now needs to be applied for the creation of clusters and for particular
elements to be assigned to them. The set comprising elements of set (10.44) submitted to k *
steps of algorithm (10.45) and (10.46) is considered to achieve this objective:
* * *
x1k , x2k ,  , xmk . (10.53)
Now, define the set of their mutual distances
{d(x k*
i
*
, x kj ) } i = 1, 2,, m − 1
j = i + 1, i + 2,, m
. (10.54)
Its size is
m(m − 1)
md = . (10.55)
2
Considering set (10.54) as a one-dimensional random variable sample, one should calculate
the auxiliary kernel estimator fˆd of mutual distances (10.54). Normal kernel (10.3) is sug-
gested once more; furthermore, the smoothing parameter modification procedure with the
standard value of parameter (10.21), together with the left-sided support boundary to the
interval [0, ∞); see formula (10.24) for x* = 0.
Finding – with appropriate accuracy – the “first” (in the sense of the lowest argument
value) local minimum of the function fˆd in the interval (0, D), where
D= max d( xi , x j ), (10.56)
i = 1,2,, m − 1
j = i + 1, i + 2,, m
is the next task. For this objective, one can consider set (10.54) to be a random sample,
estimate its standard deviation applying formula (10.8), and subsequently take the values
x from the set
{0, 0.01 ⋅ σ d , 0.02 ⋅ σ d ,  ,[int(100 ⋅ D) − 1] ⋅ σ d }, (10.57)
where int(100 ⋅ D) means an integral of the number 100 ⋅ D , until the condition
fˆd ( x − 0.01 σ d ) > fˆd ( x) and fˆd ( x) ≤ fˆd ( x + 0.01 σ d ) (10.58)
is fulfilled. The first (the smallest) value* will be treated as the smallest distance between
cluster centers located in close proximity to each other and referred to as xd hereinafter.
The final step is the creation of the clusters. This is achieved through:
1. Taking an element of set (10.55) and first producing a one-element cluster i ncluding it.
2. Finding an element of set (10.55) which differs from the others in the cluster, and is
nearer than xd; if such an element exists, it needs to be then added to that cluster; in
the event that this is not the case, go to point 4.
3. Discovering an element of set (10.55) different to elements in the cluster, and lying
closer than xd to at least one of these other elements; in the event of there being such
an element, one should add this to the cluster and subsequently repeat point 3.
4. Adding the attained cluster to a “cluster list” and removing its elements from set
(10.55); if this set reduced in such a way remains not empty, go to point 1, otherwise,
finish the algorithm.
The “cluster list” such obtained has all clusters defined during the above procedure con-
tained within. Now, we have investigated the basic form of the clustering procedure – its
potential modifications with their influence on the results is described in the following
section.
10.4.2 Influence of the parameters values on obtained results

It should be underlined once more that the clustering procedure investigated in the
previous section did not necessitate any initial, often arbitrary, assumption regarding the
number of clusters – it mainly depends on the internal data structure. In the elementary
form presented above, the parameters are calculated effectively based on optimizing cri-
teria, but in practical applications, it can be beneficial to suitably modify the values of
kernel estimator parameters – thus influencing, in such a manner, the cluster number and
additionally, the proportion of their occurrence between dense and sparse regions.
* In the event that such a value is nonexistent, the presence of one cluster should be recognized and the
procedure completed. The same applies to the irrational but formally possible situation m = 1, when set (1.52)
is empty.
As discussed in Section 10.2, too great a value of the smoothing parameter h results in
the over-smoothing of the kernel estimator; while if it is too small, this causes the appear-
ance of too many local extremes. Therefore, the result of increasing – with respect to that
calculated by the criterion of the mean integrated square error – this parameter value is
the occurrence of fewer clusters; conversely, decreases to this value yield more clusters.
In both cases, one can emphasize that despite influencing the number of clusters, their
number still solely depends on the data’s internal structure. On the basis of the performed
research, a change in the smoothing parameter value in the range −25% to +50% can be
recommended. Results are in need of individual verification if they lie beyond this range.
The intensity of the smoothing parameter modification – as described in Section
10.2.1 – is defined by the parameter c; its standard value is provided by formula (10.21). Its
increase sharpens the kernel estimator in the dense regions of set (10.44) and also smooths
it in the sparse areas; as a consequence, if this parameter value rises, then the number of
clusters in dense areas increases and simultaneously decreases in sparse regions. These
effects are reversed in the event of this parameter value diminishing. On the basis of the
performed research, the parameter c value can be proposed to be between 0 (indicating a
lack of modification) and 1.5. The validity of the obtained results needs individual verifica-
tion in the case of exceeding a value of 1.5. In particular one can suggest c = 1 as standard.
However, growth of the cluster number in dense data regions and at the same time
lowering or even eliminating clusters in sparse areas (as they frequently contain atypi-
cal elements appearing as a result of various errors) is frequently desired in practice.
Combining the aforementioned considerations, it is appropriate to propose increases to
both the change of the standard intensity of the smoothing parameter modification (10.21)
and simultaneously, the smoothing parameter h value calculated on the basis of the opti-
mization criterion, to the value h* given as
c − 0.5
3
h* =   h. (10.59)
 2
The combined effect of both of these factors implies a twofold smoothing of the estimator
f̂ in the areas in which set (10.44) is sparse. At the same time, the above factors virtually
cancel each other out in dense regions; hence, them having almost zero influence on the
discovery of clusters in such areas. On the basis of the conducted research, a change in the
parameter c value in the range of 0.5–1.0 can be executed; however, increases that exceed
1.0 require individual validation. In particular, the value c = 0.75 can be recommended in
such a case.

The above clustering procedure was comprehensively tested both for generated synthetic
sets and generally available benchmarks; it was also compared with other well-known
methods. Confirming the total supremacy of any of these remains difficult; however, the
concept proposed here enables greater opportunities to adjust specific data structures
and, as a consequence, the acquired results become – from a human point of view – more
justified. This property, along with possibilities of changes to standard values of param-
eters on the basis of clear and easily visualized interpretations, has been used actively in
three projects: in the field of engineering (fuzzy controller synthesis); within management
(a mobile phone operator’s marketing support strategy); and in the domain of bioinformat-
ics (grain categorization for seed production).
More details with visual aids are presented in the article by Kulczycki and
Charytanowicz (2010). Applications were synthetically presented in the paper by Kulczycki
et al. (2012), and also in more detail in the particular publications by Charytanowicz et al.
(2016), Kulczycki and Daniel (2009), and Łukasik et al. (2008).
10.5 Classification
Classification constitutes the third of the basic tasks of data analysis (Duda et al. 2001). In
the previously considered problems of atypical element identification (Section 10.3) and
clustering (Section 10.4), the subject of processing was only a data set (10.25) or (10.44),
respectively, with no other information, without additional prompts, supervision. These
problems are therefore typical unsupervised tasks. Meanwhile, the classification issue
involves a tested element being assigned to previously defined groups (classes) repre-
sented by patterns. This constitutes additional significant information, and thus the clas-
sification becomes a supervised task.
Such beneficial conditions of a classification task cause the available methodology to
be rich and very varied. The concept presented in the following sections will be based on
the Bayes approach. The classifiers gained in this way work quite well in complex real-
world situations and are eagerly used by practitioners, chiefly because of their robustness,
low requirements for patterns, and also their illustrativeness supporting individual modi-
fications. Especially, in the method proposed here, there is an opportunity to attribute
preferences to classes containing elements which – according to possible asymmetrical
task requirements – must especially not be incorrectly attributed to others. The parameters
of kernel estimators can be made more precise with the aim of successively improving
classification quality. Moreover, the application of the sensitivity method borrowed from
artificial neural networks allows the elimination of those pattern elements that have insig-
nificant or even a negative effect on the correctness of results. These last two procedures
will in turn be the basis for the creation of an effective adaptational structure, adjusting a
classifier to nonstationary data (so-called concept drift).
10.5.1 Bayes classification
Assume J sets containing elements from space R n:
x1′ , x2′ ,  , xm′ 1 (10.60)

x1′′, x2′′,  , xm′′2 (10.61)

x1"' , x2"' ,  , xm"J ' , (10.62)
representing assumed classes. The sizes m1, m2, …, mJ, need to be more or less proportional
to the “contribution” of specific classes within the investigated population. The aim of
classification is to map the tested element
x ∈ R n (10.63)
to one of the groups represented by patterns (10.60)−(10.62). Denote as fˆ1 , fˆ2 , , fˆJ kernel
estimators successively calculated on the basis of sets (10.60)−(10.62) treated as r andom sam-
ples (10.1) each time – the methodology used for this purpose is presented in Section 10.2.
According to the classic Bayes concept (Duda et al. 2001), the classified element (10.63)
needs then to be attributed to that class in which the value
m1 fˆ1 ( x ), m2 fˆ2 ( x ), … , mJ fˆJ ( x ) (10.64)
is the largest. By introducing the positive coefficients z1 , z2 , , z J, the above can be general-
ized to
z1m1 fˆ1 ( x ), z2 m2 fˆ2 ( x ), … , z J mJ fˆJ ( x ). (10.65)
Giving the values z1 = z2 =  = z J = 1, formula (10.65) is reducing to dependency (10.64).

By applying a suitable increase to the value zi, one introduces an inversely proportional
decrease to the probability of mistakenly assigning the ith class elements to an incorrect
class; however, the danger of slightly increasing the overall quantity of misclassifications
then theoretically appears. Bearing this in mind, it is possible to increase the values of even
a few coefficients zi.
10.5.2 Correction of values of smoothing parameter and modification intensity

It is often presented within literature on the subject that classic techniques for the smooth-
ing parameter value calculation – typically originating from the quadratic criterion – are
mostly inappropriate for the purposes of classification. Available publications practically
fail to provide a comprehensive solution to such problems, in particular for more than two
classes and in multidimensional cases. A procedure is proposed in this chapter, which
will suit the specific conditions pertaining to classification with the nonstationary patterns
considered here – this especially refers to a successive adaptation to the appearing changes.
Let us propose the introduction of n + 1 multiplicative correcting coefficients, relating
to the values of the intensity of modification procedure c and the smoothing parameter for
particular coordinates h1 , h2 ,  , hn , with respect to those obtained applying the integrated
square error criterion. These are denoted as b0 ≥ 0, b1 , b2 ,  , bn > 0, respectively. Note that
b0 = b1 =  = bn = 1 implies a lack of correction. Using, for a comprehensive search, a grid
with a rather sizable discretization value, one may find the most beneficial points within
the context of correct classification. The last stage is a classic optimization algorithm in the
(n + 1)-dimensional space, with starting conditions being the points found above, whereas
the performance index is
J (b0 , b1 ,  , bn ) = # {incorrect classifications}, (10.66)
where # means the number of elements within a set. The classic leave-one-out method can
be applied for the calculation of the above functional value for any fixed argument. Due
to this value being an integer, the modified Hook–Jeeves algorithm (Kelley 1999) was used
to find a minimum. Alternate conceptions are described in the survey paper (Venter 2010).
As a result of performed research, the assumption can be made that for every coor-
dinate, the grid should usually have nodes at the points 0.25, 0.5, …, 1.75. The functional
(10.66) values are calculated for these nodes; the attained results are then sorted and the
five best become starting conditions for the Hook–Jeeves procedure, in which the initial
step value is proposed as 0.2. Following completion of each of the above five executions,
the values of functional (10.66) for the obtained end points are calculated, and that which
has the smallest value is the sought-after vector of the parameters b0 , b1 , b2 ,  , bn.
It is worthy of note that in this procedure it is not necessary to correct the c lassification
parameters; however, doing so would enhance the classification quality and, moreover,
would allow applying an easy and convenient formula (10.7) to calculate smoothing
parameter values.
10.5.3 Reduction to pattern sizes

In reality, it is possible that some elements of sets (10.60)−(10.62) representing patterns of
classes have irrelevant or even negative influence over classification quality. Therefore,
their removal may result in both a reduction in incorrect results and also a decrease in
calculation times. To achieve the objectives of this procedure, the generalization of the
kernel estimator definition given as follows will consist in the addition of the nonnegative
coefficients w1 , w2 ,  , wm , normed by
m
∑ w = m, (10.67)
i=1
i
subsequently assigned to specific random sample (10.1) elements. Then the initial definition
for kernel estimator (10.2) becomes
m
fˆ (x) =
1
mhn ∑ w K  x −h x  . (10.68)
i=1
i
i
Formulas (10.2), (10.5), (10.19), (10.20), and (10.23), (10.24) can be changed analogously. The
coefficients wi describe the weight (significance) of the ith pattern element with respect to
classification quality. It should be noted that if wi ≡ 1, definition (10.68) is then reduced to
basic form (10.2).
The procedure for reducing pattern sets (10.60)−(10.62) consists of two stages. The first
consists of the weights wi calculation; the second is the removal of such random sample
elements which have the lowest respective weights. To realize the former of these two
stages, separate neural networks can be built for each class. For simplicity of the forth-
coming notations, let the index j = 1, 2,  , J which characterizes specific classes, be fixed.
The constructed network has three layers and is unidirectional: with m inputs which are
related to subsequent elements of the pattern; a hidden layer of a size equal to the integral
of the number m; with one output neuron. This network learns through the use of a data
set consisting of values of specific kernels for consecutive pattern elements, while the out-
put is the kernel estimator value for the considered pattern element. Network learning is
achieved through backward propagation of errors, with a momentum factor. At the com-
pletion of the procedure, the network is subjected to an analysis of sensitivity on learning
data; for details see the book by Zurada (1992). The essence of this method constitutes the
establishment – after network learning – of the influence of the subsequent inputs ui on the
output y; this is represented by the real coefficients
∂ y( x1 , x2 ,  , xm )
Si = for i = 1, 2,  , m. (10.69)
∂ xi
Define the coefficients Si( p ), which aggregate information originating from consecutive
iterations of the previous stage (with p = 1, 2,  , P ) and characterize the sensitivity of
successive learning data. This results in the coefficient Si defined as
∑ (S
p=1
( p) 2
i )
Si = for i = 1, 2, , m, (10.70)
P
which will be used to calculate the coefficients wi. Thus, first let
 
 
 Si 
wi =  1 −
m  for i = 1, 2,… , m, (10.71)

 ∑j=1
Sj 

and then norm to

w i
wi = m m for i = 1, 2,… , m, (10.72)
∑ w
i=1
i
in order to guarantee condition (10.67). The form of definition (10.71) is due to the net-
work created here being the most sensitive to redundant and atypical elements, which
suggests – as a consequence of the kernel estimator (10.68) form – a requirement to assign
to them the suitably smaller values w i, and consequently, wi; these coefficients characterize
the significance of specific pattern elements with respect to the classification quality.
The natural requirement that those elements for which wi < 1 needed to be removed
from the pattern has been confirmed by the performed research [observe that the mean
value of coefficients wi equals 1 due to normalization being introduced by formula (10.72)].
Increases to this value resulted in a significant drop in classification accuracy because of
losses of nonredundant, valuable information carried in the pattern. Conversely, decreases
of such a threshold led to a substantial fall in the reduction of pattern size; however,
its effect over classification accuracy was barely noticeable in the proximity of value 1,
although a significant reduction results in a significant increase in the number of errors.
10.5.4 Structure for nonstationary patterns (concept drift)

The material of previous Sections 10.5.2 and 10.5.3 will also be used to create the
classification structure for the nonstationary case, that is when all or some patterns of
classes undergo significant changes, considering the investigated task. A block diagram
of the calculation procedure is presented in Figure 10.1. Blocks symbolizing operations
performed on all elements of patterns (10.60)–(10.62) are jointly drawn with a continuous
line; a dashed line denotes operations on particular classes, while a dotted line is used for
separate operations for each element of those patterns.
First, one should fix the reference sizes of patterns (10.60)–(10.62), hereinafter denoted
by m1* , m2* , , m*J. The patterns of these sizes will be the subject of a basic reduction pro-
cedure, described in Section 10.5.3. The sizes of patterns available at the beginning of
the algorithm must not be smaller than the above referential values. These values can,
however, be modified during the procedure’s operation, with the natural condition that
their potential growth does not increase the number of elements newly provided for the
patterns. For preliminary research, m1* = m2* =  = m*J = 25 ⋅ 2 n can be proposed. Lowering
Initial
patterns
New pattern elements

A. Calculation of
smoothing h1, h2,..., hn
and modification
s1, s2,..., sm parameters
B. Calculation of
correction coefficients
b0, b1,... bn
C. Calculation of
weights
w1, w2,..., wm
D. Sorting of weights
w1, w2,..., wm
E. Reduction of patterns
m1*, m2*,..., mj*

elements with Remaining
greatest weights elements
w1, w2,..., wm
F. Calculation U. Calculation of
of weights derivatives
w1, w2,..., wm w1’, w2’,..., wm’
V. Sorting of derivatives
w1’, w2’,..., wm’
Elements with
G. Reduction of patterns
positive wi’
W. Removal not more than
wi ≥ 1 wi < 1 of pattern qm1*, qm2*,..., qmj*
elements
Remaining elements
Z. Removal of elements
H. Bayes classification
Figure 10.1 Block diagram for classification algorithm.

these values may worsen the classification quality, whereas an increase results in an
e xcessive calculation time.
The elements of initial patterns (10.60)–(10.62) are provided as introductory data. Based
on these – according to the procedures presented in Section 10.2 – the value of the param-
eter h is calculated (for the parameter c it is given by formula (10.21)). Figure 10.1 shows this
action in block A. Next, corrections in the parameters h and c values are made by taking
the coefficients b0 , b1 ,  , bn, as described in Section 10.5.2 (block B in Figure 10.1).
The subsequent procedure, shown by block C, is the calculation of the parameters
wi values mapped to particular elements of patterns, separately for each class, as in
Section 10.5.3. Following this, within each class, the values of the parameter wi are sorted
(block D), and then – in block E – the appropriate m1* , m2* ,  , m*J elements of the largest
values wi are designated to the classification phase itself. The remaining undergo further
treatment, denoted in block U, which is presented in the following sections, after Bayes
classification has been dealt with.
The reduced patterns separately go through a procedure newly calculating the values
of parameters wi, presented in Section 10.5.3 and depicted in block F. In turn, as block G
in Figure 10.1 denotes, these pattern elements for which wi ≥ 1 are submitted to further
stages of the classification procedure, while those with wi < 1 are sent to block A for further
processing in the next steps of the algorithm after adding new elements of patterns. The
final, and also the principal part of the procedure worked out here is Bayes classification,
presented at the beginning of this section and marked by block H. Obviously, many tested
elements (10.64) can be subjected to classification separately. After the procedure has been
finished, elements of patterns which have undergone classification are sent to the begin-
ning of the algorithm to block A, for further processing in the next steps, following the
addition of new elements of patterns.
Now – as mentioned two paragraphs earlier, in the last sentence – it remains to con-
sider those pattern elements, whose values wi were not counted among the m1* , m2* ,  , m*J
largest for particular patterns. Thus, within block U, the derivative wi′ is calculated* for
each of them. If the element is “too new” and does not possess the k − 1 previous values wi,
then the gaps are filled with zeros (because the values wi generally oscillate around unity,
such behavior significantly increases the derivative value, and in consequence, ensures
against premature elimination of this element). Next, for each separate class, the elements
wi′ are sorted (block V). As marked in block W, the respective
qm1* , qm2* ,  , qm*J for i = 1, 2,  , m, (10.73)
elements of each pattern with the largest derivative values, on the additional requirement
that the value is positive, go back to block A for further calculations carried out after the
addition of new elements. If the number of elements with positive derivative is less than
qm1* , qm2* ,  , qm*J, then the number of elements returning may be smaller (including even
zero). The remaining elements are permanently eliminated from the procedure, as shown
in block Z. In the above notation, q is a positive constant influencing the proportion of
* As the task considered here does not require the differences between subsequent values t1 , t2 , ..., tk to be equal,
it is therefore advantageous to apply interpolation methods. In the procedure worked out here, favorable
results were achieved using a classic method based on Newton's interpolation polynomial. Detailed formu-
las, as well as a treatment of other related concepts are found in the survey paper (Venter 2010). A backward
derivative, after taking into consideration the last three values, can be assumed as standard, i.e., a useful com-
promise between stability of results and possibility to react to changes (the derivative has then two degrees of
freedom).
patterns’ elements with little, but successively increasing meaning. As a standard value
q = 0.2 is proposed, or more generally q ∈[0.1, 0.25] depending on the size/speed of changes.
An increase in this parameter value allows more effective conforming to pattern changes,
although this potentially increases the calculation time, while lowering it may signifi-
cantly worsen adaptation. In the general case, this parameter can be different for particu-
lar patterns – then formula (10.73) takes the form q1m1* , q2 m2* ,  , q J m*J, where q1 , q2 ,  , q J are
positive.
The above procedure is repeated following the addition of new elements (block A
in Figure 10.1). Besides these elements – as has been mentioned earlier – for particular
patterns m1* , m2* ,  , m*J elements of the greatest values wi are taken, respectively, as well
as up to qm1* , qm*2 ,  , qm*J (or in the generalized case q1m1* , q2 m*2 ,  , q J m*J ) elements of the
greatest derivative wi′, so successively increasing its significance, most often due to the
nonstationarity of patterns.

The properties and functioning of the concepts presented in this chapter have been
checked numerically in detail and subsequently confirmed. First, consider the influence of
the coefficients z1 , z2 ,  , z J , introduced by formula (10.65). Above various natural costs of
individual classification errors, in the case of concept drift considered in Section 10.5.4, the
nonstationary patterns are more difficult conditioned in a natural manner, which indicates
the need to increase the value of the coefficients zi proportional to the changes of speed
to particular classes. A value of 1.25 is initially suggested. The correction of the modifica-
tion intensity and smoothing parameter values, described in Section 10.5.3 caused a 10%–
20% decrease in the number of classification errors. Moreover, following the application
of the reducing pattern procedure, as discussed in Section 10.5.4, there was a decrease in
the number of misclassifications by an additional 10%–20%, with a concurrent reduction in
the pattern sizes by around 40%. The joint occurrence of these effects is worthy of particu-
lar note; at a same time as the reduction of pattern sizes – implying a significant calculation
speed increase – there is also an enhanced quality of classification. Finally, the nonstation-
ary structure described in Section 10.5.5 was also intensively examined for cases where
changes successively grew, occurred in a stepped manner, and were periodical. This was
proven to be especially beneficial for such consecutive growth.
The broader description of the stationary case, also including classification of interval
information, can be found in the article by Kulczycki and Kowalski (2011). The nonstation-
ary structure was presented in the paper by Kulczycki and Kowalski (2015a), and for the
case interval information, in the publication by Kulczycki and Kowalski (2015b). The task
to submit the result in fuzzy and intuitionistic fuzzy forms is being currently investigated.
10.6 Example practical application and final comments

The three procedures presented in this chapter were successfully applied to the creation
of the methodologically uniform concept of a mobile phone operator’s marketing support
strategy.
Due to the high dynamic growth which is prevalent within the market of mobile
phone networks, there is a natural necessity for such companies to permanently focus their
strategies with the aim of satisfying their clients’ different requirements, while simultane-
ously maximizing income. However, the natural difficulty in exercising control over such
activity can give rise to a lack of coherence with regard to serving particular clients, thus
resulting in their possible defection to competitors. In order to counter this, it is necessary

to ensure that services remain uniform for comparable clients. The results of investiga-
tions concerning long-term business clients conducted on the network of a Polish mobile
phone operator are presented in the following text.
There exists an enormous range of characteristic quantities that can be applied in
practice to describe specific subscribers. These are primarily mean monthly income per
SIM card, subscription length, number of active SIM cards, and possibly others, appropri-
ately adapted to the current market specifics. Therefore, each m-element within a database
characterizes successive business clients, and comprises n their features (attributes) easily
available for an operator. For the previous example we have
 xi ,1 
 
 xi ,2 
xi =  xi ,3  for i = 1, 2,… , m, (10.74)
 
 
 xi , n 
 
where xi,1 denotes the average monthly income per SIM card of the ith client, xi,2 is its length
of subscription, xi,3 is the number of active SIM cards, and possibly others xi , 4 , xi ,5 ,  ,xi , n in
accordance with the current market situation.
Firstly, atypical elements within set (10.74) were removed, according to the proce-
dure presented in Section 10.3 (with r = 0.1). The regularity of the data structure was thus
enhanced; it is worthy of note that this was achieved through the cancellation of only
elements that had only negligible importance for the further results of the investigated
procedure.
Secondly, the data set was submitted to clustering by the procedure described in
Section 10.4. The consequence was the partitioning of the data set which consisted of
particular clients, into separate groups each composed of similar members. The results,
achieved for ordinary values of the modification intensity c and smoothing parameters
h, showed too great a number of small-sized clusters lying in low density areas of data –
mostly containing irrelevant, unusual clients – and an excessively large main cluster
containing more than half of all elements. Taking into account the properties of the used
algorithm, this value was raised to c = 1. As a consequence, the desired effects – the
significant lowering of the number of “peripheral” clusters as well as the splitting of
the main cluster – were thus attained. The number of clusters was then satisfactory and
changes to the smoothing parameters h value became redundant. At this point, the data
set comprising 1639 elements was partitioned into 26 clusters with sizes 488, 413, 247,
128, 54, 41, 34, 34, 33, 28, 26, 21, 20, 14, 13, 12, 10, two containing four elements, three of
three elements, two of two elements, and two of one element. Note that four groups can
be clearly distinguished – the first includes two large clusters of 488 and 413 elements,
the following contains two medium clusters with 247 and 128 elements, the next nine are
small and have 20–54, and finally there are 13 each with fewer than 20 elements. It is now
appropriate to eliminate the last of these clusters, although, those including key or pres-
tige clients (14, 13, 12, and 10 elements) were excluded from removal. Finally, for further
analysis, 17 clusters remained.
Then, in the case of each of the clusters found in this manner, an optimal scheme –
with regard to anticipated operator profit – was defined for the treatment of subscribers
belonging to this group. Elements of preference theory (Fodor and Rubens 1994) and fuzzy
logic were applied due to the usually imprecise nature of expert evaluation of such prob-
lems; however, details of this operation lie beyond the remit of this chapter – details can be
found in the publication by Kulczycki and Daniel (2009).
It is worthy of note that, of the above calculations, none must be performed dur-
ing client negotiations; instead they should merely be updated (every 1–6 months in
practice).
The client with whom negotiations are conducted can be characterized through the use
of – in accordance with formula (10.74) – an n-dimensional vector the specific coordinates of
which represent the respective features of this client. Such data can be obtained from the
operator database archive – if the client has previously been the subscriber – or alternatively,
from historic invoices issued by a rival network should they be attempting to poach a client.
Attributing the client to an appropriate subscriber group during negotiations – according to
clusters defined earlier – was performed by applying Bayes classification presented in Section
10.5. Because the marketing strategies regarding specific clusters have been previously estab-
lished, the above action completes the procedure of supporting the marketing strategy with
regard to business clients, which was the objective of the project presented above.
The comments summarizing the above application example are symptomatic and can
usefully be treated as recapitulation of all the material presented in this chapter.
Thus, use in the above concept of a marketing support strategy for a mobile
phone operator methodologically uniform apparatus of kernel estimators makes the
analysis and creation of a useful computer application significantly easy. In turn, its
nonparametric character freed the concept from difficult to foresee – often nonstandard –
distribution of data appearing in contemporary complex tasks. In particular, there are
no restrictions on the shape of their grouping, and even in the number of separate
parts that are divided. The values of each (excluding the easy-to-interpret participa-
tion of atypical elements r) parameter are set on the basis of optimization criteria, after
which, they can be appropriately matched to individual preferences. In this text all
the necessary formulas are given, apart from standard procedures used in Sections
10.5.2–10.5.4.
Currently, the fundamental challenge of kernel estimators is large sets of high-
dimensional data. Thanks to averaging the properties of this type of estimator, quite
satisfactory results can be obtained for the first of these aspects even by natural sam-
pling of data set elements. With its fixed size m it is worth using classic random sam-
pling (Vitter 1985), and in the case of streaming data – the algorithm presented in the
paper by Aggarwal (2006). For multidimensionality one can apply classic reduction
using the statistical method PCA (Jolliffe 2001) or a refined approach based on calcu-
lated intelligence (Kulczycki and Łukasik 2014). More sophisticated methods, also with
the presence of categorical features, are currently the subject of intensive research by
the author and his team.
Acknowledgments
The work was supported in parts by the Systems Research Institute of the Polish Academy
of Sciences in Warsaw, and the Faculty of Physics and Applied Computer Science of the
AGH University of Science and Technology in Cracow, Poland.
I thank my close associates – former Ph.D.-students – Małgorzata Charytanowicz, D.Sc.,
Ph.D., Karina Daniel, Ph.D., Piotr A. Kowalski, Ph.D., Damian Kruszewski, Ph.D., Szymon
Łukasik, Ph.D., coauthors of the common publications (Kulczycki and Charytanowicz
2010; Kulczycki et al 2012; Kulczycki and Daniel 2009; Kulczycki and Kowalski 2011, 2015a,
2015b; Kulczycki and Kruszewski 2017a, 2017b; Kulczycki and Łukasik 2014). With their
consent, this text also contains results of our joint research.
References
Aggarwal C.C., Outlier Analysis. Springer, New York, 2013.
Aggarwal C.C., On biased reservoir sampling in the presence of stream evolution. In Proceedings
of the 32nd International Conference on Very Large Data Bases, Seoul, 12–15 September 2006, U.
Dayal, K.-Y. Whang, D.B. Lomet, G. Alonso, G.M. Lohman, M.L. Kersten, S.K. Cha, Y.-K. Kim
(eds.), VLDB Endowment, 2006.
Canaan C., Garai M.S., Daya M., Popular sorting algorithms. World Applied Programming, vol. 1,
pp. 62–71, 2011.
Charytanowicz M., Niewczas J., Kulczycki P., Kowalski P.A., Lukasik S., Discrimination of wheat
grain varieties using X-ray images. In: Information Technologies in Medicine, Pietka E., Badura P.,
Kawa J., Wieclawek W. (eds.), Springer, Berlin, 2016, pp. 39–50.
Duda R.O., Hart P.E., Storck D.G., Pattern Classification. Wiley, New York, 2001.
Everitt B.S., Landau S., Leese M., Stahl D., Cluster Analysis. Wiley, New York, 2011.
Fodor J., Roubens M., Fuzzy Preference Modelling and Multicriteria Decision Support. Kluwer, Dordrecht,
1994.
Fukunaga K., Hostetler L.D., The estimation of the gradient of a density function, with applications
in pattern recognition. IEEE Transactions on Information Theory, vol. 21, pp. 32–40, 1975.
Gentle J.E., Random Number Generation and Monte Carlo Methods. Springer, New York, 2003.
Jolliffe I.T., Principal Component Analysis. Springer, New York, 2001.
Kaufman L., Rousseeuw P.J., Finding Groups in Data: An Introduction to Cluster Analysis. Wiley,
New York, 1990.
Kelley C.T., Iterative Methods for Optimization. SIAM, Philadelphia, 1999.
Kinkaid, D., Cheney, W., Numerical Analysis. Brooks/Cole, Pacific Grove, 2002.
Kulczycki P., Wykrywanie uszkodzeń w systemach zautomatyzowanych metodami statystycznymi. Alfa,
Warsaw, 1998.
Kulczycki P., Estymatory jądrowe w analizie systemowej. WNT, Warsaw, 2005.
Kulczycki P., Charytanowicz M., A complete gradient clustering algorithm formed with kernel
estimators. International Journal of Applied Mathematics and Computer Science, vol. 20, pp. 123–
134, 2010.
Kulczycki P., Charytanowicz M., Kowalski P.A., Łukasik S., The complete gradient clustering
algorithm: Properties in practical applications. Journal of Applied Statistics, vol. 39, pp. 1211–1224,
2012.
Kulczycki P., Daniel K., Metoda wspomagania strategii marketingowej operatora telefonii komór
kowej. Przegląd Statystyczny, vol. 56, no. 2, pp. 116–134, 2009; Errata: vol. 56, no. 3–4, p. 3, 2009.
Kulczycki P., Kowalski P.A., Bayes classification of imprecise information of interval type. Control
and Cybernetics, vol. 40, pp. 101–123, 2011.
Kulczycki P., Kowalski P.A., Bayes classification for nonstationary patterns. International Journal of
Computational Methods, vol. 12, ID 1550008 (19 pages), 2015a.
Kulczycki P., Kowalski P.A., Classification of interval information with data drift. In: Modeling and
Using Context, Christiansen H., Stojanovic I., Papadopoulos G.A. (eds.), Springer, Berlin, 2015b,
pp. 495–500.
Kulczycki P., Kruszewski D., Detection of atypical elements with fuzzy and intuitionistic fuzzy
evaluations. In: Trends in Advanced Intelligent Control, Optimization and Automation, Mitkowski
W., Kacprzyk J., Oprzedkiewicz K., Skruch P. (eds.), Springer, Cham, 2017a, pp. 774–786.
Kulczycki P., Kruszewski D., Identification of atypical elements by transforming task to supervised
form with fuzzy and intuitionistic fuzzy evaluations. Applied Soft Computing, vol. 60, no. 11,
pp. 623–633, 2017b.
Kulczycki P., Łukasik S., An algorithm for reducing dimension and size of sample for data explo-
ration procedures. International Journal of Applied Mathematics and Computer Science, vol. 24,
pp. 133–149, 2014.
Łukasik S., Kowalski P.A., Charytanowicz M., Kulczycki P., Fuzzy models synthesis with
kernel-density-based clustering algorithm. In: Fifth International Conference on Fuzzy Systems
and Knowledge Discovery, J. Ma, Y. Yin, J. Yu, S. Zhou (eds.), IEEE Computer Society, Los
Alamitos, vol. 3, pp. 449–453, 2008.
Parrish R., Comparison of quantile estimators in normal sampling. Biometrics, vol. 46, pp. 247–257,
1990.
Silverman, B.W., Density Estimation for Statistics and Data Analysis. Chapman and Hall, London, 1986.
Venter G., Review of optimization techniques. In: Encyclopedia of Aerospace Engineering, Blockley R.,
Shyy W. (eds.), Wiley, New York, 2010, pp. 5229–5238.
Vitter J.S., Random sampling with reservoir. ACM Transactions on Mathematical Software, vol. 11,
pp. 37–57, 1985.
Wand M., Jones M., Kernel Smoothing. Chapman and Hall, London, 1995.
Xu R., Wunsch D., Clustering. Wiley, New York, 2009.
Zurada J., Introduction to Artificial Neural Neural Systems. West Publishing, St. Paul, 1992.
chapter eleven
A new technique for constructing exact

tolerance limits on future outcomes
under parametric uncertainty
N.A. Nechval
University of Latvia
K.N. Nechval
Transport and Telecommunication Institute
G. Berzins
University of Latvia
Contents
11.1 Introduction......................................................................................................................... 203
11.2 Two-parameter Weibull distribution............................................................................... 207
11.3 Lower statistical γ-content tolerance limit with expected (1 − α)-confidence............ 211
11.4 Upper statistical γ-content tolerance limit with expected (1 − α)-confidence............ 214
11.5 Lower statistical (1 − α)-expectation tolerance limit...................................................... 217
11.6 Upper statistical (1 − α)-expectation tolerance limit...................................................... 219
11.7 Numerical example 1.........................................................................................................222
11.8 Numerical example 2......................................................................................................... 223
11.9 Conclusion...........................................................................................................................225
References......................................................................................................................................225
11.1 Introduction
Statistical tolerance (prediction) limits are another tool for making statistical inference
on an unknown population. As opposed to a confidence limit that provides information
concerning an unknown population parameter, a tolerance limit provides information on
the entire population. In this chapter, two types of statistical tolerance limits are defined:
(1) γ-content tolerance limit with expected (1 − α)-confidence, and (2) (1 − α)-expectation
tolerance limit.
To be specific, let γ denote a proportion between 0 and 1. Then one-sided γ-content
tolerance limit with expected (1 − α)-confidence is determined to capture a proportion γ
or more of the population, with a given expected confidence level 1 − α. For example, an
upper γ-content tolerance limit with expected (1 − α)-confidence for a univariate popula-
tion is such that with the given expected confidence level 1 − α, a specified proportion γ
or more of the population will fall below the limit. A lower γ-content tolerance limit with
expected (1 − α)-confidence satisfies similar conditions.
203
An upper (1 − α)-expectation tolerance limit is determined so that the expected pro-

portion of the population falling below the limit is (1 − α). A lower (1 − α)-expectation toler-
ance limit satisfies similar conditions.
It is often desirable to have statistical tolerance limits available for the distributions
used to describe time-to-failure data in reliability problems. For example, one might wish
to know if at least a certain proportion, say γ, of a manufactured product will operate at
least T hours. This question cannot usually be answered exactly, but it may be possible
to determine a lower statistical tolerance limit L(X1, …, Xn), based on a preliminary ran-
dom sample (X1, …, Xn), such that one can say with a certain expected confidence (1 − α)
that at least 100γ% of the product will operate longer than L(X1, …, Xn). Then reliability
statements can be made based on L(X1, …, Xn), or, decisions can be reached by comparing
L(X1, …, Xn) to T. Statistical tolerance limits of the types mentioned above are considered
in this chapter.
The problem can be stated more formally as follows. Let X1, …, Xn represent an
experimental random sample from a distribution with a probability density function fθ(x)
(distribution function Fθ(x), survival function Fθ ( x) = 1 − Fθ ( x)) and S be any statistic (say,
sufficient statistic or maximum likelihood estimator) obtained from the experimental
random sample X1, …, Xn, and let a random variable Y (in a future random sample Y1, …,
Ym) have the same distribution with the probability density function fθ(y) (distribution
function Fθ(y), survival function Fθ ( y ) = 1 − Fθ ( y )), where a parameter θ (in general, vector)
is common to both distributions and it is assumed that some or all numerical values of
components of the parametric vector θ are unspecified. On the basis of the experimental
random sample X1, …, Xn we wish to make a prediction about a future outcome of Yk (kth-
order statistic, 1 ≤ k ≤ m, in a future random sample of m-ordered observations Y1 ≤ … ≤
Ym), usually in the form of one-sided statistical tolerance limits on future outcomes of
Yk (lower γ-content tolerance limit Lk with expected [1 − α]-confidence [for a specified
proportion γ] and upper γ-content tolerance limit Uk with expected [1 − α]-confidence
[for a specified proportion γ]). That is, if Lk and Uk are functions of S, then Lk(S) is a lower
γ-content tolerance limit with expected (1 − α)-confidence on future outcomes of the
kth-order statistic Yk if
  ∞ 


Eθ Pr 
 ∫
  Lk (S)

{ (
gθ ( y k ) dy k ≥ γ   = Eθ Pr Gθ (Lk (S)) ≥ γ

)} = 1 − α , (11.1)
  
and Uk(S) is an upper γ-content tolerance limit with expected (1 − α)-confidence on future
outcomes of the kth-order statistic Yk if
  U k ( S) 
 
Eθ Pr 
∫ { }
gθ ( y k ) dy k ≥ γ   = Eθ Pr (Gθ (U k (S)) ≥ γ ) = 1 − α , (11.2)
  0  

where
1
gθ ( y k ) = [ Fθ ( y k )]k − 1[1 − Fθ ( y k )m − k fθ ( y k ) (11.3)
B( k , m − k + 1)
Chapter eleven: A new technique for constructing exact tolerance limits 205
is a probability density function of the kth-order statistic Yk,
1
Γ( a)Γ(b)

∫
B( a, b) = t a − 1 (1 − t)b − 1 dt =
0
Γ ( a + b)
(11.4)
∞
is the beta-function, and Γ( a) =
∫0
t a − 1e − t dt is the gamma function,
m
 m 
Gθ ( y k ) = Pr(Yk ≤ y k ) = ∑ 
i= k
i  [ Fθ ( y k )] [1 − Fθ ( y k )]
i m− i
m
 m 
m
 m 
i
 i 
∑ ∑ ∑ 
i m− i m− i + j
=  i  1 − Fθ ( y k )   Fθ ( y k )  =  i  
j 
(−1) j  Fθ ( y k ) 
i= k i= k j=0
Fθ ( y k )
=
∫0
ϕ (t|k , m − k + 1) dt (11.5)
is a probability distribution function of the kth-order statistic Yk,
1
ϕ (t|a, b) = t a − 1 (1 − t)b − 1 , t ∈(0,1), (11.6)
Β( a, b)
is a probability density function of the beta-distribution with the shape parameters a and b,
k −1
 m 
Gθ ( y k ) = 1 − Gθ ( y k ) = Pr(Yk > y k ) = ∑ 
i=0
i  [ Fθ ( y k )] [1 − Fθ ( y k )]
i m− i
k −1 k −1
 m   m 
i
 i 
∑ ∑ ∑ 
i m− i m− i + j
=  i  1 − Fθ ( y k )   Fθ (y k )  =  i  j 
(−1) j  Fθ ( y k ) 
i=0 i=0 j=0
=
∫
Fθ ( y k )
ϕ (t|k , m − k + 1) dt. (11.7)
It can be shown that
dGθ ( y k )
= gθ ( y k ). (11.8)
dy k
Indeed, taking into account (11.5), we have

m
 m 
dGθ ( y k )
dy k
=
d
dy k ∑ i= k
i  [ Fθ ( y k )] [1 − Fθ ( y k )]
i m− i
m
 m 
= ∑ 
i= k
i 
(
i[ Fθ ( y k )]i − 1[1 − Fθ ( y k )]m − i Fθ′( y k ) − (m − i)[ Fθ ( y k )]i [1 − Fθ ( y k )]m − i − 1 Fθ′( y k ) )
m
= ∑ (i − 1)!(mm! − i)![F (y )]
i= k
θ k
i−1
[1 − Fθ ( y k )]m − i fθ ( y k )
m− 1
− ∑ i!(m −mi! − 1)![F (y )] [1 − F (y )]

i= k
θ k
i
θ k
m− i − 1
fθ ( y k )
= ∑ (i − 1)!(mm! − i)![F (y )]
i= k
θ k
i−1
[1 − Fθ ( y k )]m − i fθ ( y k )
− ∑ ( j − 1)!(mm! − j)![F (y )]
i= k +1
θ k
j−1
[1 − Fθ ( y k )]m − j fθ ( y k )
m!
= [ Fθ ( y k )]k − 1[1 − Fθ ( y k )]m − k fθ ( y k ) = gθ ( y k ), (11.9)
( k − 1)!(m − k )!
where j = i + 1;
Fθ ( y k ) Fθ ( y k )
dGθ ( y k ) d d 1
dy k
=
dy k ∫ ϕ (t|k , m − k + 1) dt =
dy k ∫ Β( k , m − k + 1)
t k − 1 (1 − t)m − k dt
0 0 (11.10)
1
= [ Fθ ( y k )]k − 1[1 − Fθ ( y k )]m − k Fθ′ ( y k ) = gθ ( y k ) .
Β( k , m − k + 1)
The problem considered in this chapter is to find Lk (S) (lower statistical γ-content toler-
ance limit Lk with expected [1 − α]-confidence on future outcomes of the kth-order statis-
tic Yk) satisfying (11.1) and Uk(S) (upper statistical γ-content tolerance limit with expected
[1 − α]-confidence on future outcomes of the kth-order statistic Yk) satisfying (11.2) on the
basis of the experimental random sample X1, …, Xn when some or all numerical values of
components of the parametric vector θ are unspecified.
Thus, the logical purpose for a tolerance limit must be the prediction of future out-
comes for some production process. The coverage value γ is the percentage of the future
process outcomes to be captured by the prediction, and the confidence level (1 − α) is the
proportion of the time we hope to capture that percentage γ.
The common distributions used in life testing problems are the normal, exponential,
Weibull, and gamma distributions (Mendenhall 1958). Tolerance limits for the normal dis-
tribution have been considered in (Guttman 1957; Wald and Wolfowitz 1946; Wallis 1951),
and others.
Tolerance (prediction) limits enjoy a fairly rich history in the literature and have a
very important role in engineering and manufacturing applications. Patel (1986) provides
a review (which was fairly comprehensive at the time of publication) of tolerance intervals
(limits) for many distributions as well as a discussion of their relation with confidence
intervals (limits) for percentiles. Dunsmore (1978) and Guenther, Patil, and Uppuluri (1976)
discuss two-parameter exponential tolerance intervals (limits) and the estimation proce-
dure in greater detail. Engelhardt and Bain (1978) discuss how to modify the formulas
when dealing with type II censored data. Guenther (1972) and Hahn and Meeker (1991)
discuss how one-sided tolerance limits can be used to obtain approximate two-sided tol-
erance intervals by applying Bonferroni’s inequality. In Nechval et al. (2011, 2016a–c), the
exact statistical tolerance and prediction limits are discussed under parametric uncer-
tainty of underlying models.
In contrast to other statistical limits commonly used for statistical inference, the toler-
ance limits (especially for the order statistics) are used relatively rarely. One reason is that
the theoretical concept and computational complexity of the tolerance limits is signifi-
cantly more difficult than that of the standard confidence and prediction limits. Thus, it
becomes necessary to use the innovative approaches that will allow one to construct toler-
ance limits on future order statistics for many populations.
In this chapter, new approaches to constructing lower and upper statistical γ-content
tolerance limits with expected (1 − α)-confidence as well as (1 − α)-expectation tolerance
limits on order statistics in future samples are proposed. For illustration, a two-parameter
Weibull distribution is considered.
11.2 Two-parameter Weibull distribution

In this chapter, the two-parameter Weibull distribution with the pdf (probability density
function),
δ −1
δ  x   x δ 
fθ ( x) =   exp  −    , x > 0, β > 0, δ > 0, (11.11)
ββ   β  
and cdf (cumulative distribution function),
  x δ 
Fθ ( x) = 1 − exp  −    , x > 0, β > 0, δ > 0, (11.12)
  β  
indexed by scale and shape parameters β and δ is used as the underlying distribution of a
random variable X in a sample of the lifetime data, where θ = (β, δ).
The Weibull distribution is widely used in reliability and survival analysis due to its
flexible shape and ability to model a wide range of failure rates. It can be derived theo-
retically as a form of extreme value distribution, governing the time to occurrence of the
“weakest link” of many competing failure processes. Its special case with shape parameter
δ = 2 is the Rayleigh distribution, which is commonly used for modeling the magnitude of
radial error when x and y coordinate errors are independent normal variables with zero
mean and the same standard deviation while the case δ = 1 corresponds to the widely
used exponential distribution. For illustration, probability density functions of the two-
parameter Weibull distribution for selected values of β and δ are shown in Figure 11.1.
Let X follow a Weibull distribution with scale parameter β and shape parameter δ. We
consider both parameters β, δ to be unknown. Let (X1, …, Xn) be a random sample from
1.7
1.5
β = 0.5, δ = 2
1.3
1.1
0.9 β = 1.0, δ = 2
0.7 β = 1.5, δ = 3
0.5
β = 3.0, δ = 4
0.3
0.1
0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Figure 11.1 The Weibull probability density functions for selected values of β and δ.

the two-parameter Weibull distribution (11.11), and let β , δ be the maximum likelihood
estimates of β, δ, respectively, computed on the basis of (X1, …, Xn):

1/δ
  n

∑x

δ
β = i n , (11.13)
 i=1 
and
−1 −1
  n
 n  n 
∑ ∑ 1
∑

δ =  xδi ln xi   xδi  − ln xi  . (11.14)
   n 
 i=1 i=1 i=1 
In terms of the Weibull variates, we have that

 δ
β δ δ
V1 =   , V2 =  , V2 =  , (11.15)
β δ δ
are pivotal quantities. Furthermore, let

 
Zi = (X i / β )δ , i = 1, … , n. (11.16)
It is readily verified that any n − 2 of the Zi ’s, say Z1, …, Zn−2 form a set of n − 2 functionally
independent ancillary statistics. The appropriate conditional approach is to consider the
distributions of V1, V2, V3 conditional on the observed value of Z(n) = (Z1, …, Zn). (For pur-
poses of symmetry of notation we include all of Z1, …, Zn in expressions stated here; it can
be shown that Zn, Zn−1, can be determined as functions of Z1, …, Zn−2 only.)
Theorem 1 (Joint pdf of the pivotal quantities V1, V2 from the two-parameter Weibull
distribution). Let (X1, ..., Xn) be a random sample of n observations from the two-
parameter Weibull distribution (11.11). Then the joint pdf of the pivotal quantities
 δ
β δ
V1 =   , V2 =  , (11.17)
β
  δ
conditional on fixed
z ( n) = ( zi , …, zn ) , (11.18)
where

δ
X 
Zi =  i  , i = 1,… , n, (11.19)
 β 

 ancillary statistics, any n − 2 of which form a functionally independent set, β and
are
δ are the maximum likelihood estimates for β and δ, respectively, based on a random
sample of n observations (X1, ..., Xn) from the two-parameter Weibull distribution
(11.11), is given by
 n
   1  n v  
−n
1  
n n n

fn ( v1 , v2 |z ) = 
( n)
 ∑
 Γ(n)  i = 1
zi  v1n − 1 exp  − v1
v2
 
∑
i=1
z  
v2
i
   ϑ (z )
( n)
v2n − 2 ∏ ∑
i=1
v2
z 
i
 i = 1
zi  
2
 
= fn ( v1 |z( n) , v2 ) fn ( v2 |z( n) ), v1 ∈(0, ∞), v2 ∈(0, ∞),

(11.20)
where
n
1 
n
  n

( n)
fn ( v1 |z , v2 ) = 
Γ(n)  i = 1

∑
ziv2  v1n − 1 exp  − v1
 
∑z
i=1
v2
i  , v1 ∈(0, ∞) (11.21)

−n
n
 n

∏ ∑
1
( n)
fn ( v2 |z ) = v2n − 2 v2
z 
i
v2
z 
i , v2 ∈(0, ∞), (11.22)
ϑ (z ( n) ) i=1  i=1 
∞ −n
n
 n

ϑ (z ) =
( n)
∫v ∏ ∑
0
n− 2
2
i=1
v2
z 
i
 i=1
v2
z  dv2 (11.23)
i

is the normalizing constant.

Proof. The joint density of X1, ..., Xn is given by
δ −1
n
δ  xi    x δ 
fθ ( x1 , ..., xn ) = ∏
i=1
β  β 
exp  −  i   . (11.24)
 β 
Using the invariant embedding technique (Nechval and Vasermanis 2004; Nechval
et al. 2008, 2010), we transform (11.24) to
  n
δ  xi 
δ
 n  xi δ   
fθ ( x1 ,..., xn ) dβ dδ = ∏i=1
xi  β 
exp  −
 i=1  β  
∑
   dβ dδ
n− 2 δ  δ ( n − 1)
 n
1 δ 
n
 xi   β 
= − βδ n ∏i=1

xi  δ  ∏
i=1

  β
β  
  β  δ n δ   δ  β  δ − 1  
 xi   δ 
× exp  −  
  β  ∑i=1
 
β
    dβ   −  2 dδ 
  β  β    δ 
 δ 
n n− 2 n δ    δ ( n − 1)
 1 δ   xi  β
∏ ∏
δ 
= − βδ n   
i=1
xi  δ  i=1
β
β
 
  δ n δ   
 δ 
  δ −1 
β  xi    δ  β  dβ   − δ dδ 
∑
δ 
× exp  −       β  β   
β
   i=1
β     δ 2 
 δ 
n n− 2 n δ    δ ( n− 1)
 1 δ   xi  β
∏ ∏
δ 
= − βδ n   
i=1
xi  δ  i=1
β
β
 
  δ n δ   
 δ 
 δ
β  xi   d  β  d  δ 
∑
δ 

× exp  −  
β     β   δ 
   i=1
β 

 n n
 n

= − βδ n ∏i=1
1 n− 2
xi
v2 ∏ i=1
ziv2 v1n− 1 exp  − v1

∑z
i=1
v2
i  dv1dv2

−n
 n n
 n

∏ ∏ ∑
1
= − βδ n Γ(n)v2n− 2 v2
z 
i
v2
z 
i
i=1
xi i=1  i=1 
n
1 
n
  n

× 
Γ(n)  i = 1 ∑
ziv2  v1n − 1 exp  − v1
 
∑z
i=1
v2
i  dv1dv2 .

(11.25)
Normalizing (11.25),
−n n
 n n
 n  1 
n
  n

∏ ∏ ∑ ∑ ∑
1
− βδ n Γ(n)v2n − 2 v2
z 
i ziv2   ziv2  v1n − 1 exp  − v1 ziv2 
i=1
xi i=1  i = 1  Γ(n)  i = 1   i=1 
−n n

∞ ∞
 n n
 n
  n
  n

∏ x Γ(n)v ∏ ∑ ∑ ∑
1 1
∫∫
0 0
− βδ n
i=1 i
n− 2
2
i=1
ziv2 

 i=1
ziv2 


Γ ( n)


 i=1
ziv2  v1n − 1 exp  − v1

  i=1
ziv2  dv1 dv2

−n n
n
 n v  1 
n
  n

v n− 2
2 ∏ ∑
i=1
v2
z 
i
 i = 1
zi 2 

 ∑
Γ(n)  i = 1

ziv2  v1n − 1 exp  − v1
 
∑z
i=1
v2
i 

= ∞ −n
n
 n

∫v ∏ ∑
0
n− 2
2
i=1
ziv2 
 i = 1
ziv2  dv2

−n
n
 n 
1 
n

n
 n

v n− 2
2 ∏ ∑ v2
z 
i
 i = 1
ziv2 

= 
Γ(n)  i = 1

∑
ziv2  v1n − 1 exp  − v1
 
∑i=1
z 

v2
i
i=1
ϑ (z( n) )
= fn ( v1 , v2 |z( n) ), (11.26)
we obtain (11.20). This ends the proof.

Corollary 1.1 If a pivotal quantity is given by

 δ δ   V2
 n
  β  δ n
 βδ  n n
W = V1 

∑
i=1
ZiV2  =  
  β 
∑
i=1
Z V2
i =  δ 
β 
∑
i=1
ZiV2 = V3V2 ∑Z
i=1
V2
i , (11.27)
it follows from (11.21) that
1
W ~ g n (w ) = w n − 1 exp(− w), w ∈(0, ∞) (11.28)
Γ(n)
where gn(w) is a probability density function of the gamma distribution
a−1
1  w
f ( w| a , b ) =   exp(− w/b), w ∈(0, ∞), (11.29)
Γ( a)b  b 
with the shape parameter a = n and scale parameter b = 1.
11.3 Lower statistical γ-content tolerance limit

with expected (1 − α)-confidence
Theorem 2. Let X1, …, Xn be the n observations of the experimental random sample
of size n from a two-parameter Weibull distribution defined by the probability den-
sity function (11.11). Then a lower statistical γ-content tolerance limit with expected
(1 − α)-confidence, Lk ≡ Lk (S), on future outcomes of the kth-order statistic Yk from a
set of m future ordered observations Y1 ≤ ⋅ ⋅ ⋅ ≤ Ym also from the distribution (11.11),
which satisfies (11.1) is given by

Lk = ηL1kδ β , (11.30)
where
 (
ln 1− q1−γ
n
)−1 ∑ ZiV2 
∞ i=1 n
 n

−n

∏ z  ∑ z
1 1
∫∫
= arg  dw dv2 = 1 − α 
V2
ηLk ηLk 
w n − 1 exp(− w) v2n − 2 v2 v2

ϑ ( z( n) )
i i
 0 Γ(n)  
i=1 i=1
0 
 
(11.31)
 
is a tolerance factor; the maximum likelihood estimates β and δ of the parameters
β and δ are determined from (11.13) and (11.14), respectively; the ancillary statis-
tics Zi, i = 1, …, n, are given by (11.19); q1−γ is a quantile of the beta-distribution
satisfying
q1−γ q1−γ
1

∫ ϕ(t|k , m − k + 1) dt = ∫ Β(k , m − k + 1) t
0 0
k −1
(1 − t)m − k dt = 1 − γ . (11.32)
Proof. Taking into account (11.2), (11.7), (11.27), and (11.28), the following probability
transformation can be carried out:
∞ 
∫ (
Pr  gθ ( y k ) dy k ≥ γ  = Pr Gθ (Lk ) ≥ γ
 
)
Lk
 Fθ ( Lk )

= Pr ( 1 − Gθ (Lk ) ≥ γ ) = Pr  1 −

0
∫ ϕ (t|k , m − k + 1) ≥ γ 

 Fθ ( Lk ) 
= Pr 
 ∫ ϕ (t|k , m − k + 1) dt ≤ 1 − γ  = Pr ( Fθ (Lk ) ≤ q1−γ )

0
   L δ      L δ  
= Pr  1 − exp  −  k   ≤ q1−γ  = Pr  exp  −  k   ≥ 1 − q1−γ 
   β       β   
  Lk  δ    Lk  δ −1 
= Pr  −   ≥ ln ( 1 − q1−γ ) = Pr    ≤ ln ( 1 − q1−γ ) 
  β   β  
δ
  L β  δ    δ δ  
 β   L  δ
= Pr     ≤ ln ( 1 − q1−γ )  = Pr      ≤ ln ( 1 − q1−γ )

k −1 k −1 
  β β    β   β  
 
   n

ln ( 1 − q1−γ ) ∑Z
−1 V2
 −1   
ln ( 1 − q1−γ ) 
n i
= Pr  V1 ≤
  V2

= Pr V
 1 ∑Z V2
i ≤
 δ V2

i=1 

( ) ( )
  δ  
 L
 k β    i=1
L
 k β  
       
 n

ln ( 1 − q1−γ ) ∑
−1
 ZiV2 
= Pr  W ≤ i=1
 V2
 ,

(11.33)
( )
  δ 
  Lk β  
   
where q1−γ is the (1 − γ)-quantile of the beta-distribution with the shape parameters
a = k and b = m − k + 1. Using pivotal quantity averaging, it follows from (11.1) and
(11.33) that

 (
ln 1− q1−γ )−1 ∑ ZiV2
n

  n
  
ln ( 1 − q1−γ ) ∑
−1 i=1
  Zi  
V2
 
 V2
δ  
(

   = E  Lk β ) 

E Pr  W ≤
∫
i=1  
 V2   g n (w) dw 
( )
    δ    
0
   Lk β     
      
 
n
(
ln 1− q1−γ )−1 ∑ ZiV2
i=1
 V2
 δ 
(

∞
 Lk β ) 
∫∫
 
= g n (w) fn ( v2 |z( n) ) dw dv2
0
0
n
(
ln 1− q1−γ )−1 ∑ ZiV2
i=1
 V
 δ 2 −n
∞
(
 Lk β  ) n
 n

∏ ∑
1 1
∫∫
 
= w n − 1 exp(− w) v2n − 2 z v2 v2
z  dw dv2 = 1 − α .
ϑ ( z ( n) )
i i
0 Γ(n) i=1  i=1 
0
(11.34)
It follows from (11.34) that
 (
ln 1− q1−γ )−1 ∑ ZiV2
n

 i=1 
∞ 
 V2
 δ  −n
(

 Lk β ) n
 n

∏ ∑

 1 1 
∫∫
 
Lk = arg  w n − 1 exp(− w) v2n − 2 ziv2  zi  dw dv2 = 1 − α  .
v2
0 Γ(n) ϑ ( z( n) )  
0 i=1 i=1 
 
 
(11.35)
Assuming that


(L β )
δ
k = ηLk , (11.36)
we have (11.30). This completes the proof.

Corollary 2.1. If k = 1, then

Lk = ηL1kδ β , (11.37)
where ηLk satisfies
n
(
ln 1− q1−γ )−1 ∑ ZiV2 −n
∞ i=1 n
 n

∏ ∑
1 1
∫∫
V2
ηLk  w n − 1 exp(− w) vn− 2 z v2
z  v2
dw dv2 = 1 − α ,
0 Γ(n) (
ϑ z ( n) ) 2
i=1 
i
i=1 
i
0
(11.38)
q1−γ is a quantile of the beta-distribution (with k = 1) satisfying (11.32).
Corollary 2.2. If k = m = 1, then

Lk = ηL1kδ β , (11.39)
where
 ln γ −1
n
∑ ZiV2 
∞ i=1 n
 n

−n

∏ ∑
1 1
= arg 
∫∫ dw dv2 = 1 − α  .
V
ηLk  2
ηLk w n − 1 exp(− w) vn− 2 ziv2  ziv2 
 0 Γ(n) (
ϑ z ( n) ) 2
i=1  i=1  
0 
 
(11.40)
11.4 Upper statistical γ-content tolerance limit

with expected (1 − α)-confidence
Theorem 3. Let X1, …, Xn be the n observations of the experimental random sample of
size n from the two-parameter Weibull distribution defined by the probability den-
sity function (11.11). Then an upper statistical γ-content tolerance limit with expected
(1 − α)-confidence, Uk ≡ Uk(S), on future outcomes of the kth-order statistic Yk from a
set of m future ordered observations Y1 ≤ ⋅ ⋅ ⋅ ≤ Ym also from the distribution (11.11)),
which satisfies (11.2), is given by

U k = ηU1 kδ β , (11.41)
where
 (
ln 1− qγ
n
)−1 ∑ ZiV2 
∞ i=1 n
 n

−n

∏ ∑
1 1
= arg 
∫∫ dw dv2 = α 
V2
ηU k ηU k  w n − 1 exp(− w) v2n − 2 ziv2  ziv2 
 0 Γ(n) ϑ (z ( n) )   
i=1 i=1
0 
 
(11.42)
 
is a tolerance factor; the maximum likelihood estimates β and δ of the param-
eters β and δ are determined from (11.13) and (11.14), respectively; the ancillary
statistics Zi, i = 1, …, n, are given by (11.19); qγ is a quantile of the beta-distribution
satisfying
qγ qγ
1

∫ ϕ(t|k , m − k + 1) dt = ∫ Β(k , m − k + 1) t
0 0
k −1
(1 − t)m − k dt = γ . (11.43)
 Uk   Fθ (U k ) 
∫
Pr  gθ ( y k ) dy k ≥ γ  = Pr (Gθ (U k ) ≥ γ ) = Pr 

0
 
0
∫ϕ (t|k , m − k + 1) ≥ γ  = Pr ( Fθ (U k ) ≥ qγ )

   Uk  δ      Uk  δ  
= Pr  1 − exp  −    ≥ qγ  = Pr  exp  −    ≤ 1 − qγ 
   β       β   
  U δ    L δ −1 
= Pr  −  k  ≤ ln ( 1 − qγ ) = Pr   k  ≥ ln ( 1 − qγ ) 
  β    β  
δ
  U β  δ    δ δ  
 β   U  δ
 ≥ ln ( 1 − qγ ) = Pr   β    ≥ ln ( 1 − qγ ) 

= Pr  k −1
  k −1 
  β β       β  
   n

ln ( 1 − qγ ) ∑Z
−1 V2
   
ln ( 1 − qγ ) 
−1 n i
= Pr  V1 ≥
  δ V2 
 = Pr  V1 ∑Z V2
i ≥
 δ V2

i=1 




(
 Uk β  

)
 


i=1 
(
 Uk β 
 
) 

 n

ln ( 1 − qγ ) ∑
−1
 ZiV2 
= Pr  W ≥ 
i=1 ,
 (11.44)
 δ V2



(
 Uk β 
 
) 

where qγ is the γ quantile of the beta-distribution (11.6) with the shape parameters
a = k and b = m − k + 1.
Using pivotal quantity averaging, it follows from (11.2) and (11.44) that
 (
ln 1− qγ
n
)−1 ∑ ZiV2 
  n
  
ln ( 1 − qγ ) ∑
−1 i=1
  ZiV2   
 V2
δ  
(

    = E 1 −  Uk β ) 

∫
 
E Pr  W ≥ i=1
 V2   g n (w) dw 
( )
    δ 
  
0
   Uk β     
      
 
n
ln 1− qγ( )−1 ∑ ZiV2
i=1
 V
 δ 2
∞
(
 Uk β  )
∫∫
 
= 1− g n (w) fn ( v2 |z( n) ) dw dv2
0
0
n
ln 1− qγ( )−1 ∑ ZiV2
i=1
 V
 δ 2 −n
∞
(
 Uk β  ) n
 n

∏ ∑
1 1
∫∫
 
= 1− w n − 1 exp(− w) v2n − 2 z  v2
i z v2
i dw dv2 = 1 − α .
0 Γ(n) ϑ (z ( n ) ) i=1  i=1 
0
(11.45)
It follows from (11.45) that
 (
ln 1− qγ
n
)−1 ∑ ZiV2 
 i=1 
∞ 
 V2
 δ  −n
(

 Uk β ) n
 n

∏ ∑

 1 1 
∫∫
 
U k = arg  w n − 1 exp(− w) v2n − 2 ziv2  ziv2  dw dv2 = α  .
0 Γ(n) ϑ (z ( n) )  
0 i=1 i=1 
 
 
(11.46)
Assuming that

( )
 δ
Uk β = ηU k , (11.47)


U k = ηU1 kδ β , (11.48)
where ηU k satisfies
n
(
ln 1− qγ )−1 ∑ ZiV2 −n
∞ i=1 n
 n

∏ ∑
1 1
∫∫
V2
ηU k 
w n − 1 exp(− w) v2n − 2 v2
z 
i z  v2
i dw dv2 = α . (11.49)
0 Γ(n) ϑ (z ( n) ) i=1  i=1 
0
qγ is a quantile of the beta-distribution (with k = 1) satisfying
qγ qγ
1

∫ ϕ(t|k , m − k + 1) dt = ∫ Β(k , m − k + 1) t
0 0
k −1
(1 − t)m − k dt = γ . (11.50)


U k = ηU1 kδ β , (11.51)
where
 ln(1− γ )−1
n
∑ ZiV2 
∞ i=1 n
 n

−n

∏ ∑
1 1
= arg 
∫∫ dw dv2 = α  .
V2
ηU k ηU k  w n − 1 exp(− w) v2n − 2 ziv2  ziv2 
 0 Γ(n) ϑ (z ( n) )   
i=1 i=1
0 
 
(11.52)
Remark 1. It will be noted that an upper statistical γ-content tolerance limit with
expected (1 − α)-confidence may be obtained from a lower statistical γ -content toler-
ance limit with expected (1 − α)-confidence by replacing 1 − α by α, 1 − γ by γ.
11.5 Lower statistical (1 − α)-expectation tolerance limit

Theorem 4. Let X1, …, Xn be the n observations of the experimental random sample of
size n from a two-parameter Weibull distribution defined by the probability density
function (11.11). Then a lower statistical (1 − α)-expectation tolerance limit, Lk ≡ Lk (S),
on future outcomes of the kth-order statistic Yk from a set of m future ordered obser-
vations Y1 ≤ ⋅ ⋅ ⋅ ≤ Ym also from the distribution (11.11), which satisfies
 ∞ 
{ (
Eθ Pr Yk > Lk (S) )} 
= Eθ 
 L (S)
∫

gθ ( y k ) dy k  = 1 − α , (11.53)

k 
is given by

Lk = ηL1 δ β , (11.54)
k
where ηL is a tolerance factor satisfying

k
k −1
 m 
l
 l 
∞ v n− 2
2 ∏z v2
i
∑  l 
 
∑  j  (−1)
j=0  
j 1
ϑ (z( n) ) ∫ n
i=1

n dv2 = 1 − α . (11.55)
∑
l=0 v2
z + ηL  (m − l + j)
0 v2
  k
i
 i=1 
∞
(
Pr Yk > L =
k ) ∫ g (y ) dy
θ k k = Gθ (Lk )
Lk
k −1 k −1
 m   m 
l
 l 
∑ ∑ ∑ 
l m− l m− l + j
=  l  1 − Fθ (Lk )   Fθ (Lk )  = (−1) j  Fθ (Lk ) 

 
 l 
  j 
l=0 l=0 j=0
m−l + j
k −1
 m 
l
 l     Lk  δ  
= ∑  l 
 
∑  j  (−1)  exp  −    
 
j
   β   
l=0 j=0 
m−l + j
k −1   δ δ δ  
 m 
i
 l   β   Lk  δ  
= ∑  l 
 
∑  
j 


 j  (−1)  exp  −      
β  β 
l=0 j=0
   
m−l + j
k −1 l
 l    n δ V2
 n
 
−1
 m   L 
= ∑  l 
 
∑  j 
 
( −1) j

exp  −V1
 ∑ ziV2  k 
 β


∑ zi   
V2
  
l=0 j=0
  i=1 i=1

  V2 −1

 δ  
k −1 l
 l  n
= ∑
 m 
 l 
 
∑  
j 
 j  (−1)  exp  −W  Lk β  
  
( ) ∑ V2
z  (m − l +
i

j)  .

l= 0 j= 0   i=1  
(11.56)
Using pivotal quantity averaging, it follows from (11.22), (11.27), (11.28), (11.53), and
(11.56) that
{ (
Eθ Pr Yk > Lk (S) )}
 k −1  m    V2 −1

 l   δ  
l n

= E  ∑
 l=0  l 
 ∑ j 
  (−1)  exp −W  Lk β  
   
( ) ∑ z  (m − l +
V2
i j)   


j=0  j    i=1    

−1
 m ∞ ∞  v2 
 l   δ  
k −1 l n
=
∑  
 l 
∑   (−1)
j
∫∫ 
 
(
exp − w  Lk β  
 
) ∑ z  (m − l +
v2
i j)  gn (w) fn ( v2 |z( n) ) dwdv2

l=0 j=0  j  0 0  i=1  
n
k −1
 m l
 l  ∞ v n− 2
2 ∏z v2
i
= ∑  
 l 
∑  
j=0  j 
( −1) j 1
ϑ (z( n) )  ∫ n
i=1
δ  v2 
n dv2 , (11.57)
l=0 0


∑

( )
ziv2 +  Lk β  (m − l + j)
  
i=1
where
 n 

 k −1
 m 
l
 l 
∞ v zn− 2
2 ∏ v2
i


Lk = arg 

∑  l 
 
∑  j  (−1)
j=0  
j 1
ϑ (z ( n) ) ∫ n
i=1
v2

n dv2 = 1 − α  .

( )
 δ
∑
l=0 0
  ziv2 +  Lk β  (m − l + j) 
  i=1    
(11.58)
Assuming that

( )
δ
L β

k = ηL , (11.59)
k


Lk = ηL1 δ β , (11.60)
k
where ηL satisfies

k
1
∞ v n− 2
2 ∏z v2
i

ϑ (z ( n) ) ∫ n
i=1

n dv2 = 1 − α . (11.61)
∑
v2
z + ηL  m
0 v2
  k
i
 i=1 

Lk = ηL1 δ β , (11.62)
k
where ηL satisfies

k
1
∞ v n− 2
2 ∏z v2
i

ϑ (z ( n) ) ∫ n
i=1

n dv2 = 1 − α . (11.63)
∑
v2
z + ηL  
0 v2
 i
 k 
 i=1
11.6 Upper statistical (1 − α)-expectation tolerance limit

Theorem 5. Let X1, …, Xn be the n observations of the experimental random sam-
ple of size n from a two-parameter Weibull distribution defined by the probability
density function (11.11). Then an upper statistical (1 − α)-expectation tolerance limit,
U k ≡ U k (S), on future outcomes of the kth-order statistic Yk from a set of m future
ordered observations Y1 ≤ ≡ ≤ Ym also from the distribution (11.11), which satisfies
U k (S) 
{ (
Eθ Pr Yk ≤ U k (S) )} 
= Eθ 
 0
∫

gθ ( y k ) dy k  = 1 − α , (11.64)

 
is given by

U k = ηU1 δ β , (11.65)
k
where ηU  is a tolerance factor satisfying

k
k −1
 m   l  l ∞ v n− 2
2 ∏z v2
i
∑  l 
  j=0  
∑
 j  (−1)
j
ϑ (
1
z ( n)
) ∫ n
i=1

n dv2 = α . (11.66)
∑
l=0 v2
z + ηU   (m − l + j)
0 v2
 i
 k
 i=1 
∞
( )
Pr Yk ≤ Lk = 1 −
∫ g (y ) dy
θ k k = 1 − Gθ (U k )
U k
k −1
 m k −1
 m l
l
= 1− ∑l=0 l
l m− l
  1 − Fθ (U k )   Fθ (U k )  = 1 − ∑ ∑
l=0
 
l j=0  j 
j m−l + j
  (−1)  Fθ (U k ) 
m−l + j
k −1
 m l
l    U k  δ  
= 1− ∑  
l
∑  
j=0  j 
( −1) j


exp − 


 β   


l=0 
m−l + j
  δ δ δ  
k −1
 m i
l 
 β   Lk  δ  
= 1− ∑   ∑ j
  (−1)  exp  −      
β β
l=0 l j=0  j  
     
m− l + j
k −1
 m l
l   n δ V2
 n

−1

 U 
= 1− ∑  
l
∑
j=0  j 
j
  (−1)  exp  −V1
 ∑ ziV2  k 
 β 


∑ ziV2 



l=0   i=1 i=1
 
 m l   V2 −1

 δ  
k −1 l n
= 1− ∑  
l
∑   ( −1) j

exp 

− W U
 k

β  
 
( ) ∑ V2
z  (m − l +
i j)  .

j=0  j  
l=0
  i=1  
(11.67)
Using pivotal quantity averaging, it follows from (11.22), (11.27), (11.28), (11.64), and
(11.67) that
{ (
Eθ Pr Yk ≤ U k (S) )}
   V2 −1

 δ  
k −1 l
 l  n
( )
 m 

= E 1 −

∑  l 
 
∑  j 
j=0  
( −1) j

exp 

− W

U
 k β  
 
∑ z  (m − l +

V2
i j)   



l=0
  i=1   
k −1
 m   l 
l
= 1− ∑
l=0
 l 
 
∑ 
j=0
j 
(−1) j
−1
∞ ∞ 
δ  v2  n
 
×
∫∫ 


(
exp  − w  U k β  
 
) ∑ z  (m − l +
v2
i

j)  g n (w) f n (v2 |z( n) )dwdv2

0 0  i=1 
n
k −1
 m   l 
l ∞ v2n − 2 ∏z v2
i
= 1− ∑  l 
 
∑  j  (−1)
j=0  
j
ϑ (
1
z ( n)
) ∫ n
i=1
v2

n dv2 ,
( )

∑
l=0 0  δ
 ziv2 +  U k β  (m − l + j)
 i=1   
(11.68)
where
 n 

 k −1
 m 
l
 l 
∞ v z n− 2
2 ∏ 

v2
i
U k = arg 

∑  l 
 
∑  j  (−1)
j=0  
j 1
ϑ (z( n) ) ∫ n
i=1
v2

n dv2 = α  .

( )
 δ
∑
l=0 0
  ziv2 +  U k β  (m − l + j) 
  i=1    
(11.69)
Assuming that

( )
δ
U β
k = ηU , (11.70)
k


U k = ηU1 δ β , (11.71)
k
where ηU  satisfies
k
1
∞ v2n − 2 ∏z v2
i

ϑ (z ( n) ) ∫ n

i=1
n dv2 = α . (11.72)
∑
v2
z + ηU   m
0 v2
 i
 k
 i=1 

U k = ηU1 δ β , (11.73)
k
where ηU  satisfies
k
1
∞ v2n − 2 ∏z v2
i

ϑ (z ( n) ) ∫ n
i=1

n dv2 = α . (11.74)
∑
v2
ziv2 + ηU   
0
  k 
 i=1
Remark 2. It will be noted that an upper statistical (1 − α)-expectation tolerance limit

may be obtained from a lower statistical (1 − α)-expectation tolerance limit by replac-
ing 1 − α by α.
11.7 Numerical example 1
Consider the data in an example discussed by Mann and Saunders (1969). They regard
the data coming from the Weibull distribution as the results of full-scale fatigue tests on
a particular type of component. The data are for a complete sample of size n = 3, with
observations X1 = 45.952, X2 = 54.143, and X3 = 65.440, results being expressed here in num-
ber of thousands of cycles. On the basis of these data, it is wished to obtain the lower
(1 − α)-expectation tolerance limit for the minimum (Y1) of independent lifetimes in a
group of m = 500 components which are to be put into service.
The maximum likelihood estimates of the  unknown parameters δ and β, computed on
the basis of (X1, X2, X3), are δ = 7.726 and β = 58.706, respectively. Taking 1 − α = 0.8 and
k = 1, with n = 3 and m = 500, we have from (11.60) that the statistical lower (1 − α)-expectation
tolerance limit, Lk ≡ Lk (S), for the minimum (Y1) of independent lifetimes in a group of
m = 500 components which are to be put into service, is given by

Lk = ηL1 δ β = 5.527411, (11.75)
k
where it follows from (11.61) that
 n 

 1
∞ vn− 2
2 ∏ z v2
i


ηL
k
= arg 
 ϑ (z ) 0 
( n) ∫ n
i=1

n dv2 = 1 − α  = 1.18/10 . (11.76)

8
∑
v2
  ziv2 + ηL  m 
   k  
i=1
Lawless (1973) obtained for this example (via conditional approach in terms of a Gumbel
distribution) the lower 80% prediction limit of 5.623, which is slightly larger than (11.75).
The resulting lower 80% prediction limit of Mee and Kushary (1994) for this example
(obtained via simulation) was 5.225, which is slightly smaller than (11.75). The Mann and
Saunders (1969) result for this example was only 0.766.
Taking γ = 0.8, 1 − α = 0.8 and k = 1, with n = 3 and m = 500, we have from (11.37) that
a lower statistical γ-content tolerance limit, Lk ≡ Lk (S), with expected (1 − α)-confidence for
the minimum (Y1) of independent lifetimes in a group of m = 500 components which are to
be put into service, is

Lk = ηL1kδ β , = 4.082282, (11.77)
 (
ln 1− q1−γ
n
)−1 ∑ ZiV2 
∞ i=1 n
 n

−n

∏ ∑
1 1
= arg 
∫∫ dw dv2 = 1 − α 
V2
ηLk ηLk  w n − 1 exp(− w) v2n − 2 ziv2  ziv2 
 0 Γ(n) ϑ (z ( n) )   
i=1 i=1
0 
 
= 1.135 / 109 , (11.78)
11.8 Numerical example 2
To investigate the performance of a logic circuit for a small electronic calculator, a circuit
manufacturer puts n = 5 of the circuits on life test without replacement under specified
environmental conditions, and the failures are observed after X1 = 830, X2 = 1020, X3 = 1175,
X4 = 1424, and X5 = 1603 hours. A buyer tells the circuit manufacturer that he wants to
place three orders (l = 3) for the same type of logic circuits to be shipped to three different
destinations. The buyer wants to select a random sample of q = 5 logic circuits from each
shipment to be tested. An order is accepted only if all of five logic circuits in each selected
sample meet the warranty lifetime. What warranty lifetime should the manufacturer offer
so that all of five logic circuits in each selected sample meet the warranty with probability
of 0.95?
In order to find this warranty lifetime, the manufacturer wishes to use a random sam-
ple of size n = 5 given above and to calculate the lower statistical simultaneous tolerance
limit Lk=1(S) (warranty lifetime) which is expected to capture a certain proportion, say,
γ = 0.975 or more of the population of selected items (m = lq = 15), with a given confidence
level 1 − α = 0.95. This lower statistical simultaneous tolerance limit is such that one can say
with a certain confidence 1 − α that at least 100γ% of the military carriers in each sample
selected by the buyer for testing will operate longer than L1(S).
Goodness-of-fit testing. Let us assume that (X1, …, Xn) is a random sample from the two-
parameter Weibull distribution (11.11), and let β , δ, be the maximum likelihood estimates
of β, δ, respectively, computed on the basis of (X1, …, Xn):
−1 −1
  n
 n
 n 
∑ ∑ 1
∑
 
δ =  δ
x ln xi  
i
δ
x 
i − ln xi  = 4.977351, (11.79)
   n 
 i=1 i=1 i=1 
and

1/δ
  n

∑x

δ
β = i n = 1321.323 . (11.80)
 i=1 
We assess the statistical significance of departures from the Weibull model (11.11) by per-
forming the Anderson–Darling goodness-of-fit test. There are many goodness-of-fit tests,
for example: Kolmogorov–Smirnov and Anderson–Darling tests. The Anderson–Darling
test is more sensitive to deviations in the tails of a distribution than the older Kolmogorov–
Smirnov test. The Anderson–Darling test statistic value is determined by (e.g., D’Agostino
and Stephens 1986)
 n

2
A = −

∑ (2i − 1)( ln F (x ) + ln (1 − F (x
i=1
θ i θ n + 1− i )))  n − n, (11.81)

where Fθ(x) is the cumulative distribution function of X,

 
θ = (β = β = 1321.323, δ = δ = 4.977351), (11.82)
and n = 5 is the number of observations. The result from (11.81) needs to be modified for
small sampling values. For the Weibull distribution, the modification of A2 is
2  0.2 
Amod = A2  1 + . (11.83)
 n 
2
The Amod value must then be compared with critical values, Aα2 , which depend on the sig-
nificance level α and the distribution type. As an example, for the Weibull distribution the
2
determined Amod value has to be less than the critical value Aα2 for acceptance of goodness-
of-fit test. For this example, α = 0.05, Aα2 = 0.05 = 0.757,
 5

A2 = − 

∑
i=1
(2 i − 1) ( ln Fθ ( xi ) + ln ( 1 − Fθ ( xn + 1− i )))  5 − 5 = 0.202335, (11.84)

2  0.2 
Amod = A2  1 + = 0.220432 < Aα2 = 0.05 = 0.757. (11.85)
 n 
Since the test statistic is less than the critical value, we do not reject the null hypothesis at
the significance level α = 0.05. Thus, there is not evidence to rule out the Weibull lifetime
model (11.11).
Now the lower one-sided simultaneous γ-content tolerance limit at the confidence level
1 − α, L1 ≡ L1 (S) (on the order statistic Y1 from a set of m = 15 future ordered observations
Y1 ≤ … ≤ Ym) is given by (11.37)

Lk = ηL1kδ β = 328.7676 ≅ 329, (11.86)

 (
ln 1− q1−γ
n
)−1 ∑ ZiV2 
∞ i=1 n
 n

−n

∏ z  ∑ z
1 1
∫∫
= arg  dw dv2 = 1 − α  ,
V2
ηLk ηLk 
w n − 1 exp(− w) v2n − 2 v2
i
v2
i 
 0 Γ(n) ϑ (z ( n) )  
i=1 i=1
0 
 
= 0.0009842, (11.87)

Statistical inference. Thus, the manufacturer has 95% assurance that no failures will
occur in the proportion γ = 0.975 or more of the population of selected logic circuits (m =
15) before Lk = 329 hours.
11.9 Conclusion
The new technique (based on probability transformation and pivotal quantity averag-
ing) given and illustrated in this chapter is offered as a conceptually simple, efficient,
and useful method for constructing exact statistical tolerance limits on future outcomes
under parametric uncertainty of underlying models. It is based also on the idea of invari-
ant embedding of a sufficient statistic in the underlying model in order to construct piv-
otal quantities and to eliminate the unknown parameters from the problem via pivotal
quantity averaging. Using the proposed technique, the exact statistical tolerance limits on
future order statistics (under parametric uncertainty of underlying models)) associated
with sampling from corresponding distributions can be found easily and quickly making
tables, simulation, and special computer programs unnecessary.
We consider the one-sided statistical tolerance limits defined as follows: (1) one-sided
statistical tolerance limit that covers at least 100γ% of the measurements with expected
100(1 − α)% confidence and (2) one-sided statistical tolerance limit determined so that the
expected proportion of the measurements covered by this limit is (1 − α). For example,
such tolerance limits are required when planning life tests, engineers may need to predict
the number of failures that will occur by the end of the test or to predict the amount of
time that it will take for a specified number of units to fail. The methodology described
in this chapter is illustrated for the two-parameter Weibull distribution. Applications to
other log-location-scale distributions could follow directly. Finally, we give two numerical
examples.
It should be noted that the results obtained in this chapter (Sections 11.3–11.8) via the
proposed technique are new.
References
D’Agostino, R.B. and M.A. Stephens, Goodness-of-Fit Techniques. New York: Marcel Dekker, 1986.
Dunsmore, J.R., “Some approximations for tolerance factors for the two parameter exponential dis-
tribution,” Technometrics, vol. 20, pp. 317–318, 1978.
Engelhardt, M. and L.J. Bain, “Tolerance limits and confidence limits on reliability for the two-
parameter exponential distribution,” Technometrics, vol. 20, pp. 37–39, 1978.
Guenther, W.C., “Tolerance intervals for univariate distributions,” Naval Research Logistics Quarterly,
vol. 19, pp. 309–333, 1972.
Guenther, W.C., S.A. Patil and V.R.R. Uppuluri, “One-sided β-content tolerance factors for the two
parameter exponential distribution,” Technometrics, vol. 18, pp. 333–340, 1976.
Guttman, I., “On the power of optimum tolerance regions when sampling from normal distribu-
tions,” Annals of Mathematical Statistics, vol. XXVIII, pp. 773–778, 1957.
Hahn, G.J. and W.Q. Meeker, Statistical Intervals: A Guide for Practitioners. New York: John Wiley &
Sons, 1991.
Lawless, J.F., “On estimation of the safe life when the underlying life distribution is Weibull,”
Technometrics, vol. 15, 857–865, 1973.
Mann, N.R. and S.C. Saunders, “On evaluation of warranty assurance when life has a Weibull dis-
tribution,” Biometrika, vol. 56, pp. 615–625, 1969.
Mee, R.W. and D. Kushary, “Prediction limits for the Weibull distribution utilizing simulation,”
Computational Statistics & Data Analysis, vol. 17, 327–336, 1994.
Mendenhall, V. “A bibliography on life testing and related topics,” Biometrika, vol. XLV, pp. 521–543,
1958.
Nechval, N.A. and E.K. Vasermanis, Improved Decisions in Statistics. Riga: Izglitibas soli, 2004.
Nechval, N.A., G. Berzins, M. Purgailis and K.N. Nechval, “Improved estimation of state of stochas-
tic systems via invariant embedding technique,” WSEAS Transactions on Mathematics, vol. 7,
pp. 141–159, 2008.
Nechval, N.A., M. Purgailis, G. Berzins, K. Cikste, J. Krasts and K.N. Nechval, “Invariant embed-
ding technique and its applications for improvement or optimization of statistical decisions,”
in Al-Begain, K., Fiems, D., Knottenbelt, W. (Eds.), Analytical and Stochastic Modeling Techniques
and Applications, (LNCS) (vol. 6148, pp. 306–320). Berlin: Springer-Verlag, 2010.
Nechval, N.A., K.N. Nechval and M. Purgailis, “Statistical inferences for future outcomes with appli-
cations to maintenance and reliability,” in Lecture Notes in Engineering and Computer Science:
Proceedings of the World Congress on Engineering, WCE 2011, 6–8 July, 2011 (pp. 865–871). London,
UK, 2011.
Nechval, N.A. and K.N. Nechval, “Tolerance limits on order statistics in future samples coming
from the two-parameter exponential distribution,”American Journal of Theoretical and Applied
Statistics, vol. 5, pp. 1–6, 2016a.
Nechval, N.A., K.N. Nechval, S.P. Prisyazhnyuk and V.F. Strelchonok, “Tolerance limits on order sta-
tistics in future samples coming from the Pareto distribution,” Automatic Control and Computer
Sciences, vol. 50, pp. 423–431, 2016b.
Nechval, N.A., K.N. Nechval and V.F. Strelchonok, “A new approach to constructing tolerance limits
on order statistics in future samples coming from a normal distribution,” Advances in Image and
Video Processing (AIVP), vol. 4, pp. 47–61, 2016c.
Patel, J.K., “Tolerance limits: A review,” Communications in Statistics: Theory and Methodology, vol. 15,
pp. 2719–2762, 1986.
Wald, A. and J. Wolfowitz, “Tolerance limits for a normal distribution,” Annals of Mathematical
Statistics, vol. XVII, pp. 208–215, 1946.
Wallis, W.A., “Tolerance intervals for linear regression,” in Neyman, J. (Ed.), Second Berkeley Symposium on
Mathematical Statistics and Probability (pp. 43–51) Berkeley: University of California Press, 1951.
chapter twelve
Design of neural network–based

PID controller for biped robot while
ascending and descending the staircase
Ravi Kumar Mandava and Pandu R. Vundavilli
IIT Bhubaneswar
Contents
12.1 I ntroduction......................................................................................................................... 227
12.2 Kinematics and dynamics of the biped robot................................................................. 229
12.2.1 Dynamic balance margin while ascending the staircase................................. 230
12.2.2 Dynamic balance margin while descending the staircase............................... 231
12.2.3 Design of torque-based PID controllers for the biped robot............................ 232
12.3 MCIWO-based PID controller...........................................................................................234
12.4 MCIWO-NN–based PID controller.................................................................................. 236
12.5 Results and discussion....................................................................................................... 238
12.5.1 Ascending the staircase......................................................................................... 238
12.5.2 Descending the staircase....................................................................................... 241
12.6 Conclusions.......................................................................................................................... 245
References...................................................................................................................................... 245
12.1 Introduction
Compared to industrial manipulators, legged robots are having much more interaction
with the ground and it is a tough job to control the robot in an effective manner. Further,
the mechanism, structure and balancing of the two-legged robot is complex in nature
when compared with other legged robots. Over the past few decades, people are working
on the stability and controlling aspects of the biped robot on various terrains. Generating
the stable gait for the biped robot while walking on various terrains is a difficult task
and was taken up by many researchers. Presently, researchers are utilizing zero moment
point (ZMP) [1] based control algorithms to control the gait of the two-legged robot. Some
other researchers had tried to optimize the parameters of the ZMP-based controller after
utilizing nontraditional algorithms [2,3]. It is important to note that the conventional PID
controllers were widely deployed in both industrial as well as nonindustrial applications
due to its ease, simple design, and cost effectiveness. Based on the demand for the usage
of PID controller in various applications, the real-time tuning/adaption of the controller
gains (i.e., Kp, Kd, and Ki) in an online manner is a challenging task. Moreover, the tuning
methods such as Zeigler–Nicholas [4] and Cohen [5] methods were already proved that it is
not possible to use them in highly nonlinear, uncertain, and coupled robotic applications.
227
The advancements in the computational techniques motivated several researchers to use

evolutionary and nature-inspired optimization algorithms, such as genetic algorithms
(GAs) [6], particle swarm optimization (PSO) [7,8], ant colony optimization (ACO) [9],
cuckoo search algorithm (CSA) [10] and bacterial forging optimization (BFO) [11], for tun-
ing the gains of the conventional PID controller. The evolutionary and nature-inspired
optimization algorithms provide a fixed set of optimal tuning parameters (i.e., gains). This
fixed set of gains most of the time could not provide optimum performance when used
in uncertain and nonlinear processes. Therefore, an efficient and effective online tuning
mechanism is required for controlling the joints of the robot in a systematic manner. It is
also important to mention that in order to improve the performance of the conventional
PID controller, few researchers had tried with fuzzy logic technique for obtaining the opti-
mal gains of the PID controller in an online manner. The PID controller was successfully
implemented in various industrial applications, such as induction machine, ship plant [12–
14], micro grid [15], hybrid electrical vehicle system [16] and AVR system [17], to control
them in a more effective manner.
Based on the above literature, it is evident that the performance of the conventional PID
controller can be improved by optimizing the critical parameters of the control algorithm
using evolutionary and nature-inspired algorithms. The evolutionary algorithms, such as
GA and PSO, were widely used in optimization of the gains of the PID controller for various
industrial applications. Along with the nature-inspired optimization algorithms, recently
Mehrabain and Lucas [18] established an IWO-based optimization technique for improving
the convergence and performance of the system [19,20]. In IWO, the weeds were reproduced
without matting and the fitter plants that produce a greater number of seeds might have led
to the improvement in the convergence and the performance of the algorithm. Further, the
application of neural network in the area of dynamical control system was also tested by a
few researchers. Cembrano et al. [21] discussed the concept of neural network in learning
inverse kinematics, inverse dynamics and visual positioning of the robot in real-time con-
trol applications. Ghorbani et al. [22] designed a general regression neural network (GRNN)
feedback controller for stabilizing the biped robot at the upright position and satisfied all
the constraints in between the foot link and the ground. The obtained controller action
resulted in reducing energy consumption and minimizing the torque required at the ankle
joint. A genetic algorithm was used to train the network, and the stability of the controller
was analyzed by using the Lyapunov exponent. Later on, Jiang [23] developed a dynamic
trajectory tracking control algorithm for an industrial manipulator after using a PD control-
ler and a neural network (NN)–based controller. It was observed that after an increased
learning time, the torque consumed by the NN-based controller was seen to more when
compared with the PD controller. Saeed [24] also proposed a NN-based PID controller to
improve the performance and robustness of the robotic manipulator. They added an uncer-
tainty rejecting property to the NN-based PID controller.
The motivation of the present work is to preserve the favorable characteristics of the
conventional PID controller for moving the joints of the biped robot in a systematic man-
ner. The contributions of the present chapter are as follows:
• In this research work, the authors initiated a torque-based PID controller to control
each joint of the biped robot in a systematic manner and to reduce the error between
the two consecutive intervals of various joints.
• Optimal tuning of the PID controller occurred using the MCIWO algorithm, instead
of the time-consuming manual tuning process. A NN tool has also been developed
to tune the gains of the PID controller.
Chapter twelve: Design of neural network–based PID controller 229
• Further, the authors implemented a cosine and chaotic variables to the standard
invasive weed optimization (i.e., modified chaotic invasive weed optimization)
algorithm to evolve the structure of NN automatically and to generate the gains in
an adaptive manner. With the best of the authors’ knowledge, none of the research-
ers had used the MCIWO algorithm to evolve the structure of the neural network
in control applications.
12.2 Kinematics and dynamics of the biped robot

In the present research, an 18-Degrees of Freedom (DOF) biped robot (Figure 12.1) has
been chosen to carry out the simulations and experimental validation. Initially, the
coordinate frames are assigned at various joints and the D–H parameters [25,26] are
extracted for each joint of the biped robot. Alongside, a systematic study has been con-
ducted to study the influence of swing foot trajectory of the biped robot after using
quadratic, cubic, and fifth-ordered polynomial trajectories. According to the above
study, the swing foot trajectory is having its influence on the balance of the robot while
ascending and descending the staircase. In the present study, cubic polynomial trajec-
tories are assumed to be used by the swing foot and hip joints of the robot to generate a
smooth gait on the terrain. Further, the concept of inverse kinematics has been used to
derive the gaits for the lower and upper limbs of the biped robot. The included angles
between the lower and upper limbs of the swing leg in the sagittal plane (i.e., θ 3 and θ 4),
in the frontal plane (i.e., θ 2 and θ 5), the stand leg in the sagittal plane (i.e., θ 9 and θ 10) and
the frontal plane (i.e., θ 8 and θ 11) are given as follows:
 H 1l3 sin ψ + L1 ( l4 + l3 cosψ ) 

θ 4 = sin −1  2  (12.1)
 ( l4 + l3 cosψ ) + ( l3 sin ψ ) 
2
Figure 12.1 Structure of the biped robot.

where H1 = l4 cos(θ4) + l3 cos(θ 3), L1 = l4 sin(θ4) + l3 sin(θ 3), ψ = θ4 − θ 3 = arcos((H12 + L12 − l42 −
l32)/2 l4l3). The angle θ 3 can be calculated after using the equation θ 3 = θ4 − ψ.
 H 2 l9 sin ψ + L2 ( l10 + l9 cosψ ) 

θ 10 = sin −1  2  (12.2)
 ( l10 + l9 cosψ ) + ( l9 sin ψ ) 
2
where H2 = l9 cos(θ 9) + l10 cos(θ 10), L2 = l9 sin(θ 9) + l10 sin(θ 10), ψ = θ 10 – θ 9. The angle θ 9 can be
calculated by using θ 9 = θ 10 – ψ.
Further, the following mathematical expressions are used to calculate the joint angles
in the frontal plane:
θ 2 = θ 8 = tan −1 ( fw H 1 ) (12.3)
( )
θ 5 = θ 11 = tan −1 ( 0.5 fw ) H 2 (12.4)
12.2.1 Dynamic balance margin while ascending the staircase

The gait generated by the biped robot should be dynamically balanced while ascending
the staircase (Figure 12.2). For attaining dynamically balanced gait for the biped robot, the
zero moment point should lie inside the foot support polygon. The position of ZMP in
X- and Y-directions are shown in Figure 12.3, and it is calculated by using the subsequent
equations:
(a) (b)
l14 m17
m14 l14 l17
m18 m13 m16
l17
θ14 θ17 θ18
m15 l17
l15 l18
θ15
m15 m18
l15 L1
m9 l3
m3
θ3 θ9 m2
l3 m8
θ2 θ8
m10 l2 l8
H1
m4 θ10
l4 l3
θ4 m5 m11
l5 θ5 θ11 l11
x1 m12
l6 m6 l12
x2
fw
x3
Figure 12.2 Biped robot walking on ascending the staircase. (a) Sagittal view and (b) frontal view.
F: Ground reaction force

Z Z
Lower limb
Lower limb
X ZMP Y ZMP
YZMP
F XZMP F
YDBM
Sagittal plane XDBM Frontal plane
Figure 12.3 ZMP and DBM in both sagittal and frontal planes.
∑
n
i=1
( Iiω i − mi xi zi + mi xi ( g − zi ))
xZMP = (12.5)
∑ i=1
n
( m i ( 
z i − g ) )
∑
n
i=1
( Iiω i − mi yi zi + mi yi ( g − zi ))
y ZMP = (12.6)
∑ i=1
n
( m i ( 
z i − g ) )
where ω i, Ii, and mi represent the angular acceleration (rad/s2), mass moment of inertia
(kg m2) and mass (kg) of the link i, g is the acceleration due to gravity (m/s2), z̈i and ẍi
denote the acceleration (m/s2) of the link in z- and x-directions, respectively, and (xi, yi, zi)
indicates the coordinates of the itℎ lumped mass.
After determining the position of ZMP, the dynamic balance margin (DBM) of the
biped robot in X- and Y-directions (Figure 12.3) are calculated by using Equations (12.7)
and (12.8), respectively. It is important to note that for generating the dynamically bal-
anced gait, the ZMP should lie inside the foot support polygon. If the ZMP moves outside
the polygon, we need to move the links of the biped robot in such a way that they push
the ZMP inside the foot support polygon. Further, the dynamic balance margin has been
defined as the distance between the point where the ZMP is acting and to the end of the
foot support polygon:
 f 
xDBM =  s − xZMP  (12.7)
 2 
 f 
y DBM =  w − y ZMP  (12.8)
 2 
12.2.2 Dynamic balance margin while descending the staircase

The schematic diagram showing the structure of the biped robot while descending the
staircase in the sagittal and the frontal views, are given in Figure 12.4a,b, respectively.
The mathematical model used for the gait generation of the biped robot while descend-
ing the staircase is similar to that of ascending the staircase. Further, the DBM of the
biped robot during descending the staircase can also be calculated by implementing a
(a) (b)
l14 m17
m14 l17
l18
θ14
θ17 l14 l17
m15 θ18
m18 m13 m16
L1
l15 θ15
m3 m9 l9 l15 l18
m15
m18
l3 θ3 θ9
m4
l4 H1
m2
θ4 m10 θ2 θ8
10 l2 m8 l8
l10
l6
m5
θ5 m11
l5 θ11 l11
x1 l6 m6 m12
l12
x2
fw
x3
Figure 12.4 Biped robot walking on descending the staircase. (a) Sagittal view and (b) frontal view.
similar procedure as that of the ascending case. However, there is a small difference in the
descending case with the acceleration due to gravity “g” gacting in the direction opposite
to that of the movement of the robot.
12.2.3 Design of torque-based PID controllers for the biped robot

Once the gait is generated, then suitable torque-based PID controllers are developed for
each joint of the biped robot while ascending and descending the staircase. The dynamics
of the biped robot which is used in the design of the controller has been derived by using
the Lagrange–Euler formulation (refer to Equation 12.9). The theoretical torques required
at different joints of the biped robot are calculated by using the following equation:
∑ ∑ ∑
n n n
τ i ,the = Mij ( q ) qj + Cijk q j q k + Gi i, j, k = 1, 2, … , n (12.9)
j=1 j=1 k=1
where τi,the, q, q j, and q j represent the theoretical torque required, displacement of the joint
in (m), velocity of the joint in (m/s), and acceleration of the joint in (m/s2). Further, the
expanded form of inertia (Mi,j), centrifugal/coriolis (hi,j,k), and gravity (Gi) terms are as
follows:
∑
n
Mij = Tr  dpj I p dpiT  i, j = 1, 2,  , n (12.10)
p = max( i , j )
( )
 ∂ dpk 
∑
n
hijk = Tr  I p dpiT  i, j = 1, 2,  , n (12.11)
p = max( i , j , k )
 ∂qp 
∑
n
Gi = − mp gdpi ep rp i, j = 1, 2,  , n (12.12)
p=i
where ep rp, Ip, and g denote the mass center (m), mass moment of inertia (kg m/s2) tensor of
pth link and acceleration due to gravity in (m/s2), respectively. It is important to note that
the acceleration of the joint plays a significant role in controlling each link of the biped
robot. By rearranging Equation (12.9), the expression related to the acceleration of link i is
given as follows:
∑ −1 
∑ ∑  
∑ 
n n n n
Mij ( q )  − Mij ( q ) * τ i ,the  i, j, k = 1, 2, … , n
−1
qj = Cijk q j q k − Gi  + 
j=1  j=1 k=1   j=1 

(12.13)
now considering the term
∑
n
Mij ( q ) * τ i ,the = τˆ (12.14)
−1

j=1
and substituting Equation (12.14) into Equation (12.13), it can be written as
∑ −1 
∑ ∑ 
n n n
qj = Mij ( q )  − Cijk q j q k − Gi  + τˆ i, j, k = 1, 2, … , n (12.15)
j=1  j=1 k=1 
In real-time applications, the theoretical torque required and acceleration of each link are
not suitable to estimate the value of actual torque and acceleration. Based on the above rea-
son the actual torque required at different joints of the biped robot is calculated by using
the following expression:
τ act = K p e + K d e + K i ∫ edt (12.16)
where τact represents the actual torques required at different joints of the biped robot; the
terms Kp, Kd, and Ki denote proportional, derivative, and integral gains of the PID control-
ler, respectively; and e indicates the value of error (i.e., difference between the desired and
actual value, e(θi) = θif − θis) related to each joint. After including the terms e and e the above
equation can be written as follows:
τ i , act = K pi (θ if − θ is ) − K diθis + K ii ∫ e (θ is ) dt i = 1, 2, … , n (12.17)
where θif and θis represent the final and initial angular positions at different joints of the
biped robot, respectively. Therefore, the final control equation that represents the accelera-
tion of the link is
∑ −1 
∑ ∑ 
( )
n n n
qj = Mij ( q )  − Cijk q j q k − Gi  + K pi θ if − θ is − K diθis + K ii ∫ e (θ is ) dt

j=1  j = 1 k = 1 
(12.18)
12.3 MCIWO-based PID controller

Mehrabain and Lucas developed a novel stochastic IWO algorithm [17] inspired from
the colonizing behavior of weeds, which is a common phenomenon in agriculture. So
far, researchers have used this algorithm in many engineering problems because it is not
only robust in nature but also simple and easy to implement. In a standard IWO algo-
rithm, weeds that indicate the feasible solution to the problem are considered as the set of
weeds. Initially, a finite number of weeds are dispersed randomly over the search space.
Depending on the fitness of the problem in a colony, each weed produces new seeds. The
new seeds are distributed randomly over the search space by using a normally distributed
random number with mean equal to zero. The above process will continue until the maxi-
mum number of weeds is reached. Note that only the weeds with better fitness survive
and produce new seeds, and others are eliminated. The above process continues until the
maximum number of iterations is reached or hopefully, the weed with best fitness is clos-
est to the optimal solution. The flowchart shown in Figure 12.5 explains the operation of
the MCIWO algorithm. The step-by-step procedure used for the implementation of the
algorithm is explained as follows:
1.
Initialize a population
The population of the initial solution is being dispersed randomly over the
N-dimensional search space. It is important to note that each position of the weed
signifies one possible solution to the problem.
2.
Reproduction
After growing, the individual weeds are allowed to reproduce new seeds depending
on their own, the lowest and highest fitness values in the colony. Based on the fitness
of the weed, the number of seeds produced by the weed varies from lower to higher
in a linear manner. The weed with better fitness will produce more seeds, and worst
fitness will produce less seeds. The goodness of this algorithm is that, all the worst
and best weeds in the solution space will contribute in the reproduction process, and
the weed that is giving the worst fitness value also shares some useful information in
the evolution process. The number of seeds (S) produced by each weed will be given
by the following equation:
 f − fmin 
S = Floor Smin + × Smax  (12.19)
 fmax − fmin 
where fmin and fmax denote the minimum and maximum fitness value in the colony,
respectively, and Smin and Smax represent minimum and maximum number of seeds
produced by each plant, respectively.
3.
Spatial dispersal
The randomly generated seeds are distributed around the parent weed with a certain
value of variance and mean equal to zero. Moreover, the standard deviation (σ) of the
random function will be reduced nonlinearly from a previously mentioned initial
value (σ initial) to a final value (σfinal) in each generation. The equation governing this
process is as follows:
(Genmax − Gen) n σ
σ Gen = ( initial − σ final ) + σ final (12.20)
(Genmax )n
Start
Initialization:
Choose the parameters, i.e., Kp, Kd, and Ki,
and other parameters, i.e., Generations, initial pop, maximum pop
exponent, sigma_initial, sigma_final, Smax, and Smin
Generate uniform random weeds in the search space
Evaluate the fitness value, i.e., error for each parameter

and assign the rank for entire population
Reproduction:
Based on the fitness of the weeds the new seeds will be produced
after including cosine and chaotic variables are introduced
Spatial dispersal:
The newely produced seeds are normally
distributed in the search space by varying SD
The total no of
weeds and seeds > Pmax
No Yes
Competative Exclusion:
Evaluate the fitness of weeds and seeds
Use all the weeds and seeds in the colony
Choose the better fitnessweeds and seed
in the colony equal to Pmax
Exit weeds = Colony
No
Next generation Stop criteria
Yes
Keep the best Kp, Kd, and Ki
Stop
Figure 12.5 Flow chart showing the step-by-step procedure of the MCIWO algorithm.
where σ initial and σfinal indicate the initial and final standard deviation, respectively,
Genmax and “n” represent the maximum number of generations and modulation
index, respectively.
In order to improve the performance of the algorithm in the present research, the
authors introduced two new terms, namely, chaotic [27] and cosine [28,29] variable in spa-
tial dispersal section. The first variable, i.e., chaotic random variable, is used to distribute
the seeds equally. This will help in enhancing the search space and in minimizing the
chances of the solution being trapped in the local optimum. The chaotic random number
considered in the present study is attained from the Chebyshev map and is
(
X k + 1 = cos k cos −1 ( X k ) (12.21) )
Further, the cosine variable assists in enhancing and exploring the search space in a
better manner, and it will utilize the unused resources in a search space. After intro-
ducing the cosine variable, Equation (12.20) can be modified and is as follows:
(Genmax − Gen) n × cos Gen × σ

σ Gen = ( ) ( initial − σ final ) + σ final (12.22)
(Genmax )n

4.
Competitive exclusion
After passing several iterations, the number of weeds in a colony will reach its maxi-
mum (Pmax) by fast reproduction. At this stage, each weed is allowed to produce new
seeds. The newly produced seeds are then allowed to spread over the search space by
using a chaotic random number. After spreading the seeds over the search area, the
weeds occupy their position and a rank will be assigned along with the parent weeds.
Once it reaches the maximum allowable population, the weeds with lower fitness are
eliminated, and the weeds with better fitness will join the population in the next genera-
tion. This process will continue until the maximum numbers of iterations are reached.
In the present chapter, the gains (i.e., Kp, Kd, and Ki) of the various PID controllers that
are used to control the individual joints of the biped robot are considered as weeds of
the MCIWO algorithm. In this study, the PID controllers are designed only to control 12
joints out of 18 joints of the biped robot. The rest six DOF belongs to the hands and they
are not considered here. Further, each PID controller requires three gains for its opera-
tion. Therefore, in total, 36 gain values are required to control all 12 joints of the biped
robot. One such population of the MCIWO algorithm looks like the one that follows:
866.54
,
400.25 , 958.32
, … , 758.35
, 550.96
, 688.78

K p1 K d1 K i1 K p36 Kd36 Ki36
It is important to note that a fitness value needs to be assigned for each population
of the MCIWO algorithm. Here, the average angular error between the start and end
points of the interval for various joints is considered as the fitness of each population.
The fitness function ( f ) of the MCIWO PID controller is as follows:
b p
1
∑ ∑ (α 1
) 2
f = ijkf − α ijks (12.23)
b j=1
2 k =1
where b denotes the number of intervals considered in one step, and p indicates the
number of joints, respectively, for which the controllers are designed.
12.4 MCIWO-NN–based PID controller

In the present research work, the authors used a feed-forward hierarchical type of NN
(Figure 12.6) that consists of input, hidden, and output layers that are used to tune the gains
bias1
bias2
v11
w11
w12 w21 v12
Kp1
w22 v13
1
w13 Kd1
2
Ki1
Kp12
Kd12
11
12 Ki12
Input layer Output layer
Hidden layer
Figure 12.6 Structure of the neural network.
of the PID controller. The architecture of the NN consists of 12 input neurons that indicate
the angular error of each joint at different instances of time. The output layer consists of 36
neurons that represent the proportional, integral, and derivative gains of the PID control-
lers that are used to control each joint of the biped robot. The number of neurons in the
hidden layer has been decided with the help of a systematic study. It is important to note
that the performance of a NN depends on the connecting weights between the neurons
of input-hidden and hidden-output layers. During the parametric study, the connecting
weights, the bias value of the network and the coefficients of transfer function values of
individual layers are optimized with the help of the MCIWO algorithm. A batch mode
of training has been implemented to train the NN. Once the structure of the NN is opti-
mized, it can be used to predict the gains of the PID controller in a more adaptive manner.
The operating principle of a MCIWO-NN is shown in Figure 12.7.
Let us consider that the change in the angular position of joints, such as Δθ 1, Δθ 2, Δθ 3,
Δθ4, Δθ 5, Δθ 6, Δθ 7, Δθ 8, Δθ 9, Δθ 10, Δθ 11, Δθ 12, of the biped robot at equal intervals of time as
Inputs: change in angle of links at regular time intervals

i.e θ1 θ12
Neural network module Weight optimization using MCIWO/PSO

for neural network structure
Outputs: tuning parameters required for different joints

i.e Kp1, Kd1, Ki1 ...........................Kp12, Kd12, Ki12
Change in error, actual torque required, ZMP,

and dynamic balance
Figure 12.7 Flow chart showing the structure of the MCIWO-NN algorithm.
the input to the NN and the gains of the PID controller, such as, Kp, Kd, and Ki of each joint
of the biped robot, are taken as outputs of the NN. Here the main objective of using the
MCIWO is to optimize the structure of the NN—that is, connecting weights, bias values
and coefficients of transfer function of the NN. One such population of MCIWO in this
case looks like the following:
0.25
, 0.4
, 0.72
, … , 0.34
, 0.00009

, 0.00005

, 2 , 5 , 7
w11 w12 w13 v1434 b1 b2 c11 c12 c13
Once the gains of the PID controller are obtained from the NN, it is helpful to control the
joint motors. The root mean square (RMS) error of angular displacement of all the joints
between the end of each interval (αijkf) and the start of each interval (αijks) is considered as
the fitness ( f ) of each population of MCIWO-NN and is given as follows:
d  b p 
1
∑ ∑ 1 1
∑ (α ) 2
f = ijkf − α ijks  (12.24)
d b 2 
i=1  j=1 k =1 
where d represents the number of training scenarios and other terms have their usual
meaning.
12.5 Results and discussion

In the present research, two types of controllers, namely, optimal PID controller (i.e.,
MCIWO-tuned PID controller) and adaptive PID controller (i.e., MCIWO-NN-tuned PID
controller) are developed to control the motion of a biped robot while ascending and
descending the staircase. Once the controllers are developed, initially their performances
are tested in computer simulations. The parameters (i.e., length, mass, and inertia) related
to the biped robot that is used in the present study are given in Table 12.1. Further, the
results related to the present research work are discussed in the subsequent subsections.
12.5.1 Ascending the staircase

The results related to the optimal and adaptive PID controllers are presented here. While
designing the optimal controller, the gains (i.e., Kp, Kd, and Ki) of the PID controller are
varied within their range. As the performance of the MCIWO algorithm depends on the
values of its parameters, a parametric study has been conducted to estimate the optimal
parameters of the MCIWO algorithm that are helpful in obtaining the optimal gains of the
Table 12.1 Parameters related to the biped robot

Link Length (m) Mass (kg) Inertia (kg m 2)
Lower limb of the leg 0.093 0.1190 0.00007440
Upper limb of the leg 0.093 0.0700 0.00012600
Ankle to foot 0.033 0.2460 0.00003300
Upper arm 0.060 0.1930 0.00008569
Lower arm 0.060 0.0592 0.00012000
Trunk 0.122 0.0975 0.00017700
Pelvis 0.037 0.1940 0.00671000
PID controller. After conducting the study, the optimal values related to the parameters
of MCIWO algorithm are seen to be as follows: final value of standard deviation (σfinal) =
0.00001; initial standard deviation (σ initial) = 3%; maximum number of seeds (Smax) = 5; mini-
mum number of seeds (Smin) = 0; initial population size (npopi) = 10; final population size
(npopf) = 25; nonlinear modulation index (n) = 2; and generations (Gen) = 30. Further, a
systematic study has also been conducted in the MCIWO-NN approach to obtain the suit-
able transfer functions for hidden and output layers, and to find the optimal number of
neurons in the hidden layer of the network. From this study, it has been observed that
the log-sigmoid and tan-sigmoid transfer functions are seen to exhibit better performance
with the hidden and output layers, respectively. Further, the number of neurons in the
hidden layer that produce better fitness is found to be equal to 14. The numbers of neu-
rons in the input, hidden, and output layers are kept equal to 12, 14, and 36, respectively.
Therefore, the numbers of connecting weights used in the network are seen to be equal to
672 (12 × 14 + 14 × 36). Finally, the number of variables that represent the NN architecture
are found to equal 675 that includes connection weights, one bias value of the network and
two coefficients of transfer functions.
For solving the above problem, the weights of the network are varying in the range
of 0.0–1.0, the bias values are varied in the range of 0.0–0.0001 and the coefficients are
transfer functions that are varied in the range of 1–10. In addition to the above study, here
also a parametric study is conducted to determine the optimal parameters of the MCIWO
algorithm that evolve the NN architecture. The optimal MCIWO parameters obtained
through the study are as follows: final value of standard deviation (σfinal) = 0.00001; initial
standard deviation (σ initial) = 4%; modulation index (n) = 4; final population (npopf) = 10;
initial population (npopi) = 5; maximum number of seeds (Smax) = 4; minimum number
of seeds (Smin) = 0; and maximum number of generations (Gen) = 50. Once the optimal
parameters of MCIWO and MCIWO-NN algorithms are identified, a comparative study
has been conducted in computer simulations on the said algorithms in terms of variation
of error, estimation of the torque required at each joint, variation of ZMP and DBM in
X- and Y-directions of the biped robot.
The variations of error at different joints of the biped robot using MCIWO and
MCIWO-NN algorithms are shown in Figure 12.8. It can be observed that the magnitude
of error reaches zero at the end of every interval. Further, the enlarged views show that the
MCIWO-NN (i.e., adaptive PID controller) converges faster than the MCIWO (i.e., optimal
controller) controller. It might be due to the reason that the trained NN could have pre-
dicted the values of gains of the controller with respect to the magnitude of error in the
joint angles, whereas in the MCIWO approach the gain values are constant and fixed, and
are not varying whenever there is a change in the magnitude of input signal to a particular
joint. This adaptiveness of the MCIWO-NN PID controller helped in converging the error
faster than the MCIWO PID controller.
Figure 12.9 shows the torque required at different joints of the biped robot while
ascending the staircase. It can be observed that the adaptive (MCIWO-NN) PID controller
requires less torque compared to optimized (MCIWO) PID controller. The reason for the
better performance is the same as the one discussed earlier. Moreover, Figure 12.10 shows
the variation of ZMP in X- and Y-directions of the biped robot while ascending the stair-
case. It can be observed that the ZMP of the MCIWO-NN–based PID controller is close to
the center of the foot when compared to the MCIWO-based PID controller. It means that
the adaptive PID controller has produced more dynamically balanced gaits than the opti-
mal PID controller.
(a) (b) 0.15

0.04 MCIWO MCIWO
MCIWO-NN 0.10 MCIWO-NN
0.03
Error at joint 3
0.05
Error at joint 2
0.02
0.00
0.01
0.00 –0.05
–0.01 –0.10
–0.02 –0.15
–0.03 –0.20
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
Time in milli seconds Time in milli seconds
MCIWO
0.0020 MCIWO 0.002 MCIWO-NN
0.0015 0.000 Enlarged view
MCIWO-NN
Error at joint 3
Enlarged view
Error at joint 2
0.0010 –0.002
0.0005 –0.004
0.0000
–0.006
–0.0005
–0.0010 –0.008
–0.0015 –0.010
1 2 3 4 5 21 22 23 24 25
(c) (d)
0.10 0.04 MCIWO
0.05 0.03 MCIWO-NN
Error at joint 5
Error at joint 4
0.00 0.02
–0.05 0.01
–0.10 0.00
–0.15 –0.01
MCIWO
–0.20 –0.02
MCIWO-NN
–0.25 –0.03
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
Enlarged view MCIWO 0.0010

0.025
MCIWO-NN 0.0005
0.020
Error at joint 4
0.0000 Enlarged View

Error at joint 5
0.015 –0.0005
0.010 –0.0010
0.005 –0.0015
0.000 –0.0020 MCIWO
–0.005 –0.0025 MCIWO-NN
–0.0030
15 16 17 18 19 20 0 1 2 3 4 5
(e) (f )
MCIWO MCIWO
0.15 0.15 MCIWO-NN
MCIWO-NN
Error at joint 8
0.10 0.10
Error at joint 9
0.05 0.05
0.00 0.00
–0.05 –0.05
–0.10
0 5 10 15 20 25 30 35
0 5 10 15 20 25 30 35
Time in milli seconds
Time in milli seconds
MCIWO
0.008 MCIWO 0.012 MCIWO-NN
0.006 MCIWO-NN 0.010 Enlarged view
Error at joint 8
Error at joint 9
0.008
0.004
0.006
0.002 0.004
0.000 0.002
0.000
–0.002 –0.002
17.0 17.5 18.0 18.5 19.0 19.5 20.0 11 12 13 14 15

(g) 0.20
(h)
MCIWO 0.2
0.15 MCIWO-NN
Error at joint 11
0.0
Error at joint 10
0.10
–0.2
0.05
–0.4
0.00
–0.6 MCIWO
–0.05
–0.8 MCIWO-NN
–0.10
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
MCIWO
0.012 Enlarged View MCIWO 0.02 MCIWO-NN
0.010 MCIWO-NN 0.01 Enlarged view
Error at joint 10
Error at joint 11
0.008
0.006 0.00
0.004 –0.01
0.002
0.000 –0.02
–0.002
–0.004 –0.03
–0.006 –0.04
12.5 13.0 13.5 14.0 14.5 15.0 17 18 19 20

Figure 12.8 Variation of error at different joints of the biped robot. (a) Joint 2, (b) Joint 3, (c) Joint 4,
(d) Joint 5, (e) Joint 8, (f) Joint 9, (g) Joint 10 and (h) Joint 11.
(a) (b)
0.55 MCIWO–NN
0.18 MCIWO–NN
MCIWO MCIWO
0.50 0.16
Torque required at different joints (N m)

0.45
0.14
0.40
0.12
0.35
0.30 0.10
0.25 0.08
0.20
0.06
0.15
0.04
0.10
0.05 0.02
0.00 0.00
Joint 2 Joint 3 Joint 4 Joint 5 Joint 8 Joint 9 Joint 10 Joint 11
Swing leg joints Stand leg joints
Figure 12.9 Torque required at different joints of the biped robot while walking on ascending the
staircase. (a) Swing leg and (b) stand leg.
0.3
MCIWO-NN MCIWO
0.2
0.1
Y-ZMP (m)
0.0
–0.1
–0.2
–0.3
–0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
X-ZMP (m)
–0.06 MCIWO-NN MCIWO
Enlarged view
–0.07
Y-ZMP (m)
–0.08
–0.09
–0.10
–0.05 –0.04 –0.03 –0.02 –0.01 0.00 0.01 0.02 0.03 0.04 0.05
X-ZMP (m)
Figure 12.10 Variation ZMP in x- and y-directions while walking on ascending the staircase.
12.5.2 Descending the staircase

Like designing the controller for ascending the staircase, in this case also a similar
study is conducted for determining the optimal gain values and optimal structure of
NN in MCIWO and MCIWO-NN algorithms, respectively. The optimal values of the
MCIWO parameters, such as modulation index, initial standard deviation, final standard
deviation, minimum number of seeds, maximum number of seeds, initial population,

final population, and maximum number of generations that yield the best fitness are
found to be equal to 2, 5%, 0.00001, 0, 2, 5, 25, and 40, respectively. Further, the systematic
study conducted for deciding the number of hidden neurons in the MCIWO-NN algo-
rithm resulted in 16 neurons. The total numbers of variables that define the architecture
of NNs is seen to be equal to 771. Alongside, the parameters of MCIWO related to the
structural optimization of NN obtained from the parametric study is found to be equal
to 0.00001, 4%, 2, 15, 5, 2, 0, and 70 for final standard deviation, initial standard devia-
tion, modulation index, final population, initial population, maximum number of seeds,
minimum number of seeds, and maximum number of generations, respectively.
The convergence error for descending the staircase for both the MCIWO and
MCIWO-NN PID controllers has shown a similar trend as that of ascending the staircase.
It has also been observed that the adaptive controller (i.e., MCIWO-NN controller) has
shown better performance when compared with the optimal controller (i.e., MCIWO con-
troller) in terms of convergence of the error.
Similarly, the average torque required at various joints of the biped robot
(Figure 12.11) to execute the generated gait is seen to be less with the adaptive PID con-
troller when compared with the optimal PID controller. The variation of X- and Y-ZMP
for the biped robot while descending the staircase is shown in Figure 12.12. Further, the
comparison of DBMs of the biped robot while ascending and descending the staircase
using MCIWO and MCIWO-NN PID controllers are also shown in Figure 12.13. It has
been observed that the DBM for ascending the staircase is more when compared with
descending the staircase. This observation is exactly falling in line with the experience
of human beings.
In this case also, the MCIWO-NN PID controller resulted in more dynamically bal-
anced gaits when compared with the MCIWO-based PID controller. This may be due to
the reason that the adaptive PID controller has produced a gait that pushed the ZMP more
toward the center of the stance foot. During computer simulations, the MCIWO-NN–based
PID controller is found to perform better than the MCIWO-based PID controller. The
optimal gait data obtained from the MCIWO-NN PID controller for both the ascending
(a) (b)
MCIWO–NN 0.150 MCIWO–NN
MCIWO MCIWO
0.5
0.125
0.4
0.100
0.3
0.075
0.2
0.050
0.1 0.025
0.0 0.000
Joint 2 Joint 3 Joint 4 Joint 5 Joint 8 Joint 9 Joint 10 Joint 11
Swing leg joints Stand leg joints
Figure 12.11 Torque required at different joints of the biped robot while walking on descending the
staircase. (a) Swing leg and (b) stand leg.
0.3
MCIWO-NN MCIWO
0.2
0.1
Y-ZMP (m)
0.0
–0.1
–0.2
–0.3
–0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
X-ZMP (m)
0.100 MCIWO-NN MCIWO

Enlarged view
0.095
0.090
Y-ZMP (m)
0.085
0.080
0.075
0.070
0.065
0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15

X-ZMP (m)
Figure 12.12 Variation of ZMP in X- and Y-directions while walking on descending the stair-case.
(a) (b)
MCIWO MCIWO
0.030 MCIWO-NN 0.0050 MCIWO-NN
0.0045
0.025
0.0040
X-DBM (m)
Y-DBM (m)
0.020 0.0035
0.0030
0.015
0.0025
0.010 0.0020
Stair ascend Stair descend Stair ascend Stair descend
Terrains Terrains
Figure 12.13 DBM on ascending and descending the staircase. (a) X-DBM and (b) Y-DBM.
and descending case are fed to the real robot. The schematic diagrams showing the real-
time walking of the biped robot while ascending and descending the staircase are shown
in Figures 12.14 and 12.15, respectively. From this it can be observed that the biped robot
has successfully negotiated the staircase with the help of the gait obtained by the adap-
tive PID controller.
0 ms 5 ms 10 ms 15 ms
20 ms 25 ms 30 ms 35 ms
Figure 12.14 Real-time robot walking on ascending the staircase.
0 ms 5 ms 10 ms 15 ms
20 ms 25 ms 30 ms 35 ms
Figure 12.15 Real-time robot walking on descending the staircase.

12.6 Conclusions
In this research work, an attempt is made to develop a torque-based PID controller for the
biped robot while ascending and descending the staircase. A metaheuristic optimization
algorithm, that is, a MCIWO algorithm, has been used to find the optimal gains of the PID
controller. Further, the same MCIWO algorithm has also been used to optimize the struc-
ture of the NN. It has been observed that in both the ascending and descending cases, the
MCIWO-NN–based PID controller is found to perform better than the MCIWO-based PID
controller. This may be due to the nature of the NN that is able to produce adaptive gains
for the PID controller when there is a change in the value of magnitude of error for each
joint. Further, the developed controllers are successfully tested in computer simulations.
Finally, the gait obtained by the adaptive (i.e., MCIWO-NN controller) PID controller is
tested on a real biped robot.
References
1. Juricic D, Vukobratovic M., Mathematical Modeling of Biped Walking Systems, ASME Publications.
Vol. 72-WA/BHF13, 1972.
2. Vundavilli PR, Sahu SK, Pratihar DK, “Dynamically balanced ascending and descending gaits
of a two-legged robot”, Int J Humanoid Robotics, Vol. 4, No. 4, pp. 717–751, 2007.
3. Vundavilli PR. Sahu SK, Pratihar DK, “Online dynamically balanced ascending and descend-
ing gait generations of a biped robot using soft computing”, Int J Humanoid Robotics Vol. 4 No.4,
pp. 777–814, 2007.
4. Ziegler JG, Nichols NB, “Optimum settings for automatic controllers”, Trans ASME, Vol. 64,
pp. 759–768, 1942.
5. Astrom KJ, Wittenmark B, Adaptive Control, New York: Addison-Wesley; 1995.
6. Jaung J-G, Haung M-T, Liu W-K, “PID control using presearched genetic algorithms for a
MIMO system”, IEEE Trans Syst Man Cybern Part C: Appl Rev, Vol. 38, No. 5, pp. 716–727,
2008.
7. Gaing Z-L, “A particle swarm optimization approach for optimum design of PID controller in
AVR system”, IEEE Trans Energy Convers, Vol. 19, No. 2, pp. 384–391, 2004.
8. Mandava RK, Manas KS, Vundavilli PR, “Optimization of PID controller parameters for
3-DOF planar manipulator using GA and PSO”, In Bennett A. (ed) New Developments in Expert
Systems Research, Computer Science, Technology and Applications, pp.67–88, Publisher: Nova
Science Publishers, 2015.
9. Muhammet Ü, Ayça Ak, Vedat T, Hasan E, “Optimization of PID controllers using ant colony
and genetic algorithms”, Stud Comput Intell, Vol. 449, pp. 5–68, 2013, Springer-Verlag, Berlin,
Heidelberg.
10. Gandomi AH, Yang X-S, Alavi AH, “Cuckoo search algorithm: A metaheuristic approach to
solve structural optimization problems”, Eng Comput, Vol. 29, pp. 17–35, 2013.
11. Latha K, Rajinikanth V, “2DOF PID controller tuning for unstable systems using bacte-
rial foraging algorithm”, In Panigrahi B.K., Das S., Suganthan P.N., Nanda P.K. (eds) Swarm,
Evolutionary, and Memetic Computing. SEMCCO 2012. Lecture Notes in Computer Science, vol.
7677. Springer, Berlin, Heidelberg, 2012.
12. Zhao Z-Y, Tomizuka M, Isaka S, “Fuzzy gain scheduling of PID controllers”, IEEE Trans Syst
Man Cybern, Vol. 23, No. 5, pp. 1392–1398, 1993.
13. Hazzab A, Bousserhane IK, Zerbo M, Sicard P, “Real-time implementation of fuzzy gain
scheduling of PI controller for induction motor machine control”, Neural Process Lett, Vol. 24,
pp. 203–215, 2006.
14. Yu K-W, Hsu J-H, “Fuzzy gain scheduling PID control design based on particle swarm optimi-
zation method”, Proceedings Second International Conference on Innovative Computing, Information
and Control, ICICIC’07, 337. Kumamoto; 2007.
15. Chaiyatham T, Ngamroo I, “Alleviation of power fluctuation in a microgrid by electrolyzer

based on optimal fuzzy gain scheduling PID control”, IEEJ Trans Electr Electron Eng, Vol. 9,
pp. 158–164, 2014.
16. Syed FU, Kuang ML, Smith M, Okubo S, Ying H, “Fuzzy gain scheduling proportional-integral
control for improving engine power and speed behavior in a hybrid electric vehicle”, IEEE
Trans Veh Technol, Vol. 58, No. 1, pp. 69–84, 2009.
17. Devaraj D, Selvabala B. “Real-coded genetic algorithm and fuzzy logic approach for real-time
tuning of proportional-integral-derivative controller in automatic voltage regulator”, IET Gener
Transm Distrib, Vol. 3, No. 7, pp. 641–649, 2009.
18. Mehrabian AR, Lucas C, “A novel numerical optimization algorithm inspired from weed colo-
nization”, Ecol Inform, Vol. 1, No. 4, pp. 355–366, 2006.
19. Khalilpour M, Razmjooy N, Hosseini H, Moallem P, “Optimal control of DC motor using
invasive weed optimization (IWO) algorithm”, Majlesi Conference on Electrical Engineering,
August 2011-Majlesi New Town, Isfahan, Iran.
20. Chen Z, Wang S, Deng Z, and Zhang X, “Tuning of auto-disturbance rejection controller
based on the invasive weed optimization”, Sixth International Conference on Bio-Inspired
Computing: Theories and Applications, 2011.
21. Cembrano G, Torras C, and Wells G, “Neural networks for robot control” IFAC Artificial
Intelligence in Real Time Control, Valtmcia, Spain, 1994.
22. Ghorbani R, Wu Q, Wang G, “Nearly optimal neural network stabilization of bipedal standing
using genetic algorithm”, Eng Appl Artif Intell, Vol. 20, pp. 473–480, 2007.
23. Jiang ZH, “Trajectory control of robot manipulators using a neural network controller”, In
Jimenez A., Al Hadithi B.M. (eds) Robot Manipulators Trends and Development, pp. 361–377,
Publisher: InTech, Chapters published March 01, 2010.
24. Pezeshki S, Badalkhani S and Javadi A, “Performance analysis of a neuro-PID controller applied
to a robot manipulator”, Int J Advanced Robotic Sys, Vol. 9, pp. 1–10, 2012.
25. Mandava RK, Vundavilli PR, “Forward and inverse kinematic based full body gait generation
of biped robot”, IEEE International Conference on Electrical, Electronics, and Optimization
Techniques, pp. 3301–3305, 2016.
26. Mandava RK, Vundavilli PR, “Whole body motion generation of 18-DOF biped robot on flat
surface during SSP & DSP”, Int J Model Ident Contr, 2017, Indescience publications (In press).
27. Ghasemi M, Ghavidel S, Aghaei J, Gitizadeh M, Falah H, “Application of chaos-based chaotic
invasive weed optimization techniques for environmental OPF problems in the power sys-
tem”, Elsevier Chaos, Solitons Fractals, Vol. 69, pp. 271–284, 2014.
28. Basak A, Pal S, Das S and Abraham A, “A modified invasive weed optimization algorithm for
time-modulated linear antenna array synthesis”, International conference on IEEE Congress on
Evolutionary Computation, Barcelona, Spain, pp. 1–10, 18–23 July 2010.
29. Roy GG, Das S, Chakraborty P, and Suganthan PN, “Design of non-uniform circular antenna
arrays using a modified invasive weed optimization algorithm”, IEEE Trans Antennas Propag,
Vol. 59, No. 1, pp. 110–118, 2011.
chapter thirteen
Modeling fertility in Murrah bulls

with intelligent algorithms
Adesh Kumar Sharma, Ravinder Malhotra,
and Atish Kumar Chakravarty
ICAR-National Dairy Research Institute
Contents
13.1 I ntroduction......................................................................................................................... 247
13.2 Materials and methods...................................................................................................... 249
13.2.1 Data........................................................................................................................... 249
13.2.2 Machine learning algorithms............................................................................... 249
13.2.2.1 Neural network models.......................................................................... 250
13.2.2.2 NN model building with R programming tools................................. 252
13.2.2.3 Support vector regression models......................................................... 252
13.2.2.4 Decision tree models............................................................................... 253
13.2.2.5 Decision tree model building with R programming tools................254
13.2.2.6 Random forest models............................................................................ 255
13.2.2.7 Linear regression models........................................................................ 256
13.2.2.8 Model evaluation error metrics.............................................................. 257
13.3.1 Neural Network Models........................................................................................ 257
13.3.2 Support vector regression models........................................................................ 257
13.3.3 Decision tree regression model............................................................................. 258
13.3.4 Random forest regression model.......................................................................... 259
13.3.5 Linear model for regression.................................................................................. 260
13.3.6 Machine learning models vis-à-vis linear regression model........................... 260
13.4 Conclusion........................................................................................................................... 261
References...................................................................................................................................... 261
13.1 Introduction
India is popularly known for its best buffalo germplasm throughout the world, being
responsible for more than 57% of the world buffalo population. Buffalo is considered as the
major dairy animal and backbone of the Indian dairy industry. The country with a 108.7
million (Anon., 2014) buffalo population ranks first in the world. India ranks first in milk
production, achieving an annual output of 155.5 million tonnes with per capita availability
of 337 g/day in 2015–2016 (Anon., 2017). Buffaloes contribute 53% (82.41 million tonnes) to
the total milk production in India.
The average productivity (5.76 kg/day/animal) of buffaloes is more than that of
indigenous cattle (3.41 kg/day/animal) in the country (Anon., 2017). Besides this, buffaloes
247
contribute significantly toward meat production, draft power, and dung for manure and
fuel. Thus, the buffalo species is the most important and indispensable component of the
livestock sector in the country.
Murrah is one of the best milch breeds of buffalo. The population of Murrah buffaloes
is 48.25 million in India, out of which 11.68 million is pure and 36.56 million buffaloes
are graded. Murrah buffaloes contribute 44.39% to the total buffalo population in India.
Murrah buffaloes are known for high milk production and a higher fat percentage, which
is almost twice than that of cow milk. Haryana state (especially Bhiwani, Jhajjar, Jind,
and Rohtak d istricts) is the home tract of Murrah buffaloes, but the graded Murrah buf-
faloes are found throughout the country owing to their higher milk production potential
coupled with adaptation to wide ecological conditions and feed conversion efficiency. The
Murrah buffalo is basically the center of attraction for dairying among the various buffalo
breeds available in India (Mir et al., 2015). Hence, the Murrah breed of buffalo has been
appropriately named as the “black gold” of dairy animals in India. Also, several coun-
tries including Bangladesh, Brazil, Bulgaria, Egypt, etc., have used Murrah as an improver
breed for upgrading their native buffaloes.
Until today, in many countries, the major attention for improvement of the dairy
animals is through increasing the milk production. Although the result has been found
satisfactory, over the years, it has been observed that increasing milk of the dairy animal
deteriorates the reproduction performance as milk production traits are negatively associ-
ated with the fertility of the animals (Berry et al., 2011). Under these constraints, an assess-
ment is required to compare the predictable fertility in relation to the milk production.
The fertility of the breeding bull in the herd may be assessed based on total pregnancy
corresponding to the total number of inseminations.
Accordingly, various studies have been carried out to predict fertility in dairy ani-
mals using classical regression analysis techniques (De Haas et al., 2007; Patil et al., 2014;
Cook and Green, 2016; Utt, 2016; Eriksson et al., 2017). Conventional regression techniques
are based on the assumption of a specific parametric function such as, linear, quadratic,
etc., to fit the data that could be rather rigid for modeling any type of relationship (Piles
et al., 2013). Alternatively, nonparametric methods like emerging machine learning (ML)
algorithms could be applied for the intelligent analysis of such traits (González-Recio
et al., 2014) as they do not involve prior knowledge of any parametric function. However,
they can adapt intricate relationships between dependent and independent variables as
well as complex dependencies among explanatory variables. Also, they are quite flex-
ible and can learn arbitrarily complex patterns when sufficient data are presented. ML
algorithms can realize how to perform important tasks by generalizing from examples,
i.e., automatic predictions from instances of desired behavior or past observations. Thus,
ML is the study of intelligent computer algorithms that improve automatically through
incidence (Ramón et al., 2012; Shalev-Shwartz and Ben-David, 2014). These learning meth-
ods have found several applications in performance modeling and evaluation in animal
sciences (Caraviello et al., 2006; Shahinfar et al., 2012; González-Recio et al., 2014; Murphy
et al., 2014; Shahinfar et al., 2014; Hempstalk et al., 2015; Fenlon et al., 2016; Borchers et al.,
2017). However, the majority of the studies in this area of research have been conducted
outside India. Very few studies have recently been carried out in India including pioneer-
ing work by the authors at this institute (Sharma et al., 2006; Sharma et al., 2007, 2013;
Panchal et al., 2016, 2017). Nevertheless, a bull’s fertility prediction using ML techniques
has not been attempted especially in Murrah buffaloes. Hence, in this chapter, the authors
have investigated various emerging ML algorithms to predict the fertility in Murrah bulls
being maintained at the ICAR-National Dairy Research Institute (NDRI), Karnal, India.
Chapter thirteen: Modeling fertility in Murrah bulls with intelligent algorithms 249
Table 13.1 Summary statistics of the Murrah breeding bulls’ fertility data set
Variable Mean SD SE Minimum Maximum Range
Birth weight 35.8723 5.3674 0.7829 25 50 25
Weight (3-m) 68.3996 17.9533 2.6188 43 116 73
Weight (6-m) 111.2570 19.4178 2.8324 76 170 94
Weight (9-m) 159.0518 36.7451 5.3598 110 270 160
Weight (12-m) 272.8443 52.2760 7.6252 172 363 191
Weight (24-m) 382.3406 51.6531 7.5344 264 493 229
Age at first calving 3.2955 0.7420 0.1082 2.1 4.9 2.8
Post-thaw motility 49.0425 5.3809 0.7849 40 60 20
Conception rate 62.4381 16.9481 2.4721 30.43 94.74 64.31
SD: standard deviation; SE: standard error.
13.2 Materials and methods

13.2.1 Data
The following information pertaining to Murrah bulls was collected from the NDRI
archives for the period, July 1993–2016: animal number, date of birth, service sire number,
dam number, date of first artificial insemination, date of successful artificial insemina-
tion, birth weight (kg); 3rd, 6th, 9th, 12th, and 24th months’ body weights (kg); age at first
calving (years); post-thaw motility (%); and bull conception rate (%). The data c omprised
3200 records of artificial inseminations in buffaloes and 55 breeding bulls’ records.
Missing data were imputed through the multiple imputation (MI) technique (Berglund
and Heeringa, 2014) implemented using advanced programming tools under the SAS 9.3
software. Consequently, the summary statistics computed from the data are presented in
Table 13.1.
13.2.2 Machine learning algorithms

ML is the domain of computational intelligence, which constructs computer programs that
automatically improve with experience (Witten and Frank, 2005; Smola and Vishwanathan,
2008; Daumé III, 2012; Shalev-Shwartz and Ben-David, 2014). ML paves the way to the treat-
ment of real-life problems related to data analysis, sometimes overlooked by research-
ers, i.e., nonlinearity, pattern recognition, classification, adaptivity, optimization, missing
variables, massive data sets, data management, causality, representation of knowledge,
parallelization, etc. However, there is no systematic approach that can be employed a priori
to identify the most fitting ML method for a specific task. Generally, several prominent
ML algorithms are investigated on a new application. Hence, this chapter investigates the
intelligent ML algorithms based upon supervised learning methods (Du and Swamy, 2014)
such as neural networks (NNs), support vector regression (SVR), decision tree (DT), ran-
dom forest (RF), and linear model (LM) for regression, with different settings for modeling
and simulation experiments to efficiently predicting breeding bulls’ fertility in Murrah
buffaloes. The open-source R programming language has been used. A brief explanation
about each of the aforementioned ML techniques with special focus on corresponding R
programming tools is given in the following sections. However, a complete mathematical
pedagogy for these ML techniques (in perspective with the R) can be found in Hastie et al.
(2009) and Kuhn and Johnson (2013).
13.2.2.1 Neural network models

NN models are well known for their adaptive capabilities to “learn” relationships among
variables. NNs are intelligent techniques for model fitting, which do not depend upon
conventional assumptions necessary for traditional regression models (Kominakis et al.,
2002; Sharma et al., 2007). The NN models are capable to efficiently analyze multivariate
response data. An NN model is similar to a nonlinear regression model, with the excep-
tion that the former can encompass numerous model parameters. Thus, NN models can
approximate any continuous function (Hornik, 1991; Li, 2008).
NN models are processing devices (algorithms) that are founded upon the neuronal struc-
ture of the mammalian cerebral cortex (e.g., human brain and nervous system) but on much
smaller scales. These models comprise a large amount of highly interconnected processing
elements (called artificial neurons or neurons in the following discussion) working parallel to
solve nonlinear and complex problems. Unlike conventional number crunching algorithms,
NN models work on the principle: learn by example. An NN model performs nonparametric
nonlinear regression through the learning process. This adaptive process involves adjust-
ments to synaptic weights that exist between the neurons just like the human brain.
An NN model is defined as a computing system made up of a number of simple, highly
interconnected processing elements, which process information by their dynamic state
response to external inputs (Hecht-Nielsen, 1990). NN models are suitable for intelligent
predictive analytics. They allow complex nonlinear relationships between the response
variable and its predictors.
An NN comprises a network of interconnected neurons organized into layers (Figure13.1).
The inputs form the left-most layer, while outputs form the right-most layer. There may
be intermediate layers called hidden layers, and artificial neurons contained in these lay-
ers are known as hidden neurons. The coefficients attached to these inputs (i.e., p
redictors)
are termed as synaptic weights, which are generally random numbers of very small mag-
nitude (Haykin, 2005). The outputs (i.e., forecasts) are obtained by a linear combination of
the inputs. These synaptic weights are computed, in the NN framework, by means of a
learning algorithm that minimizes a cost function such as sum of squared error (SSE) or
mean sum of squared error (MSE).
w00 w0n
w01 v0
x0
w10
v1
w11 net0(p)
x1 wln
w20 v2
w21
wm0 vm
xn wm1
wmn
Figure 13.1 A feed-forward NN model.

Consider the input dimension is given by n ( n ∈ Z+ ) while the number of hidden

eurons is denoted by m ( m ∈ Z+ ) . Z+ signifies the set of positive integers. The training
n
{ }
pairs are represented by D = x ( p ) , t( p ) , where x( p ) and t( p ) denote input and corresponding
target patterns; p = 1, 2, ..., P; P ∈ Z+ , is the number of training patterns; and the index p
is always assumed to be present implicitly. The matrix w denotes the input to the hidden
neurons connection strength, wij is the (i, j)th element of the matrix w representing the con-
nection strength (synaptic weights) between the jth input and the ith hidden layer neuron.
With this nomenclature, the net input to the ith hidden layer neuron is given by
n
neti = ∑w x + θ
j=1
ij j
(1)
i = w i ⋅ x + θ i(1) (13.1)
where θ i(1) is the bias of the ith hidden layer neuron. The output from the ith hidden layer
neuron is given by
hi (x ) = f (1) (neti ) (13.2)
where f (1) (⋅) is a nonlinear transfer function.

The transfer function (also known as activation function or squash function) computes
the output from a summation of the weighted inputs of a neuron. Generally, the transfer
functions employed on neurons in the hidden layer are nonlinear that introduce nonlin-
earities into the network. The selection of transfer functions may strongly affect complex-
ity and performance of NN models. Generally, the sigmoidal functions are used as transfer
functions. The net input to the output neuron may be defined similarly as Equation (13.1)
as follows:
m
net = ∑v h +θ
i=1
i i
(2)
= v ⋅ h + θ (2) (13.3)
where vi represents the synaptic (or connection) strength between the ith hidden layer
neuron and the output neuron, while θ(2) is the bias of the output neuron.
Introducing a bias neuron x0 with input value as +1, Equation (13.1) can be rewritten as
n
neti = ∑ w x = W ⋅ x (13.4)
j=0
ij j i
where wi 0 = Wi 0 ≡ θ i(1) and Wi is the weight vector wi (associated with the ith hidden neu-
ron) augmented by the 0th column corresponding to the bias. Similarly, introducing an
auxiliary hidden neuron (i = 0) such that h0 = +1, allows us to redefine Equation (13.3) as
m
net = ∑ v h = V ⋅ h (13.5)
i=0
i i
where v0 ≡ θ (2).
The equation for the network output neuron is given by
neto = f (2) (net) = net (13.6)

where f (2) (⋅) is a linear function.

The notations are illustrated diagrammatically in Figure 13.1. This diagram depicts an
n-input, m-hidden neuron, and one-output feed-forward NN model. Such an NN model
is trained to fit a data set D by minimizing an error function (or performance function) as
∑ ( net )
1 ( p) 2
F= o − t( p ) (13.7)
P p=1
This function is minimized using any standard optimization method like Broyden–
Fletcher–Goldfarb–Shanno (BFGS) optimization technique, etc.
The NN discovers knowledge from complicated or imprecise data, which is employed
to find patterns and reveal trends that are too complex to be observed either by human
beings or classical statistical techniques. A substantially trained NN acts as an expert
system to analyze data in the specific domain of information for which it was trained.
13.2.2.2 NN model building with R programming tools

The Comprehensive R Archival Network (CRAN) comprises some packages for d eveloping
NN models, i.e., nnet, neuralnet, etc. The nnet and neuralnet packages have been investigated
in this chapter. The functions in the nnet package facilitate development and validation of
feed-forward multi-layer perceptrons (MLPs). The nnet functions offer sufficient dexterity
to develop precision models by varying parameters’ settings while learning. Training of
NN models with nnet and neuralnet packages is accomplished by using the backpropa-
gation learning algorithm equipped with the BFGS optimization method (Venables and
Ripley, 2002); and resilient backpropagation (rprop) with or without weight backtracking
(Riedmiller and Braun, 1993) or the modified globally convergent version by Anastasiadis
et al. (2005), respectively. The later package supports adaptable attuning by means of
custom-choice of error and transfer function. Furthermore, the computation of generalized
weights (Intrator and Intrator, 1993) is employed.
13.2.2.3 Support vector regression models

The support vector machines (SVMs) are widely used due to many attractive features and
promising empirical performance (Gunn, 1998; James et al., 2013). Initially, SVMs were
developed to solve classification problems, which have been further extended to handle
regression problems. In case of regression, support vector algorithms are called support
vector regression (SVR). Among the various types of SVR, the most commonly used is
ε-SVR (Vapnik, 1995). Hence, eps-regression method supported by the e1071-package under
the CRAN (Kowalczyk, 2014) has been employed to predict fertility of Murrah breeding
bulls.
The goal of the ε-SVR model is to find a function f(x) that has at most ε deviation from
the actually obtained target values for all the training patterns; and simultaneously, is as
flat as possible (Smola and Scholkopf, 2004). That is, if the errors are within the ε-insensitive
band (known as ε-tube), these are ignored. Moreover, in order to ascertain robustness of
learning, the input patterns should not strictly lie on or within the ε-tube. Instead, the
points falling outside the ε-tube are penalized and slack variables are introduced to curtail
such instances.
{ } { }
Suppose, there are training data set x(1) , t(1) ,  , x( P ) , t( P ) ⊂ ℵ× ℜ where ℵ stands for
input space of the input patterns, e.g., ℜ . In ε-SV regression, the errors are ignored as long
d
as these are less than ε; however, any deviation beyond this point is not accepted.
First, let us consider the instance of linear functions f, which take the form
f ( x) = w , x + b with w ∈ℵ, b ∈ℜ (13.8)
where ⋅, ⋅ represents the dot product in ℵ. In context with Equation (13.8), the flatness
signifies to determine the smallest value for w. This can be minimized with the help of
2
Euclidean norm, i.e., w . Formally, express this problem in the form of a convex optimization
problem:
1 2
minimize = w
2
 t( p ) − w , x( p ) − b ≤ ε (13.9)

subject to 
 w, x + b − t ≤ ε
( p) ( p)
It is implicitly assumed in Equation (13.9) that a function such as f actually exists, which
{ }
approximates all pairs x ( p ) , t( p ) with ε precision, i.e., the convex optimization problem
is feasible. However, at times, this may not be the case, or we may tolerate some errors.
Slack variables, ξ p and ξ p* are introduced to manage the infeasible constraints of the opti-
mization problem under consideration. Thus, the optimization problem (Equation 13.9) is
reformulated:
P
minimize =
1 2
2
w +c ∑(
p=1
ξ p + ξ p* )
 t( p ) − w , x( p ) − b ≤ ε + ξ
 p (13.10)

subject to  w , x + b − t ≤ ε + ξ p
( p) ( p) *

 ξ p , ξ p* ≥ 0

The constant c > 0 resolves the trade-off between the flatness of f and the extent to which
deviations beyond ε can be permissible. The problem formulation Equation (13.10) is
virtually dealing with a so-called ε-insensitive loss function, ξ ε defined as
 0 if ξ ≤ ε

ξ ε :=  (13.11)
 ξ − ε otherwise
13.2.2.4 Decision tree models

DT builds regression models like a tree structure. It splits the data set into smaller and
smaller subsets simultaneously leading to the development of an associated DT, incremen-
tally. The outcome is a tree comprising various decision nodes and leaf nodes. A decision
node consists of two or more branches, each depicting values for the attribute tested. Leaf
node signifies a decision on the numerical target. The uppermost decision node in a tree
corresponding to the best predictor is termed as root node. DTs are capable to analyze both
categorical as well as numerical data.
13.2.2.5 Decision tree model building with R programming tools

The rpart (recursive partitioning and regression trees) package under R programming
language was employed in this study for building decision regression trees, which is
briefly delineated. As stated earlier, let the data consist of n inputs and a response, for each
{ } {
of P patterns, i.e., x( p ) , t( p ) for p = 1, 2, … , P with x( p ) = x( p 1) , x( p 2) ,  , x( pn) . This algorithm }
aims to automatically compute the optimum number of splitting variables and split points,
as well as the best topology that the tree should possess (Hastie et al., 2009). Assume that
the feature space is partitioned into M regions F1 , F2 ,  , FM and it models the response as a
constant cm in each region:
M
f ( x) = ∑ c I(x ∈ F ) (13.12)
m= 1
m m
∑ t ( )
2
Let the criterion be minimization of the SSE, ( p)
− f x( p )  . Then the best cˆm is just the
average of t(p) in region Fm:
cˆm = ave  t ( p ) |x( p ) ∈ Fm 
where ave [⋅] denotes the average. Now, determining the best binary partition in terms of
minimum SSE, is generally computationally infeasible. Therefore, a top-down greedy-
search algorithm is applied. Starting with all the data points, consider a splitting variable
j and split point s, and define the pair of half-planes:
{ }
F1 ( j, s) = X |X j ≤ s and F2 ( j, s) = {X |X j > s} (13.14)
Now, it is to seek the splitting variable j and split point s that solve:
 
∑ { } ∑ { }
2 2
min  min t ( p ) − c1 + min t( p ) − c2  (13.15)
j , s  c1 c2 
 x( p ) ∈F1 ( j , s ) x( p ) ∈F2 ( j , s ) 
For any choice of j and s, the inner minimization is solved by
cˆ1 = ave t( p ) |x( p ) ∈ F1 ( j, s)  and cˆ2 = ave t( p ) |x( p ) ∈ F2 ( j, s)  (13.16)
For each splitting variable, the split point s can be determined quite promptly. Thus,
determination of the best pair (j, s) is feasible by browsing through all the inputs. Having
found the best split, the data are partitioned into the two resulting regions, and repeat
this splitting process on each of the two regions. Further, this process is employed on all
the resulting regions. Now, the question is how large the tree should be grown. Naturally,
a big tree could over-fit the data, whereas a short tree might not discover the underlying
important structure, i.e., tree size is a tuning parameter leading to the model’s complex-
ity. Thus, the optimum tree size should be adaptively resolved from the data. One smart
strategy would be to split tree nodes only if the decrease in SSE due to the split exceeds
some threshold. Nevertheless, this tactic is too short-sighted as an imprecise split might
produce a very good split underneath. The ideal scheme would be to grow a large tree T0,
stopping the splitting process only when a certain least node size is attained. Then this
big tree is pruned using cost-complexity pruning technique that is delineated. Define a
subtree T ⊂ T0 to be any tree, which is attained as a result of pruning T0, i.e., collapsing any
number of its internal (nonterminal) nodes. Let the terminal nodes be indexed by m, with
node m representing region Fm. Consider |T| to be the number of terminal nodes in T. Let
the following quantities be stated in the present context:


{
N m = x( p ) ∈ Fm } 


∑t
1
cˆm = ( p)
 (13.17)
Nm 
x( p ) ∈Fm

2
∑ {t }
1
Qm (T ) = ( p)
− cˆm 
Nm ( p) 
x ∈Fm
The cost complexity criterion is defined as

T
Cα (T ) = ∑ N Q (T ) + α |T | (13.18)
m= 1
m m
It is to find the subtree Tα ⊆ T0 to minimize Cα (T ), for each α. The tuning parameter α ≥ 0

leads the trade-off between tree size and its goodness-of-fit to the data. Large values of
α produce smaller trees Tα, and vice versa. It implies that with α = 0, the outcome would
be full tree T0. Now, how to adaptively choose α? For each α, it can be shown that there
is a unique smallest subtree Tα that minimizes Cα (T ). To find Tα, the weakest link prun-
ing technique is used, which successively collapses the internal node that affects the least
per-node increase in ∑ m
N m Qm (T ), and continues until it attains the single-node (root) tree.
This leads to a finite sequence of subtrees, and it can be shown that this sequence must
contain Tα. Further pedagogical details can be found in Breiman et al. (1984). The α̂ is com-
puted by the cross-validation method. The value of α̂ is so chosen that it minimizes the
cross-validated SSE. The ultimate tree is Tα̂ .
13.2.2.6 Random forest models

RFs (or random decision forests) are a kind of ensemble learning method for regression
and other tasks; they operate by constructing several DTs during training time and pro-
ducing the mean prediction (regression) of the individual trees (Breiman, 2001; Daumé III,
2012; James et al., 2013). RFs resolve drawback of over-fitting to the training set, which is
inherent in decision trees. Random forest regression begins with the creation of decision
trees. Decision trees recursively divide data in the regression space until the quantum
of variation in the subspace becomes small. Subsequently, a predictor for the subspace is
created just by taking the average value of the input patterns corresponding to the target
patterns in the subspace. A typical algorithm (Hastie et al., 2009) illustrating various steps
involved in RF for regression models, is given as follows:
Random forest algorithm for regression:
For p = 1 to P:
1.
a. Get a bootstrap sample Z* of size N from the training data.
b. Construct a random-forest tree Tp on the bootstrapped data through recursive
iteration for each terminal node of Tp until the minimum node size nmin is attained
as follows:
i. Randomly choose m variables from the set of l variables.
ii. Select optimum variable/split-point based on the variables selected in previ-

ous step (i).
iii. Bifurcate the node into daughter nodes.
2. Compute the ensemble of trees [Tp ]1P .
1
3. Carry out the prediction at a new point x, i.e., regression with: fˆrfP ( x) = ∑ Pp = 1 Tp ( x).
P
The course of recursive partitioning can be envisaged as a regression tree. Predictions
for new data are determined by searching the predictor pertinent to the division con-
taining the new input variable. This splitting-up step for RF is greedy and, thus, does
not generally converge to the globally optimal tree. This problem is resolved through an
ensemble of locally optimal trees wherein each tree is grown by uniformly drawing a ran-
dom sample from the original subset. This process is termed as bagging or bootstrap aggrega-
tion. Moreover, once the re-sampled version of the data set is constructed, all but a small
number of features are sampled. Completion of each sampling strategy leads to a unique
trained-tree. The set of such trees is called a forest. Finally, predictions are conducted on
the basis of collection of the unique predictions made by each of these trees. This proce-
dure is known as voting. An aggregate prediction achieves higher accuracy than any of
the constituent trees. Thus, RFs are referred to as ensemble learning methods, which are
currently regarded as the best regression tools among data scientists. Although RF mod-
els, generally, exhibit superior prediction performance, they are tough to interpret. That is
why these are known as black box models.
The randomForest package supported by R language has been used for building the
RF models in this chapter. A complete description about computational aspects of this
algorithm can be found in Liaw and Wiener (2002).
13.2.2.7 Linear regression models

Linear models are conventional methods of statistics, which were initially developed in
the pre-computer era. They are simple and often provide an adequate and interpretable
description about relationship between the inputs and the output variables. For predictive
application, they can sometimes outperform modern nonlinear models, especially in situa-
tions involving small numbers of training observations, low signal-to-noise ratio or sparse
data. Thus, even nowadays, with the advent of advanced computational algorithms for
regression like ML algorithms as described in foregoing sections, conventional multiple
linear regression (MLR) models are still widely used for predictive analytics. They study
linear and additive relationships between variables.
Let t denote the dependent variable whose values are to be predicted; and let
X ( i ) , i = 1, 2,  , n denote independent variables based upon which, the prediction has to be
performed. The prediction equation for computing the predicted values of t is
tˆ = b0 + b1 X (1) + b2 X (2) +  + bn X ( n) (13.19)
This equation possesses the property that prediction for t is a straight-line function of the
X variables. The slopes of their individual straight-line relationships with t are the con-
stants bi, i = 1, 2,  , n, called coefficients of the variables indicating variation in predicted
value of bi per unit of change in X ( i ), keeping other items equal. The additional constant b0,
called intercept, is prediction that the model would produce if all the X values were zero
(if possible). The coefficients and intercept are estimated by the least squares method, i.e.,
setting them equal to the unique values that minimize SSE within the sample of data to
which the model is fitted. Also, the model’s prediction errors are generally assumed to be
independently and identically normally distributed.
The linear model function, glm() supported by R programming language has been
employed for the MLR analysis in this chapter. The detailed pedagogic description can be
found in Kabacoff (2015).
13.2.2.8 Model evaluation error metrics

The prediction performances of various machine learning models developed in this study
vis-à-vis the conventional models were evaluated by means of several statistical metrics
such as mean absolute error (MAE), root mean square error (RMSE), and Akaike’s infor-
mation criterion (AIC). The mathematical formulas for these methods are as follows:
∑|Actual − Predicted|
MAE = i=1
(13.20)
n
n 2
RMSE =
1
n ∑
i=1
 Actual − Predicted 

Actual
 (13.21)
 RSS  
AIC = n ln  + 2k,
 n  

n  (13.22)
RSS = ∑
i=1
2
( Actual − Predicted ) 

where n is the number of data points (observations in the test set); k is the number of
e stimated parameters (including the variance); and RSS is the estimated residual of the
fitted model.

13.3.1 Neural Network Models
The NN model for modeling fertility in Murrah breeding bulls was developed using nnet
and neuralnet packages with different settings, i.e., varying number of hidden layers; num-
ber of neurons in the hidden layer(s); transfer function on the hidden layer neurons; data
partitioning scheme; learning rate, error goal, epochs, learning algorithms, etc. The NN
model constructed with the neuralnet package using rprop learning algorithm is depicted
in Figure 13.2.
The optimal configurations for the NN models created with both the packages were
determined empirically through a “trial and error” approach, and the same are given in
Table 13.2.
13.3.2 Support vector regression models

The eps-regression under e1071-package was employed through R programming tools to fit
the SVR model for modeling fertility in Murrah bulls. Accuracy of the model was enhanced
by fine-tuning SVR model parameters, i.e., cost and ε values. This was done by using the
1 1 1
bwt
2.
28
16
4
1.14
wt3m
–1.6
217
0.30
0.8
444
0
7
164
563
0.7
40
wt6m
65
0.91401 –2.26999
2 1.0
141 21
–0
–0.708
8
0.0
–1. 87
.10
35
34
391
wt9m
9 7
34
cr
78
0.1
wt12m –1
–1.197
–0.037
.3
0.2 81
01
9
6
84
959
14
71
.51
.87
5
0
.31
3
5
–0
–1
–0
wt24m 0.37868 2.10716
646
908
0.14
3
0.7
aam
–0 62
7
92
0.195
.59
aptm
Error: 0.476917 Steps: 6746
Figure 13.2 NN model constructed with rprop algorithm.
Table 13.2 Neural network model’s optimum configuration and predictive performance
Number of
neurons/transfer
Type of neural Learning function in Epochs/
network/R package algorithm hidden layer(s) steps MAE RMSE AIC
Feed-forward (nnet) BFGS 6 (Log sigmoid) 100 13.12 0.15 177.47
Feed-forward (neuralnet) Rprop 2, 2 (Tangent 6746 14.53 0.30 54.95
sigmoid)
tune function in the same package. The tuning SVR, i.e., hyperparameter o ptimization or
model selection was based on grid search method. A lot of models were trained for the
different combinations of cost and ε, and the optimal one was selected (Table 13.3). The
tune method was employed to train models with ε = 0, 0.1, 0.2,…, 1 and cost = 22, 23, 24,…, 29
(Figure 13.3).
In Figure 13.3, the darker the region is, the better the model is (i.e., RMSE is closer to
zero in darker regions).
13.3.3 Decision tree regression model

The best decision tree model was developed using rpart package with maxsurrogate and
usersurrogate parameters set to 0 values. The maxsurrogate parameter denotes the number
of surrogate splits retained in the output. If this is set to 0, it signifies that half of the
computational time is used in search for surrogate splits, thereby reducing computational
time, while the feature usesurrogate specifies as to how to use surrogate in splitting process.
The best decision tree model’s configuration along with its predictive accuracy attained in
terms of MAE, RMSE, and AIC is shown in Table 13.4.
Table 13.3 Support vector regression machine’s optimum configuration and predictive performance
SVR type/R Data partitioning
package Epsilon Cost c Kernel scheme MAE RMSE AIC
Eps-regression 1 (0−1) 4 (2 −2 )
2 9 Gaussian 90:10 (10-fold 8.05 0.06 52.86
(Grid search RBF cross validation)
with tuning)
RBF: radial basis function.
Performance of SVM
500 600
550
400
500
300
450
Cost
400
200
350
100
300
0.0 0.2 0.4 0.6 0.8 1.0

Epsilon
Figure 13.3 SVR model’s performance through grid search.
Table 13.4 Decision tree regression model’s optimum configuration and predictive performance
Approach/R Data partitioning
package Maxsurrogate Usesurrogate scheme MAE RMSE AIC
Top-down 0 0 90:10 13.26 0.22 155.23
greedy search
(rpart package)
13.3.4 Random forest regression model

The RF models for prediction of fertility in Murrah bulls were developed using random
Forest package with values of ntree and mtry empirically set at 500 and 2, respectively
(Figure 13.4). The data partitioning scheme used was the same as in previous modeling
experiments, i.e., 90:10.
The optimum configuration and predictive performance of the random forest model
is given in Table 13.5.
Model
70
60
50
Error
40
30
20
0 100 200 300 400 500

Trees
Figure 13.4 Random forest regression model.
Table 13.5 Random forest model’s optimum configuration and predictive performance
Number of
variables Data
Number of per level partitioning
R package trees (ntree) (mtry) scheme MAE RMSE AIC
Random Forest 500 2 90:10 9.58 0.03 127.94
13.3.5 Linear model for regression

The conventional linear model was fitted to the same data set using the same data parti-
tioning scheme, i.e., 90:10, as ML models described above. The R function glm() has been
employed for this purpose. The MAE, RMSE, and AIC attained with best LM were 6.44,
0.18, and 356.37, respectively.
13.3.6 Machine learning models vis-à-vis linear regression model

The predictive accuracies of ML models vis-à-vis the linear model to predict the fertility of
Murrah bulls have been compared (Table 13.6).
Table 13.6 Comparison of predictive accuracies of machine learning models vis-à-vis linear model
to predict fertility of Murrah breeding bulls
Accuracy metric Models’ prediction accuracies
NN SVR DT RF LM
MAE 13.12 8.05 13.26 9.58 6.44
RMSE 0.15 0.06 0.22 0.03 0.18
AIC 177.47 52.86 155.23 127.94 356.37
The experimental results (Table 13.6) that emerged from this study revealed that the
ML models, i.e., RF, SVR, and NN models, outperformed the LM, whereas the DT model
did not perform well due to its well-known inherent problem of over-fitting. Thus, the ML
approach (especially the RF paradigm) is capable of efficiently predicting the fertility of
Murrah bulls; which was, generally, found better than conventional linear models. Hence,
ML algorithms can be employed as a plausible alternative to linear regression models in
predicting the fertility of Murrah breeding bulls.
13.4 Conclusion
Various supervised ML algorithms, viz., NN, SVR machine, DT, and RF have been inves-
tigated empirically in this chapter, for modeling breeding bulls’ fertility in Murrah buf-
faloes. The performance of these intelligent models has been compared with that of the
classical linear model for regression, also developed in this study. The results of this study
revealed that the ML approach, generally, outperformed the classical linear models for
regression. Hence, the ML models developed in this study are superior to precisely assess
the conception rate in Murrah breeding bulls at organised dairy farm(s) like ICAR-NDRI,
Karnal (India). These intelligent models will provide decision support to organized dairy
farms for selecting good buffalo bulls.
References
Anastasiadis, A.D., Magoulas, G.D. and Vrahatis, M.N. (2005). New globally convergent training
scheme based on the resilient propagation algorithm. Neurocomputing, 64, 253–270.
Anonymous (2014). 19th Livestock Census-2012 All India Report. Department of Animal Husbandry,
Dairying and Fisheries, Ministry of Agriculture, Govt. of India, New Delhi. www.dahd.nic.in/
sites/default/files/Livestock5.pdf.
Anonymous (2017). Annual Report 2016–17. Department of Animal Husbandry, Dairying & Fisheries
Ministry of Agriculture & Farmers Welfare Government of India.
Berglund, P. and Heeringa, S. (2014). Multiple Imputation of Missing Data Using SAS. SAS Institute Inc.,
Cary, NC.
Berry, D.P., Evans, R.D. and Mc Parland, S. (2011). Evaluation of bull fertility in dairy and beef cattle
using cow field data. Theriogenology, 75, 172–181.
Borchers, M.R., Chang, Y.M., Proudfoot, K.L., Wadsworth, B.A., Stone, A.E. and Bewley, J.M. (2017).
Machine-learning-based calving prediction from activity, lying, and ruminating behaviors in
dairy cattle. Journal of Dairy Science, 100, 5664–5674.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees.
Wadsworth, New York.
Caraviello, D.Z., Weigel, K.A., Craven, M., Gianola, D., Cook, N.B., Nordlund, K.V., Fricke, P.M. and
Wiltbank, M.C. (2006). Analysis of reproductive performance of lactating cows on large dairy
farms using machine learning algorithms. Journal of Dairy Science, 89, 4703–4722.
Cook, J.G. and Green, M.J. (2016). Use of early lactation milk recording data to predict the calving to
conception interval in dairy herds. Journal of Dairy Science, 99, 4699–4706.
Daumé III, H. (2012). A course in machine learning. http://ciml.info.
De Haas, Y., Janss, L.L.G. and Kadarmideen, H.N. (2007). Genetic correlations between body condi-
tion scores and fertility in dairy cattle using bivariate random regression models. Journal of
Animal Breeding and Genetics, 124, 277–285.
Du, K.-L. and Swamy, M.N.S. (2014). Fundamentals of machine learning. Chapter 2. In: Neural
Networks and Statistical Learning. Springer, London. doi:10.1007/978-1-4471-5571-3_2.
Eriksson, S., Johansson, K., Axelsson, H.H. and Fikse, W.F. (2017). Genetic trends for fertility, udder
health and protein yield in Swedish red cattle estimated with different models. Journal of
Animal Breeding and Genetics, 134, 308–321.
Fenlon, C., O’Gradyy, L., Dunnion, J., Shallooz, L., Butlerz, S. and Doherty, M. (2016). A comparison
of machine learning techniques for predicting insemination outcome in Irish dairy cows. In:
Proceedings of the 24th Irish Conference on Artificial Intelligence and Cognitive Science, September
20–21, Dublin, Ireland, pp. 57–67. http://aics2016.ucd.ie/papers/full/AICS_2016_paper_30.pdf.
González-Recio, O., Rosa, G.J.M. and Gianola, D. (2014). Machine learning methods and predictive
ability metrics for genome-wide prediction of complex traits. Livestock Science, 166, 217–231.
Gunn, S.R. (1998). Support vector machines for classification and regression. In: ISIS Technical
Report, Image Speech & Intelligent Systems Group, University of Southampton, UK.
Hastie, T., Tibshirani, R. and Friedman, J. (2009). Elements of Statistical Learning: Data Mining, Inference
and Prediction. Second Edition. Springer, New York.
Haykin, S. (2005). Neural Networks: A Comprehensive Foundation. Second Edition. Pearson Education
(Singapore) Pte. Ltd., Delhi.
Hecht-Nielsen, R. (1990). Neurocomputing. Addison Wesley Longman Publishing Co., Inc. Boston, MA.
Hempstalk, K., McParland, S. and Berry, D.P. (2015). Machine learning algorithms for the prediction
of conception success to a given insemination in lactating dairy cows. Journal of Dairy Science,
98, 5262–5273.
Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks,
4, 251–257.
Intrator, O. and Intrator, N. (1993). Using neural nets for interpretation of nonlinear models.
In: Proceedings of the Statistical Computing Section, American Statistical Society, San Francisco,
pp. 244–249.
James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An Introduction to Statistical Learning with
Applications in R. Springer, New York.
Kabacoff, R.I. (2015). R in Action: Data Analysis and Graphics with R. Manning Publications Co.,
New York.
Kominakis, A.P., Abas, Z., Maltaris, I. and Rogdakis, E. (2002). A preliminary study of the applica-
tion of artificial neural networks to prediction of milk yield in dairy sheep. Computers and
Electronics in Agriculture, 35, 35–48.
Kowalczyk, A. (2014). Support Vector Regression with R. www.svm-tutorial.com/2014/10/
support-vector-regression-r/.
Kuhn, M. and Johnson, K. (2013). Applied Predictive Modeling. Springer, New York.
doi:10.1007/978-1-4614-6849-3.
Li, F. (2008). Function approximation by neural networks. In: Sun, F., Zhang, J., Tan, Y., Cao, J. and
Yu, W. (Eds.) Advances in Neural Networks. Lecture Notes in Computer Science, 5263, 384–390.
Springer, Berlin, Germany.
Liaw, A. and Wiener, M. (2002). Classification and regression by randomForest. R News, 2/3, 18–22.
Mir, M.A., Chakravarty, A.K., Gupta, A.K., Naha, B.C., Jamuna, V., Patil, C.S. and Singh, A.P. (2015).
Optimizing age of bull at first use in relation to fertility of Murrah breeding bulls. Veterinary
World, 8, 518–522.
Murphy, M.D., O’Mahony, M.J., Shalloo, L., French, P. and Upton, J. (2014). Comparison of modeling
techniques for milk-production forecasting. Journal of Dairy Science, 97, 3352–3363.
Panchal, I., Sawhney, I.K., Sharma, A.K. and Dang, A.K. (2016). Classification of healthy and mas-
titis Murrah buffaloes by application of neural network models using yield and milk quality
parameters. Computers and Electronics in Agriculture, 127, 242–248.
Panchal, I., Sawhney, I.K., Sharma, A.K., Garg, M.K. and Dang, A.K. (2017). Mastitis detection in
Murrah buffaloes with intelligent models based upon electro-chemical and quality param-
eters of milk. Indian Journal of Animal Research, 51, 922–926.
Patil, C.S., Chakravarty, A.K., Singh, A., Kumar, V., Jamuna, V. and Vohra, V. (2014). Development
of a predictive model for daughter pregnancy rate and standardization of voluntary waiting
period in Murrah buffalo. Tropical Animal Health and Production, 46, 279–284.
Piles, M., Díez, J., delCoz, J.J., Montañés, E., Quevedo, J.R., Ramon, J., Rafel, O., López-Béjar, M. and
Tusell, L. (2013). Predicting fertility from seminal traits: Performance of several parametric and
non-parametric procedures. Livestock Science, 155, 137–147.
Ramón, M., Martínez-Pastor, F., García-Álvarez, O., Maroto-Morales, A., Josefa-Soler, A., Jiménez-
Rabadán, P., Fernández-Santos, M.R., Bernabéu, R. and Garde, J.J. (2012). Taking advantage of
the use of supervised learning methods for characterization of sperm population structure
related with freezability in the Iberian red deer. Theriogenology, 77, 1661–1672.
Riedmiller, M. and Braun, H. (1993). A direct adaptive method for faster backpropagation learning:
The RPROP algorithm. In: Proceedings of the IEEE International Conference on Neural Networks,
San Francisco, pp. 586–591.
Shahinfar, S., Mehrabani-Yeganeh, H., Lucas, C., Kalhor, A., Kazemian, M. and Weigel, K.A. (2012).
Prediction of breeding values for dairy cattle using artificial neural networks and neuro-fuzzy
systems. Computational and Mathematical Methods in Medicine. doi:10.1155/2012/127130.
Shahinfar, S., Page, D., Guenther, J., Cabrera, V., Fricke, P. and Weigel, K. (2014). Prediction of insemi-
nation outcomes in Holstein dairy cattle using alternative machine learning algorithms.
Journal of Dairy Science, 97, 731–742.
Shalev-Shwartz, S. and Ben-David, S. (2014). Understanding Machine Learning: From Theory to
Algorithms. Cambridge University Press, New York.
Smola, A.J. and Scholkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing,
14, 199–222.
Smola, A. and Vishwanathan, S.V.N. (2008). Introduction to Machine Learning. Cambridge University
Press, Cambridge.
Sharma, A. K., Jain, D.K., Chakravarty, A.K., Malhotra, R. and Ruhil, A.P. (2013). Predicting economic
traits in Murrah buffaloes with connectionist models. Journal of Indian Society of Agricultural
Statistics, 67, 1–11.
Sharma, A.K., Sharma, R.K. and Kasana, H.S. (2006). Empirical comparisons of feed-forward con-
nectionist and conventional regression models for prediction of first lactation 305-day milk
yield in Karan Fries dairy cows. Neural Computing and Applications, 15, 359–365.
Sharma, A.K., Sharma, R.K. and Kasana, H.S. (2007). Prediction of first lactation 305-day milk yield
in Karan-Fries dairy cattle using ANN modelling. Applied Soft Computing, 7, 1112–1120.
Utt, M.D. (2016). Prediction of bull fertility. Animal Reproduction Science, 169, 37–44.
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer, New York.
Venables, W.N. and Ripley, B.D. (2002). Modern Applied Statistics with S. Springer, New York.
Witten, I.H. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques. Second
Edition. Morgan Kaufmann Publishers, San Francisco, CA.
chapter fourteen
Computational study of the

Coanda flow for V/STOL
Maharshi Subhash and Michele Trancossi
University of Modena and Reggio Emilia
Maharshi Subhash
Graphic Era University
Michele Trancossi
Sheffield Hallam University
Contents
14.1 Introduction......................................................................................................................... 265
14.2 Governing equations.......................................................................................................... 267
14.2.1 Spalart–Allmaras model........................................................................................ 267
14.2.2 k–ε Model................................................................................................................. 268
14.2.3 SST k–ω model......................................................................................................... 268
14.2.4 k–ε–ζ–f Model........................................................................................................... 269
14.3 Grid independence test and solution methodology...................................................... 270
14.5 Conclusions.......................................................................................................................... 281
Acknowledgments....................................................................................................................... 281
Nomenclature............................................................................................................................... 281
References...................................................................................................................................... 281
14.1 Introduction
It is not only a dream but also a necessity of tomorrow to have vertical and short takeoffs
and landings (V/STOL) in the civil aviation sector; because of rapid growth of aviation for
humanitarian purposes. There are several methods [1–5], which can be useful to imple-
ment V/STOL for air vehicles. Out of these methods, the most adequate method is based
on thrust vectoring. The project ACHEON (Aerial Coanda High Efficiency Orienting-jet
Nozzle) encompasses a thrust vectoring propulsive nozzle called HOMER (High-speed
Orienting Momentum with Enhanced Reversibility), which is supported in a patent devel-
oped at the University of Modena and Reggio-Emilia, Italy [6]. The idea encapsulated in
the project is to use the Coanda surface for the thrust vectoring to achieve V/STOL. In
the past, several works depicted the use of the Coanda surface for the flow control on the
aircraft wing and other flow control devices [7–12]. However, this concept can also be used
efficiently in other industrial applications like plasma spray gun (and for direct injection
in combustion chamber to improve the combustion efficiency [13–15]).
265
The attachment of the jet over the adjacent curved surface was known about two
centuries ago by Young [16] and patented around one century later by Henry Coanda,
a Romanian engineer; therefore, this phenomenon is known as the “Conada” effect.
The conditions of the stability of the flow over the curved surface has been described
by Rayleigh [17] through the streamline curvature, although this flow feature did not
attract much research attention until 1961. The mechanism of the flow over the convex
surface causing Coanda flow can be found in the literature by Newman [18]. They found
that the flow adheres the curved surface due to the momentum balance between centrif-
ugal force and the pressure force [19]. Due to the interaction of the ambient fluid to the
boundary layer on the flow over the curved surface, the static pressure increases gradu-
ally and when the pressure gradient becomes zero, that position on the curved surface
is the verge of separation; beyond this position the pressure gradient becomes positive
and causes the reverse flow. The detailed literature review in addition to its engineering
applications, especially for generating lift on the curved surface has been performed by
Trancossi [20].
Some past works were concentrated on the investigation of the mechanism of the
flow over the curved surface. Wille and Fernholtz [21] have performed experiments on the
flow over a convex surface and found that surface curvature has a significant influence
on the jet deflection. They also studied the boundary layer phenomena and entrainment
of ambient fluid, causing the jet to adhere to the curved surface. However, in counterpart
the jet-spreading rate increases rapidly and causes the separation of the boundary layer.
Therefore, in order to investigate the flow phenomena, one needs to go into the boundary
layer, which has been attempted in previous work [22].
Experimental investigation on the flow over a circular cylinder by Fekete shows that
the velocity profile on the curved surface is similar to the profile at the plane wall jet. He
also investigated the surface pressure, position of separation, and wall shear stress. He
has shown that the wall-shear stress is negligible as long as the ratio b/R is not too small,
stating experiments where b/R < 0.0075 may be prone to skin friction forces. He has dis-
covered that θsep decreases with increased surface roughness; however, with large values
of the Reynolds number the influence of surface roughness is negligible within the tested
roughness limit [23].
Neuendorf and Wygnansky [24] investigated experimentally the flow over the curved
surface; they found that the entrainment of the ambient fluid causes the jet to adhere to
the curved surface, but, on the other side, also causes separation, because of the incre-
ment of the jet spreading rate. Therefore, boundary layer approximation fails. This indi-
cates that the condition of the velocity gradient and pressure gradient for the separation
needs to be reinvestigated in order to reveal the physics of the flow for the curved surface.
Without knowing this flow behavior the design of such a nozzle would depend upon the
trial methods, which would consume more time and cost. This part of work has already
been invesigated [22]. In the present work, our main emphasis is on the flow and geometric
parameters. Another work for identification of geometric parameters is by Patankar and
Sridhar [25], which delineated the behavior of the Coanda flow, which is the function of the
aspect ratio (ratio between jet orifice length to the jet orifice width), but the choice of aspect
ratio depends upon the geometry of the flow. There are no unanimously defined param-
eters that influence the flow, due to the intricacies of the flow geometry. Here, an attempt
has been made to define such parameters for this case and also to drive other researchers
to validate these parameters. Therefore, for the nozzle flow over the Coanda surface, one
can define the aspect ratio by diameter of the exit nozzle (b) to the radius of curvature (R)
of the Coanda surface. In the present chapter this has been defined for design purposes.
Chapter fourteen: Computational study of the Coanda flow for V/STOL 267
There are more parameters which have been discussed in Section 14.4. The present work
delineates relevant work, which is focused on the identification of the geometric and flow
parameters for the design of such flow. To use this flow phenomenon for industrial appli-
cations, there is a necessity to identify the flow and geometric parameters for better opti-
mized design.
14.2 Governing equations
The mass and momentum conservation equations are given in the Reynolds averaging
form
∂U i
= 0 (14.1)
∂ xi
∂U i ∂U i 1 ∂P ∂2 U i ∂
+Uj =− +ν − ui′u′j (14.2)
∂t ∂x j ρ ∂ xi ∂x j ∂x j ∂x j
14.2.1 Spalart–Allmaras model
The Spalart–Allmaras equations are as follows:
1 ∂ ∂νˆ ∂νˆ 
2
∂νˆ ∂νˆ c  νˆ   ∂νˆ 
+u = cb 1 (1 − ft 2 )Sˆ νˆ −  cw 1 fw − b21 ft 2    +  (ν + νˆ )  + cb 2 
∂t ∂x j  κ  d  σ  ∂ x j  ∂ x j  ∂ xi ∂ xi 
(14.3)
and the turbulent eddy viscosity [26] is computed from
µt = ρνˆ fν 1 (14.4)
where
χ3
fν 1 = (14.5)
χ + cν31
3
νˆ
χ = (14.6)
ν
and ρ is the density, ν = μ/ρ is the molecular kinematic viscosity, and μ is the molecular
dynamic viscosity. Additional definitions are given by the following equations:
ν
Sˆ = Ω + 2 2 fν 2 (14.7)
κ d
where Ω = 2WijWij is the magnitude of the vorticity, d is the distance from the field point
to the nearest wall, and
χ
fν 2 = 1 − (14.8)
1 + χ fν 1
 1 + c6 
fw = g  6 w63  (14.9)
 g + cw 3 
g = r + cw2 ( r 6 − r ) (14.10)
ν
r = min  2 2 ,10  (14.11)
 Sκ d 
ft 2 = ct 3 exp ( − ct 4 χ 2 ) (14.12)
1  ∂ui ∂u j 
Wij = − (14.13)
2  ∂ x j ∂ xi 
The constants are
cb 1 = 0.1335, σ = 2/3, cb 2 = 0.622, κ = 0.41, cw 2 = 0.3, cw 3 = 2, c v 1 = 7.1, ct 3 = 1.2
cb 1 1 + cb 2
ct 4 = 0.5, cw 1 = +
κ2 σ
14.2.2 k–ε Model

The equations are as follows [27]:
∂k  µ  
ρ + ρ u j∇k = ∇  µ + T  ∇k  + µT P − ρε (14.14)
∂t  σk  
∂ε  µ   ε
ρ + ρ u j∇ε = ∇  µ + T  ∇ε  + ( Cε 1µT P − ρCε 2ε ) − ρε (14.15)
∂t  σε   k
where
P = ∇u × ∇u + ( ∇u)  (14.16)

T

k2
µT = ρCµ (14.17)
ε
The model constants are
Cµ = 0.09, Cε 1 = 1.44, Cε 2 = 1.92, σ k = 1.0, σ ε = 1.3
14.2.3 SST k–ω model

The kinematic eddy viscosity [28] is given by
a1 ⋅ k
νT = (14.18)
max( a1 ω , SF2 )
Turbulent kinetic energy:
∂k ∂k  ∂k 
ρ +Uj = Pk − β * kω + (ν + σ kν T )  (14.19)
∂t ∂x j  ∂ xj 
Specific dissipation rate:
∂ω ∂ω  ∂ω  1 ∂ k ∂ω
ρ +Uj = α S2 − βω 2 + (ν + σ ων T )  + 2(1 − F1 )σ ω 2 (14.20)
∂t ∂x j  ∂ x j  ω ∂ xi ∂ xi
Closure coefficient and auxiliary equations are given by
  2 k 500ν   
2


F2 = tanh  max  * , 2    (14.21)
   β ω y y ω   
 
   2 k 500ν  4σ ω 2 k   
4

F1 = tanh  min  max  * 2 
, 2  , (14.22)
   β ω y y ω  CDkω y   
  
2
1
S2 =
2
(
∂ j ui + ∂i u j ) (14.23)
Pk = min (G, 10β * kω ) (14.24)
∂U i  ∂U i ∂U j 
G = ν T + (14.25)
∂ x j  ∂ x j ∂ xi 
 1 ∂ k ∂ω 
CDkω = max  2 ρσ ω 2 , 10−10  (14.26)
 ω ∂x j ∂x j 
φ = φ1F1 + φ2 (1 − F1 ) (14.27)
The values of model constants are given as
α
1 = 5/9, α 2 = 0.44, β1 = 3/40, β 2 = 0.0828, β * = 0.09, σ k 1 = 0.85, σ k 2 = 1, σ ω 1 = 0.5, σ ω 2 = 0.856
14.2.4 k–ε–ζ–f Model

In order to improve numerical stability of the original v 2 − f model by solving a transport
v2
equation for the velocity scale ζ = instead of velocity scale v 2 . The variable ζ represents
k
a scalar whose near-wall behaviour resembles that of the normal-to-wall Reynolds stress
component.
Incorporating the Durbin’s [29] elliptic relaxation concept, a new eddy-viscosity tur-
bulence model comprising four equations denoted as k–ε–ζ–f was developed by Hanjalic
et al. [30].
The eddy-viscosity is obtained in the following form:
k2
ν t = Cµζ (14.28)
ε
moreover, the rest of the variables are from the following set of model equations; thus,
∂(α k ρ k k k ) ∂ ∂   µkt  ∂ k k 
+ (α k ρ k vk k k ) = α k  µk +  + α k ρ k ( Pk − ε k ) (14.29)
∂t ∂x j ∂x j   σ k  ∂ x j 

∂(α k ρ k ε k )
+
∂ (C* P α ε − Cε 2α k ε k2 ) + ∂ α  µ + µkt  ∂ε k  (14.30)
(α k ρ k vk ε k ) = ε 1 k k k  k k 
∂t ∂x j kk ∂ x j   σ ε  ∂ x j 
∂(α k ρ kζ k ) ∂ ζ ∂   µkt  ∂ζ k 
+ (α k ρ k vkζ k ) = ρ f − ρ Pk + α k  µ k +  (14.31)
∂t ∂x j k ∂x j   σ ζ  ∂ x j 
where the following form of the f equation is adopted as
2 
−ζ
2∂2 f  Pk   3 
f −L = C1 + C2  (14.32)
∂ x j ∂ x j  ζ  T
The turbulent time scale T and length scale L are given by
 k αk  1/2 
ν
T = max  min  , 2  , CT   (14.33)
  ε υ Cµ 6S 
2 ε 

  k 3/2 k 3/2   ν3  
1/2
L = CL max  min  , 2 2 
, Cη    (14.34)
  ε υ Cµ 6S   ε  
Additional modifications to the ε-equation are that the constant Cε1 is dampened close to
the wall; thus,
( )
Cε* 1 = Cε 1 1 + 0.045 1/ ζ (14.35)
This is computationally more robust than the original model v 2 − f .
14.3 Grid independence test and solution methodology

The main geometric parameters such as the ratio of the nozzle throat diameter (b) to the
exit curvature radius (R) have been considered for the present study. The grid has been
generated in the Gambit [31] and the boundary layer resolved appropriately as shown in
Figures 14.1 and 14.2 for two different b/R ratios. Computations have been performed on
Figure 14.1 Grid generation of the first flow geometry.
Figure 14.2 Grid generation of the second flow geometry.
commercial software AVL Fire [32] for k–ζ–f turbulence model and the rest of the model on
Fluent 6.3 (2016) for Ref. [33].
The grid independent check has been performed according to the ERCOFTAC [30]
guidelines and as depicted in the following papers [34,35]. The optimum number of grid
(numerically stable grid) has been determined through the numerical computation of the
grid at different refinement levels of the grid at the curved surface (first grid from the
wall at 80, 40, and 20 µm). It has been found that, when the grid resolved the viscous sub-
layer until y+ value less than two (first grid from the wall at 20 µm), then one can get the
jet deflection angle independent of the grid. In addition, the x+ value is less than 60 for
the stable solution of the flow along the downstream. We have employed the four turbu-
lence models such as the Spalart–Allmaras (SA) model [26], SST K–ω [28] model, k–ε with
enhanced wall treatment [27] model, and k–ζ–f model [30].
In this chapter, the descritization error has been minimized using the second-order
upwind scheme for the momentum equation and for modified turbulent viscosity in
the Spalart–Allmaras (SA) model (Equation 14.3). The pressure and velocity have been
coupled through the SIMPLE (Semi-Implicit Method for Pressure Linked Equation) algo-
rithm [36].
The first-order implicit method has been used to discretize the unsteady term. The
advantage of the fully implicit scheme is that it is unconditionally stable with respect to
time step size. However, the time step has been taken Δt = 1 × 10 –3 s. Figure 14.3 (a–c) shows
the residual RMS plot of the three models has been given; it has been found that for the
SA model the error is in the range of 10−9–10−10, that is lowest in comparison to the other
turbulence model.
(a)
1E–08
Residuals
Continuity
x-Velocity
y-Velocity
Nut
1E–09
1E–10
25000 25500 26000 26500 27000 27500 28000 28500 29000 29500 30000
Iterations
(b)
1E–05
1E–06
1E–07
Residuals
Continuity
1E–08 x-Velocity
y-Velocity
k
1E–09
Omega
1E–10
71000 71500 72000 72500 73000 73500 74000 74500 75000 75500 76000
Iterations
(c)
1E–03
1E–04
Residuals
Continuity
1E–05 x-Velocity
y-Velocity
k
Epsilon
1E–06
4000 4500 5000 5500 6000 6500 7000 7500 8000 8500 9000
Iterations
Figure 14.3 Residual plot for different turbulence models: (a) SA, (b) SST k–ω, (c) k–ε model.
Moreover, we have found that the k–ζ–f model has more numerical stability than other
models; therefore, in this computational study this model has been used.

Before going into the details of the computational study, it is customary to define the road
map of the design parameters. There are several parameters for the design of such a noz-
zle. But, the key parameters on which the flow and geometry have vital importance are as
follows:
Nozzle exit throat diameter: b

Nozzle exit curvature radius: R
Nozzle exit Reynolds number: Reexit
Flow Reynolds number over the curved surface: Reflow
Exit jet deflection angle: θ
Mach number of the flow (calculated at the exit of the nozzle for compressible flow): M
Velcity ratio of two jets or mass flux ratio of two jets: V1/V2 or M1/M2
Thrust of the jet in the x-direction (along the flow) and the y-direction (normal to the
flow): Fx and Fy
It has been easily realized that, for such application in V/STOL the flow velocity will be in
a compressible range. However, in the present work, the study has been started from the
incompressible flow to at the verge of the compressible flow (M = 0.3).
For the geometry (Figure 14.1) the radius of exit curvature (R) is 101.566 mm, the exit
throat diameter (b) is 40.163 mm and inlet diameter (d1 = d2) is 56 mm and the ratio of b/R
is 0.395. Assume the average velocity and velocity ratio are constant for all the computa-
tions within the range of average velocity 20–40 m/s. Initial observation on the computa-
tional results shows that the jet deflection angle is the function of velocity ratio. The flow
visualization has been given through velocity contours from Figures 14.4 to 14.6 for some
selected cases. Now, these contours reveal the fact that the highest velocity occurs at the
exit of the nozzle attached to the upper curvature of the nozzle, as the upper nozzle has
been designated as V1 velocity and lower nozzle as V2. The ratio is always greater than
one; therefore, the attachment of the flow is near the upper exit curvature and the maxi-
mum velocity occurs there. Consequently, the boundary layer thickness is very small in
the range of the micrometer.
In another configuration as shown in Figure 14.2, the throat diameter (b) is 46 mm, the
radius of curvature (R) is 179.543 mm, the inlet diameter (d1 = d2) is 77 mm, and the b/R ratio
is 0.256. The velocity contours have been shown in Figure 14.7. The complete attachement
of the flow on the Coanda surface has been seen for the velocity ratio greater than 1.3. The
reason for the larger attachment angle can be explained by the velocity vector (Figure 14.8)
and the velocity plot (Figures 14.9 and 14.10).
Figure 14.8 depicts the velocity profile at the exit of the nozzle. The subsequent plots in
Figures 14.9 and 14.10 show the velocity profile near the exit and far from the exit, respec-
tively. Near the exit of the flow, the effect of pulling the low velocity jet toward the high
velocity jet is low, and as the flow progresses the effect is more pronounced. We can see
that far from the exit, the velocity profile attained the higher gradient than the low velocity
jet; in effect the low velocity jet attracted toward the high velocity jet.
6.81E+01 7.39E+01
6.47E+01 7.02E+01
6.13E+01 6.65E+01
5.79E+01 6.28E+01
5.45E+01 5.91E+01
5.11E+01 5.54E+01
4.77E+01 5.17E+01
4.43E+01 4.80E+01
4.09E+01 4.43E+01
3.75E+01 4.06E+01
3.41E+01 3.69E+01
3.07E+01 3.32E+01
2.72E+01 2.95E+01
2.38E+01 2.59E+01
2.04E+01 2.22E+01
1.70E+01 1.85E+01
1.38E+01 1.48E+01
1.02E+01 1.11E+01
6.81E+00 7.36E+00
3.41E+00 3.69E+00
0.00E+00 0.00E+00
V1/V2 = 1.3 (6.158°) V1/V2 = 1.8 (11.943°)
7.97E+01 9.23E+01
7.67E+01 8.76E+01
7.17E+01 8.30E+01
6.77E+01 7.84E+01
6.37E+01 7.38E+01
5.68E+01 6.92E+01
5.58E+01 6.46E+01
5.18E+01 6.00E+01
4.78E+01 5.54E+01
4.38E+01 5.07E+01
3.58E+01 4.61E+01
3.59E+01 4.15E+01
3.19E+01 3.69E+01
2.79E+01 3.23E+01
2.39E+01 2.77E+01
1.99E+01 2.31E+01
1.69E+01 1.85E+01
1.20E+01 1.38E+01
7.97E+00 9.23E+00
3.98E+00 4.61E+00
0.00E+00 0.00E+00
V1/V2 = 2.5 (17.996°) V1/V2 = 6 (24.035)
Figure 14.4 Velocity contours for Vav = 20 m/s.
Therefore, it can be said that due to the decrement of the b/R ratio from 0.395 to 0.256 a
large attachment angle has been found. In this way, the jet adhesion angle is a strong func-
tion of the b/R ratio, which is a geometric parameter. The b/R ratio is the main controlling
parameter for the jet adhesion angle.
After computation for the above-mentioned velocity ranges, the relation between
the velocity ratio (V1/V2) and the jet deflection angle (θ) is plotted in Figure 14.11. For
the average velocities 20, 25, and 30 m/s, the deflection angle has nearly the same value
until the large velocity ratio. For the Vav = 35 m/s, there is little difference in the jet
1.19E+02 1.29E+02
1.13E+02 1.23E+02
1.07E+02 1.16E+02
1.01E+02 1.10E+02
9.50E+01 1.03E+02
8.91E+01 9.67E+01
8.31E+01 9.03E+01
7.72E+01 8.38E+01
7.13E+01 7.74E+01
6.53E+01 7.09E+01
5.94E+01 6.45E+01
5.34E+01 5.80E+01
4.75E+01 5.16E+01
4.16E+01 4.51E+01
3.56E+01 3.87E+01
2.97E+01 3.22E+01
2.38E+01 2.58E+01
1.78E+01 1.93E+01
1.19E+01 1.29E+01
5.94E+00 6.45E+00
0.00E+00 0.00E+00
V1/V2 = 1.3 (θ = 6.729°) V1/V2 = 1.8 (θ = 12.346°)
1.39E+02
1.32E+02 1.61E+02
1.25E+02 1.53E+02
1.18E+02 1.45E+02
1.11E+02 1.37E+02
1.04E+02 1.29E+02
9.72E+01 1.21E+02
9.02E+01 1.13E+02
8.33E+01 1.05E+02
7.63E+01 9.66E+01
6.94E+01 8.86E+01
6.25E+01 8.05E+01
5.55E+01 7.25E+01
4.86E+01 6.44E+01
4.16E+01 5.64E+01
3.47E+01 4.83E+01
2.78E+01 4.03E+01
2.08E+01 3.22E+01
1.39E+01 2.42E+01
6.94E+00 1.61E+01
0.00E+00 8.05E+00
0.00E+00
V1/V2 = 2.5 (θ = 18.190°) V1/V2 = 6 (θ = 26.362°)
deflection angle from V1/V2 = 4; however, for the lower velocity ratio it is nearly the
same. In other words, it can be said that the rate of increment of the deflection angle for
average velocity 35 m/s is higher than the lower velocity range (20, 25, and 30 m/s) and
for the 40 m/s. For the Vav = 40 m/s, the deflection angle is larger than the other average
velocity. The reason may lie in the fact that, at this inlet velocity, the exit velocity of jet
is quite higher near to the Mach number 0.3. Nevertheless, the slope is the same as for
20, 25, and 30 m/s for all velocity ratios. Now, the different behaviors of the slope for
1.47E+02
1.36E+02
1.40E+02
1.29E+02
1.33E+02
1.22E+02
1.25E+02
1.15E+02
1.18E+02
1.09E+02
1.11E+02
1.02E+02
1.03E+02
9.50E+01
9.58E+01
8.82E+01
8.84E+01
8.14E+01
8.11E+01
7.46E+01
7.37E+01
6.79E+01
6.63E+01
6.11E+01 5.90E+01
5.43E+01 5.16E+01
4.75E+01 4.42E+01
4.07E+01 3.68E+01
3.39E+01 2.95E+01
2.71E+01 2.21E+01
2.04E+01 1.47E+01
1.38E+01 7.37E+00
6.79E+00 0.00E+00
0.00E+00 V1/V2 = 1.8 (θ = 13.125°)
V1/V2 = 1.3 (θ = 6.912°)
1.84E+02
1.59E+02
1.75E+02
1.51E+02
1.65E+02
1.43E+02
1.56E+02
1.35E+02
1.47E+02
1.27E+02
1.38E+02
1.19E+02
1.29E+02
1.11E+02
1.20E+02
1.03E+02
1.10E+02
9.53E+01
1.01E+02
8.74E+01
9.20E+01
7.94E+01
8.28E+01
7.15E+01
7.36E+01
6.35E+01
6.44E+01
5.56E+01
5.52E+01
4.77E+01
4.60E+01
3.97E+01
3.68E+01
3.18E+01
2.76E+01
2.38E+01
1.84E+01
1.59E+01
9.20E+00
7.94E+00
0.00E+00
0.00E+00
V1/V2 = 2.5 (θ = 19.545°) V1/V2 = 6 (θ = 26.822°)
average velocity 35 m/s at larger velocity ratios have been investigated in detail. An
investigation of the reason for aberration leads us to see the flow phenomena in detail;
therefore, the calculation of the Reexit and Reflow performed for all average velocity range
as shown in Table 14.1.
An interesting phenomenon has been observed, for the average velocity 20, 25, 30, and
40 m/s, i.e., the exit Reynolds number is always higher than the flow Reynolds number
over the exit curvature. This behavior pointed out the laminarization of the flow; however,
for most cases the flow Reynolds numbers are in the order of 105 (we are using the word
1.15E+02 1.25E+02
1.09E+02 1.19E+02
1.04E+02 1.13E+02
9.78E+01 1.07E+02
9.21E+01 1.00E+02
8.63E+01 9.40E+01
8.06E+01 8.78E+01
7.48E+01 8.15E+01
6.90E+01 7.52E+01
6.33E+01 6.90E+01
5.75E+01 6.27E+01
5.16E+01 5.54E+01
4.60E+01 5.02E+01
4.03E+01 4.39E+01
3.45E+01 3.76E+01
2.88E+01 3.13E+01
2.30E+01 2.51E+01
1.73E+01 1.88E+01
1.15E+01 1.25E+01
5.75E+00 6.27E+00
0.00E+00 0.00E+00
V1/V2 = 2 and Vav =30 m/s
V1/V2 =4 and Vav =25 m/s
1.36E+02 1.41E+02
1.29E+02 1.34E+02
1.22E+02 1.27E+02
1.15E+02 1.19E+02
1.08E+02 1.12E+02
1.02E+02 1.05E+02
9.50E+01 9.84E+01
8.82E+01 9.14E+01
8.14E+01 8.44E+01
7.47E+01 7.73E+01
6.79E+01 7.03E+01
6.11E+01 8.33E+01
5.43E+01 5.62E+01
4.75E+01 4.92E+01
4.07E+01 4.23E+01
3.39E+01 3.51E+01
2.71E+01 2.81E+01
2.04E+01 2.11E+01
1.36E+01 1.41E+01
6.79E+00 7.03E+00
0.00E+00 0.00E+00
V1/V2 =1.3, Vav =35 m/s V1/V2 =1.14 and Vav =37.5 m/s
Figure 14.7 Velocity contours for different velocity ratio and various average velocities.
1.41E+02
1.34E+02
1.27E+02
1.20E+02
1.12E+02
1.05E+02
9.84E+01
9.14E+01
8.44E+01
7.73E+01
7.03E+01
6.33E+01
5.62E+01
4.92E+01
4.22E+01
3.52E+01
2.81E+01
2.11E+01
1.41E+01
7.03E+00
3.85E–04
Figure 14.8 Velocity vector at different positions of the outer wall of the Coanda surface for
V1/V2 = 1.14 and Vav = 37.5 m/s.
140
120
x = 0.05 m
100 x = 0.10 m
x = 0.15 m
magnitude
80
Velocity
(m/s)
x = 0.20 m
60
40
20
0
–0.04 –0.03 –0.02 –0.01 0 0.01 0.02 0.03 0.04 0.05 0.06
Position (m)
Figure 14.9 Velocity profile near the exit of the nozzle at different positions for V1/V2 = 1.14 and
Vav = 37.5 m/s.
140
120
x = 0.30 m
100 x = 0.40 m
x = 0.50 m
80
magnitude
Velocity
x = 0.70 m
(m/s)
60
40
20
0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
Position (m)
Figure 14.10 Velocity profile far from the exit of the nozzle at different positions for V1/V2 = 1.14 and
Vav = 37.5 m/s.
30
25
20
15
θ
Vav = 40 m/s
Vav = 35 m/s
10
Vav = 30 m/s
Vav = 25 m/s
5 Vav = 20 m/s
0
0 1 2 3 4 5 6
V1/V2
Figure 14.11 Plots between velocity ratio and the deflection angle.
Table 14.1 The calculation of the exit Reynolds number and

the flow Reynolds number over the exit curvature
V1/V2 Reexit Reflow
Vav = 40 m/s
1.3 3.080E+05 9.395E+04
1.8 3.080E+05 1.784E+05
2.5 3.080E+05 2.657E+05
6.0 1.269E+05 1.502E+05
Vav = 35 m/s
1.3 2.695E+05 3.135E+05
1.8 2.695E+05 3.135E+05
2.5 2.302E+05 2.679E+05
6.0 1.110E+05 1.291E+05
Vav = 30 m/s
1.3 1.977E+05 1.572E+05
1.8 2.310E+05 1.836E+05
2.5 9.467E+04 7.525E+04
6.0 7.989E+04 6.350E+04
Vav = 25 m/s
1.3 7.870E+04 2.116E+04
1.8 1.646E+05 8.599E+04
2.5 7.888E+04 6.227E+04
6.0 6.658E+04 7.012E+04
Vav = 20 m/s
1.3 5.28E+04 1.435E+04
1.8 1.32E+05 6.942E+04
2.5 1.32E+05 1.004E+05
6.0 1.31E+05 1.392E+05
“laminarization” because the flow retards on the curved surface with respect to the flow
at the exit of the nozzle). Very few cases go below this range, but for the average velocity
35 m/s, the flow accelerates over the curved surface, and consequently, the flow Reynolds
number is higher than the exit Reynolds number. Now, we reach at this point that the flow
over the curved surface can exhibit two phenomena: one is laminarization and the other
is acceleration. These two distinct behaviors have different characteristics of the jet deflec-
tion angle with respect to the higher velocity ratio. To envisage this behavior from another
point, the Mach number has been calculated at the exit flow velocity and found around 0.3.
This is near the compressible flow behavior, and for higher average velocity the exit Mach
number is a little bit above this value. Until, now, we have these two behaviors of the flow.
This should be further investigated experimentally.
Our next attempt is to envisage the effect of laminarization and the acceleration of
flow upon the thrust. Here, the thrust means the force exerted by the flow in along the
flow and normal to the flow. The thrust in x-direction would contribute in cruising and
the normal thrust in y-direction would contribute in the lift force of the aircraft. This has
been calculated by multiplying the mass flux with the exit velocity as shown in Table 14.2.
The force normal to the flow direction can contribute to the lift force. Therefore, it has been
observed from Table 14.2 that the normal force is maximum for the range of velocity ratio
from 1.8 to 2.5. This is a strong function of the flow phenomena over the curved surface
depicted above. Therefore, it can be said that the force that contributes the lift can be in the
range of the velocity ratio of 1.8–2.5. Through this computational study, the range of the
maximum force has been investigated, which is vital for the design of such a nozzle for
maximum lift force.
Until now, we have discussed the influence of given parameters on the flow. For the
given velocity ratio, for which we can have maximum thrust, the jet deflection angle is a
function of b/R ratio.
For the velocity ratios 1.3, 2, and 4, we can see the complete attachments of the flow
over the exit curvature for the lower value of the b/R ratio. This parameter has a strong
Table 14.2 Force calculation along the flow

(x-direction) direction and normal to the flow (y-axis)
V1/V2 Fx (N) Fy (N)
Vav = 40 m/s
1.3 614.711 42.863
1.8 614.704 92.796
2.5 614.704 124.714
6.0 614.711 42.863
Vav = 35 m/s
1.3 470.639 32.272
1.8 470.634 56.322
2.5 402.063 73.827
6.0 193.852 57.003
Vav = 30 m/s
1.3 295.984 23.385
1.8 345.773 43.333
2.5 141.717 27.383
6.0 119.593 35.587
Vav = 25 m/s
1.3 98.175 6.280
1.8 205.371 28.274
2.5 98.407 18.219
6.0 83.056 23.674
Vav = 20 m/s
1.3 52.668 3.587
1.8 131.428 18.13
2.5 131.229 24.462
6.0 130.925 31.676
influence on the attachment of the flow and can be controlled for the desirable thrust for
the propulsion.
14.5 Conclusions
The flow behaviors have been studied in detail. The flow and geometric parameters for the
design of such a nozzle have been recognized through the computational fluid dynamics
analysis. It has been realized that the influence of the b/R ratio is higher than other param-
eters. Therefore, this parameter can be one of the controlling elements for the design of the
nozzle in order to have maximum lift and cruising velocity. The range of V1/V2 has been
defined for the maximum thrust. The correlation will be developed after the experimental
database for calculation of thrust and deflection angle.
Acknowledgments
Some of the computational works of the present chapter were performed as part of Project
ACHEON (Aerial Coanda High Efficiency Orienting-jet Nozzle) with ref. 309041, sup-
ported by the European Union through the 7th Framework Programme during the stay
of the first author at UNIMORE Italy. Some of the computational studies were performed
recently using AVL-Fire. The first author acknowledges AVL List GmbH, Hans-List-Platz
1, A-8020, Graz, Austria for providing AVL AST software for research and development
purposes under the University Partnership Program.
Nomenclature
b = Exit throat diameter of the nozzle (m)
d1 = Diameter of the upper nozzle (m)
d2 = Diameter of the lower nozzle (m)
Fx = X component of the resultant thrust at the exit of the nozzle (N)
Fy = Y component of the resultant thrust at the exit of the nozzle (N)
P = Mean pressure (N/m2)
R = Radius of the Coanda surface attached with the nozzle exit (m)
Reexit = Reynolds number of the flow at the exit of the nozzle, Vexit,av × b/ν (–)
Reflow = Reynolds number of the flow at the curved surface of the nozzle, Vflow,av × Rθ/ν (–)
Ui = Reynolds averaged velocity tensor (m/s)
u’i = Fluctuating velocity tensor (m/s)
V1 = Velocity of the flow in upper nozzle (m/s)
V2 = Velocity of the flow in lower nozzle (m/s)
References
1. Yoshitani, N., Hashimoto, S.-I., Kimura, T., Motohashi, K., and Ueno, S., “Flight Control
Simulators for Unmanned Fixed-Wing and VTOL Aircraft,” ICROS-SICE International Joint
Conference 2009, August 18–21, 2009, Fukuoka International Congress Center, Japan.
2. Thomason, T., “Bell-Boeing JVX Tilt Rotor Program – Flight Test Program,” American Institute
of Aeronautics and Astronautics, 1983, AIAA Paper No. 83-2726.
3. Saeed, B., Gratton, G., and Mares, C., “A Feasibility Assessment of Annular Winged VTOL
Flight Vehicles,” Aeronautical Journal, Vol. 115, 2011, pp. 683–692.
4. Kim, H., Rajesh, G., Setoguchi, T., and Matsuo, S., “Optimization Study of a Coanda Ejector,”
Journal of Thermal Science, Vol. 15, No. 4, 2006, pp. 331–336.
5. Alvi, F., Strykowski, P., Krothapalli, A., and Forliti, D., “Vectoring Thrust in Multiaxis Using
Confined Shear Layers,” Journal of Fluids Engineering, Vol. 122, No. 1, 2000, pp. 3–13.
6. Trancossi, M., Dumas, A. Giuliani, I., and Baffigi, I., “Ugello capace di deviare in modo din-
amico e con-trollabile un getto sintetico senza parti meccaniche in movimento e suo sistema
di controllo,” Patent No.RE2011A000049, Italy, 2011.
7. Freund J. B. and Mungal, M. G., “Drag and Wake Modification of Axisymmetric Bluff Bodies
Using Coanda Blowing,” Journal of Aircraft, Vol. 31, No. 3, May–June 1994, pp. 572–578.
8. Chng, T. L., Rachman, A., Tsai, H. M., and Zha, Ge-C., “Flow Control of an Airfoil via Injection
and Suction,” Journal of Aircraft, Vol. 46, No. 1, 2009, pp. 291–300.
9. Lee, D.-W., Hwang, J.-G., Kwon, Y.-D., Kwon, S.-B., Kim, G-Y., and Lee, D.-E., “A Study on the
Air Knife Flow with Coanda Effect,” Journal of Mechanical Science and Technology, Vol. 21, 2007,
pp. 2214–2220.
10. Lalli, F., Bruschi, A., Lama, R., Liberti, L., Mandrone, S., and Pesarino, V., “Coanda Effects in
Coastal Flows,” Coastal Engineering, Vol. 57, 2010, pp. 278–289.
11. Collis, S.S., Joslin, R.D., Seifert, A., and Theofilis, V., “Issues in Active Flow Control: Theory,
Control, Simulation, and Experiment,” Progress in Aerospace Science, Vol. 40, 2004, pp. 237–289.
12. Florin, F., Alexandru, D., Octavian, P., and Horia, D., “Control of Two-Dimensional Turbulent
Wall Jet on a Coanda Surface,” Proceedings in Applied. Mathematics and Mechanics, Vol. 11, 2011,
pp. 651–652.
13. Mabey, K., Smith, B., Whichard, G., and McKechnie, T., “Coanda-Assisted Spray Manipulation
Collar for a Commercial Plasma Spray Gun,” Journal of Thermal Spray Technology, Vol. 20, No. 4,
2011, pp. 782–790.
14. Kim, H., Rajesh, G., Setoguchi, T., and Matsuo, S., “Optimization Study of a Coanda Ejector,”
Journal of Thermal Science, Vol. 15, No. 4, 2006, pp. 331–336.
15. Vanierschot, M., Persoons, T., and Van den Bulck, E., “A New Method for Annular Jet Control
Based on Cross-Flow Injection,” Physics of Fluids, Vol. 21, 2009, pp. 025103-1–025103-9.
16. Young, T., “Outlines of Experiments and Inquires Respecting Sound and Light,” Philosophical
Transactions of Royal Society of London, Vol. 90, 1 January 1800, pp. 106–150.
17. Rayleigh, L., “On the Dynamics of Revolving Fluid,” Proceedings of Royal Society of London,
Series A, Vol. 93, No. 648, 1 March, 1917, pp. 148–154.
18. Newman, B. G., The Deflexion of Plane Jets by Adjacent Boundaries, in Coanda Effect, In
Boundary Layer and Flow Control, edited by G. V. Lachmann, Vol. 1, Pergamon Press, Oxford,
1961, pp. 232–264.
19. Carpenter, P. W., and Green, P. N., “The Aeroacoustics and Aerodynamics of High-Speed
Coanda Devices, Part 1: Conventional Arrangement of Exit Nozzle and Surface,” Journal of
Sound and Vibration, Vol. 208, No. 5, 1997, pp. 777–801.
20. Trancossi, M., “An Overview of Scientific and Technical Literature on Coanda Effect Applied
to Nozzles,” SAE Technical Papers No. 2011-01-2591, Issn 0148-7191, 2011.
21. Wille, R., and Fernholtz, H., “Report on the First European Mechanics Colloquium, on the
Coanda Effect,” Journal of Fluid Mechanics, Vol. 23, No. 4, 1965, pp. 801–819.
22. Subhash, M., and Dumas, A., “Computational Study of Coanda Adhesion over Curved
Surface,” SAE International Journal of Aerospace, 2013, paper number: 13ATC-0018/2013-01-2302
(Accepted for Publication).
23. Fekete, G. I., “Coanda Flow of a Two-Dimensional Wall Jet on the Outside of a Circular
Cylinder,” Mechanical Engineering Research Laboratories, Rept. 63-11, McGill University,
1963.
24. Neuendorf, R., and Wygnansky, I., “On a Turbulent Wall Jet Flowing over a Circular Cylinder,”
Journal of Fluid Mechanics, Vol. 381, 1999, pp. 1–25.
25. Patankar, U., and Sridhar, K., “Three-Dimensional Curved Wall Jets,” Journal of Basic
Engineering, Vol. 94, No. 2, 1972, pp. 339–344.
26. Spalart, P. R., and Allmaras, S. R., “A One-Equation Turbulence Model for Aerodynamic
Flows,” AIAA Paper No. 92-0439, 1992.
27. Launder, B. E., and Sharma, B. I., “Application of the Energy Dissipation Model of Turbulence
to the Calculation of Flow Near a Spinning Disc,” Letters in Heat and Mass Transfer, Vol. 1, No.
2, 1974, pp. 131–138.
28. Menter, F. R., “Two-Equation Eddy-Viscosity Turbulence Models for Engineering Applications,”
AIAA Journal, Vol. 32, No. 8, August 1994, pp. 1598–1605.
29. Durbin, P. A., “Separated Flow Computations with the k-ε-v2 Model,” AIAA Journal, Vol. 33,
1995, pp. 659–664.
30. Hanjalic, K., Popovac, M., and Hadziabdic, M., “A Robust Near-Wall Elliptic-Relaxation Eddy-
Viscosity Turbulence Model for CFD,” International Journal of Heat Fluid Flow, Vol. 25, No. 6,
2004, pp. 1047–1051.
31. ANSYS Fluent User manual, 2016.
32. AVL-Fire User manual, 2014.
33. Casey, M., and Wintergerste, T., “ERCOFTAC Special Interest Group on ‘Quality and Trust in
Industrial CFD’ Best Practice Guidelines,” Version 1.0, January 2000.
34. Rizzi, A., and Vos, J., “Towards Establishing Credibility in Computational Fluid Dynamics,”
AIAA Journal, Vol. 36, No. 5, 1998, pp. 668–675.
35. Celik, I., Li, J., Hu, G., and Shaffer, C., “Limitations of Richardson Extrapolation and Some
Possible Remedies,” Journal of Fluids Engineering, Vol. 127, July 2005, pp. 795–805.
36. Patankar, S. V., and Spalding, D. B., “A Calculation Procedure for Heat, Mass and Momentum
Transfer in Three-Dimensional Parabolic Flows,” International Journal of Heat and Mass Transfer,
Vol. 15, 1972, pp. 1787–1806.
chapter fifteen
Introduction to collocation method with

application of B-spline basis functions
to solve differential equations
Geeta Arora
Lovely Professional University
Contents
15.1 Introduction......................................................................................................................... 285
15.2 Collocation method............................................................................................................ 286
15.3 B-spline................................................................................................................................. 287
15.3.1 B-spline of degree zero.......................................................................................... 287
15.3.2 First-degree (linear) B-spline................................................................................. 288
15.3.3 Second-degree (quadratic) B-spline..................................................................... 288
15.4 Characteristics of B-spline basis functions..................................................................... 289
15.5 Types of B-spline................................................................................................................. 289
15.5.1 Trigonometric B-spline basis functions............................................................... 289
15.5.2 Exponential B-spline basis functions................................................................... 290
15.6 Methodology: Collocation method using B-spline basis function.............................. 290
15.7 Numerical solution of advection diffusion equation using collocation method....... 292
15.7.1 Using B-spline basis functions.............................................................................. 292
15.7.2 Using trigonometric B-spline basis functions.................................................... 293
15.8 Numerical example............................................................................................................ 295
References...................................................................................................................................... 296
15.1 Introduction
Due to the wide existence and applicability of ordinary and partial differential equations
in various branches of science and engineering, a variety of nonlinear systems of initial
and boundary value problems have been extensively studied in the literature. Many of the
mathematical models of engineering problems can be expressed in terms of partial differ-
ential equations such as in describing the physics of various phenomena in science, in the
study of the physical laws of fluid flow diffusion in transport problems, electromagnetic
waves, neural networks, tissue engineering,quantum phenomena, etc. These are some
of the application areas where existing phenomena or processes can be easily described
in the form of initial and boundary value problems. Since it is not always feasible to cal-
culate the analytical solutions of obtained modeled equations, there emerges the need for
and role of advanced numerical methods.
A variety of numerical methods are available to obtain the numerical as well as analytical
solutions of partial differential equations. Two of the most popular techniques for solving
285
partial differential equations include the finite difference method and the finite element
method. In the finite difference method, the solution is derived at a finite number of points
by approximating the derivatives at each of the selected points. The accuracy of this method
is based on the refinement of the grid points where the solution is being evaluated [1]. In
the finite element method the focus is on dividing the domain into a finite number of ele-
ments with allocated nodes at predefined locations around the boundary elements [2]. The
elements as well as the nodes result in a mesh that can be refined to minimize the error.
In the last few years, the collocation method, which is a type of finite element method,
has been an emerging popular technique to solve various ordinary and partial differential
equations. This method has been developed from the finite element method using the con-
cepts of the finite difference method. This method has been applied to solve a variety of
mathematical problems with different types of basis functions with the aim to obtain the
best possible numerical solutions of various linear and nonlinear mathematical problems.
It involves satisfying a differential equation to some tolerance at a selected finite number
of points, called collocation points.
In this chapter, the collocation method will be discussed using B-spline basis func-
tions in standard as well as in trigonometric form. A numerical problem of an advection
diffusion equation is solved to describe the application of the method with details. The
obtained results are presented in the form of tables depicting the absolute and maximum
absolute errors.
15.2 Collocation method
In the collocation method, the numerical solution of a differential equation is obtained as a
linear combination of basis functions with unknown coefficients to be determined. In this
approach, a given function is approximated by a polynomial at collocation points chosen
by some predefined way that can be either uniform or nonuniform.
Let us discuss the application of this approach to a general differential equation
represented in the following form:
f ( x , u, ux ) = 0 (15.1)
to be solved by the collocation method in domain [xL, xR] with the known values of given
boundary conditions defined as
u( xL ) = ϕ 0 , u( xR ) = ϕ 1 (15.2)
To initialize, the method requires a proper choice of basis functions {ϕ 1 , ϕ 2 , … , ϕ N } and a

set of points given by xL = x1 < x2 <  < xN = xR .
The numerical solution can be approximated as
N
U= ∑ c ϕ (x) (15.3)
i=1
i i
Here, ci′ s are the unknowns to be calculated, and N is the total number of domain
partitions. The domain partition also affects the performance of the method, with more
domain partitions, the closer the approximate solution approaches to the exact solution.
To apply the approach of the collocation method, the approximated solution value at the
boundary is taken from the boundary conditions, and the solution is obtained at internal
node points.
Chapter fifteen: Introduction to collocation method 287
15.3 B-spline
The theory of B-spline function is well known in obtaining the approximate numerical
solution of boundary value problems, either ordinary or partial differential equations due
to their distinct properties.
Schoenberg [3] in 1946 was the first researcher to refer to the word B-spline (“B” refers
to basis) in his research work related to the field of mathematics. He described B-spline as
a short form of basis spline that represents a smooth, piecewise polynomial. The concept of
B-spline is an extended form of splines with some additional properties. A B-spline basis
function is a spline function described upon the knot sequence xi having minimal support
with respect to a given degree, smoothness, and domain partition. Following are some
related definitions related to nodes:
• A set of real numbers x0 ≤ x1 ≤ x2 ≤  ≤ xN − 1 ≤ xN , which is the uniform partition of

domain [xL, xR], are called the node points or knots.
• The ith knot span is the open-half interval given by [xi, xi+1].
• Node points are said to be uniform if they are equally spaced, i.e., xi+1 − xi = h, 0 ≤ i ≤
N − 1, else they are nonuniform nodes.
The first definition of the B-spline basis functions was given by Schumaker [4] using the
idea of divided differences. After this, a recurrence relation was independently obtained
by Cox [5] and de Boor [6] in the early 1970s to compute B-spline of various orders and
degrees. The recursive formula is used to calculate the mth B-spline basis function of the
lth degree in a recursive manner by implementation of the Leibniz’ theorem, which can be
stated as follows:
Bm ,l ( x) = Vm ,l Bm ,l − 1 ( x) + (1 − Vm+ 1,l )Bm+ 1,l − 1 ( x) (15.4)
 x − xm 
where, Vm ,l =  .
 xm+ l − xm 
This is the well-known Cox–de Boor recursion formula to calculate a particular degree
B-spline basis function as a linear combination of basis functions of smaller degree. Here
Bm,l(x) is an mth B-spline basis function of degree l, and x is a parameter variable.
The above-defined recurrence relation (15.4) can be used for l = 1 as the initial value
to generate the first-degree B-splines. It then results in construction of the higher-order
basis functions. The basis function Bm,l(x) for degree l ≥ 1 can hence be written as a linear
combination of two (l − 1)th degree basis functions.
15.3.1 B-spline of degree zero

On substituting l = 0 in (15.4), the basis function of zero-degree B-spline is obtained, which
is a step function, an elementary B-spline basis function defined by
 1, x ∈[ xm , xm + 1 )
Bm ,0 =  (15.5)
0, otherwise

From the definition, a zero-degree B-spline can be described as a function that is nonzero
and has value one, on the half open interval [xm, xm+1) while at all other points it is zero. The
appearance of zero-degree B-spline is as presented in Figure 15.1.
xm–1 xm xm+1 xm+2
Figure 15.1 Zero-degree B-spline.
15.3.2 First-degree (linear) B-spline

Linear B-spline is the first-degree B-spline calculated by using the Cox–de Boor recursive
formula (15.4) by substituting l = 1 and implementing the value of zero-degree B-spline
(15.5). The linear B-spline basis function is depicted as a tent or hat function having a non-
zero value in two knot spans [xm, xm+1) and [xm+1, xm+2) as follows:
 x − xm
 , x ∈[ xm , xm + 1 )
 xm + 1 − xm
 xm + 2 − x
Bm ,1 =  , x ∈[ xm + 1 , xm + 2 ) (15.6)
 x m + 2 − xm + 1

 0, otherwise

and can be represented as in Figure 15.2.
15.3.3 Second-degree (quadratic) B-spline

The quadratic B-spline is a second-degree B-spline that is obtained from the de Boor
recursion formula (15.4) for l = 2 and linear B-spline (15.6) as
 ( x − x m )2
 x ∈[ xm , xm + 1 )
 ( xm + 2 − xm )( xm + 1 − xm )
 ( x − xm )( xm + 2 − x) ( xm + 3 − x)( x − xm + 1 )
 + x ∈[ xm + 1 , xm + 2 )
 ( xm + 2 − xm )( xm + 2 − xm + 1 ) ( xm + 3 − xm + 1 )( xm + 2 − xm + 1 )
Bm ,2 = (15.7)
 ( xm + 3 − x)2
x ∈[ xm + 2 , xm + 3 )
 ( xm + 3 − xm + 1 )( xm + 3 − xm + 2 )


 0, otherwise

xm–1 xm xm+1 xm+2 xm+3
Figure 15.2 First-degree B-spline.

xm–1 xm xm+1 xm+2 xm+3 xm+4
Figure 15.3 Second-degree B-spline.
From the definition it can be concluded that this second-degree B-spline basis function is
nonzero for three consecutive knot spans and has the presentation of a curve as depicted
in Figure 15.3.
Using a similar approach, the formula for higher-degree B-splines can be obtained.
15.4 Characteristics of B-spline basis functions

B-spline basis functions have some distinguished characteristics described as follows:
1. Bm,l(x) is a nonzero polynomial on [xm, xm+l+1) for degree l ≥ 0.

2. On any span [xm, xm+1) at most l + 1 basis functions of degree l are nonzero:
Bm− l ,l ( x), Bm− l + 1,l ( x), Bm− l + 2,l ( x), … , Bm ,l ( x).
3. Nonnegativity:
For all m, l, and x, Bm,l(x) is nonnegative in the interval [xm, xm+l+1). The closed interval
is called the support of Bm,l(x).
4. Local knots:
The mth B-spline Bm,l(x) depends only on the knots xm , xm+ 1 , xm+ 2 , … , xm+ l + 1 .
5. Local support:
If x is outside the interval [xm, xm+l+1), then Bm,l(x) = 0:
The local support property indicates that each segment of a B-spline curve is influ-
enced by only l control points or each control point affects only l curve segments.
15.5 Types of B-spline
B-spline basis functions are also in trigonometric and exponential forms. The most com-
monly used basis functions are the cubic B-spline (B-spline of degree three) because of the
property that they are symmetric with respect to the origin. Following are the trigonomet-
ric and exponential B-spline basis functions of degree three.
15.5.1 Trigonometric B-spline basis functions

The trigonometric B-spline basis functions of degree three are defined as
 p 3 ( xm ) x ∈[ xm , xm + 1 )

1  p( xm )( p( xm )q( xm + 2 ) + q( xm + 3 ) p( xm + 1 )) + q( xm + 4 ) p ( xm + 2 )
2
x ∈[ xm + 1 , xm + 2 )
TBm ( x) =  (15.8)
w 2
q( xm + 4 )( p( xm + 1 )q( xm + 3 ) + q( xm + 4 )p( xm + 2 )) + p( xm )q ( xm + 3 ) x ∈[ xm + 2 , xm + 3 )

 q 3 ( xm + 4 ) x ∈[ xm + 2 , xm + 3 )
where
b−a
h= , is the step size for domain x ∈ [ a, b]
n

 x − xm   x − x  h  3h 
p( xm ) = sin  , q( xm ) = sin  m , w = sin   sin( h) sin  
 h   2   2  2
This is a polynomial cubic trigonometric function with some geometric properties like C∞
continuity, nonnegativity, and partition of unity.
15.5.2 Exponential B-spline basis functions

The exponential B-spline basis functions of degree three are defined as
  

 
1
p
(
b2  ( xm − 2 − x) − sinh ( p( xm − 2 − x)) 

) x ∈[ xm − 2 , xm − 1 )

 a1 + b1 ( xm − x) + c1 exp ( p( xm − x)) + d1 exp ( − p( xm − x)) x ∈[ xm − 1 , xm )
EBm ( x) =  (15.9)
 a1 + b1 ( x − xm ) + c1 exp ( p( x − xm )) + d1 exp ( − p( x − xm )) x ∈[ xm , xm + 1 )

 

 
1
p
(
b2  ( x − xm + 2 ) − sinh ( p( x − xm + 2 )) 

) x ∈[ xm + 1 , xm + 2 )

phc p  c(c − 1) + s2  p
where, a1 = , b1 =  , b2 =
phc − s 2  ( phc − s)(1 − c)  2( phc − s)
1  e − ph (1 − c) + s(e − ph − 1)  1  e ph (c − 1) + s(e ph − 1) 
c1 = , d =
4   4  ( phc − s)(1 − c) 
1
( phc − s)(1 − c)
b−a
c = cosh( ph), s = sinh( ph), and h = , is the step size for domain x ∈[ a, b]
n
and p is a parameter to be chosen from p = max pi .

0≤ i ≤ N
15.6 Methodology: Collocation method

using B-spline basis function
In the collocation method, using B-spline approximation leads to a technique that
requires only the unknown parameters at certain node points to generate the solution.
Collocation points and degree of B-spline are the key factors that play an important role
in the implementation of the method and also affect the results to be obtained up to a
desired level of accuracy. Consider a mesh a = x0 < x1 , … , xN − 1 < xN = b as a uniform par-
tition of the solution domain a ≤ x ≤ b by the knots xm with step size h = xm+1 − xm, where
m = 0,…, N − 1.
Let us discuss the method to solve a partial differential equation by assuming the
solution approximated by U(x, t) that can be written in linear combination form of B-spline
basis functions as follows:
m+ l − 2
U ( x , t) = ∑
j = m− l + 2
c j Bj ( x) (15.10)
Here, l defines the degree of the B-spline basis functions, m defines the number of
collocation points, and cm are the constants to be calculated from the generated matrix
system to be solved using any numerical method.
The formula for the cubic B-spline basis function using the definition and second-
order B-spline basis function was first given by Prenter [7] to solve a partial differential
equation given by
 ( x − x m − 2 )3 x ∈[ xm − 2 , xm − 1 )

 3
( x − xm − 2 ) − 4( x − xm − 1 ) 3
x ∈[ xm − 1 , xm )
1 
Bm ,3 ( x) = 3  3
( xm + 2 − x) − 4( xm + 1 − x) 3
x ∈[ xm , xm + 1 ) (15.11)
h 
 ( xm + 2 − x ) 3
x ∈[ xm + 1 , xm + 2 )
 0 otherwise

The cubic B-spline basis function as defined above will be nonzero at four knot spans and
is presented as in Figure 15.4.
From this definition, the values of Bm(x) at the node points with its first and second
derivatives can be tabulated as in Table 15.1.
By substituting l = 3 in (15.10), the solution can be approximated as
m+ 1
U ( x , t) = ∑ c (t)B (x) (15.12)

j = m− 1
j j
Hence, it can be simplified in the form
U ( xm , t) = cm− 1 Bm− 1 ( xm ) + cm Bm ( xm ) + cm+ 1 Bm+ 1 ( xm )

(15.13)
or U ( xm , t) = cm− 1 Bm ( xm+ 1 ) + cm Bm ( xm ) + cm+ 1 Bm ( xm− 1 )
It is evident that the nonzero part of Bm is localized to a small neighborhood of xm, namely,
in the interval xm− 2 < xm < xm+ 2 . Because of this, only Bm− 1 , Bm , Bm+ 1 contribute to the value
of U at xm. Using the values of basis functions at the node points from Table 15.1 in Equation
(15.13), the approximate solution and its derivatives up to second order can be determined
in terms of parameters cm′ s that can be written as
xm–3 xm–2 xm–1 xm xm+1 xm+2 xm+3
Figure 15.4 Third-degree B-spline.

Table 15.1 Value of Bm(x) for cubic B-spline and its derivatives at the nodal points
xm−2 xm−1 xm xm+1 xm+2
Bm(x) 0 1 4 1 0
Bm′ ( x) 0 3/ h 0 −3/h 0
Bm′′ ( x) 0 6/h2 −12/h2 6/h2 0
U ( xm , t ) = c m − 1 + 4 c m + c m + 1
hU ′( xm , t) = 3(cm + 1 − cm − 1 ) (15.14)
h2U ′′( xm , t) = 6(cm − 1 − 2 cm + cm + 1 )
15.7 Numerical solution of advection diffusion

equation using collocation method
To gain insight into the application of the collocation method to solve a partial differential
equation, let us apply the above described method to find the solution of a well-known
advection diffusion equation in one dimension, given by
ut + α ux − β uxx = 0, x ∈ [0, 1] (15.15)
with boundary conditions u(0, t) = ϕ 0, u(1, t) = ϕ 1, and initial condition u( x , 0) = φ ( x).

Here, u is the concentration of fluid with uniform velocity, α and β are constant repre-
senting the coefficient of diffusion, and l is the length of the channel.
The numerical solution for the equation is obtained on finite domain [0, l]. The domain
is discretized with uniform partition as x0 < x1 , … , xN − 1 < xN with common difference
h = xm+ 1 − xm , where m = 0,…, N − 1.
To implement the collocation method, the time derivative is discretized using the
finite difference approach, and the Crank–Nicolson scheme is applied on spatial variables
in Equation (15.15) to get
 un + 1 − un   un+ 1 + uxx
n
  (u )n+ 1 + (ux )n 
  = β  xx  −α  x  (15.16)
 ∆t   2   2 
Here, ∆t represents the time step.

On separating the values of solution u(x, t) at nth and (n + 1)th levels and hence
substituting the values of u(x, t) and its derivative in terms of the approximate function
given by (15.4), equation can be written in terms of unknown time parameters, cm′ s.
15.7.1 Using B-spline basis functions

On substituting the values of the approximate solution and its derivative given by (15.14),
Equation (15.16) reduces to
α∆t n β∆t n
cmn+−11 (1 − 6 z − 3 y ) + cmn+ 1 (4 + 12 z) + cmn++11 (1 − 6 z + 3 y ) = un − ux + uxx (15.17)
2 2
α∆t β∆t
Here, y = , z = 2 and m = 0,…, N.
2h 2h
This system consists of (N + 1) linear equations in (N + 3) unknowns {c−1 , c0 , c1 , … , c N , c N + 1 }.

To obtain a unique solution to this system, two additional constraints c−1 and cN+1 are
required. These additional constraints can be obtained by substituting the approximation
for u(x, t) in the boundary conditions.
At m = 0, using boundary condition, u( x0 , t) = ϕ 0, gives
 α∆t n β∆t n 
c0n+ 1 (36 z + 12 y ) + c1n+ 1 (6 y ) =  un − ux + uxx  − ϕ 0 (1 − 6 z − 3 y ) (15.18)
 2 2 
Similarly at m = N, using boundary condition u( xN , t) = ϕ 1, we obtain
 α∆t n β∆t n 
c Nn+−11 (−6 y ) + c Nn+ 1 (36 z − 12 y ) =  un − ux + uxx  − ϕ 1 (1 − 6 z + 3 y ) (15.19)
 2 2 
This results in a (N + 1) × (N + 1) matrix system, given by AC = B where C is the unknown

coefficients, {c0 , c1 ,  , c N − 1 , c N }.
The coefficient matrix A is given by
 36 z + 12 y 6y 0 0 0 0 
 
 1 − 6z − 3y 4 + 12 z 1 − 6z + 3y . . 0 
 0 1 − 6z − 3y 4 + 12 z 1 − 6z + 3y 0 0 
 
 0 0    0 
 0 . . 1 − 6z − 3y 4 + 12 z 1 − 6 z + 3 y 

 0 0 0 0 −6 y 36 z − 12 y 
 
with right-hand side
  n  α∆t  n  α∆t  n  
  u −  2  ux +  2  uxx  − ϕ 0 (1 − 6 z − 3 y ) 
 
 
  α∆t  n  β∆t  n 
un −  u + u
  2  x  2  xx 
 
 
  
 
  α∆ t   β∆t  
 un −   uxn +  
n
uxx 
  2   2  
 
  n  α∆t  n  α∆t  n  
  u −  2  ux +  2  uxx  − ϕ 1 (1 − 6 z + 3 y ) 
 
15.7.2 Using trigonometric B-spline basis functions

Using the definition of the trigonometric B-spline basis function, the values of basis
functions and the first and second derivatives can be tabulated as in Table 15.2, where ai′ s
are defined as
 h
sin 2  
 2 2
a1 = , a2 = ,
 3h  1 + 2 cos( h)
sin ( h ) sin  
 2
−3 3
a3 = , a4 = ,
 3h   3h 
4 sin   4 sin  
 2  2
 h
3 cos 2  
3 ( 1 + cos( h))  2
a5 = , a6 = −
 h   h  3h    h
16 sin    2 cos   + cos   
2
sin 2   ( 2 + 4 cos( h))
 2   2  2   2
Using the linear combination formula to write the approximate solution with trigonometric
B-spline basis functions up to the second-order derivative, the approximate solution can be
determined in terms of the time parameters cm′ s as
U ( xm , t) = a1cm − 1 + a2 cm + a1cm + 1
U ′( xm , t) = a4 cm − 1 + a3 cm + 1
U ′′( xm , t) = a5 cm − 1 + a6 cm + a5 cm + 1
On substituting the values of basis functions in Equation (15.16), the system can be
written as
cmn+−11 ( a1 − γ a5 + η a4 ) + cmn+ 1 ( a2 − γ a6 ) + cmn++11 ( a1 − γ a5 + η a3 ) = un − η uxn + γ uxx

n
(15.20)
β∆t α∆t
Here, γ = ,η = , and m = 0,…, N.
2 2
This system consists of (N + 1) linear equations in (N + 3) unknowns {c−1 , c0 , c1 , … , c N ,

c N + 1 }. To obtain a unique solution to this system, two additional constraints c−1 and cN+1 are
required. These additional constraints can be obtained by substituting the approximation
for u(x, t) in the boundary conditions. At m = 0, using boundary condition, u(0, t) = ϕ 0, gives
 a a   a a   α∆t n β∆t n 
c0n+ 1  γ  2 5 − a6  − η  2 4   + c1n+ 1η ( a3 − a4 ) =  un − ux + uxx 
  a1   a1    2 2 

ϕ
− 0 ( a1 − γ a5 + η a4 ) (15.21)
a1
Table 15.2 Value of Bm(x) for trigonometric cubic B-spline and its derivatives at the nodal points
xm−2 xm−1 xm xm+1 xm+2
TBm ( x) 0 a1 a2 a1 0
TBm′ ( x) 0 a3 0 a4 0
TBm′′ ( x) 0 a5 a6 a5 0
Similarly at m = N, using boundary condition u(1, t) = ϕ 1, we obtain
 a a   a a   α∆t n β∆t n 
c Nn+−11η ( a4 − a3 ) − c Nn+ 1  γ  2 5 − a6  − η  2 3   =  un − ux + uxx 
  a1   a1    2 2 
ϕ1
− ( a1 − γ a5 + η a3 ) (15.22)
a1
This results in a (N + 1) × (N + 1) matrix system, given by AC = B, where C is the unknown

coefficients, {c0 , c1 ,…, c N − 1 , c N } .
Application of the collocation method with both of the above discussed B-spline basis
functions results in a tridiagonal matrix system that can be solved by using the Thomas
algorithm [8]. On solving this tridiagonal matrix, the values of unknown cm′ s are obtained.
By using these values of cm′ s, the approximate solution at a particular time level is obtained.
But to find the solution at the first time level, values of cm′ s at the initial time level are
required, and then the system can be solved recursively. The solution at the zeroth time
level can be computed from the initial condition.
15.8 Numerical example
The B-spline basis functions are widely used to solve various linear and nonlinear ordi-
nary and partial differential equations, see Refs. [9–16]. This chapter is an effort to discuss
the basics of basis functions and their implementation to solve differential equations.
To get insight into the method, consider a problem with α = 0, β = 1 in Equation (15.15)
that reduces to the heat equation given as
ut − uxx = 0, x ∈[0,1] (15.23)
with boundary conditions u(0, t) = 0, u(1, t) = 0 and initial condition u( x , 0) = sin(π x).
The exact solution of the equation is given by u( x , t) = exp(−π 2 t) sin(π x).
Numerical solution of the concerned equation is obtained by collocation method using
standard B-spline and trigonometric B-spline basis functions at t = 1. To discuss the accu-
racy of the obtained solutions, errors are calculated using both types of basis functions
and are depicted in Table 15.3. The max absolute errors are depicted in Table 15.4 for two
different values of time step and domain partition. First, values are calculated at time step
0.01 and 40 domain partitions for t = 1 to t = 3 and then the time step is considered 0.001
with 160 domain partitions for same time levels.
It can be concluded from Tables 15.3 and 15.4 that the solution is comparable with the
exact solution from both forms of the B-spline basis functions. In case of standard B-spline
Table 15.3 Absolute errors using B-spline and trigonometric B-spline

Node point B-spline Trigonometric B-spline
0.1 8.15E-05 5.60E-06
0.2 3.67E-07 1.45E-05
0.4 7.87E-07 2.32E-05
0.5 8.27E-07 2.42E-05
0.6 7.87E-07 2.26E-05
0.8 6.71E-07 1.87E-05
0.9 8.15E-05 9.83E-06
Table 15.4 Maximum absolute errors using B-spline and trigonometric B-spline
Time B-spline Trigonometric B-spline
At N = 40 and ∆t = 0.01
1 2.0112E-04 2.4165E-05
2 1.2937E-05 3.2392E-07
3 1.0714E-06 2.7468E-08
At N = 160 and ∆t = 0.001
1 2.4242E-08 6.5252E-06
2 2.8568E-11 6.1191E-10
3 1.5320E-15 4.4115E-14
basis functions, the solution is very much improved on increasing the domain partitions
with small values of time steps, while in the case of the trigonometric B-spline, the solution
is also improved but at a normal pace.
References
1. W. Zahra, Numerical Treatment of Boundary Value Problems Using Spline Functions, LAP Lambert
Academic Publishing, GmbH & Co.KG and licensors, Germany 2010.
2. K. S. Surana, J. N. Reddy, The Finite Element Method for Boundary Value Problems, CRC Press,
Taylor & Francis Group, Boca Raton, FL, 2016.
3. I. J. Schoenberg, Contribution to the problem of approximation of equidistant data by analytical
functions, Quarterly Applied Mathematics, 4, 1946, 45–99.
4. L. L. Schumaker, Spline Functions, Basic Theory, Wiley, Cambridge University Press, New York, 1981.
5. M. G. Cox, The numerical evaluation of B-splines, Journal of the Institute of Mathematical
Applications, 10, 1972, 134–149.
6. C. de Boor, A Practical Guide to Splines, Springer Verlag, New York, 1978.
7. P. M. Prenter, Splines and Variational Methods, John Wiley & Sons, New York, 1975.
8. D. U. Von Rosenberg, Methods for Solution of Partial Differential Equations, Vol. 113, American
Elsevier Publishing Inc., New York, 1969.
9. M. K. Kadalbajoo, L. P. Tripathi, A. Kumar, A cubic B-spline collocation method for a numerical
solution of the generalized Black–Scholes equation, Mathematical and Computer Modelling, 55
(3–4), 2012, 1483–1505.
10. A. K. Khalifa, K. R. Raslan, H. M. Alzubaidi, A collocation method with cubic B-splines for solv-
ing the MRLW equation, Journal of Computational and Applied Mathematics, 212 (2), 2008, 406–418.
11. R. Pourgholi, Applications of cubic B-splines collocation method for solving nonlinear inverse
parabolic partial differential equations, Numerical Methods for Partial Differential Equations, 33 (1),
2017, 88–104.
12. M. Gholamian, J. Saberi-Nadjafi, Cubic B-splines collocation method for a class of partial
integro-differential equation, Alexandria Engineering Journal, 2017. doi:10.1016/j.aej.2017.06.004.
13. G. Arora, V. Joshi, A computational approach using modified trigonometric cubic B-spline for
numerical solution of Burgers’ equation in one and two dimensions, Alexandria Engineering
Journal, 2017. doi:10.1016/j.aej.2017.02.017.
14. G. Arora, V. Joshi, A computational approach for solution of one dimensional parabolic partial
differential equation with application in biological processes, Ain Shams Engineering Journal, 2016.
doi:10.1016/j.asej.2016.06.013.
15. M. Abbas, A. A. Majid, A. I. Md. Ismail, A. Rashid, Numerical method using cubic trigonomet-
ric B-spline technique for nonclassical diffusion problems, Abstract and Applied Analysis, 2014,
Article ID 849682, 11 pages.
16. O. Ersoy, I. Dag, The exponential cubic B-spline collocation method for the Kuramoto–
Sivashinsky equation, Filomat, 30 (3), 2016, 853–861, DOI 10.2298/FIL1603853E.
chapter sixteen
Rayleigh’s approximation method

on reflection/refraction phenomena
of plane SH-wave in a corrugated
anisotropic structure
Neelima Bhengra
Indian Institute of Technology (ISM)
Contents
16.1 I ntroduction......................................................................................................................... 297
16.2 Problem formulation and its solution.............................................................................. 299
16.2.1 Solution for the lower highly anisotropic half-space......................................... 301
16.2.2 Solution for the upper fluid-saturated poroelastic half-space.......................... 302
16.3 Boundary conditions..........................................................................................................305
16.4 Solution of the first-order approximation of the corrugation.......................................305
16.5 Solution for second-order approximation of the corrugation...................................... 307
16.6 Special case of a simple harmonic interface....................................................................309
16.7 Particular cases for special case........................................................................................ 310
16.8 Energy distribution............................................................................................................ 312
16.9 Numerical discussion and results.................................................................................... 313
16.9.1 Effect of corrugation amplitude............................................................................ 314
16.9.2 Effect of corrugation wavelength......................................................................... 316
16.9.3 Effect of frequency factor....................................................................................... 317
16.9.4 Influence of initial stress parameter on poroelastic half-space........................ 318
16.9.5 Influence of initial stress parameter on highly anisotropic half-space........... 321
16.10 Concluding remarks......................................................................................................... 324
References...................................................................................................................................... 325
16.1 Introduction
In recent years, the phenomena of elastic wave scattering due to different obstacles present
in the media have drawn the considerable attention of many distinct researchers across the
globe. This is because this investigation enables us to unravel deep subsurface structures
that have immense operational usage in oil exploration, earthquake engineering, and much
more. Various types of materials provide distinct propagative behavior to the waves under-
neath the earth. During SH-wave propagation, the boundaries present between the layers
distribute the individual waves to reflect or transmit through the interface depending on the
angle of incidence. This defines the distributive characteristics of the interface. Furthermore,
297
the boundaries are majorly irregular or corrugated, which further increases the complex-
ity of the investigation. The variation of the wave propagation also depends largely on the
physical characteristics of the medium. Therefore, propagation through such layers can also
enlighten us with some important facts about faults and anticlinal structures beneath the
earth. The phenomena of reflection and transmission have been the principal concept behind
the subjects of geophysics and seismology. Explorations of oil and gas companies have been
using this concept for years to detect the accumulation of hydrocarbons beneath the earth.
Literature is already present on the reflection and transmission of SH-waves, such as Ewing
et al. (1957), Keith et al. (1977), Aki and Richards (2002), etc. Fokkema (1980) investigated these
phenomena using the time-harmonic waves. The study of stress free boundary between two
incompressible materials using the reflection and transmission phenomena was done by Pal
and Chattopadhyay (1984).
Nowadays, the reflection and refraction through porous media have become one
of the core subjects of investigation due to the dynamic behavior. A separate field of
study has emerged concerning the propagation through the porous media. A typical
porous media is the one that has some pores in it, which is usually filled with fluid.
Such materials are often characterized by their porosity values. Porosity is defined as
the ratio of the volume of void space to the total volume. Porosity values range from 0
to 1. The connected pore space enables the filtration of pore fluid through the porous
media. Pumice, sandstone, and soil are some of the naturally occurring porous materi-
als found in the earth. The dynamic nature of such porous materials is the major aspect
of rock study, which is effectively used in seismic exploration for detailed investiga-
tion of subsurface structures to explore sedimentary basins for hydrocarbon produc-
tion. Deresiewicz (1961) first studied the boundary effects on the wave propagation in a
liquid-filled porous media. Wu et al. (1990) investigated reflection and refraction of elastic
waves from a fluid-saturated porous solid boundary. Sharma and Gogna (1992) used the
plane harmonic waves to investigate the reflection and refraction phenomena through
an interface between an elastic solid and a liquid-saturated porous media by making
purposeful use of the asymptotic approximation of dissipation function. Tajuddin and
Haussaini (2005) analyzed the reflection phenomena of plane waves at the boundaries
of a liquid-filled poroelastic half-space. Tomar and Arora (2007) studied the reflection
and refraction phenomena of elastic waves through an elastic/porous solid filled with
immiscible fluids.
Wave propagation through an anisotropic media is fundamentally very different to
an isotropic media. In seismology, if there are variations in phase velocity that depend
largely on factors such as wave propagation direction, particle motion direction, the ori-
entation of the material and the stress and strain of the propagating media, then it is said
that there is an anisotropy in the propagating medium. The anisotropic properties of
the material have a significant contribution on the reflection and refraction coefficients.
Information of such coefficients can help us to understand the mechanical properties of
the medium. Anisotropy also occurs due to the presence of thin laminates arranged in
a particular order. Other factors such as micro-fracturing and orientation of the mineral
can also result in a general anisotropy. Normally, it is difficult to derive a general anisot-
ropy from a specific anisotropy; therefore, it is necessary that during a wave propagation
problem, the anisotropy should be of the general type. These general problems have
motivated the present study. Crampin (1977) was the first researcher who differentiated
anisotropy with isotropy. He established that the variation in velocity due to anisotropy
is one of the many anomalies that can occur in the media. The concepts of reflection and
transmission phenomenon in the anisotropic half-space have been the base of geological
Chapter sixteen: Rayleigh’s approximation method 299
study to explore continental margins for mineral exploration. Daley and Hron (1979)
investigated the ellipsoidal anisotropic media to derive reflection and transmission coef-
ficients for seismic waves. Rokhlin et al. (1986) studied this wave scattering phenomena
of elastic waves on a plane interface lying between two generally anisotropic media.
Then, Thomsen (1988) published a paper on reflection seismology over azimuthally
anisotropic media.
Rayleigh (1907) made the first attempt to find the solution to the reflection problem of
light or sound when incident perpendicularly on an uneven boundary surface. Then, Sato
(1955) applied Rayleigh’s concept on the elastic waves, which was later extended by Asano
(1960, 1961, 1966). Abubakar (1962a–c) attempted to study the problem of scattering of
elastic waves incident on a corrugated interface by utilizing the perturbation technique.
Saini and Singh (1977) studied the effect of anisotropy on the reflection of SH-waves at
an interface. In general terms, the Rayleigh’s method approximates the exponential term
associated with the corrugated interface. For the solution of first-order approximation of
corrugation, the linear terms are retained, and since the amplitude and slope of the cor-
rugated boundary are assumed to be very small, the higher orders are neglected. Several
other kinds of literature have also been published on Rayleigh’s method implemented
on elastic wave scattering in the corrugated interface, such as Tomar and Saini (1997),
Tomar et al. (2002), Tomar and Kaur (2003), Tomar and Singh (2007), etc. Tomar and Kaur
(2007) then investigated the behavior of the SH-wave at a corrugated interface that lies in
between a dry sandy half-space and an anisotropic elastic half-space.
In the present chapter, utilizing Rayleigh’s approximation method, an attempt has been
made to study the reflection and refraction pattern in a corrugated interface sandwiched
between an initially stressed fluid-saturated poroelastic half-space and a highly aniso-
tropic half-space. Here the highly anisotropic half-space is considered as triclinic. Closed
form formulae for the reflection and refraction coefficients have been derived. Rayleigh’s
method has been effectively used to derive first- and second-order approximations of the
coefficients. Some special cases have also been deduced. The energy ratios of the reflected
and refracted waves are also presented. Various two-dimensional plots have been drawn
to show the effects of some affecting parameters such as initial stress parameter, corruga-
tion amplitude, wavelength and frequency factor.
16.2 Problem formulation and its solution

Let us assume z = ς ( x) as the equation of the corrugated interface that separates two media,
namely, an initially stressed fluid-saturated poroelastic half-space and a highly anisotro-
pic half-space. In the above equation, ζ is considered as a periodic function of x, which is
independent of y whose mean value is zero. The x-axis is taken on the horizontal plane,
while z-axis is taken vertically downward. Let F2 be the upper half-space occupying the
region −∞ < z ≤ ζ ( x ) and F1 be the lower half-space occupying the region ζ ( x ) ≥ z > ∞. The
geometry of the problem is presented in Figure 16.1.
The Fourier series representation of the function can be taken as
ζ= ∑ ζ e
n =1
n
inλ x
+ ζ − ne − inλ x  (16.1)
Here, ζ n and ζ − n are Fourier expansion coefficients, λ is the wave number and n is series
expansion order and the wavelength of corrugation is 2π/λ.
(D0 , f )
(D1, f1)
(D1΄, f1΄)
F2 SH
(f)
ζ
x
F1
SH
(e) (e) (B1΄, e1΄)
SH
(B1, e1)
Z (B, e)
Figure 16.1 Geometry of the problem.
Introducing the constant terms d, cn, and Sn such that
d c ∓s
ζ 1 = ζ −1 = , ζ ± n = n n , n = 2,3,… (16.2)
2 2
Using Equation (16.2), the series in Equation (16.1) can be written as
ζ = d cos λ x + ∑[ c cos nλ x + s sin nλ x], (16.3)

n =2
n n
If the shape of the corrugated interface is represented by only one cosine term, i.e.,
ζ = d cos λ x ; then 2π/λ and d is the wavelength and amplitude of corrugation, respec-
tively. Let ui,vi and wi(I = 1, 2) be the displacement components along x, y, and z directions,
respectively.
For the propagation of SH-wave, it is assumed that
∂
ui = 0, wi = 0, vi = vi ( x , z, t), ≡ 0 (16.4)
∂y
Indices 1 and 2 stand for the highly anisotropic half-space and the fluid-saturated poroelas-
tic half-space. The first and second partial derivatives with respect to time are represented
d d
as ∂t and ∂tt, respectively. Moreover, ∂ z and ∂ zz stand for and 2 , respectively.
dz dz
16.2.1 Solution for the lower highly anisotropic half-space

Consider a homogeneous highly anisotropic elastic medium that has 21 elastic constants.
By using the Hooke’s law, the stress–strain relations of the highly anisotropic media are
given by
T11t = F11e11 + F12 e22 + F13 e33 + F14 e23 + F15 e13 + F16 e12 ,
T22t = F12 e11 + F22 e22 + F23 e33 + F24 e23 + F25 e13 + F26 e12 ,
T33t = F13 e11 + F23 e22 + F33 e33 + F34 e23 + F35 e13 + F36 e12 ,
(16.5)
T23t = F14 e11 + F24 e22 + F34 e33 + F44 e23 + F45 e13 + F46 e12 ,
T13t = F15 e11 + F25 e22 + F35 e33 + F45 e23 + F55 e13 + F56 e12 ,
T12t = F16 e11 + F26 e22 + F36 e33 + F46 e23 + F56 e13 + F66 e12 .
Here, Tijt , Fij, and eij are the components of the stress tensor, stiffness coefficients, and
components of the strain tensor, respectively.
The equations of motion in the absence of body forces are given by Biot (1965),
∂ x T11t + ∂ y T12t + ∂ z T13t − S11

t
( ∂y ω z − ∂z ω y ) = ρ1 ∂tt u1 ,
∂ x T21t + ∂ y T22t + ∂ z T23t − S11
t
∂ x ω z = ρ1 ∂tt v1 , (16.6)
∂ x T31t + ∂ y T32t + ∂ z T33t + S11

t
∂ x ω y = ρ1 ∂tt w1 .
where ρ1 denotes the mass density, and ωx, ωy, and ωz are the rotational components
given by
ω x = 1 2 ( ∂ y w − ∂ z v ) , ω y = 1 2 ( ∂ z u − ∂ x w ) , and ω z = 1 2 ( ∂ x v − ∂ y u) .
Now, using Equations (16.4)–(16.6), we get the governing equation of motion and the
stress–strain components
t
S11
∂ x T21 + ∂ z T23 − ∂ x v1 = ρ1 ∂tt v1 (16.7)
2
and
T21 = F66 ∂ x v1 + F64 ∂ z v1 , T23 = F46 ∂ x v1 + F44 ∂ z v1 (16.8)
Using Equations (16.7) and (16.8), we obtain

t
S11
F66 ∂ xx v1 + 2 F46 ∂ xz v1 + F44 ∂ zz v1 − ∂ xx v1 = ρ1 ∂tt v1 (16.9)
2
Assuming the solution as v ( x , z, t ) = V ( z ) e i(ω t − k1x ) , where ω is the angular frequency,

ω sin e
k1 = is the horizontal component of the wave number, e being the angle of incidence
β1
and substituting these in Equation (16.9), we have
 ω2 
∂ zz V − 2 ik1µ1 ∂ z V + k12 µ2  2 2 − 1 + ξ  V = 0 (16.10)
 k1 β 1 
where
F46 F St F
µ1 = ,µ2 = 66 , ξ = 11 , and β12 = 66
F44 F44 F66 ρ1
The solution of Equation (16.10) is given as
v1 ( x , z, t ) = ( A0 e i Ω0 z + B0 e − i Ωz ) (16.11)
where
(
Ω0 = k1 µ1 + µ12 + µ2 ( cot 2e + ξ ) ) and Ω = k (− µ +
1 1 µ12 + µ2 ( cot 2e + ξ ) )
Hence, the displacement for the lower half-space is given by
v1 ( x , z, t ) = ( B0 e iΩ0 z + Be − iΩz ) e(ω t − k1x ) (16.12)
16.2.2 Solution for the upper fluid-saturated poroelastic half-space

Let us consider the upper half-space as a fluid-saturated transversely isotropic poro-
elastic infinite medium. At first, we deduce the equation governing the propagation of
SH-wave in a poroelastic medium. If (u2, v2, w2) and (U2, V2, W2) denote the components
of solid-phase displacements and fluid-phase displacements of the poroelastic medium,
respectively, then for SH-wave propagating in x, z direction and causing displacement in
y direction only, we have
u2 = 0, w2 = 0, v2 = v2 ( x , z, t) and U 2 = 0, W2 = 0, V2 = V2 ( x , z, t) (16.13)
For the upper half-space, the stress–strain relations are
T11p = ( 2 N + A ) e11 + Ae22 + Fe33 + ME,
T22p = Ae11 + ( 2 N + A ) e22 + Fe33 + ME,
T33p = Fe11 + Fe22 + 2Ce33 + KE,
T12p = 2 Ne12 , (16.14)
T23p = 2Ge23 ,
T31p = 2Ge31 ,
σ = Me11 + Me22 + Ke33 + DE,
where, A,C,D,F,G,K,M,N are the material constants; Tijp are the components
 of the stress
tensors acting on the solid phase of the poroelastic material; E = divU i is the fluid volumetric
strain; σ = −fp is the stress acting on the fluid phase of poroelastic material in which P is the
pressure in the fluid and f is the porosity of the poroelastic material.
With the help of Equations (16.13) and (16.14), the equation of motion for SH-wave
propagation in fluid-saturated initially stressed poroelastic medium in the absence of
body forces and the viscoelasticity of the fluid based on Biot (1956a,b, 1962, 1965) can be
written as
∂ x T21p + ∂ z T23p − S11

p
∂ z ω 13 = ∂tt [ ρ11v2 + ρ12V2 ] ,
(16.15)
∂tt [ ρ11v2 + ρ12V2 ] = 0,
1
where ω ij =
2
( ui , j − uj , i ); S11p are horizontal initial stress; ρ11 , ρ22, and ρ12 take into account as
the inertial effects of the moving fluids and are associated with the densities of the solid
part ρ s , fluid part ρ f , and the aggregate medium ρ2 by the relations such that the mass
density of the aggregate is ρ2 = ρ11 + 2 ρ12 + ρ22 = ρ s + f ( ρ f − ρ s ) .
Moreover, the following inequalities also hold for the dynamic coefficients
ρ11 > 0, ρ12 ≤ 0, ρ22 > 0, ρ11 ρ22 − ρ12 2
>0
On further simplification, Equation (16.15) results in
 p
S11 
 N +  ∂ xx v2 + G ∂ zz v2 = d′ ∂tt v2 (16.16)
2 
 ρ2 
where d′ =  ρ11 − 12  .
 ρ22 
To solve Equation (16.16), we assume v2 ( x , z, t ) = V ( z ) e i(ω t − k2 x ) and after applying this in
Equation (16.16), we get
∂ zz V + k22η 2V = 0, (16.17)
where
N Sp  γ2 
η = k2  µ1′ ( −1 + µ2′ cosec 2 f )  , µ1′ = ( 1 + ξ1 ) , µ2′ = d p ( 1 + ξ1 ) , ξ1 = 11 , and d p =  γ 11 − 12 
−1
G 2N  γ 22 
The solution of Equation (16.17) is given by
V ( x , z, t ) = ( C0 e − iη z + D0 e iη z ) (16.18)
The displacement for upper initially stressed poroelastic half-space, i.e., F2, is given as
v2 ( x , z, t ) = ( C0 e − iη z + D0 e iη z ) e i(ω t − k2 x ) (16.19)
where C0 and D0 are constants, and k2 is the wave number defined by the law of refraction
k2 : k1 = sin e : sin f
f is the refraction angle, and ρ2 is the density of the upper half-space.

Let us assume that a ray of plane SH-wave of unit amplitude is propagating in the lower
half-space (F1) and is incident at the corrugated interface z = ζ , making an angle e with the
z-axis. Due to the corrugation at the interface, the reflection and refraction phenomena
will be affected, and the incident SH-wave will give rise to (1) a regularly reflected and a
regularly transmitted wave at angles e and f with the z-axis, in the lower (F1) and upper
half-space (F2), respectively; (2) a spectrum of nth order of irregularly reflected and irregu-
larly refracted waves at angles en and fn in the left side of regularly reflected and regularly
refracted waves, respectively; and (3) a similar spectrum of irregularly reflected and
irregularly refracted waves at angles en′ and fn′ in the right side of regularly reflected and
regularly refracted waves, respectively, at the corrugated interface.
The angle of refraction f is related to the angle of incidence e through Snell’s law
sin e sin f
= (16.20)
β1 β2
The angles en , en′ , fn , and fn′, are given by the following spectrum theorem by Abubakar
(1962a–c):
nλβ1 nλβ1
sin en − sin e = , sin en′ − sin e = − ,
ω ω
(16.21)
nλβ 2 nλβ 2
sin fn − sin f = , sin fn′ − sin f = −
ω ω
The total displacement in the lower highly anisotropic half-space (F1) is then given by the
sum of the incident, regularly reflected, and irregularly refracted waves
 ∞ ∞
 iω  t − x sin e 
v ( x , z, t ) =  B0 e iΩ0 z + Be − iΩz +

∑
n =1
Bne − iΩn z e − inλ x + ∑
n =1
Bn′ e − iΩn′ z e inλ x  e  β1 

(16.22)
where
Ωn =
ω sin en
β1 (
− µ1 + µ12 + µ2 ( cot 2en + ξ ) and Ω′n =
ω sin en
β1 )
− µ1 + µ12 + µ2 ( cot 2en′ + ξ ) ( )
Similarly, the total displacement in upper initially stressed fluid-saturated poroelastic
half-space (F2) is the sum of regularly refracted and irregularly refracted waves:
 ∞ ∞
 iω  t − x sin f 
v1 ( x , z, t ) =  D0 e iη z +

∑n =1
Dne iηn z e − inλ x + ∑
n =1
Dn′ e iηn′ z e inλ x  e  β2  (16.23)

where
ω sin fn  ω sin fn 
ηn =  µ1′ ( −1 + µ2′ cosec 2 fn )  and ηn′ = µ1′ ( −1 + µ2′ cos ec 2 fn′ ) 
β2 β2 
The constants B0 and B are the reflection and refraction coefficients at plane interface,
respectively, and the constants Bn , Bn′ and Dn , Dn′ are the reflection and refraction coeffi-
cients, respectively, for the first-order approximation of corrugation. All these constants
are determined from the boundary conditions at the interface.
16.3 Boundary conditions
The boundary conditions at the corrugated interface z = ζ ensure the continuity of
displacement and stress, i.e.,
v1 = v2 (16.24)
(T23t − ζ ′T12t ) = (T23p − ζ ′T12p ) (16.25)

where ζ ′ is the derivative of ζ with respect to x. Substituting Equations (16.12) and (16.19) in
the above boundary conditions, we obtain
 iΩ ζ ∞ ∞

 B0 e 0 + Be − iΩζ +

∑
n =1
Bne − iΩnζ e − inλ x + ∑
n =1
Bn′ e − iΩn′ ζ e inλ x 

(16.26)
 ∞ ∞

=  D0 e iη z +

∑n =1
Dne iηn z e − inλ x + ∑
n =1
Dn′ e iηn′ z e inλ x 

and
 ω sin e  ω sin e
B0 Ω0 ( F44 − ζ ′F64 ) − ( F46 − ζ ′F66 ) e iΩ0ζ + B{−Ω ( F44 − ζ ′F64 ) − β
 β 1  1
∞
  ω sin e  
× ( F46 − ζ ′F66 )}e − iΩζ + ∑B e
n =1
n
− iΩnζ − inλ x
e − ( F44 − ζ ′F64 ) Ωn − 
  β1
+ nλ  ( F46 − ζ ′F66 ) 
 
∞
  ω sin e  
+ ∑B 'e
n =1
n
′ ζ inλ x
− iΩn
e − ( F44 − ζ ′F64 ) Ω′n +  −
  β1
+ nλ  ( F46 − ζ ′F66 ) 
 

∞
 ω sin f  iηζ  ω sin f 
= D0 Gη + ζ ′N
 β2 
e + ∑D e
n =1
n
− iηnζ − inλ x
e ηnG + nλζ ′N + ζ ′N
 β2 

∞
 ω sin f 
+ ∑D′
n =1
n
iηn′ ζ inλ x
e ηn′ G − nλζ ′N + ζ ′N
 β2 

(16.27)
From Equations (16.26) and (16.27), the reflection and refraction coefficients of nth order of
approximation of the corrugated interface can be determined.
16.4 Solution of the first-order approximation

of the corrugation
As discussed earlier, the amplitude and slope of corrugation are very small, so the higher
powers of ζ can be neglected. The exponential function involving ζ can then be approxi-
mated as (Rayleigh’s approximation of first order)
e ± iαζ = 1 ± iαζ . (16.28)

In view of Equation (16.28), using Equations (16.26) and (16.27) by collecting the terms
independent of x and ζ to both sides,
B0 + B = D0 (16.29)
 F46ω sin e   F46ω sin e 

 F44Ω0 − β1  B0 −  F44Ω + 2 β  B = D0Gη (16.30)
1
These equations provide the values of reflection and refraction coefficients of the regularly
reflected and refracted SH-wave at a plane interface.
Solving Equations (16.29) and (16.30), we have
  F46ω sin e  
 −Gη +  F44Ω0 −  
B   β1  , (16.31)
=
B0   F46ω sin f  
Gη +  F44Ω +  
  β1 

D0
=
[ F44Ω0 + F44Ω ] (16.32)
B0   F46ω sin f 
Gη +  F44Ω +  
  β1 
Equations (16.31) and (16.32) give the reflection and refraction coefficients of SH-waves at a
plane interface between initially stressed fluid-saturated poroelastic half-space and highly
anisotropic half-space.
In order to find the solutions of the first-order approximation for the coefficients Bn
and Dn, we arrange the coefficients of e−inλx on both sides of Equations (16.26) and (16.27),
and then we have
Bn − Dn = iζ − n ( − B0Ω0 + BΩ + ηD0 ) (16.33)
bnBn − dnDndn = − iζ − n ( t3 D0 − t2 B − t1B0 ) (16.34)
  ω sin e  
bn = − F44Ωn − F46  + nλ   , dn = ηnG,
  β1  
  ω sin e   ω sin e  
t1 = nλ  − F46Ω0 + F66  + Ω0  − F44Ω0 + F46 β   ,
  β 1 1 
where
  ω sin e   ω sin e  
t2 = nλ  F46Ω + F66  + Ω  − F44Ω − F46 ,
  β1   β 1  
 nλ Nω sin f 
t3 = −Gη 2 + 
 β2 
Equating the coefficients of einλx, we obtain the first-order approximation for coefficients Bn′
and Dn′ ,
Bn′ − Dn′ = iζ n ( − B0Ω0 + BΩ + ηD0 ) (16.35)
Bn′ bn′ − Dn′ dn′ = − iζ n ( t6 D0 − t5 B − t4 B0 ) (16.36)
where
  ω sin e  
bn′ = − F44Ω′n + F46  − + nλ   , dn′ = ηn′ G,
  β1  
  ω sin e   ω sin e  
t4 = nλ  F46Ω0 − F66  + Ω0  − F44Ω0 + F46 ,
  β1   β1  

  ω sin e   ω sin e  
t5 = nλ  − F46Ω − F66  + Ω  − F44Ω − F46 ,
  β1   β1  
 nλ Nω sin f 
t6 = −Gη 2 − 
 β2 
From Equations (16.33) to (16.36), we obtain the reflection and refraction coefficients of
irregularly reflected and refracted waves for the first-order approximation:
Bn ΠB+n Dn ΠD+ n Bn′ ΠB−n′ Dn′ ΠD− n′

= , = + , = , = − (16.37)
B0 Π+n B0 Πn B0 Π−n B0 Πn
where
  B D  D B 
Π+Bn = iζ − n ( dn − bn )  −Ω0 + Ω + η 0  + ( t3 − ηbn ) 0 − ( t2 + bnΩ ) + ( −t1 + bnΩ0 )  ,
  B0 B0  B0 B0 
 D B 
ΠD+ n = iζ − n ( t3 − ηbn ) 0 − ( t2 + bnΩ ) + ( −t1 + bnΩ0 )  ,
 B0 B0 
  B D  D B 
Π−Bn′ = iζ n ( dn′ − bn′ )  −Ω0 + Ω + η 0  + ( t6 + ηbn′ ) 0 − ( t5 + bn′Ω ) + ( −t4 − bn′Ω0 )  ,
  B0 B0  B0 B0 
 D B 
ΠD− n′ = iζ n ( t6 + ηbn′ ) 0 − ( t5 + bn′Ω ) + ( −t4 − bn′Ω0 )  ,
 B0 B0 
Π+n = ( bn − an ) , Π−n = ( dn′ − bn′ )
16.5 Solution for second-order approximation

of the corrugation
For the solution of second-order approximation, we disregard the terms involving the
third and higher powers of ζ so that
exp ( ± iαζ ) = 1 ± iαζ − (αζ ) /2 (16.38)

2

Using Equations (16.1) and (16.38) into Equations (16.26) and (16.27) and comparing the
terms independent of x, the coefficients of e−inλx and those of einλx, separately on both sides
of the equations thus obtained, we get the following system of six equations which on
solving will give reflection and refraction coefficients for the second-order approximation:
B0 ( 1 − Ω20ζ − nζ n ) + B ( 1 − Ω2ζ − nζ n ) − iΩnζ nBn − iΩ′nζ − nBn′ = D0 ( 1 − η 2ζ − nζ n ) + iηnζ nDn + iηn′ζ − nDn′ ,
Ω′n2ζ −2nBn′ η ′ 2ζ 2 D′
iΩ0ζ − nB0 − iΩζ − nB + ( 1 − Ω2nζ − nζ n ) Bn − = iηζ nD0 + ( 1 − ηn2ζ − nζ n ) Dn − n − n n ,
2 2
Ω2nζ n2 Bn η 2ζ 2 D
iΩ0ζ nB0 − iΩζ nB − + ( 1 − Ω′n2ζ − nζ n ) Bn′ = iηζ nD0 − n n n + ( 1 − ηn′ 2ζ − nζ n ) Dn′ ,
2 2
 iF ω sin e 
(
B0 iF44Ω0 1 − Ω20ζ − nζ n + 46
β
) (
−1 + Ω20ζ − nζ n  )
 1 
 iF ω sin e 
(
+ B iF44Ω −1 + Ω2ζ − nζ n + 46
β
) (
−1 + Ω20ζ − nζ n  )
 1 
  Ω2ζ ζ   ζ F ω sin e 
+ Bn − F44Ωn2ζ n + λ n  −1 + n − n n  ζ nΩn F64 + λ nζ n F66 + n 66 
  2  β1 
 F ω sin e     Ω′n2ζ − nζ n 
+ F46Ωnζ n − λ n + 46   + Bn′ − F44Ω′n ζ − n + λ n  1 −
2

 β1     2
 ζ F ω sin e   F46ω sin e  

× ζ − nΩ′n F64 − λ nζ − n F66 + − n 66  + F46Ω′nζ − n λ n − 
 β 1   β1  
(
= D0 iGη 1 − ηn2ζ − nζ n )
   η 2ζ ζ  ζ ω sin f  η 2ζ ζ   
+ Dn  −Gηn2ζ n − λ nN λ nζ n  −1 + n − n n  + n  −1 + n − n n   
   2  β2  2   
   η ′ 2ζ ζ  ζ ω sin f  ηn′ 2ζ − nζ n   
+ Dn′  −Gηn′ 2ζ − n − λ nN  λ nζ − n  1 + n − n n  + − n  −1 +    ,
   2  β2 2  

  Ω2ζ ζ  λ nF66ω sin eζ − n  Ω20ζ − nζ n  Ω0 F46ω sin eζ − n 
B0  F64 λ nΩ0ζ − n  1 − 0 − n n  − F44Ω2 0ζ − n +  1 +  + 
  2  β1 2 β1 
  Ω2ζ − nζ n  λ nF66ω sin eζ − n  Ω2ζ − nζ n  ΩF46ω sin eζ − n 

+ B  F64 λ nΩζ − n  1 −  − F44Ω2ζ − n −  1+  − 
  2  β1  2 β1 
 F ω sin e 
( ) (
+ Bn  F64 λ nΩnζ − n −1 + Ωn2ζ − nζ n + iF64 λ n −1 + Ωn2ζ − nζ n − 46
β1
) (
1 − Ωn2ζ − nζ n  )
 
  F Ω′   ω sin e   F Ω′  
+ Bn′ iΩ′n2ζ −2n  − λ nF64 + 44 n  + iF66 λ nΩ′nζ −2n  λ n −   F66 − 46 n  
  2   β1   2  
 λ nNζ − nω sin f  η 2ζ − nζ n  
= D0  −Gη 2ζ − n − −1 + 
 β2  2 
  − iηn2  iλ nN  ζ −2nω sin f  

+ DniGηn 1 − ηn2ζ − nζ n  + Dn′ Gηnζ −2n  + λ 2 n2  − ηn′  ,
  2  β2  β2 
  Ω2ζ ζ  λ nF66ω sin eζ n  Ω2ζ ζ  Ω F ω sin eζ n 

B0  F64 λ nΩ0ζ n  1 − 0 − n n  − F44Ω20ζ − n +  −1 + 0 − n n  + 0 46 
  2  β1  2  β1 
  Ω2ζ − nζ n  λ nF66ω sin eζ n  Ω2ζ − nζ n  ΩF46ω sin eζ n 

+ B  F64 λ nΩζ − n  −1 +  − F44 Ω 2
ζ n −  −1 +  − 
  2 β1 2 β1 
  Ω2 F   ω sin e  iF46ηn2ζ n2  ω sin e  

+ Bn iλ nΩn2ζ n2  F64 + n 44  + iF66ηnλ nζ n2  λ n +  +  λn + 
  2   β1  2  β 1  

  ω sin e ζ ζ ω sin eηn′ 2  
( )
+ Bn′  F44ηn′ −1 + Ω′n2ζ − nζ n + iF46  λ n −
 β1
− λ nΩ′n2ζ − nζ n + − n n
β1  
 
 λ nNζ nω sin f  η 2ζ − nζ nω sin f  

= D0  −Gη 2ζ n + 1 − 
 β2  2 
  − iηn2  iλ nN 
+ Dn′ iGηn′ 1 − ηn′ 2ζ − nζ n  + Dn Gηnζ −2n 
 2   +
β2
{ }
λ nβ 2ζ n2 + ω sin fηnζ n2 
 
16.6 Special case of a simple harmonic interface

We now obtain the reflection and refraction coefficients of incident plane SH-wave at an
interface that is given by ζ = d cos λ x . Thus, the equation for the interface can be obtained
by setting
 d
 ; when n = 1
ζ n = ζ −n = 2
 0; when n = 2, 3,

In this case, 2π/λ is the wavelength and d is the amplitude of the corrugation. Thus, the
reflection and refraction coefficients for the first-order approximation of the corrugation
can be obtained by setting n = 1 in Equation (16.37), and we obtain
B1 Π+B1 D1 ΠD+ 1 B1′ Π−B1′ D1′ ΠD− 1′

= , = + , = , = − (16.39)
B0 Π1+ B0 Π1 B0 Π1− B0 Π1
where
d  B D  D B 
Π+B1 = i ( d1 − b1 )  −Ω0 + Ω 1 + η 0  + ( t3′ − ηb1 ) 0 − ( t2′ + b1Ω ) 1 + ( −t1′ + b1Ω0 )  ,
2  B0 B0  B0 B0 
d D B 
Π+D1 = i ( t3′ − ηb1 ) 0 − ( t2′ + b1Ω ) 1 + ( −t1′ + b1Ω0 )  ,
2 B0 B0 
d  B D  D B 
Π−B1′ = i ( d1′ − b1′ )  −Ω0 + Ω 1 + η 0  + ( t6′ + ηb1′ ) 0 − ( t5′ + b1′Ω ) 1 + ( −t4′ − b1′Ω0 )  ,
2  B0 B0  B0 B0 
d D B 
Π−D1′ = i ( t6′ + ηb1′ ) 0 − ( t5′ + b1′Ω ) 1 − ( t4′ + b1′Ω0 )  ,
2 B0 B0 
Π1+ = ( b1 − a1 ) , Π1− = ( d1′ − b1′ )
where
  ω sin e     ω sin e   ω sin e  

b1 = − F44Ω1 − F46  + λ   , d1 = η1G, t1′ = λ  − F46Ω0 + F66  + Ω0  − F44Ω0 + F46 ,
  β 1     β 1   β1  
  ω sin e   ω sin e    λ Nω sin f 

t2′ = λ  F46Ω + F66  + Ω  − F44Ω − F46   , t3′ = −Gη +
2
,
  β1   β1    β2 
  ω sin e     ω sin e   ω sin e  

b′ = − F44Ω′1 + F46  − + λ   , d1′ = η1′G, t4′ = λ  F46Ω0 − F66  + Ω0  − F44Ω0 + F46 ,
1   β1     β1   β1  
  ω sin e   ω sin e    λ Nω sin f 

t5′ = λ  − F46Ω − F66  + Ω  − F44Ω − F46   , t6′ = −Gη −
2
,
  β1   β1    β2 
Ω1 =
ω sin e1
β1 ( )
− µ1 + µ12 + µ2 ( cot 2e1 + ξ ) , Ω′1 =
ω sin e1
β1 (
− µ1 + µ12 + µ2 ( cot 2e1′ + ξ ) , )
ω sin f1  ω sin f1 
η1 =
β2  ( )
µ1′ −1 + µ2′ cos ec 2 f1  and η1′ =
β2 
( )
µ1′ −1 + µ2′ cosec 2 f1′ 
16.7 Particular cases for special case

Case I: When lower half-space is considered as isotropic medium without initial stress,
i.e., when F11 = F22 = F33 = λ1 + 2μ, F12 = F13 = F23 = μ , F44 = F66 = µ, F46 = 0 and S11
t
= 0 and upper
half-space in initially stressed fluid-saturated poroelastic half-space, then Equation (16.39)
becomes
B1 Π+B1 D1 ΠD+ 1 B1′ Π−B1′ D1′ ΠD− 1′

= , = + , = , = − (16.40)
B0 Π1+ B0 Π1 B0 Π1− B0 Π1
where
d  B D  D B 
2  B0 B0  B0 B0 
d D B 
Π+D1 = i ( t3′ − ηb1 ) 0 − ( t2′ + b1Ω ) 1 + ( −t1′ + b1Ω0 )  ,
2 B0 B0 
d  B D  D B 
2  B0 B0  B0 B0 
d D B 
Π−D1′ = i ( t6′ + ηb1′ ) 0 − ( t5′ + b1′Ω ) 1 − ( t4′ + b1′Ω0 )  ,
2 B0 B0 
Π1+ = ( b1 − a1 ) , Π1− = ( d1′ − b1′ )
where
  µω sin e     µω sin e  
b1 = {− µΩ1 } , d1 = η1G, t1′ = λ   + Ω0 ( − µΩ0 )  , t2′ = λ   + Ω ( − µΩ )  ,
  β1     β1  
 λ Nω sin f    µω sin e  
t3′ = −Gη 2 +  , b1′ = {− µΩ′1 } , d1′ = η1′G, t4′ = λ  −  + Ω0 ( − µΩ0 )  ,
 β 2    β 1 
  µω sin e 
t 5′ = λ  −
  β 1





+ Ω ( − µΩ )  , t6′ = −Gη 2 −

λ Nω sin f 
β 2
 , Ω1 =

ω sin e1
β1 ( ( cot 2e1 ) ),
( ( cot 2e1′ ) ), η1 = ω sin  µ1′ ( −1 + µ2 cos ec f 1′ )  ,
ω sin e1 f1  ω sin f1 
 µ1′ ( −1 + µ2′ cosec f1 )  , η1′ = β
Ω′1 = 2  ′ 2 
β1 β 2 2
µ N Sp  γ2 
, µ1′ = ( 1 + ξ1 ) , µ2′ = d ( 1 + ξ1 ) , ξ1 = 11 , d p =  γ 11 − 12 
−1
µ1 = 0 ,µ2 = 1, ξ = 0, β12 =
ρ1 G 2N  γ 22 
Equation (16.40) is deduced for the case when SH-wave is incident at a corrugated inter-
face between initially stressed fluid-saturated poroelastic half-space and isotropic elastic
half-space.
Case II: When the upper half-space becomes isotropic elastic medium without initial
p
stress and without poro-elasticity, i.e., S11 = 0, d p → 1, N = G = µ p and lower half-space is
considered as highly anisotropic half-space, then Equation (16.39) reduces to
B1 Π+B1 D1 ΠD+ 1 B1′ Π−B1′ D1′ ΠD− 1′

= , = + , = , = − (16.41)
B0 Π1+ B0 Π1 B0 Π1− B0 Π1
where
d  B D  D B 
2  B0 B0  B0 B0 
d D B 
Π+D1 = i ( t3′ − ηb1 ) 0 − ( t2′ + b1Ω ) 1 + ( −t1′ + b1Ω0 )  ,
2 B0 B0 
d  B D  D B 
2  B0 B0  B0 B0 
d D B 
Π−D1′ = i ( t6′ + ηb1′ ) 0 − ( t5′ + b1′Ω ) 1 − ( t4′ + b1′Ω0 )  ,
2 B0 B0 
Π1+ = ( b1 − a1 ) , Π1− = ( d1′ − b1′ )
where
  ω sin e     ω sin e   ω sin e  

b1 = − F44Ω1 − F46  + λ   , d1 = η1µ p , t1′ = λ  − F46Ω0 + F66  + Ω0  − F44Ω0 + F46 ,
  β1     β1   β1  
  ω sin e   ω sin e    p 2 λµ pω sin f 

t2′ = λ  F46Ω + F66  + Ω  − F44Ω − F46   , t3′ = − µ η + ,
  β1   β1    β2 
  ω sin e     ω sin e   ω sin e  

b1′ = − F44Ω′1 + F46  − + λ   , d1′ = η1′ µ p , t4′ = λ  F46Ω0 − F66  + Ω0  − F44Ω0 + F46 ,
  β1     β1   β1  
  ω sin e   ω sin e    p 2 λµ pω sin f 

t 5′ = λ  − F46Ω − F66  + Ω  − F44Ω − F46   , t6′ = − µ η − ,
  β1   β 1    β2 
Ω1 =
ω sin e1
β1 ( )
− µ1 + µ12 + µ2 ( cot 2e1 + ξ ) , Ω′1 =
ω sin e1
β1 (
− µ1 + µ12 + µ2 ( cot 2e1′ + ξ ) , )
ω sin f1  ω sin f1 
η1 =
β2 
( )
−1 + cos ec 2 f1  , η1′ =
β2 
(
−1 + cosec 2 f1′  ,)
F46 F St F
µ1 =
F44 F44 F66 ρ1
(
,µ2 = 66 , ξ = 11 , β12 = 66 , η = k2  −1 + cosec 2 f  , )
 γ2 
µ1′ = 1, µ2′ = 1, ξ1 = 0, and d p =  γ 11 − 12 
 γ 22 
Equation (16.41) is deduced for the case when SH-wave is incident at a corrugated interface
between isotropic elastic half-space and highly anisotropic half-space.
16.8 Energy distribution
It is apparent that when a plane SH-wave is incident on any surface, the energy of the
incident wave is distributed among the reflected and refracted waves. The energy flux for
the incident and each of the individually reflected and refracted waves can be obtained by
multiplying total energy per unit volume with the wave velocity and the area of the wave
front. In our case, the total energy per unit volume is twice the mean kinetic energy den-
sity. Also, the wave front area is proportional to the cosine of the angle of wave intersected
against normal. Therefore, by Snell’s law and the spectrum theorem, the energy equation
for each of the individual waves, i.e., the incident, regularly reflected and refracted, and
irregularly reflected and refracted SH-wave for the nth-order approximation of the corru-
gation can be written as (Abubakar 1962b, Tomar and Kaur 2007)
2 ∞ 2 ∞ 2 2 ∞ 2
ρ β cos f D0 ρ2 β 2 cos fn Dn
1=
B
B0
+ ∑
n =1
cos en Bn
cos e B0
+ ∑
n =1
cos en′ Bn′
cos e B0
+ 2 2
ρ1β1 cos e B0
+ ∑
n =1
ρ1β1 cos e B0

(16.42)
The energy distribution at the interface between two different types of half-spaces can be
deduced using Equation (16.42) by equating the coefficients of Bn , Dn , Bn′ , and Dn′ to 0, as
they are positively dependent on corrugation amplitude,
2 2
B ρ β 2 tan e D0
1= + 2 22
B0 ρ1β1 tan f B0
From Equation (16.42), when n = 1 it becomes
∑E ≈ 1
i =1
i
Here, E1 and E2 are the energy ratios of the regularly reflected and regularly refracted
waves. Energy ratio is particularly defined as the ratio of the energy of reflection/
refraction wave and energy of the incident wave. Similarly, E3, E5 and E4, E6 can be defined
as the energy ratios of the irregularly reflected waves and irregularly refracted waves,
respectively, for the first-order approximation of corrugation. Thus, the energy ratios are
given as
2 2 2
B ρ β cos f D0 cos e1 B1
E1 = , E2 = 2 2 , E3 = ,
B0 ρ1β1 cos e B0 cos e B0

2 2 2
ρ β cos f1 D1 cos e1′ B1′ ρ β cos f1′ D1′
E4 = 2 2 , E5 = , E6 = 2 2
ρ1β1 cos e B0 cos e B0 ρ1β1 cos e B0
16.9 Numerical discussion and results

A thorough numerical analysis has been performed to study the influence of various
parameters such as corrugation amplitude, wavelength, frequency factor, and initial stress
parameter associated with both the half-spaces, on the reflection and refraction coefficients
against the angle of incidence. The following relevant elastic parameters have been used
in the calculation and the results thus obtained are illustrated graphically. For medium F1,
the data taken are as follows (Tiersten 1969):
F11 = 86.74 Gpa, F22 = 129.77 Gpa, F33 = 102.83 Gpa,

F12 = −8.25 Gpa, F13 = 27.15 Gpa, F14 = −3.66 Gpa,
F23 = −7.42 Gpa, F24 = 5.7 Gpa, F34 = 9.92 Gpa,
F44 = 38.61 Gpa, F46 = 0.9 Gpa, F55 = 68.81 Gpa,
F66 = 29.01 Gpa, ρ1 = 2649 kg/m 3
For medium F2, the data are as follows:
G = 0.1387 × 1010 N/m 2 , N = 0.2774 × 1010 N/m 2 , ρ11 = 1.926137 × 103 kg/m 3 ,

ρ12 = −0.002137 × 103 kg/m 3 , ρ22 = 0.215337 × 103 kg/m 3
16.9.1 Effect of corrugation amplitude

Figures 16.2 and 16.3 have been drawn to demonstrate the variation of amplitude ratios
represented as (B1/B0) and (D1/B0), respectively, against the incident angles for different
values of corrugation amplitude (d). In Figure 16.2 it is seen that initially, the amplitude
ratio is decreased with incidence angle, but on increasing the angle further, (B1/B0) tends
to increase until it reaches its individual maxima. After the maxima, the amplitude ratio
0.00004
3
2
0.00003
Amplitude ratio
0.00002
1. d = 0.1.
2. d = 0.2.
0.00001 3. d = 0.3.
0
0 20 40 60 80
Angle of incidence (°)
Figure 16.2 Variation of modulus of amplitude ratio (B1/B0) with respect to angle of incidence for
different values of amplitude of corrugation (d).
0.00005
0.00004
Amplitude ratio
0.00003
0.00002
1. d = 0.1.
3 2. d = 0.2.
0.00001 2 3. d = 0.3.
1
0
0 20 40 60 80
Figure 16.3 Variation of modulus of amplitude ratio (D1/B0) with respect to angle of incidence for
gets decreased with increasing incident angles. When corrugation amplitude (d) is con-
cerned, the amplitude ratios experience a positive effect for most parts of the incident
angle. Furthermore, the ratio (D1/B0) in Figure 16.3 has a similar type behavior as seen in
Figure 16.2 with respect to incidence angle and corrugation amplitude. Here, the effect of
corrugation amplitude is less pronounced, which is identified by less spacing in between
the curves.
Figures 16.4 and 16.5 have been plotted to discuss the variation of ( B1′ /B0 ) and ( D1′ /B0 )
against the incident angle while considering corrugation amplitude (d) as the effecting
parameter. In Figure 16.4, the amplitude ratio ( B1′ /B0 ) has a gradual increase throughout the
incident angle. However, with the increasing values of (d), the ratio ( B1′ /B0 ) gets decreased.
Amplitude ratio ( D1′ /B0 ) has the same characteristics as ( B1′ /B0 ), which is clearly visible in
Figure 16.5.
1. d = 0.1.
0.15 2. d = 0.2.
3. d = 0.3.
Amplitude ratio
1
0.10
2
0.05
3
0.00
0 20 40 60 80
Figure 16.4 Variation of modulus of amplitude ratio ( B1′ /B0 ) with respect to angle of incidence for
1. d = 0.1.
2. d= 0.2.
0.15 3. d = 0.3.
Amplitude ratio
1
0.10
2
0.05
3
0.00
0 20 40 60 80
Figure 16.5 Variation of modulus of amplitude ratio ( D1′ /B0 ) with respect to angle of incidence for
16.9.2 Effect of corrugation wavelength

The purpose of Figures 16.6 and 16.7 is to depict the variation of amplitude ratios (B1/B0)
and (D1/B0), respectively, against the incident angles for different values of wavelength of
corrugation (λ). In Figure 16.6, it is seen that the amplitude ratios decrease with increasing
incidence angle and reach 0. On further increasing the incident angle, the ratios begin
to increase for the rest of some parts. After that, the ratios have a decreasing behavior.
However, the ratios are increased due to λ during smaller incident angles. But after attain-
ing minima, (B1/B0) is decreased. In Figure 16.7, all the ratios have similar b ehavioral
characteristics with respect to incident angle. However, there is a slight increase in (D1/B0)
for smaller angles and a slight decrease for higher incident angles due to corrugation
wavelength.
Two-dimensional plots between amplitude ratios ( B1′ /B0 ) and ( D1′ /B0 ) with respect to
incident angles for different values of corrugation wavelength (λ) have been sketched in
0.00005
1. λ = 1.0.
2. λ = 2.0.
0.00004
3. λ = 3.0.
Amplitude ratio
0.00003
3
0.00002 2
1
0.00001
0
0 20 40 60 80
different values of wavelength of corrugation (λ).
0.00007
0.00006
1
0.00005 2
Amplitude ratio
3
0.00004
0.00003
0.00002 1. λ = 1.0.
2. λ = 2.0.
0.00001 3. λ = 3.0.
0
0 20 40 60 80
Figures 16.8 and 16.9. In Figure 16.8, ( B1′ /B0 ) has a decreasing characteristic for the smaller
incident angles until it attains its minima. On further increase in incident angle, the values
of amplitude ratio tend to increase and reach its maxima. Then on further increase in angle,
the ratio gets decreased. However, it is observed that with increasing values of λ, the value of
( B1′ /B0 ) is increased for most incident angles. A very different case is observed in Figure 16.9,
where ( D1′ /B0 ) starts increasing from 0° onward and then decreases afterward. The λ puts a
positive influence in ( D1′ /B0 ) but the influence gets decreased in the higher incident angles.
16.9.3 Effect of frequency factor

The curves in Figures 16.10 and 16.11 have been traced out to demonstrate the behavior of
(B1/B0) and (D1/B0) against incident angle for various values of frequency factor (ωd/β1). In
Figure 16.10, the curves have a similar characteristic as seen in the above cases. However,
0.00008
3
0.00006 2 1
Amplitude ratio
0.00004
1. λ = 1.0.
0.00002 2. λ = 2.0.
3. λ = 3.0.
0
0 20 40 60 80
0.00015
Amplitude ratio
3
0.00010 2
1
1. λ = 1.0.
0.00005 2. λ = 2.0.
3. λ = 3.0.
0.00000
0 20 40 60 80
0.00014
0.00012 1. ωd/β1 = 0.2.

2. ωd/β1 = 0.3.
0.00010 3. ωd/β1 = 0.4.
Amplitude ratio
0.00008 3
0.00006
2
0.00004
0.00002 1
0.00000
0 20 40 60 80
different values of frequency factor (ωd/β1).
0.0001
3
2
0.00008
Amplitude ratio
0.00006
1
0.00004
1. ωd/β1 = 0.2.
0.00002 2. ωd/β1 = 0.3.
3. ωd/β1 = 0.4.
0.0000
0 20 40 60 80
the effect of (ωd/β1) is much more pronounced as the spaces between the curves are much
higher. But, (B1/B0) is decreased in smaller incident angles and gets increased for higher
angles. Similar behavior is observed in Figure 16.11 for (D1/B0).
In Figure 16.12, the amplitude ratio first decreases slightly with incident angle and
then gets increased and finally is decreased. Frequency factor puts a favorable influence on
( B1′ /B0 ) for most part of the incidence angle. In Figure 16.13, the value of ( D1′ /B0 ) is increased
initially and then is decreased with increasing incident angle values. Moreover, frequency
factor has a positive influence on the amplitude ratios throughout the incident angle range.
16.9.4 Influence of initial stress parameter on poroelastic half-space

Figures 16.14–16.19 have been sketched to manifest the variation of different amplitude
ratios against incidence angle related to the initially stressed fluid-saturated poroelastic
0.00015
3
Amplitude ratio
0.00010
0.00005 1. ωd/β1 = 0.2.

2. ωd/β1 = 0.3.
3. ωd/β1 = 0.4.
0.00000
0 20 40 60 80
0.00025
0.00020
Amplitude ratio
0.00015
3
0.00010 2
1. ωd/β1 = 0.2.
1
2. ωd/β1 = 0.3.
0.00005 3. ωd/β1 = 0.4.
0.00000
0 20 40 60 80
–0.9987
1, 2, 3
–0.9988
Amplitude ratio
–0.9989
1. ξ1 = 0.0.
–0.9990 2. ξ1 = 0.2.
3. ξ1 = 0.4.
–0.9991
0 20 40 60 80
Figure 16.14 Variation of modulus of amplitude ratio (B/B0) with respect to angle of incidence for
different values of initial stress parameter (ξ1) associated with the poroelastic half-space.
0.0013
0.0012
Amplitude ratio
1, 2, 3
0.0011
0.0010 1. ξ1 = 0.0.
2. ξ1 = 0.2.
3. ξ1 = 0.4.
0.0009
0 20 40 60 80
0.00004
0.00003
Amplitude ratio
1, 2, 3
0.00002
1. ξ1 = 0.0.
2. ξ1 = 0.2.
0.00001
3. ξ1 = 0.4.
0
0 20 40 60 80
0.000035
0.00003 1, 2, 3
Amplitude ratio
0.000025
1. ξ1 = 0.0.
0.00002 2. ξ1 = 0.2.
3. ξ1 = 0.4.
0.000015
0 20 40 60 80
0.0001
0.00008
1, 2, 3
Amplitude ratio
0.00006
0.00004 1. ξ1 = 0.0.
2. ξ1 = 0.2.
3. ξ1 = 0.4.
0.00002
0.0000
0 20 40 60 80
0.00018
1, 2, 3
Amplitude ratio
0.00016
1. ξ1 = 0.0.
0.00014
2. ξ1 = 0.2.
3. ξ1 = 0.4.
0.00012
0 20 40 60 80
different values of initial stress parameter associated (ξ1) with the poroelastic half-space.
half-space. The amplitude ratios have varying influence due to the incident angles, but it is
very much evident from all the figures, that the initial stress parameters (ξ1 = 0.0, 0.2, and
0.4) do not have any influence on the poroelastic half-space as all the curves overlap each
other. Thus, it can be concluded that the initial stress parameter related to the poroelastic
half-space does not put any prominent effect on the amplitude ratios.
16.9.5 Influence of initial stress parameter on

highly anisotropic half-space
Figures 16.20–16.25 have been plotted to demonstrate the variation of various amplitude
ratios against incidence angle by varying the values of initial stress parameter (ξ1 = 0.0,
0.2, and 0.4) related to highly anisotropic half-space. Variation of amplitude ratios (B/B0)
and (D0/B0) in Figures 16.20 and 16.21, respectively, appear to be similar, which indicates
–0.9988
–0.9990
3
Amplitude ratio
–0.9992
2
–0.9994
1. ξ = 0.0.
–0.9996 2. ξ = 0.2. 1
3. ξ = 0.4.
–0.9998
0 20 40 60 80
Figure 16.20 Variation of modulus of amplitude ratio (B/B0) with respect to angle of incidence for
different values of initial stress parameter (ξ) associated with the highly anisotropic half-space.
0.0012
0.0010
3
Amplitude ratio
0.0008 1 2
1. ξ = 0.0.
0.0006
2. ξ = 0.2.
0.0004 3. ξ = 0.4.
0.0002
0.0000
0 20 40 60 80
Figure 16.21 Variation of modulus of amplitude ratio (D0/B0) with respect to angle of incidence
for different values of initial stress parameter (ξ) associated with the highly anisotropic half-space.
0.00003
3
0.000025
Amplitude ratio
0.00002 2
0.000015 1
1. ξ = 0.0.
0.00001 2. ξ = 0.2.
3. ξ = 0.4.
5. × 10–6
0
0 20 40 60 80
different values of initial stress parameter (ξ) associated with the highly anisotropic half-space.
0.000035
0.00003
3
0.000025
Amplitude ratio
2
0.00002
1
0.000015
1. ξ = 0.0.
0.00001 2. ξ = 0.2.
3. ξ = 0.4.
5. × 10–6
0
0 20 40 60 80
Figure 16.23 Variation of modulus of amplitude ratio (D1/B0) with respect to angle of incidence
0.0001
3
2
0.00008
Amplitude ratio
0.00006
1
0.00004 1. ξ = 0.0.
2. ξ = 0.2.
0.00002 3. ξ = 0.4.
0.0000
0 20 40 60 80
Figure 16.24 Variation of modulus of amplitude ratio ( B1′ /B0 ) with respect to angle of incidence
0.00020
0.00015
2
Amplitude ratio
0.00010 1
1. ξ = 0.0.
0.00005 2. ξ = 0.2.
3. ξ = 0.4.
0.00000
0 20 40 60 80
Figure 16.25 Variation of modulus of amplitude ratio ( D1′ /B0 ) with respect to angle of
incidence for different values of initial stress parameter (ξ) associated with the highly
that both ratios experience the same influence. In both graphs, the amplitude ratios have a
gradual decrease with increase in incidence angle. But, they are increased due to an initial
stress parameter related to the highly anisotropic half-space.
Similarly, (B1/B0) in Figure 16.22 and (B1/B0) in Figure 16.23 also have identical behav-
iors. The amplitude ratios start to increase from smaller incidence angle and remain the
same and indistinguishable to 15° irrespective of any initial stress parameter value. With
a further increase in incidence angle, the effect of initial stress parameter associated with
highly anisotropic half-space is clearly recognizable. The ratios have an increasing effect
due to the stress parameter.
The amplitude ratio ( B1′ /B0 ) in Figure 16.24 does not have any considerable effect due
to the initial stress parameter until 20°. On further increasing the angle, the effect of the
initial stress parameter is clearly visible and ( B1′ /B0 ) has an increasing effect. Besides, in
Figure 16.25, ( D1′ /B0 ) start to increase from 0° and continue to increase to a certain angle.
After that, the amplitude ratios have a gradual decrease. Further, the amplitude ratios have
increasing effects due to the increase in the initial stress parameter related to the highly
16.10 Concluding remarks
A comprehensive investigation has been done to study the reflection and refraction
phenomena of a plane SH-wave through a corrugated interface sandwiched between
an initially stressed fluid-saturated poroelastic half-space and a highly anisotropic half-
space. Rayleigh’s method of approximation has been effectively utilized to derive first- and
second-order approximations of the coefficients. A rigorous analysis between the reflec-
tion and refraction coefficients against various parameters such as corrugation amplitude,
corrugation wavelength, frequency factor and the initial stress parameter associated with
both the half-spaces has been done. Each of the individual parameters has been discussed
separately and in detail. Finally, some of the highlight observations of the study are as
follows:
1. Closed-form expressions were derived for reflection and refraction coefficients by

making an effective use of Rayleigh’s approximation technique.
2. Some special cases were deduced for the case when both the fluid-saturated poro-
eleastic half-space and highly anisotropic half-space were considered as isotropic
elastic half-space.
3. Corrugation amplitude has a positive effect on the amplitude ratios (B1/B0) and (D1/
B0) for most of the incident angles, while it has a negative influence on ( B1′ /B0 ) and
( D1′ /B0 ) for the entire range of incident angles.
4. Corrugation wavelength has varied effect on the amplitude ratios (B1/B0) and (D1/B0).
However, it puts a positive influence on ( B1′ /B0 ) and ( D1′ /B0 ).
5. The frequency factor parameter puts a favorable influence on all the amplitude ratios.
6. The initial stress parameter associated with the fluid-saturated poroeleastic half-
space does not have any effect on the amplitude ratios.
7. The initial stress parameter associated with the highly anisotropic half-space has a
positive and visible effect on the amplitude ratios.
The critical findings of this descriptive study of the present problem can be worthwhile to
the fields of geophysics and geology. This study may furnish some valuable assistance to
geoscientists for proper interpretation of geological structures.
References
Abubakar, I. Scattering of plane elastic waves at rough surfaces – I. Proc. Camb. Philos. Soc. 58 (1962a):
136–157.
Abubakar, I. Reflection and refraction of plane SH-waves at irregular interfaces – I. J. Phys. Earth 10.1
(1962b): 1–14.
Abubakar, I. Reflection and refraction of plane SH-waves at irregular interfaces – I. J. Phys. Earth 10.1
(1962c): 15–20.
Aki, K., Richards, P.G. Quantitative Seismology, 2nd ed. University Science Books, Sausalito (2002).
Asano, S. Reflection and refraction of elastic waves at a corrugated boundary surface. Part-I. The
case of incidence of SH-wave. Bull. Earthq. Res. Inst. 38.2 (1960): 177–197.
Asano, S. Reflection and refraction of elastic waves at a corrugated boundary surface. Part-II. Bull.
Earthq. Res. Inst. 39.3 (1961): 367–466.
Asano, S. Reflection and refraction of elastic waves at a corrugated interface. Bull. Seismol. Soc. Am.
56.1 (1966): 201–221.
Biot, M.A. Theory of elastic waves in a fluid saturated porous solid I. low frequency range. J. Acoust.
Soc. Am. 28 (1956a): 168–178.
Biot, M.A. Theory of elastic waves in a fluid saturated porous solid II. High frequency range.
J. Acoust. Soc. Am. 28 (1956b): 179–191.
Biot, M.A. Mechanics of deformations and acoustic propagation in porous media. J. Appl. Phys. 33
(1962): 1482–1489.
Biot, M.A. Mechanics of Incremental Deformation, Wiley, New York (1965).
Crampin, S. A review of the effects of anisotropic layering on the propagation of seismic waves.
Geophys. J. Int. 49.1 (1977): 9–27.
Daley, P.F., Hron F. Reflection and transmission coefficients for seismic waves in ellipsoidally aniso-
tropic media. Geophysics 44.1 (1979): 27–38.
Deresiewicz, H. The effect of boundaries on wave propagation in a liquid-filled porous solid: II.
Love waves in a porous layer. Bull. Seismol. Soc. Am. 51.1 (1961): 51–59.
Ewing, W.M., Jardetzky, W.S., Press, F. Elastic Waves in Layered Media. Lamont Geological Observatory
Contribution. McGraw-Hill, New York (1957).
Fokkema, J.T. Reflection and transmission of elastic waves by the spatially periodic interface between
two solids (theory of the integral-equation method). Wave Motion 2.4 (1980): 375–393.
Keith, C.M., Crampin, S. Seismic body waves in anisotropic media: Reflection and refraction at a
plane interface. Geophys. J. Int. 49.1 (1977): 181–208.
Lord Rayleigh, O. M. On the dynamical theory of gratings. Proc. R. Soc. Lond. A 79.532 (1907):
399–416.
Pal, A.K., Chattopadhyay, A. The reflection phenomena of plane waves at a free boundary in a pre-
stressed elastic half-spaces. J. Acoust. Soc. Am. 76.3 (1984): 924–925.
Rokhlin, S.I., Bolland T.K., Adler L. Reflection and refraction of elastic waves on a plane interface
between two generally anisotropic media. J. Acoust. Soc. Am. 79.4 (1986): 906–918.
Saini, S.L., Singh, S.J. Effect of anisotropy on the reflection of SH-waves at an interface. Geophys. Res.
Bull. 15.2 (1977): 67–73.
Sato, R. The reflection of elastic waves on corrugated surface. Zisin 8.1 (1955): 8–22.
Sharma, M.D., Gogna, M.L. Reflection and refraction of plane harmonic waves at an interface
between elastic solid and porous solid saturated by viscous liquid. Pure and Appl. Geophys. 138
(1992): 249–266.
Tajuddin, M., Hussaini, S.J. Reflection of plane waves at boundaries of a liquid filled poroelastic half-
space. J. Appl. Geophys. 58.1 (2005): 59–86.
Thomsen, L. Reflection seismology over azimuthally anisotropic media. Geophysics 53.3 (1988): 304–313.
Tomar, S.K., Arora, A. Reflection and transmission of elastic waves at an elastic/porous solid sat-
urated by two immiscible fluids. Int. J. Solids Struct. 43 (2006): 1991–2013 [Erratum, ibid 44,
5796–5800 (2007)].
Tomar, S.K., Kaur, J. Reflection and transmission of SH-waves at a corrugated interface between two
laterally and vertically heterogeneous anisotropic elastic solid half-spaces. Earth, Planets and
Space 55.9 (2003): 531–547.
Tomar, S.K., Kaur, J. SH-waves at a corrugated interface between a dry sandy half-space and an
anisotropic elastic half-space. Acta Mech. 190 (2007): 1–28.
Tomar, S.K., Saini, S.L. Reflection and refraction of SH-waves at a corrugated interface between two
dimensional transversely isotropic half spaces. J. Phys. Earth 45.5 (1997): 347–362.
Tomar, S.K., Singh, S.S. Quasi-P-waves at a corrugated interface between two laterally dissimilar
monoclinic half spaces. Int. J. Solids Struct. 44.1 (2007): 197–228.
Tomar, S.K., Kumar, R., Chopra, A. Reflection and refraction of SH-waves at a corrugated inter-
face between transversely isotropic and visco-elastic solid half spaces. Acta Geophys. Pol. 50.2
(2002): 231–249.
Tiersten, H.F. Linear Piezoelectric Plate Vibration, Plenum Press, New York (1969).
Wu, K.Y., Xue Q., Adler L. Reflection and transmission of elastic waves from a fluid‐saturated porous
solid boundary. J. Acoust. Soc. Am. 87.6 (1990): 2349–2358.
Index
A Boundary value problem, 18

Bounded input bounded output (BIBO) stable, 54
Abscissa of absolute convergence of the integral, 5 B-spline basis functions, 287
Absolute refractory period, 63 characteristics of, 289
Absolute summability, 48 collocation method using, 285–296
ACHEON (Aerial Coanda High Efficiency Orienting- methodology, 290–292
jet Nozzle) project, 265 numerical example, 295–296
Activation function, 251 numerical solution of advection diffusion
Adaption, 65 equation, 292–293
Adaptive neuro fuzzy inference system (ANFIS), 66 exponential B-spline basis functions, 290
Akaike’s information criterion (AIC), 257 first-degree (linear) B-spline, 288
Anderson–Darling goodness-of-fit test, 224 second-degree (quadratic) B-spline, 288–289
Anderson–Darling normality test, 93 trigonometric B-spline basis functions, 289–290
Anisotropy, 298 types, 289–290
Ant colony optimization (ACO), 228 zero-degree B-spline, 287–288
Approximation theory, 54 Bull’s fertility prediction, using ML techniques, see
Artificial neural networks (ANNs), 63–65 Machine learning (ML) algorithms
artificial neurons, 64
basic principle of, 64
firing rule, 64 C
types of, 64–65 Cauchy problem, 18
Artificial neurons, 64 Center of area (CoA) method, 61
Audio signal processing, 54 Center of sums, 62
Centipoise, 92
B Cesàro means, 48
Cesàro summation, 47
Backpropagation algorithm, 63–64 Chemodanov, B.K., 2
Bacterial forging optimization (BFO), 228 Classification-prediction artificial neural network, 65
Bagging/bootstrap aggregation process, 256 Classification tasks, of data analysis
Banach space, 51 Bayes classification, 192–193
Bandwidth, 139 block diagram of calculation procedure, 196
Bass model, 165, 168 comments, 198
Bias theorem, see Second bias theorem correction of values of smoothing parameter and
Big “oh” for functions, 43 modification intensity, 193–194
Binary encoding, 66 reduction to pattern sizes, 194–195
Biped robot structure for nonstationary patterns, 195–198
kinematics and dynamics of, 229–233 Clustering, 187–192
results and discussion, 238–244 Coanda flow, for vertical and short takeoffs and
structure of, 229 landings (V/STOL)
torque-based PID controllers for, 232–233 governing equations, 267–270
walking on ascending the staircase, 230 k–ϵ–ζ–f model, 269–270
walking on descending the staircase, 232 k–ε model, 268
Black box models, 256 Spalart–Allmaras model, 267–268
Boundary conditions, 305
327
328 Index
Coanda flow, for vertical and short takeoffs and Decision tree (DT) models
landings (V/STOL) (Cont.) building with R programming tools, 254–255
SST k–ω model, 268–269 for modeling fertility in Murrah bulls, 253
grid independence test and solution results and discussion, 258, 259
methodology, 270–273 Deferred Cesàro means, 49
introduction, 265–267 Defuzzification, 61
results and discussion, 273–281 Degree of approximation, 51–52
Coanda, Henry, 266 Delta function, 14–15
Collocation method, 286 of first order, 15
introduction, 285–286 of second order, 15
numerical example, 295–296 Derringer’s desirability function method, 96–97
numerical solution of advection diffusion Desirability function approach, for simultaneous
equation using, 292–295 optimization, 96–97
using B-spline basis functions Differentiation theorem
characteristics of, 289 for image, 10–11
exponential B-spline basis functions, 290 for original, 9–10
first-degree (linear) B-spline, 288 Diffusion of innovation, 165
methodology, 290–292 Digital image processing, 54
second-degree (quadratic) B-spline, 288–289 Dirac delta function, 14–15
to solving differential equations, 285–296 Dirichlet’s theorem, 41
trigonometric B-spline basis functions, Doetsch, G., 2
289–290 Dual-market innovation diffusion model (DMIDM),
types, 289–290 168, 171, 174, 175
zero-degree B-spline, 287–288 Dual-market models, 166
Collocation points, 286 Dual-response surface methodology
Complex convolution theorem, 12–13 for simultaneous optimization of pulp yield and
Comprehensive R Archival Network (CRAN), 252 viscosity of pulp cooking process, 91–109
“Conada” effect, 266 Dynamic balance margin (DBM), of biped robot,
Concept drift, 195–198 230–232
Conflicting bifuzzy number (CBFN), 113
Conflicting bifuzzy set (CBFS), see Time-dependent
E
conflicting bifuzzy set (CBFS)
Constrained nonparametric maximum likelihood Early market adopters, 166
estimation (CNPMLE), 135–138 Early market adoption model, 169
multiple failure-occurrence time data case, Efficient solution, see Pareto optimal solution
137–138 18-Degrees of Freedom (DOF) biped robot, 229
single failure-occurrence time data case, 135–137 Energy distribution, 312–313
Control theory, 54 ε-SVR model, 252
Conventional reliability of a system, 112 eps-regression method, 252
Convergence of Fourier series, 42 Equal fuzzy sets, 59
Convex conflicting bifuzzy set (CBFS), 113 Evolutionary and nature-inspired optimization
Convolution theorem, 11–12 algorithms, 228
Cost ratio, 132 (1 − α)-expectation tolerance limit
Cox–de Boor recursion formula, 287 lower statistical, 217–219
Cox–Lewis (CL) NHPP model, 142 upper statistical, 219–222
Cox–Lewis process, 132 Exponential B-spline basis functions, 290
Crank–Nicolson scheme, 292
Crisp and fuzzy sets, comparison of, 62 F
Cuckoo search algorithm (CSA), 228
(α,β)-Cut of a time-dependent CBFS, 113 Failure time data analysis of repairable system
introduction, 129–131
D model description, 131
nonparametric estimation methods
Damping theorem, 9 constrained nonparametric ML estimator,
Data analysis, 54 135–138
failure time data analysis of repairable system, Kernel-based approach, 138–142
129–146 numerical examples
Kernel estimators for, 177–200 real example with multiple minimal repair
YouTube view count, 156–163 data sets, 144–146
Index 329
simulation experiments with single minimal equal, 59

repair data, 142–144 membership function, 60
parametric estimation method, 132–135
Feasible solution (FS), 74
Feedback artificial neural network, 65 G
Feed-forward artificial neural network, 65 γ-content tolerance limit with expected
Finite difference method, 286 (1 − α)-confidence
Finite element method, 286 lower statistical, 211–214
Firing rule, 64 upper statistical, 214–217
First-degree (linear) B-spline, 288 Gaussian kernel function, 139
First displacement theorem, 7–8 Gaussian membership function, 60
First-order impulsive unit function, see Dirac delta General Nörlund means, 49
function General regression neural network (GRNN)
Fourier, Jean Baptiste Joseph, 36 feedback controller
Fourier approximation method, 47 for stabilizing biped robot, 228
Fourier series, 39–41 Genetic algorithms (GAs), 66, 228
applications and significant uses, 54 working of, 66–67
and applications in engineering, 35–54 Gibbs, J. Willard, 44
big “oh” for functions, 43 Gibbs phenomenon, 44–47
convergence of, 42 with example, 45–47
degree of approximation, 51–52 results related to, 47
Dirichlet’s theorem, 41 Goal programming (GP), 73
Fourier analysis and Fourier transform, 43–44 numerical example, 85
Fourier transform, 44 for solving MOO problem, 76
Gibbs phenomenon, 44–47 for solving MOTP, 79–80, 82
Lipschitz condition, 50 Goodness-of-fit testing, 223
modulus of continuity, 50 Google Ad Sense, 151
and music, 52–53 Grid independence test and solution methodology,
norm, 49–50 270–273
origin of, 36
orthogonality of sine and cosine functions, 37–39
periodic functions, 36–37 H
regularity condition, 49
Riemann–Lebesgue lemma, 41 Hamming distance technique, 64
small order, 43 Harmonic means, 48
summability methods, 47–49 Heaviside unit function, 13
term-wise differentiation, 41–42 Hebb, D. O., 63
trigonometric Fourier approximation, 47 Height method, see Max-membership function
various Lipschitz classes, 50 Hidden neurons, 250
Fourier transform, 44 Holland, John, 66
Fourier analysis and, 43–44 HOMER (High-speed Orienting Momentum with
Fundamental period, 37 Enhanced Reversibility), 265
Fuzzy defuzzification, 61–62 Hooke’s law, 28
center of area (CoA) method, 61 Hook–Jeeves procedure, 193
center of sums, 62
max-membership function, 61 I
mean-max method, 62
weighted average method, 61 ICAR-National Dairy Research Institute (NDRI),
Fuzzy logic, 58–59 India, 248
evolution of, 59 Identification of atypical elements, 183–187
examples of uses of, 63 Imitators, 168
for optimization, 98–99 Inferior solution, 75
Fuzzy membership function, 75 Innovation diffusion models, 165
Fuzzy programming (FP), 75 Innovators, 168
numerical example, 84–85 Integers, 43
for solving MOTP, 78–79 Intensity function of NHPP, 131
Fuzzy rule base system, 61 Intercept, 256
Fuzzy sets, 59, 75 Intuitionistic fuzzy sets, 112
and crisp sets, comparison, 62 Inverse Fourier transform, 44
330 Index
Inverse Laplace transform, 3–4 examples of solving problems of mechanics, 18–22

Ivanov, V.A., 2 image of unit function and other simple
functions, 13–18
K inertial disk rotating at end of rod, 25–26
inversion formula, 6
k–ε–ζ–f model, 269–270 linear substitutions, 7–9
k–ε model, 268 multiplication and curtailing, 11–13
Kernel-based approach and operations mapping, 2–7
multiple failure-occurrence time data case, in problems of studying oscillation of rods, 23–24
140–142 property of, 5–7
single failure-occurrence time data case, 138–140 relationship between velocities of particles of
Kernel estimators, for data analysis elementary volume of rod with stresses,
classification 24–25
Bayes classification, 192–193 Learning rules for neurons, 63
block diagram of calculation procedure, 196 Least-squares cross-validation (LSCV) method, 139
comments, 198 Leave-one-out cross-validation, 139
correction of values of smoothing parameter Leibniz’ theorem, 287
and modification intensity, 193–194 Linear B-spline basis function, 288
reduction to pattern sizes, 194–195 Linear model (LM) for regression
structure for nonstationary patterns, 195–198 for modeling fertility in Murrah bulls, 256–257
clustering results and discussion, 260
comments, 191–192 Linear multi-objective optimization (MOO)
influence of parameters values on obtained problem, 74
results, 190–191 Linear optimization problem (OP), 74
procedure, 188–190 Linear programming problem (LPP), 72
example practical application and final comments, Lipschitz condition, 50
198–200 Log-likelihood cross-validation (LLCV) method, 139
identification of atypical elements Longitudinal oscillations of disk, equations of, 27–30
basic version of procedure, 183–184 Lower statistical γ-content tolerance limit with
comments, 187 expected (1 − α)-confidence, 211–214
equal-sized patterns of atypical and typical Lower statistical (1 − α)-expectation tolerance limit,
elements, 186 217–219
extended pattern of population, 185–186 Lp -norm, 50
introduction, 177–178 Lur’e, A.I., 2
methodology, 178–182
modification of smoothing parameter, 181–182 M
support boundary, 182
Knots, 287 Machine learning (ML) algorithms, to predicting
Kohonen, Tuevo, 65 fertility in Murrah bulls, 249–257
Kondratenko, L., 25 decision tree (DT) models, 253
results and discussion, 258, 259
L with R programming tools, 254–255
linear model (LM) for regression, 256–257
L 0-norm, 50 predictive accuracies ML model vis-à-vis,
L1-norm, 50 260–261
L2-norm, 50 results and discussion, 260
L ∞-norm, 50 model evaluation error metrics, 257
Lag theorem, see First displacement theorem neural network (NN) models, 250–252
Laplace integral, 2 results and discussion, 257, 258
domain of convergence of, 4 with R programming tools, 252
domain of inversion of, 6 random forest (RF) models, 255–256
Laplace transform results and discussion, 259–260
application, 1–33 support vector regression (SVR) models, 252–253
in engineering technology, 30–33 results and discussion, 257–258, 259
designation, 2 Main market adopters, 166
differentiation and integration, 9–11 Main market adoption model, 169–171
equations of longitudinal oscillations of disk, Market segmentation based modeling
27–30 approach to understanding multiple modes in
equations of torsional oscillations of disk, 26–27 diffusion curves, 165–175
Index 331
introduction, 165–167 summary statistics of fertility data set, 249

mathematical modeling, 167–171 Music, and Fourier series, 52–53
early market adoption model, 169
main market adoption model, 169–171 N
total adoption modeling, 171
parameter estimation, 171–175 Naïve estimator, 136
Maximum likelihood (ML) estimation, 132 Nature-inspired optimization algorithms, 228
Max-membership function, 61 Neural network (NN)–based PID controller
McCulloch, W. S., 63 for biped robot, 227–245
Mean absolute error (MAE), 257 dynamic balance margin (DBM)
Mean-max method, 62 while ascending the staircase, 230–231
Medical signal processing, 54 while descending the staircase, 231–232
Membership function, 60, 75 introduction, 227–229
Gaussian, 60 kinematics and dynamics of biped robot, 229–233
trapezoidal, 60 MCIWO-based PID controller, 234–236
triangular, 60 MCIWO-NN–based PID controller, 236–238
of triangular fuzzy number, 75–76 results and discussion, 238–244
Z-shaped, 60 torque-based PID controllers, 232–233
Minimal repair, 130, 131 Neural network models
Modified chaotic invasive weed optimization building with R programming tools, 252
(MCIWO)-based PID controller algorithm for modeling fertility in Murrah breeding bulls,
for biped robot while walking on staircase, 250–252
234–236 results and discussion, 257, 258
Modified chaotic invasive weed optimization- Neural networks, 63, see also Artificial neural
neural network (MCIWO-NN)–based PID networks (ANNs)
controller algorithm training of, 65–66
for biped robot while walking on staircase, Neurons, 63
236–238 nnet and neuralnet packages, NN models with, 252
Modulus of continuity, 50 Node points, 287
Monte Carlo simulation, 142 Nonhomogeneous Poisson process (NHPP), 130
Multi-objective optimization (MOO) problems, 72, 74 Nonlinear multi-objective optimization (MOO)
Multi-objective transportation problem (MOTP), 72 problem, 74
comparison, 87–88 Nonlinear optimization problem (OP), 74
introduction, 71–73 Nonparametric estimation methods
mathematical model of TP and, 77–78 constrained nonparametric ML estimator
new approach for solving, 71–88 multiple failure-occurrence time data case,
numerical example, 83–87 137–138
fuzzy programming, 84–85 single failure-occurrence time data case,
goal programming, 85 135–137
revised multi-choice goal programming, 86 Kernel-based approach
Vogel approximation method, 86–87 multiple failure-occurrence time data case,
preliminaries, 73–77 140–142
concepts of solution, 74–77 single failure-occurrence time data case,
solution procedure 138–140
fuzzy programming, 78–79 Nörlund mean, 45–46, 48
goal programming, 79–80 Norm, 49–50
merits and demerits, 81–83 Normal conflicting bifuzzy set (CBFS), 113
revised multi-choice goal programming, 80 Number theory, 43
Vogel approximation method, 80–81
Multiple alignments by fast Fourier transform O
(MAFFT), 54
Multiple failure-occurrence time data case, 134–135 Online social networks, 149
Multiple linear regression (MLR) models, 256 Operations research (OR), 71
Murrah bulls, modeling fertility in, 247–261 Optimal and adaptive PID controllers, 238–243
data, 249 Optimal compromise solution, 75
introduction, 247–248 Optimal solution, 74–75
materials and methods, 249–257 Optimization, 66, 71, 73, see also Simultaneous
ML algorithms, see Machine learning (ML) optimization of multiple characteristics
algorithms methodology
332 Index
Optimization problem (OP), 71–72, 73 solution for second-order approximation of

Ordinary summability, 47 corrugation, 307–309
Orthogonality of sine and cosine functions, 37–39 solution for upper fluid-saturated poroelastic
half-space, 302–304
P special case of simple harmonic interface, 309–310
Rayon grade pulp, 92
Parametric estimation method, 132–135 Regularity condition, 49
multiple failure-occurrence time data case, Reinforced training, 66
134–135 Reliability assessment of system, 112
single failure-occurrence time data case, 133–134 Reliability evaluation, with time-dependent CBFN,
Pareto optimal solution, 74, 75 115–118
Partial differential equation, 54 Response surface methodology, 99
Particle swarm optimization (PSO), 228 Revised multi-choice goal programming
“Partner Program”, 150 (RMCGP), 76
Periodic functions, 36–37 numerical example, 86
Periodic replacement, 131 for solving MOO problem, 77
Piecewise-continuous functions, 3 for solving MOTP, 80, 82–83
Pitts, W., 63 Riemann–Lebesgue lemma, 41
Plato, 59 Riesz means, 49
Porosity, 298 Root mean square error (RMSE), 257
Power law (PL) model, 133 Root node, 253
Power law NHPP model, 142 rpart package, under R programming language, 254
Power law process, 130, 132 R programming language, 249
Probability mass function (p.m.f.) of NHPP, 131 DT model building with, 254–255
Product life cycle (PLC), 166–167 NN model building with, 252
Q S
Quadratic B-spline basis function, 288–289 Second bias theorem, 8
Quadratic quality criterion, 13 Second-degree (quadratic) B-spline, 288–289
Sedov, L.I., 2
R Shifted unit function, 13
SH-wave propagation, 298–299
Ramanujan-Fourier series, 54 Signal processing, 54
Random forest (RF) models Similarity theorem, 7
for prediction of fertility in Murrah bulls, SIMPLE (Semi-Implicit Method for Pressure Linked
255–256 Equation) algorithm, 271
results and discussion, 259–260 Simultaneous optimization of multiple
randomForest package, 256 characteristics methodology
Rayleigh’s approximation method data collection and modeling, 100–104
boundary conditions, 305 Derringer’s desirability function method, 96–97
energy distribution, 312–313 dual-response surface methodology, 99–100
introduction, 297–299 fuzzy logic approach, 98–99
numerical discussion and results, 313–324 introduction, 91–96
corrugation amplitude effect, 314–315 optimization, 104–106
corrugation wavelength effect, 316–317 of pulp cooking process, 91–109
frequency factor effect, 317–318, 319 Taguchi’s loss function approach, 97–98
initial stress parameter on highly anisotropic validation, 106
half-space, 321–324 Single failure-occurrence time data case, 133–134
initial stress parameter on poroelastic half- Single-objective optimization problem, 73–74
space, 318–321 Single step function/single jump function, 13
particular cases for special case, 310–312 Small order, 43
problem formulation and its solution, 299–304 Smoothing parameter, 139
on reflection/refraction phenomena of plane kernel estimator with modification of, 181–182
SH-wave, 297–324 Social networks, 149
solution for first-order approximation of Soft computing techniques
corrugation, 305–307 adaptive neuro fuzzy inference system
solution for lower highly anisotropic half-space, (ANFIS), 66
301–302 applications, 67–68
Index 333
artificial neural networks (ANNs), 63–65 Synaptic weights, 250

artificial neurons, 64 System reliability evaluation, 112
basic principle of, 64
firing rule, 64 T
types of, 64–65
development history, 58 Taguchi’s loss function approach, 97–98
fuzzy defuzzification, 61–62 Taylor (infinite) series, 36
center of area (CoA) method, 61 Term-wise differentiation, 41–42
center of sums, 62 Threshold logic unit (TLU), 64
max-membership function, 61 Time-dependent conflicting bifuzzy set (CBFS)
mean-max method, 62 and applications in reliability evaluation,
weighted average method, 61 111–127
fuzzy logic, 58–59 basic concept and definitions, 112–113
examples of uses of, 63 conflicting bifuzzy number (CBFN), 113
fuzzy rule base system, 61 convex CBFS, 113
fuzzy sets, 59–60 (α,β)-cut of a time-dependent CBFS, 113
and crisp sets, comparison, 62 introduction, 112
equal fuzzy sets, 59 normal CBFS, 113
membership function, 60 parallel-series system
genetic algorithms, 66 failure rates, 121–122
working of, 66–67 numerical example, 126
introduction, 58 parallel system
neural networks training failure rates, 120–121
reinforced training, 66 numerical example, 125
supervised training, 65 problem formulation, 113–115
unsupervised training, 65–66 reliability evaluation with, 115–118
Spalart–Allmaras (SA) model, 267–268 series-parallel system
Speech processing, 54 failure rates, 123–124
SPSS (Statistical Package for Social Sciences) numerical example, 126–127
nonlinear regression models, 172 series system
Squash function, 251 failure rates, 118–119
SST k–ω model, 268–269 numerical example, 125
Statistical inference, 225 triangular time-dependent CBFS, 113
Statistical tolerance (prediction) limits, on future Time function, 16–17
order statistics Tolerance limits, see Statistical tolerance (prediction)
introduction, 203–207 limits
lower statistical γ-content tolerance Torque-based PID controllers, for biped robot,
limit with expected (1 − α)-confidence, 232–233
211–214 Torsional oscillations of disk, equations of, 26–27
lower statistical (1 − α)-expectation tolerance Total adoption modeling, 171
limit, 217–219 T-periodic function, 36–37
numerical examples, 222–225 Transfer function, 31–32, 251
two-parameter Weibull distribution, 207–211 Transportation problem (TP), 72
upper statistical γ-content tolerance limit with graphical network of, 72
expected (1 − α)-confidence, 214–217 Trapezoidal membership function, 60
upper statistical (1 − α)-expectation tolerance Triangular membership function, 60
limit, 219–222 Triangular time-dependent CBFS, 113
Strong summability, 48 Trigonometric B-spline basis functions, 289–290
Summability, 47 numerical solution, 293–295
absolute, 48 Trigonometric Fourier approximation, 47
methods for, 47–49 Trigonometric Fourier series, 45–46
ordinary, 47 Two-parameter Weibull distribution, 207–211
strong, 48
Supervised training, 65 U
Support vector machines (SVMs), 252
Support vector regression (SVR) models Unit functions, 13–14
for modeling fertility in Murrah bulls, 252–253 Unsupervised training, 65–66
results and discussion, 257–258, 259 Upper statistical γ-content tolerance limit with
Synapses, 63 expected (1 − α)-confidence, 214–217
334 Index
Upper statistical (1 − α)-expectation tolerance limit, Voting procedure, 256

219–222
W
V
Weibull distribution, 207
Vertical and short takeoffs and landings (V/STOL), in Weighted goal programming (WGP), 76, 82
civil aviation sector, 265 Werbos, Paul, 63
View-count based modeling
data analysis and model validation, 156–163 Y
introduction, 149–150
model development, 153–156 Yield of pulp cooking process, 92–93, 95
exponential growth (model II), 155 YouTube view count
linear growth (model I), 155 data analysis and model validation, 156–163
repeat viewing (model III), 156 model development, 153–156
YouTube view count, twofold perspective of, twofold perspective of, 150–153
150–153
Virality, broadcasting and, 152–153 Z
Viscosity of pulp cooking process, 92–95
Viscous staple fiber, 92 Zadeh, L. A., 58
Vogel approximation method (VAM), 73 Zero-degree B-spline, 287–288
numerical example, 86–87 Zero moment point (ZMP), 227
for solving MOTP, 80–81 Z-shaped membership function, 60

Advanced Mathematical Techniques in Engineering Sciences

Uploaded by

Copyright:

Available Formats

Advanced Mathematical Techniques in Engineering Sciences

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Mathematical Techniques in Engineering Sciences

Uploaded by

Copyright:

Available Formats

Advanced Mathematical

Advanced Mathematical Techniques in Engineering Sciences

© 2018 by Taylor & Francis Group, LLC

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-138–55439-9 (Hardback)

Visit the Taylor & Francis Web site at

and the CRC Press Web site at

Chapter 1 Application of the Laplace transform in problems of studying

Chapter 2 Fourier series and its applications in engineering...........................................35

Chapter 3 Soft computing techniques and applications....................................................57

Chapter 4 New approach for solving multi-objective transportation problem............71

Chapter 5 An application of dual-response surface optimization methodology

Chapter 6 Time-dependent conflicting bifuzzy set and its applications in

Chapter 7 Recent progress on failure time data analysis of repairable system..........129

Chapter 8 View-count based modeling for YouTube videos and weighted

Chapter 9 Market segmentation-based modeling: An approach to understand

Chapter 10 Kernel estimators for data analysis...................................................................177

Chapter 11 A new technique for constructing exact tolerance limits on future

Chapter 12 Design of neural network–based PID controller for biped robot

Chapter 13 Modeling fertility in Murrah bulls with intelligent algorithms................247

Chapter 14 Computational study of the Coanda flow for V/STOL..................................265

Chapter 15 Introduction to collocation method with application of B-spline

Chapter 16 Rayleigh’s approximation method on reflection/refraction

parameters such as initial stress parameter, corrugation amplitude, wavelength, and

A. Anand Shshank Chaube

A. Arora K.K. Chowdhury

G. Berzins M.S. Irshad

Neelima Bhengra Boby John

Leonid Kondratenko A. Munjal

S. Sonker Maharshi Subhash

O. Singh Michele Trancossi

Pankaj Kumar Srivastava

Application of the Laplace transform

Here, s is a complex variable; s = x + jy; t > 0.

Here, the function F(s) is a function of the complex variable s = x + jy.

Figure 1.1 A piecewise-continuous function: t1 is a point of discontinuity of the first kind;

Figure 1.2 Changing the complex variable.

Figure 1.3 The domain of convergence of the Laplace integral.

f ( t ) < Me x0t , t > 0.

Here, x0 is the exponent of the growth of the function f(t).

Figure 1.4 The domain of inversion of the Laplace integral: ds = jdy.

f (t − a) → e − as F( s); a > 0. (1.10)

For the inverse transformation is valid

e − as F( s) ← f (t), for t > a. (1.11)

Figure 1.5 Offset function to the right and left.

Rule III plays an important role in solving difference equations [3].

It follows that if the correspondence f(t) → F(s) holds in the half-plane.

1.4 Differentiation and integration

L  f ′ ( t )  = sF ( s ) − f ( +0 ) , where f ( +0 ) = limt→0 f ( t ) . (1.16)

L  f ′(t)  = sF( s).

Hence, we formulate the following rule.

f ′(t) → sF( s). (1.17)

f ( 2 ) (t) → s2 F( s) − f (+0) − f ′(+0);

f (3) (t) → s3 F( s) − f (+0)s2 − f ′(+0)s − f (2) (+0);

−tf (t) → F ′( s); t 2 f (t) → F ( 2 ) ( s); (−1)n t n f (t) → F ( n) ( s). (1.23)

Theorem 1.11 extends to integrals of higher orders.

F( s) f −1 (+0) f ( −2) (+0)

1.5 Multiplication and curtailing

 f1 (t) + f2 (t)  → F1 ( s) + F2 ( s). (1.27)

1.4 Differentiation and integration

1.5 Multiplication and curtailing

1.6 The image of a unit function and

1.7 Examples of solving some problems of mechanics

sY ( s) − y ( +0 ) + c0Y ( s) = F(s). (1.58)

y(t) = f (t)e − c0t + y(+0)e − c0t . (1.60)

mx′′(t) = f (t). (1.62)

ms 2 X ( s) = msx0 + mx0′ + F( s). (1.64)

ms 2 X ( s) = ms 2 x0 + msv0 + F( s). (1.68)

x′′(t) = f (t). (1.70)

s 2 X ( s) = u0 (t)F( s) + u1 (t)v0 + u2 (t)x0 . (1.71)

s 2 X ( s) = 1F(s) + sv0 + s 2 x0 . (1.72)

mx′′ + cx = mgu0 (t) + mx0′ u1 (t)v0 + mx′u2 (t)x0 , (1.74)

(ms 2 + c)X ( s) = mg + mx0′ s + mx0 s 2 . (1.75)

1.8 Laplace transform in problems of

1.9 Relationship between the velocities of the particles of an

1.10 An inertial disk rotating at the end of the rod

1.12 Equations of longitudinal oscillations of a disk

1.13 Application of the Laplace transform

1.13.1 Method of studying oscillations of the velocities of motion

1.13.2 Features of functioning of a drive with a long force line

1.13.3 Investigation of dynamic features of the system in