Modern Optimization Book
Modern Optimization Book
Download details:
IP Address: 129.94.8.6
This content was downloaded on 07/11/2023 at 10:40
Influence of Cobalt and Sodium Doping on MnO/CNT Composite Anode Materials for Li-Ion Batteries
Alessandro Palmieri, Raana Kashfi-Sadabad, Sajad Yazdani et al.
Modern Optimization Methods for
Science, Engineering and
Technology
Modern Optimization Methods for
Science, Engineering and
Technology
Edited by
G R Sinha
Myanmar Institute of Information Technology Mandalay, Myanmar
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system
or transmitted in any form or by any means, electronic, mechanical, photocopying, recording
or otherwise, without the prior permission of the publisher, or as expressly permitted by law or
under terms agreed with the appropriate rights organization. Multiple copying is permitted in
accordance with the terms of licences issued by the Copyright Licensing Agency, the Copyright
Clearance Centre and other reproduction rights organizations.
Permission to make use of IOP Publishing content other than as set out above may be sought
at [email protected].
G R Sinha has asserted his right to be identified as the author of this work in accordance with
sections 77 and 78 of the Copyright, Designs and Patents Act 1988.
DOI 10.1088/978-0-7503-2404-5
Version: 20191101
IOP ebooks
British Library Cataloguing-in-Publication Data: A catalogue record for this book is available
from the British Library.
US Office: IOP Publishing, Inc., 190 North Independence Mall West, Suite 601, Philadelphia,
PA 19106, USA
Dedicated to my late grandparents, my teachers and Revered Swami Vivekananda.
Contents
Preface xvi
Acknowledgements xvii
Editor biography xviii
List of contributors xx
vii
Modern Optimization Methods for Science, Engineering and Technology
viii
Modern Optimization Methods for Science, Engineering and Technology
ix
Modern Optimization Methods for Science, Engineering and Technology
x
Modern Optimization Methods for Science, Engineering and Technology
xi
Modern Optimization Methods for Science, Engineering and Technology
xii
Modern Optimization Methods for Science, Engineering and Technology
xiii
Modern Optimization Methods for Science, Engineering and Technology
xiv
Modern Optimization Methods for Science, Engineering and Technology
xv
Preface
xvi
Acknowledgements
xvii
Editor biography
G R Sinha
Dr G R Sinha is Adjunct Professor at the Institute of Information
Technology (IIIT) Bangalore and is currently deputed as a Professor
at Myanmar Institute of Information Technology (MIIT),
Mandalay, Myanmar. He obtained his BE (electronics engineering)
and MTech (computer technology) with a Gold Medal from the
National Institute of Technology, Raipur, India. He received his
PhD in electronics and telecommunications engineering from
Chhattisgarh Swami Vivekanand Technical University (CSVTU), Bhilai, India.
He has published 227 research papers in various international and national
journals and conferences. He is an active reviewer and editorial member of more
than 12 reputed international journals such IEEE’s Transactions on Image
Processing, Elsevier’s Computer Methods and Programs in Biomedicine, etc. He
has been Dean of Faculty and Executive Council Member of CSVTU India and is
currently a member of the Senate of MIIT. Dr Sinha has been appointed as an ACM
Distinguished Speaker in the field of DSP for the years 2017–20. He has also been
appointed as an Expert Member for the Vocational Training Program by Tata
Institute of Social Sciences (TISS) for two years (2017–19). He has been the
Chhattisgarh Representative of the IEEE MP Sub-Section Executive Council for
the last three years. He has served as a Distinguished Speaker in Digital Image
Processing for the Computer Society of India (2015). He also served as
Distinguished IEEE Lecturer on the IEEE India council for the Bombay section.
He has been a Senior Member of IEEE for many years.
He is the recipient of many awards, such as the TCS Award 2014 for Outstanding
Contributions in the Campus Commune of TCS, R B Patil ISTE National Award
2013 for Promising Teacher by ISTE New Delhi, Emerging Chhattisgarh Award
2013, Engineer of the Year Award 2011, Young Engineer Award 2008, Young
Scientist Award 2005, IEI Expert Engineer Award 2007, ISCA Young Scientist
Award 2006, and the nomination and awarding of the Deshbandhu Merit
Scholarship for five years. He has authored six books, including Biometrics
published by Wiley India, a subsidiary of John Wiley, and Medical Image
Processing, published by Prentice Hall of India. He is a consultant for various skill
development initiatives of NSDC, Government of India. He is a regular referee of
project grants under the DST-EMR scheme and several other schemes of the
Government of India. He has delivered many keynote/invited talks and chaired
many technical sessions at international conferences in Singapore, Myanmar,
Bangalore, Mumbai, Trivandrum, Hyderabad, Mysore, Allahabad, Nagercoil,
Nagpur, Kolaghat, Yangon, Meikhtila and many other places. His special session
on ‘Deep Learning in Biometrics’ was included in the IEEE International
Conference on Image Processing in 2017. He is a Fellow of IETE New Delhi and
a member of international professional societies such as IEEE, ACM and many
xviii
Modern Optimization Methods for Science, Engineering and Technology
other national professional bodies such as ISTE, CSI, ISCA and IEI. He is a
member of various committees of the university and has been Vice President of the
Computer Society of India for the Bhilai chapter for two consecutive years. He has
supervised eight PhD scholars and 15 MTech scholars. His research interests include
image processing and computer vision, optimization methods, employability skills,
outcome based education (OBE), etc.
xix
List of contributors
Sirajuddin Ahmed
Jamia Millia Islamia
New Delhi
India
Rajesh Chamorshikar
Bhilai Steel Plant
Bhilai
Chhattisgarh
India
Siddharth Choubey
SSTC-SSGI
CSVTU
Bhilai
Chhattisgarh
India
Abha Choubey
SSTC-SSGI
CSVTU
Bhilai
Chhattisgarh
India
Sien Deng
Department of Mathematical Sciences
Northern Illinois University
DeKalb, IL
USA
Santosh R Desai
Electronics and Instrumentation Engineering
BMS College of Engineering
Basavangudi
Bangalore
India
Somesh Kumar Dewangan
SSTC-SSGI
CSVTU
Bhilai
Chhattisgarh
India
xx
Modern Optimization Methods for Science, Engineering and Technology
Vladimir Gorbunov
Moscow Institute of Electronic Technology
Moscow
Russia
Shankru Guggari
Department of Computer Applications
BMS College of Engineering
Bengaluru
Karnataka
India
Glenn Harris
Department of Mathematical Sciences
Northern Illinois University
DeKalb, IL
USA
Vandana Khare
CMR College of Engineering and Technology
Hyderabad
Telangana
India
Myo Khaing
University of Computer Studies (Magway)
Magway
Myanmar
xxi
Modern Optimization Methods for Science, Engineering and Technology
Ajay Kulkarni
Medi-Caps University
Indore
Madhya Pradesh
India
Rahul Kumar
Department of Information Technology
National Institute of Technology Raipur
Raipur
Chhattisgarh
India
Bonya Mukherjee
Bhilai Steel Plant
Bhilai
Chhattisgarh
India
Pushkala Narasimhan
PG Department of Commerce
NMKRV College for Women
Bangalore
Karnataka
India
Jyotiprakash Patra
SSTC-SSGI
CSVTU
Bhilai
Chhattisgarh
India
Jyothi Pillai
Department of Information Technology
Bhilai Institute of Technology
Durg
Chhattisgarh
India
xxii
Modern Optimization Methods for Science, Engineering and Technology
Rajendra Prasad
Indian Institute of Technology Roorkee
Roorkee
Uttarakhand
India
Sachin Puntambekar
Medi-Caps University
Indore
Madhya Pradesh
India
Subrahmanian Ramani
Bhilai Steel Plant
Bhilai
Chhattisgarh
India
K C Raveendranathan
Rajadhani Institute of Engineering and Technology
Thiruvananthapuram
Kerala
India
Arpana Rawal
Bhilai Institute of Technology
Durg
Chhattisgarh
India
Mridu Sahu
Department of Information Technology
National Institute of Technology Raipur
Raipur
Chhattisgarh
India
Mamta Singh
Sai College
Bhilai
Chhattisgarh
India
G R Sinha
Myanmar Institute of Information Technology
Mandalay
Myanmar
xxiii
Modern Optimization Methods for Science, Engineering and Technology
Kostiantyn Tkachuk
Igor Sikorsky Kyiv Polytechnic Institute
Kiev
Ukraine
Oksana Tverda
Igor Sikorsky Kyiv Polytechnic Institute
Kiev
Ukraine
Sergij Vambol
State Ecological Academy of Postgraduate Education and Management
Berdyansk State Pedagogical University
Berdyansk
Ukraine
Viola Vambol
Berdyansk State Pedagogical University
Berdyansk
Ukraine
K A Venkatesh
Myanmar Institute of Information Technology
Mandalay
Myanmar
xxiv
IOP Publishing
Chapter 1
Introduction and background to
optimization theory
G R Sinha
Wikipedia suggests that the optimization problem was studied in the seventeenth
and eighteenth centuries, including with regard to Kepler’s law, Bernoulli’s theorem,
Leibniz’s calculus of variations, etc. In calculus, probability theory, mechanics,
chemical equilibrium and many other fields, optimization theorems and principles
evolved in the nineteenth and twentieth centuries. In the twentieth century, the
traveling salesman problem, which is one of the most popular among optimization
problems, was studied and worked upon. Now, optimization theory is not only
limited to STEM and allied branches but the concept is widely used in all fields, such
as financial management, economics, life sciences, genetics, population studies and
several others. We can use the evolution of human beings as an example, with
human ancestors being more ape-like. Evolution then continued and human beings
can be seen in the present context as the best possible form of the species of its kind
that natural selection could produce. The human species is now educated and is
trying become even faster through computing speed, although the thinking capacity
of the brain is unlimited. Thought processes and their outcomes are being improved
continuously. This is a simple case of optimization. A person can become a better
intellectual or individual using some methods, thought processes or techniques—
employing some type of optimization concept to perform better.
Let us understand optimization problems with the help of an example. A system is
defined by the following equations:
M = ax1 + bx2 (1.1)
P = Mx1 + c, (1.2)
where x1 and x2 are two input variables, a and b are constants, M is an intermediate
output, c is one more constant and P is the final output or response of the system.
Now, with a given set of conditions we are interested in obtaining the optimum
value of P, which can be achieved in two different ways:
• Optimizing system performance by optimizing the internal parameters and
constants which are characteristics of the system.
• Optimizing by adjusting or optimizing all individual variables or selected
variables present in the system.
Optimization is expected to produce the best possible final value otherwise some
more optimization methods need to be applied to further improve the results—there
arises the need for a robust approach that can suit all requirements and conditions.
Another situation for using optimization in the above example may be due to some
error introduced in the final value. If the error in the final value is more than the
tolerance limit (the allowed limit) then the error can be minimized by using a
suitable optimization method.
1-2
Modern Optimization Methods for Science, Engineering and Technology
something, some parameter, so that a method, system or device performs better. The
term robustness relates to attempting a process more and more times since the solution
can never be optimum, rather it can be optimal. So, researchers will be always
engaging in experiments, simulations, modeling and study so that a better result or
performance is obtained. Robustness has long been a challenging research problem in
all fields of science and technology irrespective of application. This could be in the field
of automobiles, manufacturing, computing, or information and communication
technology (ICT) enabled services, all of which need continuous improvement. If we
are to discuss any particular example of robustness then let us discuss image processing
and computer vision based applications [1, 2]. This is an important application because
computer vision is used in almost all modern advancements. The prominent authors of
image processing admit that there is no general theory in image processing, which
means that whatever is done in a novel manner becomes new in the area of image
processing. Obtaining a robust approach for any components of image processing such
as image de-noising, segmentation or pattern matching is challenging, and it is very
difficult to claim that the obtained solution is robust one. The robustness depends on
optimizing one or a few or all the parameters involved in the research. The extensive
research on image processing suggests that robustness is a permanent research
problem, a few research contributions can be seen in [1–6] where optimization of
performance was attempted but not as a robust approach.
Combining the two important terms, robust optimization is an emerging area of
study and research where robustness is investigated among a number of optimization
problems and solutions in real-time industries and practice. The optimization process
employs a number of methods for improving the performance of a particular task
and different optimization methods produce different performance metrics and
therefore robustness needs to be investigated among the optimization methods.
The best possible optimization method that can result in optimum performance of a
system will be referred to as a robust optimization method. If the method is robust
then it should also be used for a variety of tasks irrespective of different performance
evaluation parameters.
1
The definition is taken from: https://www.lexico.com/en/definition/optimization and slightly modified.
2
The definition is taken from https://dictionary.cambridge.org/dictionary/english/optimization and slightly
modified.
1-3
Modern Optimization Methods for Science, Engineering and Technology
Variables are the most important and guiding factors deciding the result being
produced by the optimization and constraints are certain conditions under which
optimization achieves its desired goal (its objective).
1.2.2 Objectives
An objective means a goal to be achieved. Every STEM performance improvement
will have a certain goal to achieve, maybe an error to minimize at some specific level.
In all such cases, optimization is used with the important purpose to achieve the goal
that is set by the system response or method. Thus, the optimization method is
assigned a minimum goal that is required to be achieved. As an example, a few
images are presented in figure 1.1 which are the results of breast cancer segmentation
and detection. The results were obtained using an optimized method that combined
gray level clustering enhancement and adaptive threshold segmentation (GLCE-
ATS). If we look at four different segmentation results which all highlight breast
cancer masses, they appear the same and it is very difficult to determine the cases in
which more cancerous elements (malignant elements) are present. The optimization
method has the objective to achieve the maximum number of malignant masses
present in the breast mammograms [8].
The objective of optimization is also to minimize the error or value of a
suitable function e(x) or f(x), y = minimize {f(x)} or minimize {e(x)}.
1-4
Modern Optimization Methods for Science, Engineering and Technology
(a) (b)
(c) (d)
Figure 1.1. Results of breast cancer detection, with different results for different optimization parameters used
in the GLCE-ATS method.
1-5
Modern Optimization Methods for Science, Engineering and Technology
Here the goal is to minimize the value of the error signal. The minimum or
optimum value has to be obtained as Y and the optimization has to be achieved
under the conditions given in equation (1.4). These conditions are bounds which are
apparently lower and upper bounds in the case of x2 and x1, respectively. The error
signal or value needs to be minimized between x1 and x2. In different types of
optimization problems and applications, various types of bounds and constraints are
used. The optimization methods can be generalized but the constraints are specific to
the applications or problems being addressed. Therefore, the constraints are
subjective, whereas the optimization methods are objective for the applications. In
[7], the solution to achieving equilibrium requires a number of constraints in the
implementation of optimization in the work.
1-6
Modern Optimization Methods for Science, Engineering and Technology
1-7
Modern Optimization Methods for Science, Engineering and Technology
beginning as per the need and the optimization method is selected after certain steps,
such as defining design variables, constraints and bounds.
The methods of optimization will fall under different categories, but the basis of
choosing a suitable method will be the above two factors, that is, the constraints and
bounds being either linear or non-linear. The optimization methods are further
classified as:
• Single variable or multi-variable methods.
• Constrained or non-constrained methods.
• Linear or non-linear optimization methods.
• Single objective or multi-objective optimization methods.
• Specialized or general-purpose methods.
• Traditional or non-traditional methods.
1-8
Modern Optimization Methods for Science, Engineering and Technology
numerical analysis and feature based analysis in various signal processing and
control applications.
In the fast changing dynamic environments of various emerging applications of
signal processing, control, transport, etc, optimization is employed where the
methods, models and problems are dynamic in nature. Telecommunications,
artificial intelligence based advancements, financial analysis and several others fields
are rapidly changing with a wide range of dynamics wherein the problems and
models of optimization used also change rapidly. In such applications, the methods
which are traditionally used also need modification and dynamic updates [10]. There
are a number of optimization methods which are recommended for such applica-
tions, a few of which are: ant colony optimization, evolutionary optimization,
genetic algorithms, neural networks and swarm intelligence based methods, etc. In
the context of dynamic optimization, the quantitative measures of performance to
assess how the optimization method is working become essential. Such measures are
required in all optimization methods, but in a dynamically changing environment
the assessment or evaluation is particularly important [1–3, 9, 10]. Some of the
important measures which are widely used, irrespective of type of optimization
problem and application, are:
• Cumulative fitness and mean fitness.
• Off-line error.
• Robustness.
• Diversity.
• Standard deviation.
• Time.
• Computational complexity.
• Average error.
• Average best function value.
• Current best evolution.
1-9
Modern Optimization Methods for Science, Engineering and Technology
comfort level has to be increased in the field of ergonomics, then the optimization
utilizes the structural properties of the system or device whose ergonomics require
improvement. Figure 1.3 shows a simple and typical structural diagram of the outer
surface of an airplane where the airfoil or blade design and their optimization are
important factors in the performance of the flight. This is just an symbolic diagram,
as there are thousands of small, medium and high level structures present in an
airplane.
Structural optimization is divided into various types based on the structural
attributes. A few major structural optimization methods or techniques are as
follows:
1. Shape optimization: This is based on the different shapes of systems or
products.
2. Area and volume optimization: Area and volume decide the performance of a
huge number of products in the transport and control based applications,
and therefore the performance could be improved by optimizing either the
area or volume or both or some other similar variables.
3. Size optimization: This may include length, width and other similar dimen-
sional features in the optimization of the system’s structural performance.
4. Topological optimization: Since the topology represents the overall intercon-
nection of various components in a system and how those components are
structured together determines the system’s performance, in particular in the
automobile and similar sectors, and the role of optimization is to improve the
topology of the system architecture [11].
The topological optimization of structures such as seismic loads, struc-
tural optimization for steel plants and reliability optimization are examples
of structural optimization methods. Topological optimization is imple-
mented following a certain workflow of operations; one such flow diagram
can be seen in figure 1.4. The need for optimization arises and structural
properties are studied. The optimization problem is identified and a
suitable method is implemented that utilizes the structural attributes in the
process of optimization. The optimization results in some post-processing
with the application of an optimization method which is sensitive to the
structural features of the system.
1-10
Modern Optimization Methods for Science, Engineering and Technology
Optimization of Problem
Structural
Optimization Method Analysis
Application of
Optimization
1-11
Modern Optimization Methods for Science, Engineering and Technology
1-12
Modern Optimization Methods for Science, Engineering and Technology
1-13
Modern Optimization Methods for Science, Engineering and Technology
where X1, X2, X3, Y1, Y2, Y3, Z1, Z2, Z3 are variables and P1, P2 and P3 are
constants. The equations are linear equations and the values of P1, P2 and P3 may be
any real values. The representation as shown by the three equations can also be
represented as a vector:
• X = [X1 × 2 × 3]; Y = [Y1 Y2 Y3]; Z = [Z1 Z2 Z3]; and P = [P1 P2 P3].
• The coefficients used in the three equations can also be expressed as vectors:
A1 = [1 1 1]; B = [2 2 1.5]; and C = [1.5 3 5].
⎡1 2 1.5⎤⎡ X ⎤ ⎡ P1 ⎤
⎢1 2 3 ⎥⎢Y ⎥ = ⎢ P ⎥ . (1.8)
⎢⎣ ⎥ ⎢ 2⎥
1 1.5 5 ⎦⎢⎣ Z ⎥⎦ ⎣ P3 ⎦
There are many other ways in which vector representation can replace the set of
linear equations. This representation of equations is more appropriately called a
matrix representation. When matrix representation comes into the analysis of any
optimization theory then all of its properties, such as determinant, rank, character-
istic equation, eigenvalues, eigenvectors, etc, become very important to understand.
Since the matrix and its properties at this level fall under the elementary knowledge
of matrices we will not discuss them further here. Thus, linear optimization can
work on:
• Linear equations.
• Vector representations of linear equations.
• Matrix representations.
Matrix representations provide an efficient way to express the operations and also
better help in the analysis of systems compared to the other approaches as far as the
implementation of linear optimization is concerned [16]. Linear representation based
optimization includes:
• The steepest descent method.
• The conjugate gradient method [17].
• The normal equation method.
1-14
Modern Optimization Methods for Science, Engineering and Technology
x x+y
Any vector space can also have its vector sub-spaces. The concepts of vector space
and sub-spaces are utilized in the optimization methods for linear optimization in
several applications [18–24]. Vector space includes vector addition which is a
common mathematical tool in vector algebra. This can be seen in figure 1.5.
1-15
Modern Optimization Methods for Science, Engineering and Technology
Hot
Cold
1-16
Modern Optimization Methods for Science, Engineering and Technology
Hot
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
References
[1] Sinha G R, Raju K S, Patra R, Aye D W and Khin D T 2018 Research studies on human
cognitive ability Int. J. Intell. Defen. Supp. Syst. 5 298–304
[2] Sinha H, Meshram M R and Sinha G R 2018 BER performance analysis of MIMO-OFDM
over wireless channel Int. J. Pure Appl. Math. 118 195–206
[3] Patel B C and Sinha G R 2011 Comparative performance evaluation of segmentation
methods in breast cancer images Int. J. Mach. Intell. 3 130–3
[4] Sinha G R 2015 Fuzzy Based Medical Image Processing Advances in Medical Technologies
and Clinical Practice (AMTCP) Book Series (Hershey, PA: IGI Global)
[5] Patel B C and Sinha G R 2010 Early detection of breast cancer using self-similar fractal
method Int. J. Comput. Appl. 10 39–43
[6] Sinha K and Sinha G R 2013 Comparative analysis of optimized K-means and C-means
clustering methods for segmentation of brain MRI images for tumor extraction Proc. of Int.
Conf. on Emerging Research in Computing, Information, Communication and Applications
(Amsterdam: Elsevier), pp 619–25
[7] Prepoka A 1980 On the development of optimization theory Am. Math. Month. 87 527–42
[8] Patel B C and Sinha G R 2018 Mass segmentation and feature extraction of mammographic
breast images in computer-aided diagnosis PhD Thesis Chhattisgarh Swami Vivekanand
Technical University Bhilai
[9] Chong E K P and Zak S H 2001 An Introduction to Optimization (New York: Wiley)
[10] Cruz C, González J R and Pelta D A 2011 Optimization in dynamic environments: a survey
on problems, methods and measures Soft Comput. 15 1427–48
[11] Labanada S R 2015 Mathematical programming methods for large-scale topology opti-
mization problems PhD Thesis Technical University of Denmark
[12] Ahmad H A 2012 The best candidates method for solving optimization problems J. Comput.
Sci. 8 711–5
[13] Luo Z Q and Yu W 2006 An introduction to convex optimization for communications and
signal processing IEEE J. Select. Area Commun. 24 1426–38
[14] Bezhovski Z 2015 The historical development of search engine optimization Inform. Know.
Manage. 5 91–6
1-17
Modern Optimization Methods for Science, Engineering and Technology
[15] Luenberger D G 1969 Optimization by Vector Space Methods (New York: Wiley)
[16] Absil P A, Mahony R and Sepulchre R 2008 Optimization Algorithms on Matrix Manifolds
(Princeton, NJ: Princeton University Press)
[17] Abrudan T, Eriksson J and Koivunen V 2009 Conjugate gradient algorithm for optimization
under unitary matrix constraint Signal Process. 89 1704–14
[18] Ehrgott M and Gandibleux X 2003 Multiple Criteria Optimization: State of the Art
Annotated Bibliographic Surveys (New York: Kluwer/Academic)
[19] Boumal N 2014 Optimization and estimation on manifolds PhD Princeton University https://
web.math.princeton.edu/~nboumal/papers/boumal_optimization_and_estimation_on_mani-
folds_phd_thesis.pdf
[20] Hartley R I and Kahl F 2009 Global optimization through rotation space search Int. J.
Comput. Vis. 82 64–79
[21] Gallier J and Quaintance J 2018 Fundamentals of linear algebra and optimization Technical
Report University of Pennsylvania
[22] Nocedal J and Wright S J 2006 Numerical Optimization 2nd ed (Berlin: Springer)
[23] Dubourg V 2011 Adaptive surrogate models for reliability analysis and reliability-based
design optimization PhD Thesis Université Blaise Pascal https://www.phimeca.com/IMG/
pdf/these_dubourg_2011.pdf
[24] Simon D 2013 Evolutionary Optimization Algorithms: Biologically-Inspired and Population-
Based Approaches to Computer Intelligence (Hoboken, NJ: Wiley)
1-18
IOP Publishing
Chapter 2
Linear programming
K A Venkatesh
Optimization is the key for success in any field. In general, every resource, such as
time, availability of skilled work force, etc, is finite; allocating scarce resources to the
requested needs in an optimal way is one of the important tasks in every operation.
Linear programming is one of the mathematical programming models which has
roots in ancient mathematics, namely algebra, in particular the solving of systems of
simultaneous linear equations, linear algebra and specifically matrix operations. The
potential applications of the linear programming problem (LPP) are enormous:
supply chain management, organ donor and recipient matching, communications
network design, aviation industries, financial engineering, network optimization,
smart grids, decision making and so on. LPP is a special case of optimization
problems, which look for maximum or minimum values for a single objective
function or multiple objective functions with a set of constraints; the constraints may
be equations or inequalities. In LPP, the variables in the objective function and
constraints are linearly related.
2.1 Introduction
The study of modeling with linear equations began its journey with the birth of
algebra. As a natural extension, many real-world problems can be modeled as a
system of simultaneous linear equations. Let us begin with the example of a high
school level mathematics problem.
Example 2.1. Three pens and two pencils cost $190 and two pens and three pencils
cost $180. Find the cost of each pen and pencil.
To model as a system, we use the letters x , y to denote the unknown quantities
and these are known as variables. The above problem can be written as
3x + 2y = 190
2x + 3y = 180.
The above system in matrix notion, can be written as AX = b, where A is the
coefficient matrix of order 2(n ), and X and b are column vectors of order 2 × 1
(n × 1). To solve the system, there are many methods from the theory of matrices. In
general, the methods are classified into two categories, namely direct methods and
iterative methods. Methods such as Cramer’s, Gauss elimination, Jordon and matrix
decomposition are examples of direct methods and Gauss–Jacobi, Gauss–Seidel and
SOR methods are examples of iterative methods. All the mentioned methods work
well as long as the coefficient matrix is a square one. Suppose that the coefficient
matrix is a rectangular one, that is the system has a greater number of variables than
the number of equations, we cannot then solve in the usual sense. Suppose the
system has k variables in m equations (k > m ), set k – m variables as zero then solve
for the remaining variables. Such solutions are called basic solutions. The number of
basic solutions is ( k ).
m
A single entity which consists of a linear expression, called the objective function,
with finitely many linear inequalities, called linear constraints and non-negative
restrictions, is called the linear programming problem (LPP). The general form of
the LPP is as follows.
Objective function:
n
Optimize Z = ∑i=1cixi . (2.1)
X ⩾ 0, (2.3)
where A is an m × n matrix, and X and b are column vectors of order m × 1.
The basic solutions which satisfy the given set of constraints are called basic
feasible solutions. The basic feasible solution which optimizes the objective function
is called the optimal basic feasible solution.
All LPPs fall into one of three categories: feasible and bounded, unbounded, and
infeasible. If the given LPP is feasible and bounded then it has at least one optimal
solution.
Prior to Dantzig’s simplex method to solve the LPP, there was an enumerative
algorithm to find a feasible solution to the given set of constraints, but this method
was not a viable model because it inherently generates additional inequalities. In
1963, Dantzig developed the simplex method and led the development of a new
branch of study, called LPP.
The simplex method principle is that rather than working with inequalities, you
transform the inequalities into equalities by introducing slack variables (if the sign of
the inequality is ⩽) or surplus (if the sign of the inequality is ⩾) variables. This
2-2
Modern Optimization Methods for Science, Engineering and Technology
augmented form of LPP is called the standard form. If the type of objective function
is maximization, then convert all constraints which are of ⩾ type.
Example 2.2. A manufacturer produces three products P1, P2 and P3 in one of its
plants. Each product requires a certain amount of time on each of two machines,
given in table 2.1 (h unit−1).
The goal of the problem is to identify the optimal number of units of P1, P2, P3 to
produce so that the earned profit is maximized. Let x , y, z be the number of units of
P1, P2, P3 to be produced, respectively.
From the given scenario, the objective is to maximize the profit. Therefore the
objective function is given by
Maximize Z = 3x + 3y + 3z
Machines
Profit per unit
Product M1 M2
P1 7 8 $3
P2 8 12 $4
P3 5 7 $2
Available m/c time (h week−1) 40 35
2-3
Modern Optimization Methods for Science, Engineering and Technology
2-4
Modern Optimization Methods for Science, Engineering and Technology
Note that here the warehouses and retail stores are known as pure supply nodes and
pure consumption nodes. The transportation problem can be represented by a
complete bi-partite graph. In certain scenarios, some nodes can be supply as well as
consumption nodes, such models are called transshipment models and the nodes
which are supply and demand nodes are called transshipment nodes.
Example 2.3. A firm has three warehouses W1, W2 and W3 and four retail outlets
R1, R2, R3 and R4. The unit transportation cost of shifting from Wi to Rj and the
maximum possible supply and the demand requirements are given in the table 2.3.
The company wishes to minimize the transportation cost.
Solution: total demand is 5700 = total supply and hence the problem is balanced.
The corresponding LPP model is
Subject to constraints:
2-5
Modern Optimization Methods for Science, Engineering and Technology
R1 R2 R3 R4 Supply
W1 6 4 8 4 2000
W2 7 21 4 6 1200
W3 10 4 9 7 2500
Demand 1500 2000 1000 1200
R1 R2 R3 R4 Supply
W1 1300 0 0 700 2000 Leq 2000
W2 200 0 1000 0 1200 Leq 1200
W3 0 2000 0 500 2500 Leq 2500
1500 2000 1000 1200
Supply constraints
6x11 + 4x12 + 8x13 + 4x14 ⩽ 2000
7x21 + 21x22 + 4x23 + 6x24 ⩽ 1200
10x31 + 4x32 + 9x32 + 7x34 ⩽ 2500.
Demand constraints
6x11 + 7x21 + 10x31 ⩾ 1500
4x12 + 21x22 + 4x32 ⩾ 2000
8x13 + 4x23 + 9x33 + 4x14 ⩾ 1000
4x14 + 6x24 + 7x34 ⩾ 1200
xij ⩾ 0; 1 ⩽ i ⩽ 3 & 1 ⩽ j ⩽ 4.
Using the Excel solver we obtain the optimal values of xij and the minimum
transportation cost and the same is presented in table 2.4.
The minimal transportation cost is 27 500.
The shipment details are:
• W1 to R1 is 1300 units and W1 to R4 is 700 units.
• W2 to R1 is 200 units and W2 to R3 is 1000 units.
• W3 to R2 is 2000 units and W3 to R4 is 500 units.
2-6
Modern Optimization Methods for Science, Engineering and Technology
Example 2.4. (Transshipment problem) A firm has two plants P1 and P2 which
produce switch gears; two carry and forward (C&F) agents (C1, C2 and C3) and
dealers (D1, D2 and D3). P1 and P2 produce 1500 switch gears and 2500 switch gears,
respectively, and the demands at the dealers are 1900, 1200 and 900, respectively. All
goods can reach the dealers through any of the two C&F agents. The unit
transportation costs between nodes are shown in the network diagram (figure 2.1).
The objective of the organization is to minimize the transportation cost.
Here P1 and P2 are pure supply nodes, C1, C2, D3 are transshipment nodes and
D1, D2 are pure demand nodes. Using the buffer, we will convert this problem into a
transportation problem.
Now we have nodes P1, P2, C1, C2, D3 as supply nodes and C1, C2, D1, D2, D3 are
demand nodes. The supply at the transshipment nodes equals the buffer, and
demand at the transshipment nodes equals demand plus buffer, where the buffer can
be total supply or total demand. The objective function is the same as in the
transportation problem. The transshipment cost table is presented in table 2.5.
The assignment problem is a special case of the transportation problem. There are
m jobs and n bidders who are available to complete the jobs. Assume that n ⩾ m or
vice versa. Every job is assigned to only one bidder and every bidder will do only one
job. Hence this can be modeled as square matrices and hence in both cases we add
dummy rows/columns as per the need and assign unity as the supply and demand for
both nodes. Let Cij be the cost quoted by the ith bidder for doing the jth job. While
formulating this scenario, as a linear programming model, change the inequality in
both constraints as equality and solve as usual.
2-7
Modern Optimization Methods for Science, Engineering and Technology
P1 3 2 * * * 1500
P2 1 4 2500
C1 0 * 5 2 3 B
C2 6 0 4 3 8 B
D3 * * * 4 0 B
Demand B B 1900 1200 900 + B B = Buffer
Replace (*) with a sufficiently large value and solve as in the transportation problem.
Jobs
Contractors
J1 J2 J3
Contractor C1 123 78 67
Contractor C2 158 65 75
Contractor C3 200 50 35
Contractor C4 100 30 40
Contractor C5 225 80 68
2-8
Modern Optimization Methods for Science, Engineering and Technology
Jobs
Contractors Supply
J1 J2 J3 J4 J5
Contractor C1 123 78 67 0 0 1
Contractor C2 158 65 75 0 0 1
Contractor C3 200 50 35 0 0 1
Contractor C4 100 30 40 0 0 1
Contractor C5 225 80 68 0 0 1
Demand 1 1 1 1 1
2-9
Modern Optimization Methods for Science, Engineering and Technology
1 Info Systems 6
2 BTS 11
3 Advanced Systems 9
4 Pearless Insurance 5.50
5 Loyal Insurance 6
6 Commercial Bank 4.30
7 Agro Bank 4.10
Constraints:
I + B + A + P + L + CB + AB = 1000000
I + B + A ⩽ 500000
CB + AB ⩽ 500000
P + L ⩽ 500000
CB + AB ⩽ 0.2(I + B + A)
I ⩽ 0.55(CB + AB )
I , B, A , P , L , CB , AB ⩾ 0.
2-10
Modern Optimization Methods for Science, Engineering and Technology
5. Identify the entering/pivot variable (the variable that has the most negative
value in the z-row, the denominator should be positive).
6. Identify the leaving variable (among the ratios between the solution column
values and the entering variable column values, choose the minimal one).
7. Create the new table as in the Gauss–Jordon method for the next iteration
8. If not, repeat step 4–6 until optimality is reached.
2x + 4y − 3z ⩽ 25
− x + 2y + 2z ⩽ 35
x , y , z ⩾ 0.
Using the Excel Solver add-in, we solve this problem. Tables 2.10 and 2.11 show the
entry, the solution obtained and the answer for the problem.
From the tables, the optimal values for x = 155, y = 0 and z = 95 and the
maximum value is 560. In this problem, both constraints are of a binding nature,
that is the obtained solution of the decision variables satisfies the constraints and
hence the solution obtained is the optimal basic feasible solution. The solution
report generated by the Excel Solver is given in table 2.12.
x y z
Coefficients of the
objective function 3 2 1
Optimal values of D
variables RHS
Constraint 1 2 4 −3 0 Leq 25
Constraint 2 −1 2 2 0 Leq 35
x y z
Coefficients of the
objective function 3 2 1
Optimal values of D
variables 155 0 95 560 RHS
Constraint 1 2 4 −3 25 Leq 25
Constraint 2 −1 2 2 35 Leq 35
2-11
Modern Optimization Methods for Science, Engineering and Technology
Variable Cells
Problem-Constraints-Set
2-12
Modern Optimization Methods for Science, Engineering and Technology
5. In phase 2 start with the original objective function subject to the constraints
in the final iteration of phase 1, without the artificial variables column and
solve as usual to obtain the optimal basic feasible solution.
Example 2.7. Obtain the optimal basic feasible solution for the LPP using the two-
phase method.
Minimize Z = 4x + y ,
subject to
3x + y = 3
4x + 3y ⩾ 6
x + 2y ⩽ 4
x , y ⩾ 0.
Phase 1
Minimize w = A1 + A2 ,
subject to the constraints
3x + y + A1 = 3
4x + 3y − S1 + A2 = 6
x + 2y + S2 = 4
x , y , S1, S2, A1, A2 ⩾ 0.
Phase 2
Minimize Z = 4x + y,
subject to the constraints
W 7 4 0 0 0 0 9
A1 3 1 0 0 1 0 3
A2 4 3 −1 0 0 1 4
S2 1 2 0 1 0 0 4
2-13
Modern Optimization Methods for Science, Engineering and Technology
B variables x y S1 S2 A1 A2 Solution
Z 0 0 1/5 0 1 1 18/5
x 1 0 1/5 0 3/5 1/5 3/5
y 0 1 −3/5 0 4/5 3/5 6/5
S2 0 0 1 1 1 1 1
B variables x y S1 S2 Solution
Z 0 0 1/5 0 18/5
X 1 0 1/5 0 3/5
Y 0 1 −3/5 0 6/5
S2 0 0 1 1 1
1 3
x + 0y + S1 + 0S2 =
5 5
3 3
0x + y − S1 + 0S2 =
5 5
0x + 0y + S1 + S2 = 1
x , y , S1, S2 ⩾ 0.
In the final iteration of phase 1, the basic variables identified are x, y and S2.
Again express the basic variables in terms of non-basic variables to obtain the
new objective function of phase 2. The initial table of phase 2 is shown in table 2.15.
On solving this, we obtain x = 2/5, y = 9/5 and the minimum value of Z is 17/5.
2.5 Duality
Each LPP is associated with another LPP called the dual linear problem, whereas
the original problem is called the primal problem. In this section, we will see the
conversion of primal to dual and dual to primal. If one of the primal or dual
problems has an optimal solution the other also has an optimal solution. One can
obtain the optimal values of a primal problem by solving the dual problem and
vice versa.
Steps involved in writing the dual of the given LPP:
1. Convert all constraints into ⩽ type, if the objective function is maximization;
otherwise convert all constraints into ⩾ type.
2. Convert all the constraints into the equalities type by adding slack/surplus
variables.
2-14
Modern Optimization Methods for Science, Engineering and Technology
2-15
Modern Optimization Methods for Science, Engineering and Technology
Example 2.9. Solve the following LPP and answer the following questions,
Maximize Z = 8x + 9y + 6z ,
subject to
5x + 4y + 2z ⩽ 120
x + 2y + z ⩽ 55
x + 2y + 3z ⩽ 30
x , y , z ⩾ 0.
1. Find the range of optimality of the coefficients of the objective function.
2. Find the shadow prices of all the constraints.
3. If the rhs of the first constraint is changed to 120, what will happen to the
optimal solution?
The obtained optimal solution from solving using Excel is given in table 2.16. The
optimal values of x = 21.67, y = 4.17 and z = 0. The maximum value of Z = 210.83.
The first and third constraints are binding, whereas the second one is not binding
and it indicates that there are 25 units of unused resources available.
The sensitivity report generated by the Excel Solver is given in table 2.17.
$G$3 0 210.83
Variable Cells
Constraints—Set
2-16
Modern Optimization Methods for Science, Engineering and Technology
Problem-Variable Cells
Constraints Set
The optimality range for C1 is 4.5 ⩽ C1 ⩽ 10.125. The optimality range for C2 is
7.69 ⩽ C2 ⩽ 16. The optimality range for C3 is C1 ⩽ 8.83 and no lower bound.
Within these ranges the optimal solution remains unchanged.
The shadow price for each constraint is given in the table and the range for
optimality is also presented. Any change in the rhs of the first constraint is within the
range, but the optimal value will be decreased by 5 × 1.17 = 5.85. This sensitivity
report helps the decision maker in a better way.
Example 2.10. Find the shortest path between nodes 1–4 in the network, the link
cost/distance between the nodes are given as: Cost(1,2) = 3; Cost(1,3) = 4; Cost
(2,3) = 2; Cost(2,4) = 6; Cost(3,4) = 4 (figure 2.2).
2-17
Modern Optimization Methods for Science, Engineering and Technology
6
2
3 2
1 4
4 4
x1 x1 x1 x1 x2 x2 x2 x2 x3 x3 x3 x3 x4 x4 x4 x4
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
3 4 2 6 4
0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 8
E
1 1 1 q 1
E
1 −1 −1 0 q 0
E
1 1 −1 0 q 0
E
1 1 1 q 1
2-18
Modern Optimization Methods for Science, Engineering and Technology
Note that if we cannot construct the ratio after identifying the leaving variable, that
is, all values in the leaving variable row are non-negative, then the problem has an
infeasible solution.
Further reading
[1] Dantzig G B 1963 Linear Programming and Extensions (Princeton, NJ: Princeton University
Press)
[2] Dantzig G B and Thapa M N 1997 Linear Programming (New York: Springer)
[3] Bazarra M S, Jarvis J J and Sharali H D 2005 Linear Programming and Network Flows
(New York: Wiley Blackwell)
[4] Hiller F and Liberman G J 2005 Introduction to Operations Research 8th edn (New York:
McGraw Hill)
[5] Winston W L and Albright S C 2007 Practical Management Science 3rd edn (Boston, MA:
Cengage South Western)
[6] Winston W L 1996 The teachers’ forum: management science for MBA at Indiana
University INFORMS J. Appl. Anal. 26 105–11
[7] Anderson D R, Sweeny D J, Williams T A, Camm J D and Cochran J J 2016 An Introduction
to Management Science: Quantitative Approaches to Decision Making 14th edn (Boston, MA:
Cengage Learning)
2-19
Modern Optimization Methods for Science, Engineering and Technology
[8] Murthy K G, Kabadi S N and Chandrasekaran R 2000 Infeasibility analysis for linear
systems: a survey Arab. J. Sci. Eng. 25 3–18
[9] Greenbreg H J 1993 How to analyze the result of linear programs—part 1: preliminaries
INFORMS J. Appl. Anal. 23 56–67
[10] Greenbreg H J 1993 How to analyze the result of linear programs—part 2: price interpretation
INFORMS J. Appl. Anal. 23 97–114
[11] Greenbreg H J 1993 How to analyze the result of linear programs—part 3: infeasibility
diagnosis INFORMS J. Appl. Anal. 23 120–39
2-20
IOP Publishing
Chapter 3
Multivariable optimization methods for risk
assessment of the business processes of
manufacturing enterprises
Vladimir Gorbunov
The rapid rate of development of science and technology opens up broad prospects
for the development of innovative businesses. However, opening a new business is
associated with investment costs for the organization and development of this
business. Before you meet these costs, you should make sure that the business idea in
question is realistic. This can be achieved using special programs that allow you to
create a mathematical model of the business and evaluate its possible future
outcomes. However, business is associated with many random sources of data.
The business model must be able to work with random variables as input data. This
chapter discusses the possibility of forming such a model using the example of the
program ‘E-Project’. This program allows one, on the basis of the input source data,
to determine the cash flow of the project during the entire period of its implementa-
tion and business performance indicators, and to build charts that reflect the internal
and external data flows of the project. The program allows you to perform risk
accounting and calculation of business process parameters taking into account these
factors. The project manager has the opportunity to analyze the impact of anti-risk
measures on the model and decide on the feasibility of their implementation. The
decision is made on the basis of the project performance indicators, quantitative
assessment of risk factors and the value of anti-risk costs.
3.1 Introduction
Optimization is the mathematical problem of maximizing or minimizing some
function of one or more variables under constraints. Optimization is the basis of
economic analysis of production processes. Typically, the value to be optimized is
L = F (U ).
In this case, any change in the values of the control parameters affects the change in
the value of L, since the control parameters are directly included in the expression of
the optimization criterion and thus change the output parameters of the process,
which depend on the control ones.
If the random perturbations are large enough and they must be taken into
account, then experimental and statistical methods should be used to obtain a model
of the object in the form of a function that is fair only for the studied local area, and
the optimality criterion will take the form:
L = F (X , U ).
3-2
Modern Optimization Methods for Science, Engineering and Technology
If the objective function F(x1, x2, …, XP) has continuous partial derivatives in its
arguments, then putting the partial derivatives of F in x equal to zero and solving n
equations together,
(dF / dxi ) = 0, i = 1, 2, … , n ,
provides the value of the objective function.
One-parameter optimization provides a search of the extrema of the functions of
one variable. The limiting condition is the one-modality of the objective function on
the studied interval. The search for the minimum (or maximum) of the objective
function is performed by different methods, which differ in the speed of finding the
desired value. One such method is the dichotomy method. It uses the values of the
objective function when searching for its extremum.
For example, take the function F(x) (figure 3.1). It is necessary to find x ,
delivering the minimum (or maximum) of the function F(x) on the interval [a, b] with
a given accuracy ε , i.e. to find
x = arg min F (x ), x ∈ [a , b ].
At each step of the search process, divide the segment [a, b] in half, x = (a + b)/2—
the coordinate of the middle of the segment [a, b]. Calculate the value of the function
F(x) in the neighborhood ±ε of the computed point x:
F1 = F (x − ε ),
F2 = F (x + ε ).
Compare F1 and F2 and discard one of the halves of the segment [a, b] (figure 3.1).
When searching for a minimum:
• If F1 < F2, then discard the segment [x, b], then b = x.
• Otherwise, drop the segment [a, x], then a = x.
The division of the segment [a, b] continues until its length is less than the specified
accuracy ∣b − a∣ ⩽ ε . The algorithm of this method is presented in figure 3.2.
In the output, x is the coordinate of the point at which the function F(x) has a
minimum (or maximum), FM is the value of the function F(x) at that point.
The optimality criterion is called complex if it is necessary to set the extremum of
the objective function for several changing variables. The procedure for solving the
optimization problem usually includes additional restrictions on these changing
3-3
Modern Optimization Methods for Science, Engineering and Technology
parameters. In this case, the solution of the optimization problem is reduced to the
sequential execution of the following operations: drawing up a mathematical model
of the optimization object; selection of the optimality criterion and determination of
the objective function; the establishment of possible restrictions on the variables; and
the choice of the optimization method, which will find the extreme values of the
required quantities.
The methods of multidimensional optimization are usually distinguished by the
type of information they need in the process:
• Direct search methods (zero-order methods) that need only the values of the
target function.
• Gradient methods (first order methods) which additionally require first order
partial derivatives of the objective function.
• Newtonian methods (second order methods) that use second order partial
derivatives.
3-4
Modern Optimization Methods for Science, Engineering and Technology
3-5
Modern Optimization Methods for Science, Engineering and Technology
The main parts of the methodology are given in the diagram. The diagram shows
the functions of the system in geometrical rectangles, and also the existing relations
between the functions and the external environment. The rectangles represent
specific processes, functions, works or tasks that have a purpose and lead to a
marked result. Arrows indicate the interaction processes between them and the
external environment.
The model includes three types of documents (graphical charts, glossary, text),
which refer to each other. Information about the system is displayed in the graphical
diagrams with blocks and arrows, and their connections. The blocks represent the
basic functions of the model elements. These functions can be broken down
(decomposed) into their component parts and presented in the form of more detailed
charts. The decomposition process continues until the subject is described at the level
of detail necessary to achieve the objectives of a specific project. The glossary is
created and maintained by a set of definitions, key words and explanations for each
element of the chart and describes the essence of each element. The text gives an
additional description of the operation of the system.
Often there are cases where some arrows do not make sense to continue
considering in the child diagrams below a certain level in the hierarchy, or vice
versa—individual blocks have no practical value above a certain level. On the other
hand, sometimes you need to get rid of the separate ‘conceptual’ arrows and stop
using them deeper than a certain point.
Standard IDEF0 introduces the concept of tunneling to solve such problems. The
designation ‘tunnel’ in the form of two parentheses around the beginning of the arrow
indicates that this arrow is not inherited from the parent functional block and appears
(from the ‘tunnel’) only on this chart. In turn, the same designation around the end of
the arrow in close proximity to the block receiver refers to the fact that the arrow is a
child of this block diagram, and if displayed will not be considered in the diagram.
The business process model allows financial models to determine the cost
characteristics of processes and their interaction in time. The financial plan of the
project should reflect all costs associated with its preparation, take into account the
cost of manufactured goods or services, and determine income from the sale of
goods or services. The projects differ in the time interval start-up of the business, its
development and completion. During the project the prices of goods, raw materials
and debt capital can change and, in financial terms, these changes must be taken into
account.
3.3 The market and specific risks, the features of their account
A financial plan is an important document containing detailed information about
cash flow for current operations as well as the investment and financing activities of
the enterprise. The manager may obtain information from this plan about:
• The sources of funds and directions of their use.
• The excess cash in the accounts of the enterprise. The extent to which the
enterprise extension is provided at the expense of their own and borrowed funds.
• Any additional loans.
3-6
Modern Optimization Methods for Science, Engineering and Technology
When making a report on cash flows for previous periods the entrepreneur uses
the actual data available in the accounting records. The preparation of the forecast
for the coming period requires more detailed analysis of the current situation and the
trends of its change. Often entrepreneurs plan the distribution of funds for the best
and worst cases and also for the most real situation.
A small enterprise can take into account the probabilistic nature of the factors
influencing its activity. In this case, the calculation of the indicators for the future
period is not reduced to the calculation of the three variants of development. In
order to obtain correct results, we have to calculate a single process, which
mathematically defines the probability of achieving the possible result of the activity
of the enterprise depending on the identified characteristics of the involved
processes. Such forecasting will enable the entrepreneur to choose the right develop-
ment strategy for the enterprise, ensuring the timely payment of the accepted debt.
Values such as the price of the traded goods or services, the cost of materials or
components, labor costs, etc, are connected through mathematical dependencies
with the performance of the company. The variation of the initial indicators requires
reproducing the calculations. In this regard, there is a need for an automated means
of calculation for established procedures to instantly output when the source data
change.
To help developers speed up the preparation of the business plan and provide the
necessary information for the investor to know the level of quality of the design
documents, the entrepreneur uses specialized programs for financial planning.
Traditionally, such systems are developed versions of the document templates. The
most popular systems provide capabilities similar to the program Ехсеl, and all of their
value for the user lies in a well-chosen list of topics. Filling it out, he or she will obtain
a more or less acceptable financial plan. For example, the most popular system in this
group is the program for Business Plan Pro, with hundreds of thousands of users.
The ability of some of the financial planning programs to account for risk factors
is important because of the acceleration of the rates of economic development in
every industry. An example of such a program is ‘E-Project’ [2, 3]. The program has
the following features:
• Incorporates a probabilistic calculation of the business processes.
• Uses a widespread and reliable software package.
• The ability to navigate freely in the calculation methodology, adapted to
specific user requests by creating their own forms of source data and
calculation algorithms.
• The ability to enter data in the form of arbitrary shapes, and the results of
calculations in the form of required reports. There are good editors for the
formation of forms and reports.
• Performs an advanced analysis of the credit-worthiness of the project, i.e. the
dependence of the results of calculations and changes in loan terms.
• A probabilistic calculation of the financial risk of the project depending on
the probability characteristics of the source data.
• The results of the calculations are in the form of tables and diagrams.
• Has the ability to convert data into HTML format.
3-7
Modern Optimization Methods for Science, Engineering and Technology
3-8
Modern Optimization Methods for Science, Engineering and Technology
answers the questions of how much and when the manufactured products listed in
the form ‘Products’ will be implemented. The sales in this form will be submitted as
the number of products, the characteristics of which are presented in the form
‘Products’.
The form ‘Finance’ is used to enter the initial data on the financial activities and
considers such indicators as equity, loans, repayment of loans, interest on loans,
grants and government funding, and the payment of dividends.
The input form ‘Taxes’ specifies the source data for the calculation of tax
planning. The procedure of payment and the amount of taxes depends on the legal
form of the enterprise, and the accepted the forms of accounting and tax reporting.
In this form, the authors of the draft introduce the adopted tax rate for the primary
taxable base indicators: payroll, income, imputed income, profit and property.
The form ‘Staff’ is used to determine the financial flows associated with staff
wages, receiving a set salary. This form indicates the period during which the
employee will work on the project, his or her position and set salary. You can plan
roles without names when you do not know the names of the professionals who will
be involved in the project.
To calculate the impact of risks the software uses the input form ‘Common
project risks’. All risks of the project can be divided into two categories: market and
special. Market risks form due to fluctuations in price parameters, i.e. the instability
of the market. These indicators can change from their average value both upwards
and downwards. Such risks must be characterized by the standard parameters for
random distributions (expectation, variance), which are determined by the entered
data and reviewed by the input forms. The second type (special risks) is associated
with the specific situation in the project, which can occur with some probability and
this will cause appropriate financial changes.
To assess the impact of special risks the following characteristics are considered:
the period of manifestation of the risk factor, the probability of a risk situation R
and the financial cost upon the occurrence of a risk situation Q. For the construction
of the types of risks one of the accepted categories should be allocated: technical,
organizational, financial, environmental, technological, etc. The program includes
measures to reduce the impact of risks. The effect of these events will change the
parameters P and Q on P1 and Q1. These events are recorded in the form of the
investment measures the cost of these activities Q2 and their execution time.
Table 3.1 reflects the approach to risk assessment and events. Boolean variable K
can take the value 0 or 1. When K = 1 the event risks are characterized by the
parameters of Р1Q1, if K = 0 the event fails and the risk is perceived with the
parameters PQ.
The introductory data in the form end with the formation of the financial model
of the project. The results for the generated models can be observed in the output
reports that are in accordance with the selected algorithm and the results are given as
tables (a button on the main menu ‘Reports’) or graphics (click main menu ‘Chart’).
Figures 3.5 and 3.6 show examples of output graphs.
Modern enterprises should take into account in their plans all tangible and
intangible factors that can affect their condition in the future. The result of such
3-9
Modern Optimization Methods for Science, Engineering and Technology
3-10
Modern Optimization Methods for Science, Engineering and Technology
3-11
Modern Optimization Methods for Science, Engineering and Technology
A section of the strategic plan relating to the internal business processes necessary
to achieve planned results defines the main innovations. The events section is aimed
at the development of new products and improvement of the competitiveness of the
enterprise. The performance section characterizes the processes contributing to the
achievement of the efficiency of the enterprise, the receipt of planned financial
results, improving internal technology, production process and meeting the chang-
ing demands of buyers. Insufficient attention to the internal processes can lead to
activation associated with the risks of these processes.
The term development implies a structure that the organization must implement
to ensure their development and growth in the long term. To ensure long-term
success and prosperity requires a constant updating of processes taking into account
new technical developments. The successful development of the organization
involves its human resources, information systems and organizational procedures.
To support their activities in the market, the company should invest in staff
development, information technology, systems and procedures. Insufficient atten-
tion to these factors can lead to the emergence of risk factors associated with the
personnel of the company and the growth of the company.
where NPV is the net present value of future cash flows, T is the number of settlement
periods within the planning horizon, CFt is cash flow over the t period, CF(T+1) is the
cash flow terminal of the first period, i is the value of the discount rate and g is the
growth rate of cash flow in the post-planning period (percent per annum) [4].
In the general case, calculating the NPV considers the cash flow generated by the
project lifecycle (during the planning horizon). Sometimes cash flow for the project
in the post-planned period is omitted in the calculation of the NPV to outside
investors (with the aim of increasing the reliability of the calculations). In formula
(3.1) the first term characterizes the cash flow from the project during the planning
horizon, the second does so in the post-planned period. Calculation of the cost cash
flow in the post-planned period is carried out using the Gordon model [5]. The
discount rate represents the average return that an investor received when investing
in the project alternative under consideration.
Calculation (optional) of the discount rate is determined based on:
• Ways of taking into account inflation when calculating cash flow.
• The project participant for which NPV is calculated.
• Information about project components.
3-12
Modern Optimization Methods for Science, Engineering and Technology
The project over time inevitably changes the factors that determine the value of
the bet. For example, in the start-up phase the business may experience a permanent
decline in risk through risk reduction ‘non-realization of the project’. After the
payback period, the risk of investors associated with a possible ‘return’ of funds is
also reduced to zero. But at the same time, other changes can lead to an increased
exposure to risk factors and this is equivalent to an increase in the discount rate.
Therefore, the discount rate assumption is the basic assumption of the calculation of
equation (3.1)—usually the adoption of constant values of the discount rate for the
entire lifecycle of the constant bet throughout the project lifecycle in the preparation
of preliminary calculations.
The simplified model developed by the magazine Business Valuation Review [7]
for emerging markets may be used to build value models of the company and its
elements. This model is based on the assumption that the yield on government
Eurobonds reflects the risks associated with investing in the share capital of ‘ideal
companies’, i.e. companies with no flaws. The disadvantages of a real company are
equal to the risks specific to the company and the specific business. These flaws are
marked as premium to the discount rate due to these risk factors (see table 3.2).
For practical application of this model for risk management, it is necessary to
increase the risk-free rate in accordance with table 3.2. Each allowance should be
determined taking into account the possibility of its appearance in the project and
the consequences after the considered risk.
The method of expert estimations consists in the possibility of using the
experience of experts in the analysis process of the project and considering the
influence of diverse qualitative factors. The formal peer-review process often comes
down to the following. The management of the project (firm) develops the list of
evaluation criteria in the form of expert (polling) sheets containing the questions.
Corresponding weighting coefficients are assigned for each criterion (at least are
calculated), and are not disclosed to the experts. The answers are compiled for each
criterion, the weight of which is not known to the experts. The experts should have
information about the evaluation project and through examination should be able to
analyze the questions and mark the chosen answer. Next, the completed expert
Table 3.2. Estimates of allowances for ‘deviation of the ideal’ corporate risks.
3-13
Modern Optimization Methods for Science, Engineering and Technology
sheets and the output or results of the examination are processed accordingly on the
basis of well-known computer packages for the processing of statistical information.
In practice, risk analysis and decision-making is often not worth the task of
obtaining quantitative characteristics. What is important is comparative analysis,
which experts use to assess the occurrence of risk events on a simplified scale of
gradations. For example, the place each of the events on the graph in the axes of
‘impact’ is ‘probably’. The diagram consists of nine cells, each of which corresponds
to a single set of estimates (figure 3.7). For example, an event characterized by the
estimated ‘low impact, low probability’ should be displayed in the lower left cell of
the chart, and event assessments ‘low impact, high likelihood’ should be displayed in
the lower right cell, etc.
The whole chart is divided into three approximately equal parts. The three cells of
the diagram located at the bottom left are an area of insignificant risk. The three
cells of the diagram in the upper right are an area of significant risk. The remaining
part of the diagram (three cells) is an area of medium risk. Thus, the risk associated
with event A is insignificant, the risk for event B is average and the risk of event C is
substantial. The resulting diagrams, which in accordance with expert assessments
apply to all risk events, are called risk maps. This map shows what risk events can
take place, what the correlation between different types of risks is and how risks
should be given maximum attention (in this example, risk events C). This approach
is widespread in the practice of risk management for companies in the real world.
Risk managers typically use three or five (rarely seven) grades for probability of
exposure and materiality. The described chart is a convenient way to visualize risk.
In practice, there are other ways of visualization, such as using a circular or a
color chart.
Sensitivity analysis and scenario analysis are the sequential steps in quantitative
risk analysis; the latter allows us to avoid some of the shortcomings of sensitivity
analysis. However, the scenario method is the most effective and can be applied
when the number of possible values for NPV of course. However, the expert faces an
unlimited number of different scenarios in the risk analysis of the investment project.
The actual method of assessing the individual risk of the project helps to solve this
problem (simulation); the basis of this method is the probabilistic assessment of the
occurrence of various circumstances. By using specialized software packages for the
calculation of the economic efficiency of projects, the evaluation of the impact of
risks is obtained in the form of output tables and graphs reflecting the impact of risk
factors on the project output.
The probability
Low Medium High
Strong Event C
The Impact Average Event A Event B
Weak
3-14
Modern Optimization Methods for Science, Engineering and Technology
In the sensitivity analysis, analyze the sensitivity of the project to the main
parameters and modify one of the input parameters of the project. By sequentially
changing project parameters, the developer defines input variables that strongly
affect the project result. For these variables, measures are developed to reduce their
impact. The aim of the project sensitivity analysis is to determine the impact of
varying factors on the financial result of the project. The most common method used
for sensitivity analysis is simulation. NPV is used as the integrated indicator of the
financial result of the project. The result of sensitivity analysis can be reduced to
similar conclusions, e.g. the project allows the reduction of the price of sale by 16%,
an increased sales volume by 11% and increased direct costs by 14%.
Figure 3.8 shows a graph of the impact on the net present value, changes in sales
volume, sales price and direct costs.
The sensitivity analysis does not perform the measurement of risk, but only
assesses the influence of various factors on the results of the financial activities of the
enterprise. The analysis allows one to identify the factors that most affect the
amount of profit and then consider activities that control these factors to find ways
to organize activities to neutralize these factors.
Sensitivity analysis (determination of the critical point) can be produced without
the use of special computer programs based on the conditions of break-even
production. Consider this analysis as an example.
In the calculation of the predetermined period from the start of series production
the total costs of production (P) are defined as
P = V*M + F
where F and V are, respectively, fixed and variable costs.
Turnover (O) is defined as the product of the price of the product (C) and the
quantity (M)
O = C * M.
Figure 3.8. Graph of the evaluation of the sensitivity of the project to the sales, the cost of materials and
constant costs.
3-15
Modern Optimization Methods for Science, Engineering and Technology
3-16
Modern Optimization Methods for Science, Engineering and Technology
where σ is the variance, Ri is the specific value of the possible variants of the
2
expected income for the financial transactions, R̄ is the average expected value of
income under financial transactions, Pi is the potential frequency (probability) of
obtaining separate variants of the expected income on financial transactions and n is
the number of observations.
Dispersion does not give a complete picture of deviations ΔX = X − R¯ , which are
more pronounced for the risk evaluation. However, dispersion allows establishing a
link between linear and quadratic deviations using the well-known Chebyshev
inequality.
The probability that a random variable X deviates from its expectation by more
than a given tolerance ε > 0 does not exceed its variance divided by ε2, i.e.
D
P ( ∣X − R ∣ > ε ) ⩽ .
ε2
This shows that a small risk of dispersion deviation corresponds to a small risk,
and according to a linear deviation of point X are likely to be within ε, the
neighborhood of the expected values.
3-17
Modern Optimization Methods for Science, Engineering and Technology
The root mean square (standard) deviation is one of the most common in
assessing the level of individual financial risk as the variable determines the degree of
absolute variability and is calculated by the following formula:
n
σ= ∑(Ri − R¯ )2 * Pi .
i=1
3-18
Modern Optimization Methods for Science, Engineering and Technology
3-19
Modern Optimization Methods for Science, Engineering and Technology
In order to fully describe the risk using the VAR measure, you must first specify the
probability (small enough to consider the event ‘almost’ impossible), or the level of
confidence associated with this probability value. If the probability is set as 5%, it
means a confidence level of 95% (100%–5%) and represents the result in the form
VAR95% (pronounced ‘VAR at the 95% level’). The level of 95% is rather arbitrary,
each individual sets this level based on their relationship to the possible unlikely events,
and an understanding of what is considered an ‘almost’ impossible event. Therefore,
VAR can be used with other levels of confidence, e.g. 90% or 99% (when talking about
VAR90% or VAR99%). In addition, the assessment or the calculation of VAR is in
practice used in the time horizon of the game (financial operations). Therefore, when
speaking about risk, VAR determines what is the minimum financial result that can be
obtained with a certain confidence level during a certain period of time.
Here is an example. The statement ‘evaluation of the VAR of the risk of lower
returns during the next week is minus 2% at a confidence level of 95%’ or briefly ‘a
week VAR95% = −2%’ means that:
• With a probability of 95%, the yield of the planned operation will be at
least −2% for the week.
• With a probability of 95%, the loss for the week will not exceed 2%.
• A weekly loss of over 2% is possible with a probability of 5%.
These formulas are of practical importance. In the vast majority of cases the
probability distribution of the results of economic games is not known. However, it
is often possible to estimate some characteristics of the unknown distribution, in
particular, the expected outcome and variance. Then you can make the assumption
that unknown to us the distribution is very similar to normal, and we can estimate
VAR using equation (3.2). This assumption is close to the truth for games in the
financial markets, as the prices of many important assets are determined by many
random factors that are often inconsistent and conflicting. Even if the probability
distribution of the results of each of these random factors is not normal, their joint
distribution will tend to normal.
3-20
Modern Optimization Methods for Science, Engineering and Technology
Most of the results of typical business processes are built on the combination of
the operations of addition, subtraction and multiplication that can be defined by
equations (3.3–3.6).
In the E-Project package the normal distribution of the output is accepted. This
assumption is possible in cases when the business process involved a large number of
random variables, and the final result is a complex combination of these input
effects. It is possible to rate the difference between the results of modeling the
business process using the analytical model and the method of simulation. For
example, for changing the law Triang input data (figure 3.10), the distribution of the
output can be represented by a normal distribution (figure 3.11) [7, 8].
From this figure, it is seen that model business processes at various distributions
of the input parameters allow the output process to describe the distribution of the
normal law. This example shows a normal distribution and Welbull distribution.
The standard deviation of all the distributions differs by a negligible amount. An
approximation of these distributions to normal will be more accurate the more
processes will affect the output parameters. The output distribution final results are
built according to the dependences of equations (3.2)–(3.5) for non-symmetric
distribution input data and will have less asymmetry than the input distribution.
Figure 3.12 presents the simulation result of a mixture of input data, distributed
over six different laws. The result of the comparison of the obtained distribution
3-21
Modern Optimization Methods for Science, Engineering and Technology
Figure 3.11. The distribution of the output parameter of the business process for Triang distribution
input data.
3-22
Modern Optimization Methods for Science, Engineering and Technology
Figure 3.12. The distribution of the outputs of the business process while increasing aggregate sales from
various distributions of input data.
The accuracy of the analytic calculation output parameters for business processes
depends on the accuracy of determining the same statistical characteristics of the
input parameters.
An important step of risk management is to reduce the cost of risks and anti-risk
action when you run anti-risk measures to modify the characteristics of risk factors.
Optimization of risks can be reduced to evaluating the cost of anti-risk measures and
changing impact after the occurrence of the risk factors resulting from the
application of anti-risk measures. With a limited budget, you cannot perform all
anti-risk events. We select those that have the greatest effect. We can account for the
feasibility of carrying out certain anti-risk activities based on an integrated assess-
ment of the financial results of the project with different combinations of parameter
K (table 3.1). E-Project uses a special add-in that enables the adopted criteria and
conditions of restrictions to choose the optimal value of parameter K for all the risks
involved.
First, activities are planned with the greatest efficiency [9]. Figure 3.13 shows the
dependence of financial loss from the implementation of the 24 anti-risk measures
and the costs of implementation of these activities.
Various algorithms can be proposed to automate the selection of anti-risk
measures. For example, the set of events fits within the allocated budget or choice
of activities, the effectiveness of which exceeds the threshold. The result of this
optimization is presented in figure 3.14, with the financial losses from the original
risk and the same rate for a project with risks.
The proposed method allows us to assess the impact of risk factors on the
efficiency of the project, to assess the impact on financial performance anti-risk
activities and to choose those which provide the greatest effect according to the
chosen criterion of project evaluation.
3-23
Modern Optimization Methods for Science, Engineering and Technology
Figure 3.13. Losses from risks and the cost of measures from them.
Figure 3.14. Losses from risk source and risk after special events.
3.5 Conclusion
The business process model based on risk allows you to more accurately plan the
development of a business. It allows the opportunity to develop activities that improve
the final result of the project with minimal cost before the start of the project.
References
[1] Sommerville I 2001 Software Engineering 6th edn (Reading, MA: Addison-Wesley), 693 p
[2] Gorbunov V L 2013 Business Planning with Risk and Efficiency Assessment of Projects:
Scientific-Practical Manual (Moskow: RIOR: Infra-M.) p 248
3-24
Modern Optimization Methods for Science, Engineering and Technology
3-25
IOP Publishing
Chapter 4
Nonlinear optimization methods—overview and
future scope
Somesh Kumar Dewangan, Siddharth Choubey, Jyotiprakash Patra
and Abha Choubey
not necessarily a possible or globally optimal solution. The two types of optimiza-
tion methods have their advantages and disadvantages. Therefore, combining
deterministic and heuristic techniques is suggested for handling large-scale optimi-
zation problems to find a global optimum. This chapter attempts to provide a future
research direction in the area of deterministic and heuristic methods to improve the
computational effectiveness of finding a globally optimum answer for various real-
world application issues. As we introduce new ideas into model design, model
examination, model changes, global restriction handling and hybridization, the
advantages of this new optimization method can be used without expert knowledge.
With increased complex optimization problems, hybrid optimization calculations
and programming language concepts become essential.
4.1 Introduction
This section discusses an overview of optimization, non-linear programming (NLP)
and a few case studies.
4.1.1 Optimization
Optimization can be related to existing or specifically built scientific models. The
idea is that one might want to locate an extremum result of a model by shifting a few
parameters or factors [1]. The typical motivation for discovering suitable parameter
values is used in selection choices in design optimization.
Optimization is the process of achieving the best outcome under given conditions.
In design, development, support, etc, engineers need to make choices. The objective
of every such choice is either to limit costs or to increase advantages [2].
The costs and advantages can normally be communicated as a component of
certain design factors. Thus, optimization is the process of finding the conditions
that can give the maximum or the minimum estimate of value based on the suitable
conditions. It is also commonly established as a hidden guideline in the investigation
of numerous complex choice or allocation problems. Utilizing optimization theory,
one solves a complex choice problem, including the choice of qualities for a number
of interrelated factors, by concentrating on a single target function, designed to
evaluate the execution and measure the nature of the choice. This one function is
maximized (or minimized, depending upon the details) subject to criteria that may
confine one of the choice variable qualities. In this case a reasonable single
component of a problem can be isolated and described by a function, be it a benefit
or risk in a business setting, or speed or distance in a physical problem [3].
No single strategy is available to handle all optimization problems efficiently.
Thus, various methods are used to address different kinds of problems.
Linear and nonlinear optimization methods address the following problem:
finding numerical values for a given set of variables with the feasible goal satisfying
certain criteria, called target functions. The solution with target function achieves its
minimum value among all the combinations of possible variables. An example is the
4-2
Modern Optimization Methods for Science, Engineering and Technology
dietary regimen problem, for finding a combination of foods which fulfills the health
requirements at minimum expense. In contrast to traditional problems of connected
arithmetic, the majority of which start in materials science, linear and nonlinear
optimization problems for the most part need solutions given by closed formulae,
and must be understood through numerical methods, with calculations performed
on PCs [4].
In numerous numerical programming applications, linear optimization assump-
tions or approximations may allow proper depiction of the problem over the range
of variables being considered. In nonlinear optimization, nonlinear constraints
f (x1, x2 , x3, … , xn)
of the decision variable are used. If the possible solution space is bounded by
nonlinear constraints then the method used to find possible solution is called non-
linear programming (NLP).
for k = 1, … , m , gk(x ) ⩽ 0
for k = 1, … , p , lk(x ) = 0.
A design vector can be defined through variable x known as f(x) with objective
function gk (x ).
4-3
Modern Optimization Methods for Science, Engineering and Technology
4.1.2 NLP
NLP is similar to linear optimization due to the fact that it has a goal, general
requirements and variable constraints. The thing that matters is that a nonlinear
program incorporates in any event one nonlinear function, which could be the target
function or a few or the majority of the constraints.
In nonlinear optimization, models are intrinsically significantly more difficult to
improve. There are some reasons behind this, which are briefly described as follows:
1. In numerical methods for tackling nonlinear optimization we have
restricted data about the problem. This implies that it is difficult to
recognize a local optimum from a global optimum. In such a case, we
have accessible data, we expect that one point x is itself the estimation of
the target function at x. There is sufficient data to determine when you are
near the minimum or maximum, yet there is no opportunity to determine
whether there exists an alternative and better local maximum [10, 11].
2. In a linear optimization system there is a set number of points to search for
the optimum solution in a linear program, by checking optimum points, or
corner points, of the feasible polytope space. In contrast, in a nonlinear
optimization system an optimum solution could be in any place—at an
optimum point, along an edge of the feasible space, or in the interior of the
feasible space [15].
3. In the event that nonlinear constraints are considered, there might be
various diverse feasible regions, regardless of whether you can locate the
optimum inside a specific feasible space.
4. In nonlinear problems, specific initial conditions may produce diverse final
solutions, where there might be various distinctive minima, and beginning
at some other point may produce an alternative final solution point and
target function value.
5. In a linear optimization system one will either discover a point that fulfills
every one of the requirements or it will specifically confirm that no feasible
points exist anywhere. However, in nonlinear procedures there is no such
certainty.
6. In linear optimization methods, the initial segment routine finds a solution
that fulfills every one of the constraints and from there on these are never
broken, however, in a nonlinear optimization strategy, finding a solution
that fulfills the folding and curving conditions is difficult in itself, regardless
of whether a solution is found sooner or later; the uniformity may again be
violated when the calculation attempts to move to another point that has a
superior estimation of the solution [16].
7. In a linear optimization strategy, we have only a few possible results: a
globally optimum solution point, feasible yet unbounded, and infeasible.
8. There is a huge number of complex numerical hypotheses and various
solution calculations. The reason why NLP is so much more difficult
than LP is due to its being dependent on possible nonlinear capabilities.
4-4
Modern Optimization Methods for Science, Engineering and Technology
There are also considerably more reasons why NLP is difficult, which
depend on increasingly practical considerations.
9. It is difficult to decide the suitable conditions that result in obtaining the
best possible solution.
10. Various calculations converge at certain conditions and provide the
optimum solution.
11. Different conditions used by different users may produce distinctive
solution.
In nonlinear optimization, a few things which were considered difficult in the past
can now be generally achieved effectively:
• Derivatives. The derivative functions are regularly required by solvers (those
involved in attempting optimum solution). There were two methods: numer-
ical estimate by limited differences, or a modeling function which can provide
code to the subsidiary.
• Input groups. At a certain point every solver had its own specific information
organization to depict the model and if we found that one solver was not
successful for the current problem it was a monotonous and mistake-laden
process to recode the model into an alternative configuration and thus it could
be attempted by different solver [17, 18].
Ax = b ,
Further,
4-5
Modern Optimization Methods for Science, Engineering and Technology
Scientific model
This model suggests a plan that operates on different vectors v = (v1, v2, v3, … , vn )
T, where 0 ⩽ vi ⩽ 1.
min { 1
2
x ˆTV (x ): f ˆTx = ʎ , Dx = d , x ⩾ 0 . }
4.2 Convex analysis
Convex analysis is used in cases where the nonlinear optimization problem utilizes
curved objective function [15].
x = λx1 + (1 − λ)x 2 ,
where the two points x1, x2 result in a convex combination.
In this scenario, C ⊂ IRn is convex if any two points x1, x 2 ∈ C and if every raised
combination of x1, x 2 ∈ C .
In other ways, we can say that the line partition interfacing two sensitive
motivations behind a convex set is contained in the set [22]. A parabola
f (x ) = ax 2 + bx + c with a > 0 can also be defined by the same function.
4-6
Modern Optimization Methods for Science, Engineering and Technology
Explanation 4.7. A function f(x) is called concave if, for every y and z and for 0 ⩽ λ
⩽ 1,
f ∣[λγ + (1 − λ)z ].
4-7
Modern Optimization Methods for Science, Engineering and Technology
⎧ max −cT x + ∑m y g (x )
⎪
⎪ m
j=1 j j
⎨∑ j = 1yj ∇gj (x ) = c .
⎪
⎪
⎩ yj ⩾ 0, j = 1, … , m
4-8
Modern Optimization Methods for Science, Engineering and Technology
This was derived from true mechanics for discovering least cost answers for immense
upgrade issues [7]. It aggregates up the hillclimbing systems and discards their
standard damage: dependence of the course of action on the starting stage, and
accurately certifications to pass on a perfect plan. This is cultivated by displaying a
probability ρ of affirmation ρ = 1.
4-9
Modern Optimization Methods for Science, Engineering and Technology
i.e. r1 > r2 > 0 starting from z(r1) approximates the minimum of P(z, r2).
4-10
Modern Optimization Methods for Science, Engineering and Technology
References
[1] Klafszky E 1976 Geometric Programming, Seminar Notes 11.1976 (Budapest: Hungarian
Committee for Systems Analysis)
[2] Moore C 1993 Braids in classical gravity Phys. Rev. Lett. 70 3675–9
[3] Chenciner A and Montgomery R 2000 A remarkable periodic solution of the threebody
problem in the case of equal masses Ann. Math 152 881–901
[4] Coleman J O 1998 Systematic mapping of quadratic constraints on embedded fir filters to
linear matrix inequalities Proc. of 1998 Conf. on Information Sciences and Systems
[5] Spergel D N 2000 A new pupil for detecting extrasolar planets, arXiv: astro-ph/0101142
[6] Ho J K 1975 Optimal design of multi-stage structures: a nested decomposition approach
Comput. Struct. 5 249–55
[7] Karmarkar N K 1984 A new polynomial–time algorithm for linear programming
Combinatorica 4 373–95
[8] Anstreicher K M 1990 A standard form variant, and safeguarded linesearch, for the modified
Karmarkar algorithm Math. Program. 47 337–51
4-11
Modern Optimization Methods for Science, Engineering and Technology
[9] Jarre F 1990 Interior-point methods for classes of convex programs Technical report SOL 90-
16 Systems Optimization Laboratory, Department of Operations Research, Stanford
University, CA
[10] Han C-G, Pardalos P M and Ye Y 1991 On interior-point algorithms for some entropy
optimization problems Working paper Computer Science Department, Pennsylvania State
University, University Park, PA
[11] Kortanek K O and No H 1992 A second order affine scaling algorithm for the geometric
programming dual with logarithmic barrier Optimization 23 501–7
[12] Dantzig G B 1987 Origins of the simplex method Technical report SOL 87–5 https://apps.
dtic.mil/dtic/tr/fulltext/u2/a182708.pdf
[13] Lin Y 2002 Bellman’s principle of optimality and its generalizations General Systems Theory:
A Mathematical Approach (Kluwer), pp 135–61
[14] http://web.math.ku.dk/~moller/undervisning/MASO2010/notes/LKKT.pdf
[15] Vanderbei R J 1999 LOQO user’s manual—version 3.10 Optimiz. Methods Soft. 12 485–514
[16] National Electrical Manufacturers Association (NEMA) 2018 Volt/VAR optimization
improves grid efficiency Technical report http://assets.fiercemarkets.net/public/sites/energy/
reports/voltvarreport.pdf
[17] den Hertog D, Roos C and Terlaky T 1993 The linear complementarity problem, sufficient
matrices and the criss–cross method Linear Algebra Appl. 187 1–14
[18] Jarre F 1994 Interior-point methods via self-concordance or relative Lipschitz condition
(Würzburg: Habilitationsschrift)
[19] Lustig I J, Marsten R E and Shanno D F 1994 Interior point methods for linear
programming: computational state of the art Oper. Res. Soc. Am. J. Comput. 6 1–14
[20] Baxter J and Bartlett P L 2001 Infinite-horizon policy-gradient estimation J. Artif. Intell.
Res. 15 319–50
[21] Coleman J O and Scholnik D P 1999 Design of nonlinear-phase FIR filters with second-
order cone programming Proc. of 1999 Midwest Sym. on Circuits and Systems
[22] Bertsekas D P 1995 Nonlinear Programming (Belmont, MA: Athena Scientific)
[23] Jarre F 1996 Interior-point methods for convex programming ed T Terlaky Interior-Point
Methods for Mathematical Programming (Dordrecht: Kluwer), pp 255–96
[24] Chenciner A, Gerver J, Montgomery R and Simó C 2002 Simple choreographic motions of
N bodies: a preliminary study Geometry, Mechanics, and Dynamics ed P Newton, P Holmes
and A Weinstein (New York: Springer)
[25] Broucke R 2003 New orbits for the n-body problem Proc. of Conf. on New Trends in
Astrodynamics and Applications
[26] Vanderbei R J, Spergel D N and Kasdin N J 2003 Circularly symmetric apodization via star-
shaped masks Astrophys. J. 599 686–94
[27] Vanderbei R J, Spergel D N and Kasdin N J 2003 Spiderweb masks for high contrast
imaging Astrophys. J. 590 593–603
[28] Kasdin N J, Vanderbei R J, Spergel D N and Littman M G 2003 Extrasolar planet finding
via optimal apodized and shaped pupil coronagraphs Astrophys. J. 582 1147–61
[29] Duan Y, Chen X, Houthooft R, Schulman J and Abbeel P 2016 Benchmarking
deep reinforcement learning for continuous control Int. Conf. on Machine Learning (ICML)
pp 1329–38
[30] Sen P K and Lee K 2014 Conservation voltage reduction technique: an application guideline
for smarter grid IEEE Trans. Indus. 52 2122–8
4-12
Modern Optimization Methods for Science, Engineering and Technology
[31] Lobo M S, Vandenberghe L, Boyd S and Lebret H 1998 Applications of second-order cone
programming Technical report Electrical Engineering Department, Stanford University
http://rutcor.rutgers.edu/~alizadeh/CLASSES/12fallSDP/Papers/socp.pdf
[32] Bendsøe M P, Ben-Tal A and Zowe J 1994 Optimization methods for truss geometry and
topology design Struct. Optimiz. 7 141–59
4-13
IOP Publishing
Chapter 5
Implementing the traveling salesman
problem using a modified ant colony
optimization algorithm
Zar Chi Su Su Hlaing, G R Sinha and Myo Khaing
In this chapter, the main modifications induced by ant colony optimization (ACO)
are presented. In fact, the traveling salesman problem (TSP) could easily be solved
by using an improved version of the ant colony optimization method. The same is
attempted in this chapter. There are two main ideas in the proposed algorithm for
the modification of the ant algorithm. The first phase involves defining the candidate
set which is applied to the construction of the ant algorithm. The solution
construction phase includes defining the value of exploitation parameter q0. The
second phase focuses on the variation of pheromone information that is used to
adapt the heuristic parameter automatically throughout the algorithm runs.
Additionally, a local search algorithm is applied to all the solutions that are
produced by ACO and thus the performance of ACO is improved for TSP related
applications.
elements which are possible at each and every step, and then the current solution is
added just before choosing one element. It can therefore be seen that in most
algorithms utilizing the ant concept, the evaluation of utilities is done based on
reachable elements, otherwise the techniques employing ACO can perform adversely
due to the large value of the runtime, because if the construction of appropriate sub-
sets is not done appropriately then this problem is likely to occur.
At the solution construction phase, there are possible problems that cause slow
convergence due to the fact that the ants start scanning the set of all possible state
elements (cities) before choosing a particular one and the probability is very small to
visit the same state for most ants. Then, the computational time can be large for each
scanning step of the algorithm. When the TSP size is large scale, such a situation can
occur. This is suggested even though we are aware that the system performance can
be improved as well as the efficiency in a number of real world applications of ACO
and when large TSPs are taken into consideration. So, the major problems of ACO
include slow convergence or speed, long runtimes, long execution times, trapped
local optima, etc.
To consider the above mentioned TSP, there is a limited number of elements
(cities), called the candidate list (CL), which can reduce the scanning set of possible
elements. The CL is a limited set of possible neighborhood elements from the current
element. CLs are created statically or dynamically using available knowledge of the
problems. Using a CL, ACO algorithms can reduce the search space of problems in
the solution construction phase.
The local search procedure employs CL strategies that help to improve the ACO
generated solutions and outcomes. However, the implementation of optimization
has its own constraints in strategies related to 2-Opt and 3-Opt local search
heuristics, because the use of the construction phase in ACO becomes not very
appropriate. These strategies aim to improve the ant colony system (ACS) so that
candidate set based strategies can successfully be applied in the construction phase.
Now, there are two important approaches which are used for solving general
purpose problems in each of the elements with a strong relationship between the
elements. These approaches are the Delaunay graph candidate set approach and
nearest neighbor approach, which provide relationships between elements such as in
the TSP. A static candidate set is used in all such strategies as an a priori candidate
which is derived and does not receive any updates or modifications while being
applied in the process of the ACO metaheuristic. On the other hand, throughout the
search process, dynamic candidate set strategies are required.
5-2
Modern Optimization Methods for Science, Engineering and Technology
biased search is performed only for cities in the CL. If all cities from this list have
already been visited, one of the remaining cities is chosen.
5-3
Modern Optimization Methods for Science, Engineering and Technology
Transition probability
Ants choose the next city from the current city using the state transition rule which is
the same as for the ACS. The rules are taken from the ACS of Dorigo and Stutzle. In
this stage, an ant will iteratively select from city to city. Some of the time an ant will
select a city using the random proportional action choice rule as before. Then the ant
chooses only the ‘best’ city based on the visibility and the pheromone trail.
Formally, an ant k positioned on city x chooses a city y to move to according to
⎧
⎪ arg max
y=⎨ { α
u∈Jk (x ) [τ(x,u ) ] . [η(x,u ) ]
β
,} if q ⩽ q0
, (5.1)
⎪
⎩Y , otherwise
where q is a number which has been randomly generated in the range between 0 and
1, q0 is a constant, Jk(x) is the set of cities which are not visited by the kth ant and Y
is a random city according to
⎧ [τ(x,y )]α . [η(x,y )] β
⎪
⎪ , if y ∈ Jk(x )
Pk(x , y ) = ⎨ ∑ [τ(x,u )]α . [η(x,u )] β , (5.2)
⎪
⎪
u∈Jk (x )
⎩ 0, otherwise
where Pk(x, y) is the probability of choosing city y from city x, Jk(x) is the set of
cities which are not unvisited by ant k, τ(x, y) is the pheromone value for moving
from city x to city y, η(x, y) is the local heuristic (distance) for moving from city r to
city s which is 1/d(x, y), α determines the significance of pheromone information and
β determine the significance of heuristic visibility.
5-4
Modern Optimization Methods for Science, Engineering and Technology
10% of the total time and exploiting 90% of the time. This further suggests that the
relative significance of exploitation can be determined by q0 in comparison to biased
exploration, on the other hand, the desirability of the edges is determined by the
values of β and ρ.
In the algorithm suggested in this chapter, the next city is chosen by the ants with
more exploration at the beginning and the ants tend to have more exploitation at the
end of the algorithm. The information which is accumulated by exploitation values
is utilized properly. The values of q0 and β are permitted to vary and the appropriate
exploitation can be achieved. A new value for q0 is updated after every iteration of
the suggested scheme, which can be understood through
IterationCounter
q0 = (5.3)
MaxIteration
5-5
Modern Optimization Methods for Science, Engineering and Technology
where ρ is the decay parameter of pheromone and Lgb is the length of the global best
tour. As the effect of this rule, the ants’ searching concentrates around the best
known tour and also increases exploitation.
5-6
Modern Optimization Methods for Science, Engineering and Technology
cities with the increase in the distance of cities. While constructing the solution the
next city is chosen by the ants, then the probability of movement from city i to city j
needs to be computed and, based on the probability of transfers, the next city is
chosen. If a city is not found by an ant, then that city is moved out of the total
number of the CLs, but this limits the selection capability of the ant and the list
subsequently becomes smaller. While moving, all different cities which are found by
the ants are added to the list. When the algorithm uses the candidate set strategy
then a few factors become very important:
• The number of edges in the global optimal solution in a set of candidates.
• Restriction of candidates to the scope of choosing or selection.
The above procedure proved high speed in the performance of the algorithm
using the DCL strategy, which further helps in obtaining an improved computa-
tional time and improved quality of solution. Experimental studies have shown that
this proposed method results in an improvement of solution quality and a significant
performance gain.
5-7
Modern Optimization Methods for Science, Engineering and Technology
5-8
Modern Optimization Methods for Science, Engineering and Technology
enhanced on the path chosen so that the entropy value starts slowly decreasing. If
the entropy is allowed to decrease, then it may also reach zero and that may not lead
to a global best tour and the situation will be termed as premature convergence. In
order to deal with this type of difficulty, which occurs due to behavioral defects,
some complex optimization problems require colony optimization which can handle
the issues related to entropy. The discussion of entropy issues is required for dynamic
updates of heuristic parameters, which can also control the value of entropy.
Shannon’s theory of entropy as information theory is very popular in this area,
also known as Shannon’s information theorem [3]. This was introduced in 1948 and
is usually defined as a measure of uncertainty which is concerned with events having
disorder in a system. The entropy represents that information which is associated
with the probability of the occurrence of the event. Why this concept is relevant is an
important question. It is because of the fact that the ACO algorithm has a path
selection procedure and the selection of the path is also not certain, which means
there is uncertainty. Therefore, we suggest the entropy information to be estimated
in ACO as variation of the pheromone matrix. Each trail in this case would be a
random variable in the pheromone matrix. Such entropy is defined as
r
H (X ) = −∑Pi log Pi , (5.8)
i=1
where pi is the probability of occurrence of each trail in the pheromone matrix. For a
symmetric n city in the TSP, there are n(n − 1)/2 distinct pheromone trails and r = n
(n − 1)/2. It is obvious that the probability of each trail is the same, and H will be the
maximum (Hmax), given as
r r
1 1
Hmax = −∑Pi log Pi = −∑ log = log r . (5.9)
i=1 i=1
r r
Hmax − Hcurrent
H′ = 1 − , (5.11)
Hmax
5-9
Modern Optimization Methods for Science, Engineering and Technology
where H ′ is the value of entropy of the pheromone matrix that is currently being
considered.
c a c a
d d
b b
(a) (b)
Figure 5.1. A 2-Opt local search. Edges ab and cd from the graph illustrated in (a) were deleted and the tour
was completed again by inserting edges ac and bd, thus resulting in the valid tour shown in the graph (b).
5-10
Modern Optimization Methods for Science, Engineering and Technology
1. Initialize
Find pheromone trails τ0 using a nearest neighbor heuristic
for every edge(x, y)
Set τ(x, y) = τ0
end for
2. Calculate the maximum entropy
globalBestTour ← ϕ
globalBestTourLength ← ∞
determine candidate_list strategy
3. Place the n ants randomly on the starting city
for k = 1 to n ants do
Place and sum with the starting city of the kth ant to its tabular list
end for
localBestTour ← ϕ
localBestTourLength ← ∞
Define the value of exploitation parameter q0
4. for k = 1 to n do /*n is the number of ants */
q ← random
repeat
Find the next unvisited node j for kth ant from kth ant’s current node i by using state
transition rule with CL
Append j into the kth ant’s tabu list
Perform local pheromone update
until tour has been completed by ant k
Apply local search (2-Opt or 2.5 Opt) to improve tour
Compute tourLength of kth ant
if tourLength < localBestTourLength then
localBestTourLength ← tourLength
localBestTour ← best tour found
end if
end for
if localBestTourLength < globalBestTourLength then
globalBestTourLength ← localBestTourLength
globalBestTour ← localBestTour
end if
5. for each edge (r, s) belonging to the global best tour
Perform global pheromone update
end for
6. for every pheromone τ(r, s)
Compute value of entropy for current pheromone trails
5-11
Modern Optimization Methods for Science, Engineering and Technology
end for
Update the heuristic parameter
7. // Check end condition
if (end_condition = true or MaxIteration)
print global best tour
exit
else
go to 3.
End if
5-12
Modern Optimization Methods for Science, Engineering and Technology
Start
Initialize parameter
no Maximum
Iteration
yes
End
5-13
Modern Optimization Methods for Science, Engineering and Technology
Q R Q R
S S
P P
T T
W U W U
V V
(a) (b)
Q R Q R
S S
P P
T T
W U W U
V V
(c) (d)
Q (c) R Q R
(d)
S S
P P
T T
W U W U
V V
(e) (f)
Q R Q R
S S
P P
T T
W U W U
V V
(g) (h)
Figure 5.3. IACO for a simple TSP.
The tour length generated by the ant will be calculated by adding the length of the
edge between each two cities from the tour. The process will be accomplished by
each ant and at the end of the iteration there will be five tours generated by five ants.
The local search improvement is applied to improve the tour consturcted by the ants.
5-14
Modern Optimization Methods for Science, Engineering and Technology
The shortest of these tours will be selected as the best tour and the edges that form
this tour will be updated using the global update formula in equation (5.5). Then, we
calculate the current entropy by analyzing the pheromone information to update the
heuristic parameter β. Then, the ants are again placed randomly for a second
iteration and redefine the exploitation parameter q0. The algorithm goes on until the
maximum number of iterations has finally found the global best tour.
begin
InitializeData
Calculate MaxEntropy
while (not termination) do
ConstuctSolutions
ApplyLocalSearch
UpdateStatistics
UpdatePheromoneTrails
ComputeCurrentEntropy
UpdateHeuristicParameter
end-while
end
In data initialization: (i) read the TSP instance; (ii) calculate the distance matrix of
the read TSP instance; (iii) define and compute the the CLs for all cities; (iv) the ants
randomly place the their starting cities; (v) the algorithm’s parameters must be
initialized; and (vi) some variables that keep track of statistical information, such as
number of iterations, or best solution found (best tour length), and best tour, need to
be included.
5-15
Modern Optimization Methods for Science, Engineering and Technology
Procedure: InitializeData
begin
ReadTSPInstance
ComputeDistances
ComputeCandidateLists
InitializeAnts
InitializeParameters
InitializeStatistics
end
The following two constuction steps are repeated until the tour is finished by all the
ants. When exploiting the procedure the ConstructExploitDecisionRule needs to be
adapted. If not, the procedure ConstructExploreDecisionRule needs to be computed.
In ConstructExploitDecisionRule, a major change that can be seen is that while
choosing the next city, we need to find the city which is not visited from the entire list
of candidates from the current city. Another change which can be seen is the necessity
for dealing with the situations where all cities have been covered in the CL by any ant
k. Under this condition and taking the changes into account, the variable node
maintains its initial value as −1 and the city beyond the CL is chosen. We are required
to choose the maximum product of the pheromone value and heuristic information
[τij]α[ηij]β for moving to the next city. In ConstructExploreDecisionRule there are two
changes as in the ConstructExploitDecisionRule. The exploring procedure helps in
choosing the next city which is unvisited as per the acton choice rule in equation (5.2).
Procedure: ConstructSolutions
begin
curNode ← startNode
q0 ← IterCounter/Iterations
for k = 1 to m ants do
repeat
q ← random number
if(q < q0) then
newNode ← ConstructExploitDecisionRule(k, curNode)
else newNode ← ConstructExploreDecisionRule(k, curNode)
end if
Add newNode to ant k’s tour
LocalUpdatingRule(curNode, newNode)
curNode ← newNode
until ant k completes tour
end for
end
5-16
Modern Optimization Methods for Science, Engineering and Technology
begin
sum_probability ← 0.0 // CandidateListConstructionRule
node ← −1
for j = 1 to DCL do
if kth ant’s node j is not visited in candidate list then
selection_probability ← value of transition probability
node ← j /* city with maximal ταηβ*/
end if
end for
if (node == −1) then // city outside candidate list
for j = 1 to n do
if kth ant’s node j is not visited outside the candidate list then
selection_probability ← value of transition probability
node ← j /* city with maximal ταηβ*/
end if
end for
end if
return node
end
begin
sum_probability ← 0.0 // CandidateListConstructionRule
node ← −1
for j = 1 to DCL do
if kth ant’s next node j is not visited in candidate list then
partial_product ← pheromone_value*exp(1/distance,β) /* city with ταηβ*/
sum_probability ← sum_probability + partial_ product
end if
end for
for j = 1 to DCL do
if kth ant’s next node j is not visited in candidate list then
selection_probability ← partial_ product / sum_probability
rno ← random number
if selection_probability >= rno then
node ← j
break
end if
end if
end for
if node = −1 then // city outside candidate list
for j = 1 to n do
if kth ant’s next node j is not visited outside the candidate list then
5-17
Modern Optimization Methods for Science, Engineering and Technology
It is evident that the use of CL and computation time are very important factors
for the ants in the construction of solutions, in particular if lower values of these
factors are expected. It can be seen also that the values are considerably reduced
when ants are choosing the cities.
The next step is about local pheromone updating which acts as a trigger after the
ants have moved to the next city.
begin
value ← (1 − ρ)*pheromone[curNode][newNode] + ρ* pheromone[curNode][newNode]
pheromone[curNode][newNode] ← value
pheromone[newNode][ curNode] ← value
end
Once the solutions are constructed, the generated tours may be improved by a
local search procedure (for example 2-Opt or 2.5-Opt). The next step in an iteration
of the algorithm is the pheromone update (global pheromone updating). This is
implemented by the procedure of UpdatePheromoneTrails, which makes up two
pheromone update phases: pheromone evaporation and pheromone deposit.
Pheromone evaporation decreases the value of the pheromone trails on only the
best path by a constant pheromone decay factor ρ. The pheromone deposit adds
pheromone to the edges belonging to tours constructed by the ant’s best path length.
ComputeCurrentEntropy computes the pheromone information of the current
pheromone matrix to be used in the next step. UpdateHeuristicParameter
5-18
Modern Optimization Methods for Science, Engineering and Technology
dynamically updates the heuristic parameter based on the entropy value of current
pheromone information.
Procedure: UpdatePheromoneTrails
begin
for i = 1 to n − 1 do
tau ← 1/ bestLength
evaporation ← (1 − ρ) * pheromone[bestPath[i]][bestPath[i + 1]]
deposition ← ρ * tau;
pheromone[bestPath[i]][bestPath[i + 1]] ← evaporation + deposition
pheromone[bestPath[i + 1]bestPath[i]] ← pheromone[bestPath[i]][bestPath[i + 1]]
end for
end
Procedure: ComputeCurrentEntropy
begin
sum ← 0.0
current ← 0.0
max_entropy ← Math.log(noOfNodes*(noOfNodes-1)/2)
for i = 2 to n do
for j = 1 to i do
sum ← sum + pheromone[i][j]
end for
end for
for i = 2 to n do
for j = 1 to i do
current ← current + (−((pheromone[i][j]/sum)*
(Math.log(pheromone[i][j])/sum)))
end for
end for
current ← 1−((max_entropy-current)/max_entropy)
end
5-19
Modern Optimization Methods for Science, Engineering and Technology
The TSP is a classical optimization problem which has been used for the
evaluation of the performance of ant algorithms. This chapter presents experiments
on TSPs comparing the performance of ant colony optimization and the modified
ant colony optimization approach with CL strategy based on entropy. The
implementations use the ACS version of the ant algorithm. The proposed method
includes a candidate set which is applied to the construction phase and one extension
to the update phase of the algorithm’s heuristic parameter based on entropy.
where X means the tour length, and n is number of the tour length. The sample
variance σ2 is the second sample central moment and is defined by
n
∑(Xi − X )2 (5.13)
i=1
σ2 = ,
n
where σ is the standard deviation. The algorithm for optimization is better, with
more robustness and better stability, if the value of the standard deviation of a
performance criterion over a number of simulation runs is small. Another similar
factor σ2 is known as variance in the evaluation of performance criterion. The
smaller the value of σ2, the smaller is the range of performance values fitting into
stability like a swarm.
5-20
Modern Optimization Methods for Science, Engineering and Technology
5-21
Modern Optimization Methods for Science, Engineering and Technology
solutions. The benchmark instances are given with varying complexity and diffi-
culty. Only strongly connected instances of TSP were chosen from the TSPLIB, and
through symmetric Euclidian TSPs were analyzed, referenced as EUC_2D, meaning
that distances (or weights) between nodes are expressed as an Euclidean 2D
coordinate system. The coordinates are decimal numbers (or doubles).
Parameters Value
α 1
β 2–5
ρ 0.1
No. of ants 10
No. of iterations 20
428
427.5
427
Tour length
TourLength
426.5 Optimal
Average
426
425.5
425
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of runs
Figure 5.4. Results for the eil51 instance using local search (2-Opt).
5-22
Modern Optimization Methods for Science, Engineering and Technology
Figure 5.5. The best-so-far solution of the tour length of the eil51 instance.
Figure 5.6. The tour best result of each iteration for the eil51 instance.
0.14% from the optimal distance. The proposed algorithm’s results vary from 426
(0% deviation) to 428 (0.47% deviation) and considering half of the running results
are converged to the optimal and average length of the runs are much closed to the
optimal. Figures 5.5–5.7 show the analysis results for best tour-so-far solution, tour
best and standard deviation of tour length, respectively.
5-23
Modern Optimization Methods for Science, Engineering and Technology
Figure 5.7. Standard deviation of tour length for the eil51 instance.
455
450
445
Tour length
440
435
TourLength
430
Optimal
425
Average
420
415
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of runs
Figure 5.8. Results on the eil51 instance without using local search.
In figure 5.8, when the proposed algorithm does not use the local search
optimization to optimize the tour, the results change dramatically. The test case
was changing the number of iterations to 100 and the remaining parameter values
were kept as in table 5.1. The best solution found is at 429 for 20 trials. It is 0.7%
deviated from the optimal solution 426. The results vary between 429 and 453, with
an average tour length of 440.7 (0.7%, 6.34% and 3.45% deviation, respectively).
5-24
Modern Optimization Methods for Science, Engineering and Technology
429
Tour Length
428
427
TourLength
426 Optimal
Average
425
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of runs
Figure 5.9. Results on the eil51 instance with local search (2-Opt).
Table 5.2. Comparison of the final best solution and convergence number between IACO and DSMACS.
The analysis results of the ACS are also described in figure 5.9. The ACS
algorithm runs the eil51 test case with the parameter value of β = 5 and the other
parameter values as in table 5.1. ACS obtains an average tour length of 428.25 when
using optimization with local search, which is 2.25 higher (making it a deviation
(degree of approximation) of 0.53%) than the optimal of 426. The ACS algorithm
results vary from 427 (0.24% deviation) to 430 (0.94% deviation) and more than half
of the runs are below the average length of the runs.
5-25
Modern Optimization Methods for Science, Engineering and Technology
Improved ACO
(a) eil51
Convergence result of tour length
7800
Best sofar: Tour Length
7750
7700
7650
7600
7550
7500
7450
7400
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of Iterations
Improved ACO
(b) berlin52
Convergence result of tour length
684
Best sofar: Tour Length
682
680
678
676
674
672
670
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of Iterations
Improved ACO
(c) st70
Figure 5.10. Convergence speed of tour length for (a) eil51, (b) berlin52 and (c) st70 TSPs.
5-26
Modern Optimization Methods for Science, Engineering and Technology
5-27
Modern Optimization Methods for Science, Engineering and Technology
average distance of the trials. The third column shows the relative errors, also called
degree of approximation ((best solution-optimum)/optimum) which is how much the
deviation of the optimal distance from the best solution, and the fourth column is the
standard deviation.
In all cases, the proposed algorithm shows better performance than the ACS
algorithm. The experiment shows that the improved ant colony algorithm proposed
5-28
Modern Optimization Methods for Science, Engineering and Technology
in this approach achieved better results for TSPs, and its efficiency of solutions is
better than that of ant colony algorithm.
Looking at the above tables the deviation (relative error) from the optimal
solutions can be seen, and the differences in the deviations between the IACO
algorithm and the ACS are compared. Moreover, the average distance of the IACO
5-29
Modern Optimization Methods for Science, Engineering and Technology
422
421
420
419
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of Iterations
Improved ACO ACS
and ACS are more obvious. Figures 5.11–5.20 present the comparison of con-
vergence results of the tour length to obtain the best results for oliver30, eil51,eil76,
eil101, berlin52, kroA100, lin105, pr144, rat99 and st70, respectively. The tested TSP
datasets were executed for 20 trials, with the same number of iterations for each
dataset. The best values obtained from a trial are given below for each dataset as in
the figures. These figures show the best solution found since the start of the algorithm
run, or the iteration number at which the best solution was found. The distance which
is illustrated in the graph is the best distance found so far in the iterations.
In figure 5.21 the performance and optimal solution can be seen for IACO and
ACS with the index 100 (also called degree of accuracy). The IACO algorithm
obtains better results than the ACS and all of the results are strikingly close to the
optimal solution.
The analysis results for the larger TSP instances can be seen in table 5.13. These
problems were tested with 20 iterations for 30 trials. The number of ants used for all
these test cases is 10. The parameters are as follows α = 1, β is a dynamic value of the
algorithm and the pheromone decay parameter ρ is 0.1. However, β = 5 is used for
the ACS and other parameters are used as the proposed algorithm. Both of these
5-30
Modern Optimization Methods for Science, Engineering and Technology
550
545
540
535
530
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of Iterations
Improved ACO ACS
650
645
640
635
630
625
620
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of Iterations
Improved ACO ACS
5-31
Modern Optimization Methods for Science, Engineering and Technology
22100
22000
21900
21800
21700
21600
21500
21400
21300
21200
21100
21000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of Iterations
Improved ACO ACS
14750
14700
14650
14600
14550
14500
14450
14400
14350
14300
14250
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of Iterations
Improved ACO ACS
5-32
Modern Optimization Methods for Science, Engineering and Technology
1280
1270
1260
1250
1240
1230
1220
1210
1200
1190
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of Iterations
Improved ACO ACS
705
700
695
690
685
680
675
670
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of Iterations
Improved ACO ACS
5-33
Modern Optimization Methods for Science, Engineering and Technology
Accuracy results
100.5
100
Accuracy %
99.5
99
98.5 ACS
IACO
98
Optimal
97.5
TSP instances
Figure 5.21. The accuracy of IACO and ACO performance compared to the optimal solution.
algorithms tested all the problems with local search optimization (2-Opt). The
results were shown as the best length, average tour length and relative errors of the
algorithms. Most results of the IACO algorithm achieved the optimal solutions.
Therefore, the results of the IACO algorithm are more satisfactory than the results
of the ACS algorithm. Furthermore, the results of both algorithms are compared to
the optimal solution with the degree of accuracy. In figure 5.22 it can be seen that
most of the proposed algorithm results achieve the optimal solution and some results
are less deviated from optimum. The log graphs are also presented.
5.13.1 Analysis of ants starting at different cities versus the same city
If the TSP finds a good solution with a good tour length, then the algorithm is very
effective when the ants start from different cities. Generally, cities are similar or
identical in implementations of the TSP, and while maturity of the algorithm is
obtained and at the beginning of the tour, ants can freely select any appropriate city
for the next movement. So, ants may also use the cities with short distance and when
they reach the end of the tour almost all cities might have been covered. Once the
cities with small edges or distance are covered, the ants will also be required to reach
the next cities with some bigger edges. The method of selection for a starting city is
random in this work and the search space can therefore take multidirectional space.
The possibility of obtaining better results along with handling short and long edges
is greater.
5-34
Table 5.13. Tour length results and relative errors (deviation) on several TSP instances.
IACO ACS
TSP Optimum Best length Average tour Relative errors Best length Average tour Relative errors
problems (1) (2) length (deviation) ((2)−(1))/(1) (3) length (deviation) ((3)−(1))/(1)
5-35
kroD100 21 294 21 294 21 379.6 0% 21 309 21 584.7 0.07%
kroE100 22 068 22 068 22 135.77 0% 22 116 22 284.67 0.22%
kroA150 26 524 26 524 26 806.5 0% 26 820 27 189.47 1.08%
pr76 108 159 108 159 108 303.5 0% 108 304 108 723.4 0.13%
pr124 59 030 59 030 59 110.67 0% 59 076 59 228.23 0.08%
pr152 73 682 73 682 73 772.73 0% 73 818 74 243.67 0.18%
pr226 80 369 80 377 80 628.16 0.01% 80 524 80 959.67 0.19%
rat195 2323 2339 2356.17 0.69% 2352 2379.27 1.25%
Modern Optimization Methods for Science, Engineering and Technology
Modern Optimization Methods for Science, Engineering and Technology
Accuracy results
100.50
100.00
99.50
Accuracy %
99.00
ACS
98.50
IACO
98.00 Optimal
97.50
TSP instances
Figure 5.22. Accuracy of IACO and ACO performance compared to the optimal solution.
Table 5.14. The effect of the same starting city and random starting city.
TSP Optimal Best length Average Relative Best length Average Relative
problem (1) (2) length error((2)−(1))/(1) (3) length error((3)−(1))/(1)
The effects when all ants start from same city randomly can be seen in table 5.14.
The eil51 (51-city) problem has been tested and accordingly we found the best
possible path using 100 iterations in 20 trials. For this data, the best result was
obtained when ants started from same city randomly. In the case of the best optimal
solution, the distance is greater than the optimal distance by 2.0. The results for
distances greater than 4.0 are also highlighted.
5-36
Modern Optimization Methods for Science, Engineering and Technology
450
Tour length
440
430
TourLength
420 Average
410
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of runs
Figure 5.23. Tour length result for the eil51 instance with 10 ants and 100 iterations.
480
470
460
450
440
430
420
410
400
1 10 19 28 37 46 55 64 73 82 91 100
Number of Iterations
Improved ACO
Figure 5.24. Tour length distance achieved for eil51 using 10 ants and 100 iterations.
The algorithm was run for 100 iterations, where each iteration used 10 ants and
the best solution was found after 60 iterations. The best value was quite close to the
optimum value (figure 5.26).
Summarized results for the eil51 TSP instance:
5-37
Modern Optimization Methods for Science, Engineering and Technology
470
460
Tour length
450
440 TourLength
430 Average
420
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of runs
Figure 5.25. Tour length result on the eil51 instance with 100 ants and 10 iterations.
470
460
450
440
430
420
1 2 3 4 5 6 7 8 9 10
Number of Iterations
Improved ACO
Figure 5.26. Tour length distance achieved for eil51 using 100 ants and 10 iterations.
The algorithm was run for 10 iterations, where each iteration used 100 ants and
the best solution was found at 10 iterations. The best value was not far from the
optimum value.
Summarized results for the eil51 TSP instance:
5-38
Modern Optimization Methods for Science, Engineering and Technology
It can also be seen that the solution result is better when increasing the number of
iterations than the number of ants.
Another alternative way to analyze the varying number of ants and iterations is
also presented in the following figures, which suggest that the chance of obtaining a
good result would be higher. Figure 5.27 shows the analysis result for an increasing
number of ants throughout the iterations (for example, the first iteration uses 1 ant,
the second iterations use 21 ants, 41 ants, 61 ants and so on and the last iteration uses
181 ants). This test iterates 10 times. Figure 5.28 shows the result of increasing the
number of iterations to 200 iterations. The individual iteration uses 10 ants. The
average length of the increasing number of ants is 545.96 and that of the increasing
number of iterations is 457.89. It can be seen that increasing the number of iterations
obtain better quality results than increasing the amount of ants.
5-39
Modern Optimization Methods for Science, Engineering and Technology
480
Best so far: Tour Length
460
440
420
400
380
1 51 101 151 201 251 301 351 401 451 501
Number of Iterations
with canidate list without candidat list
Figure 5.29. Comparison of the tour length result of IACO with the candidate list and without the candidate
list for oliver30.
5-40
Modern Optimization Methods for Science, Engineering and Technology
500
Best so far: Tour Length
480
460
440
420
400
380
1 51 101 151 201 251 301 351 401 451 501
Number of Iterations
with candidate list without candidate list
Figure 5.30. Comparison of the tour length result of IACO with the candidate list and without the candidate
list for eil51.
Figures 5.29–5.31 show the results with CL and without CL, respectively. We
have achieved considerable improvements in our searching ability, with the number
of iterations reduced and less time taken.
5.15 Conclusions
Research studies suggest that ACO is a modern and emerging research field in the
area of optimization and is used for a number of engineering problems, in particular
in the age of artificial life and operations research. One of the most important
applications is swarm intelligence, used in applications related to AI. The simu-
lations using ant colonies and exploring their behavior foraging for the food resulted
in the birth of ACO whose principal element is pheromone information. This
chapter has discussed the basics of ACO, establishing a metaheuristic structure
which provides a number of options for implementation in the design of algorithms.
Various alternative approaches are explained briefly, highlighting successful algo-
rithms such as MMAS and ACS. ACO algorithms are an emerging field and are
considered as state-of-the-art algorithms in solving combinatorial problems of
optimization.
One major contribution of the chapter is suggesting IACO as an improved ant
colony optimization algorithm, that is designed for addressing combinatorial
optimization problems such as TSPs. The search space, dynamic updates, compu-
tation time and best possible tours have been addressed well with a substantial
amount of results and discussions. The heuristic parameter and selection of the
values have been discussed in terms of how the different values affect the
5-41
Modern Optimization Methods for Science, Engineering and Technology
8400
8200
8000
7800
7600
7400
7200
7000
6800
1 51 101 151 201 251 301 351 401 451 501
Number of Iterations
Figure 5.31. Comparison of the tour length result of IACO with the candidate list and without the candidate
list for berlin52.
performance of the system. The suggested approach proved to produce much better
results compared to a single approach which is used to solve TSPs. Optimal results
are obtained and some solutions to the problems were found to be optimal. An
interesting finding regarding the time taken was made that suggests the time required
in finding the solution can be optimized if flexible and dynamic updates are properly
applied.
References
[1] Colorni A, Dorigo M and Maniezzo V 1991 Distributed optimization by ant colonies Proc.
of Ecal91—Eurpoean Conf. on Artificial Life (Paris, Amsterdam: Elsevier) pp 134–42
[2] Colorni A, Dorigo M and Maniezzo V 1992 An investigation of some properties of an ant
algorithm Proc. of the Parallel Problem Solving from Nature Conf. (PPSN 92) (Brussels,
Amsterdam: Elsevier) pp 509–20
[3] Shannon C E 1948 A mathematical theory of communication Bell Syst. Tech. J. 27 379–423
[4] Pintea C-M and Dumitrescu D 2005 Improving ant system using a local updating rule Proc.
of the Seventh Int. Symp. and Numeric Algorithms for Scientific Computing (SYNASC’05)
(Piscataway, NJ: IEEE)
[5] Wang C-X, Cui D-W, Zhang Y-K and Wang Z-R 2006 A novel ant colony system based on
Delauney triangulation and self-adaptive mutation for TSP Int. J. Inform. Technol. 12 89–99
5-42
Modern Optimization Methods for Science, Engineering and Technology
[6] Hung K S, Su S F and Lee S J 2007 Improving ant colony optimization for solving traveling
salesman problem J. Adv. Comput. Intell. Intell. Inform. 11 433–42
[7] Gambardella L M and Dorigo M 1995 Ant-Q: a reinforcement learning approach to the
traveling salesman problem Proc. of the Twelfth Int. Conf. on Machine Learning (San
Francisco, CA: Morgan Kaufmann) pp 252–60
[8] Dorigo M, Maniezzo V and Colorni A 1996 The ant system: optimization by a colony of
cooperating agents IEEE Trans. Syst. Man Cyber. B 26 29–41
[9] Dorigo M and Gambardella L M 1997 Ant colony system: a cooperative learning approach
to the traveling salesman problem IEEE Trans. Evolut. Comput. 1 1–24
[10] Dorigo M and Gamgardella L M 1997 Ant colonies for the traveling salesman problem
BioSystems 43 73–81
[11] Dorigo M, Birattari M and Stuzle T 2006 Ant colony optimization—artificial ants as a
computational intelligence technique IEEE Comput. Intell. Mag. 1 28–39
[12] Dorigo M and Stützle T 2004 Ant colony optimization (Cambridge, MA: MIT Press)
[13] Dorigo M and Stützle T 2002 The ant colony optimization metaheuristic: algorithms,
applications, and advances Handbook of Metaheuristics ed F Glover and G Kochenberger
(Amsterdam: Kluwer)
[14] Randall M and Montgomery J 2002 Candidate set strategies for ant colony optimisation Ant
Algorithms, 3rd International Workshop, ANTS 2002, Proceedings ed M Dorigo, G di Caro
and M Sampels (Lecture Notes in Computer Science (including subseries Lecture Notes in
Artificial Intelligence and Lecture Notes in Bioinformatics) vol 2463) (London: Springer),
pp 243–49
[15] Stutzle T and Hoos H H 1997 MAX–MIN ant system and local search for the traveling
salesman problem Proc. of the 1997 IEEE Int. Conf. on Evolutionary Computation
(ICEC’97) (Piscataway, NJ: IEEE), pp 309–14
[16] TSPLIB 2005 TSPLIB: Library of Sample Instances for the TSP University of Heilderberg,
Department of Computer Science http://iwr.uniheidelberg.de/groups/comopt/software/
TSPLIB95/tsp/
[17] Hlaing Z C S S and Khine M A 2011 An ant colony optimization for solving traveling
salesman problem Int. Conf. on Information Communication and Management (ICICM),
IPCSIT vol 16 (Singapore: IACSIT)
[18] Hlaing Z C S S and Khine M A 2011 Solving traveling salesman problem by using improved
ant colony optimization algorithm Int. J. Inform. Educ. Technol. 1 404–49
5-43
IOP Publishing
Chapter 6
Application of a particle swarm
optimization technique in a motor imagery
classification problem
Rahul Kumar and Mridu Sahu
In many real-world problems one needs to find the best solution from all feasible
solutions to solve any particular problem. Finding the best among all solutions is a
basic theme of optimization. Minimization of the time and space to solve real
problems is still a challenging task in many areas such as the biomedical, behavior
and prediction sciences, etc. Optimization establishes relationships among problem
objectives, predefined constraints on these and the targeted variables. Optimization
techniques have been widely employed by researchers to improve the performance of
computers in many cognitive detections. This chapter presents the application of
particle swarm optimization (PSO) in motor imagery (MI). The dynamic state of the
brain is simulated with the action of different body parts in MI. The brain–computer
interface (BCI) provides a conduit between the brain and computer, and it performs
classification of MI action, which is very helpful for those people who are paralyzed
due to high-level spinal cord injury and are not able to perform any muscular
activities. The quality of signals and their performance on classifiers are still crucial
challenges in MI classification. For classification various steps are required, such as
preprocessing, feature extraction, selection, etc. The presented chapter includes
various feature extraction techniques and selections for the classification of MI. For
the recording of brain activity electroencephalography (EEG) is normally used. This
is the non-invasive mode of recording brain signals. The brain signal from the
recording is noisy and complex data, and several preprocessing steps are involved to
improve the quality of signals. The transformation of the signal from one domain to
another is called signal transformation, which has been done in the current chapter
through wavelet transformation; this also helps in extracting features from these
signals. EEG signal nonlinearity is a serious problem in finding a solution for the
detection of many diseases and disorders. Fuzzy logic is a powerful tool for handling
nonlinearity. This chapter focuses on the classification of EEG signal using an
adaptive neuro-fuzzy inference system (ANFIS). The discrete wavelet transform
(DWT) has been used here for useful feature extraction.
Also, in this chapter, PSO has been applied for optimizing the network parameters
of ANFIS. In standard ANFIS all parameters are updated using a gradient descent
algorithm. The problem present in gradient descent is that when the search space is
large, the complexity of gradient computations is high. The PSO method is inspired by
the biological nature of bird flocking and fish schooling, which is applied here to tune
the parameters of the network. Normally PSO finds an iterative candidate solution for
the specified problem. It is a metaheuristic, which provides a set of rules that is high
level and independent of the problem. PSO can optimize the functional module, tune
the neuro and fuzzy systems as well as modify the rule set generated from various
systems. In the presented study, PSO minimizes the mean square error (MSE) by
tuning the parameters of membership functions of the fuzzy inference system (FIS)
and increases the performance for classification of right hand and foot movement
detections. PSO possesses a central advantage on the implementation side as it is easy
and requires very few parameters. Two models have been proposed: one is based on
PSO and the other is gradient descent for comparative analysis. The results confirm
that the PSO based model gives better accuracy than the others.
6.1 Introduction
Optimization plays an important role in our day-to-day lives. In many disciplines,
such as the scientific, social, economics and engineering fields, optimization is used
to find a desirable solution to problems by adjusting their parameters. The main aim
of optimization is finding the best solution among all feasible solutions that are
available to us. A feasible solution satisfies all constraints in optimization problems.
Currently the problems in optimization are multi-objective and multidisciplinary.
To solve that kind of complex problem, not only is gradient descent based
optimization used, but also the evolutionary algorithms such as genetic algorithms,
and particle swarm optimization (PSO) are employed. Over the years, many
methods have been developed for optimization. The PSO method is a popular
method of optimization which is inspired by the social behavior of birds and fish.
PSO is used in many fields, such as the medical field, engineering, social economics,
etc. In medicine, biomedics is one of the important fields which focuses on the study
of the brain and its behavior (how the brain works in different environments). In this
field, the design of a stable and reliable brain–computer interface (BCI), which
provides the basic connection between a brain and computer, is a major challenge.
To design a BCI, various problems need to be overcome. To solve these problems
PSO can be used. Channel selection is an important step in recording brain signals
and also a major challenge in recording brain signals. PSO is widely used for
selecting the minimum number of channels. During the recording of signals, a lot of
noise and artifacts are generated that affect the performance of the model. To
6-2
Modern Optimization Methods for Science, Engineering and Technology
remove those artifacts and noise, PSO based filters are used. Feature extraction
methods are used to obtain the best possible characteristics from brain signal for
dimensionality reduction. Feature selection is a method which is applied after
extraction to filter out irrelevant and redundant features. PSO based feature
selection is a widely used technique to reduce the dimensionality.
Human motor activity is described by notable adaptability; a human can perform
a number of undertakings, for example, walk forward and backward, run, dance,
shuffle and produce a wide range of activities. We appear to have the option to
produce a practically limitless stream of movements to achieve objectives in the
surrounding environment [1]. The motor system is based on brain activity. However,
due to spinal cord injury and other neurological diseases the motor ability can be
damaged [2]. People may be paralyzed and they are not able to perform any
muscular activity. To restore the damaged motor function, a concept called MI,
which is fundamentally a mental task in which patients perform the motor task in
their mind without performing any physical task. This concept is used with a brain–
computer interface (BCI), which provides a connection between the brain and
external environment (figure 6.1) [3]. An MI based BCI performs the classification of
MI action such as movement of the hands and feet and other motor related action. It
provides one-way communication for a patient who is suffering from a motor
disability. BCI is categorized into two types: invasive and non-invasive. In invasive
BCI, electrodes are placed in the head using cranial surgery in order to compute
brain activity. This method provides a higher signal-to-noise ratio (SNR), but its
downsides are a higher infection rate and other side effects. To counter the negative
effects of the above method, non-invasive BCI was introduced, but it has a lower
SNR. This calls for the development of more precise, non-invasive BCI in order to
accurately measure brain activity.
In the BCI, people produce a different brain activity pattern by performing the
MI action and the pattern is identified and converted into a command and control
6-3
Modern Optimization Methods for Science, Engineering and Technology
6-4
Modern Optimization Methods for Science, Engineering and Technology
6-5
Modern Optimization Methods for Science, Engineering and Technology
select the dominating features. The fourth step used k-fold cross-validation, in which
the dataset is divided into two parts, one for training other data for testing and then
performing the classification algorithm. The result demonstrates that the proposed
technique improves precision more than other comparablee classification algo-
rithms. By using this method the authors achieved significant improvements in terms
of sensitivity and positive predictive value. Using the proposed method, the MI task
classification accuracy reached up to 80.9848%.
Hsu [13] proposed an automatic artifact elimination method where feature
selection is performed using a quantum-behaved PSO algorithm from a set of
extracted features. Optimal features are selected through QPSO and these selected
features are used as input into the SVM. The author proposed automatic artifact
elimination to remove the EOG artifacts and improve the classification accuracy. At
the same time a number of features such as spectral power, asymmetry ratio, MFFV
and other features are calculated and then combined together. After that QPSO is
used to select the features which enhance the classification accuracy.
Xu et al [14] used PSO based CSP method for feature extraction. Two parameters,
frequency band and time interval, affect the performance of CSP. So, using PSO the
authors found the optimal values for the parameters which improve the discrim-
inative ability of CSP.
Filters are used to improve the quality of signals by removing the noise and
irrelevant data. The performance of a filter depends on parameters such as the
frequency band time interval and others. To improve the performance such
parameters need to be optimized. For that, several researchers used PSO to optimize
the parameters.
Ahirwal et al [15] studies an adaptive noise canceler for improvement of EEG
signal filtering. Different versions of PSO were used to design the adaptive noise
canceler and perform a comparative analysis on different parameters for a varied
range of particles and inertia weights.
6-6
Modern Optimization Methods for Science, Engineering and Technology
The remaining sections of this chapter are as follows: in section 6.2 the
fundamentals of PSO are briefly explained and in section 6.3 the proposed model
is described. Some results of experiments are presented in section 6.4 and we provide
some remarks and conclusions in the last section.
6-7
Modern Optimization Methods for Science, Engineering and Technology
particle driven equation [24, 25], to keep the particle velocity or accelerate them
using an adaptation technique or randomized method. These proposed modifica-
tions work well and have the ability to avoid the fall of local minima. However, PSO
still does not guarantee finding a global solution with high dimension search space
advantages.
The position vector represents the location of a particle in the search space and
the velocity shows in which direction and at what speed (inertia of movement) the
particles move. In each iteration, these two vectors are updated with the above two
equations. The new velocity is calculated by the multiplication of the current
velocity and a variable called inertia w, and has c1 and c2 positive constraints for the
last two components in the velocity vector. The first term is called the inertia weight
as it maintains the current velocity and moment direction. The second component is
the cognitive component, it is also known as an individual component because it
considers the particle best position and current position. The third component is a
6-8
Modern Optimization Methods for Science, Engineering and Technology
social component; here the particle calculates the distance between its current
position and the best position for the whole swarm.
The cognitive and social component have a great impact on the moment of a
particle and this impact can be changed by tuning the coefficient. The inertia weight
tunes the exploration and exploitation and is normally decreased from 0.9 to 0.4 or
0.2. In real-word problem when searching for global optima, variation in the inertia
parameter will balance the exploration and exploitation. To demonstrate the impact
of the parameter in PSO, let us consider a simple objective function called a sphere
(equation (6.3)). Calculate the average objective value for a different range of inertia
weights ([0.9 0.2], [0.9 0.3], [0.9 0.5] [0.5 0.3]) with c1 = c2 = 2. It was observed that
the function quickly converges for adaptive inertia weight in the range [0.9 0.2].
Figure 6.2 shows the convergence of the function for various ranges of inertia
weight,
f (x ) = ∑(x 2 ). (6.3)
6-9
Modern Optimization Methods for Science, Engineering and Technology
6-10
Modern Optimization Methods for Science, Engineering and Technology
subject to
gi (x1, x2 , x3, x4 … , xn−1, xn) ⩾ 0 , i = 1, 2, 3, … , m (6.5)
p(x ) = { 0 if x ∈ S
+ ∞ otherwise
. } (6.9)
The penalty function returns a 0 if solution x is feasible, moreover in this case there
is no penalty. In the case that x is an infeasible solution, the penalty function
penalizes it by returning a greater objective value than the actual.
6-11
Modern Optimization Methods for Science, Engineering and Technology
6.3.1.1 Dataset
The dataset was acquired from BNCI Horizon-2020 [28]. The dataset related to two-
class MI, where signals are recorded for the right hand and foot movement. As per
the Graz-BCI training paradigm, a single session is carried out for recording,
training and feedback. Sessions have a total of eight runs and of the eight runs five
are used for training and three are used for testing. Of a total of 20 trials for a set, per
class 50 trials are for training and 30 are for testing. In this recording, process
participants had the task of performing sustained (5 s) kinesthetic MI as instructed
by a cue. The EEG signal is recorded using an Ag/Agcl electrode, with 512 Hz
sampling frequency using a bio-signal amplifier. Electrodes are positioned as per the
10–20 standard. Dataset processing is shown in figure 6.6.
6-12
Modern Optimization Methods for Science, Engineering and Technology
∅j ,k (n ) = 2 j /2h(2 j n − k ) (6.10)
where n = 0, 1, 2, 3, …, N − 1, j = 0, 1, 2, 3, …, J − 1, k = 0, 1, 2, 3, …, 2j − 1, and
J = log2M . M represents the length of the EEG signal. Approximate coefficient Ai
and detailed coefficient Di are calculated using the following equations:
1
Ai = ∑n x(n) × ∅j,k (n) (6.12)
M
1
Di = ∑n x(n) × ψj, k(n). (6.13)
M
6-13
Modern Optimization Methods for Science, Engineering and Technology
6.3.2 Classification
ANFIS is one of the neuro-fuzzy models. Both NN and fuzzy logic are individual
systems. When increasing the complexity of the model it is difficult to calculate the
membership function and fuzzy rules encourage the development of another
approach that has the tendencies of both systems. This approach is called the
adaptive neuro-fuzzy system, which gives the advantages of both NN and fuzzy
logic. The advantage of fuzzy logic is that fuzzy rules describes real-world problems
in a better way. The second thing is interpretability; this means it is easy to explain
how every single output value of the fuzzy system is generated. The problem with the
fuzzy system is that it needs expert knowledge to create rules and it takes a long time
6-14
Modern Optimization Methods for Science, Engineering and Technology
to tune the parameter (membership function). Both problems arise simply because it
is not possible to tune the fuzzy system. However, for NN, it trains well but it is
remarkably difficult to use earlier learning about the considered system. To over-
come the disadvantages of these two systems, some researchers combine both the
NN and fuzzy logic systems. The hybrid system ANFIS was proposed by Jang [31].
where x , y and Ai , Bi represent the given input and antecedent, respectively. The fi
includes f1 and f2 in terms of pi , qi and ri whose values are found through the training
6-15
Modern Optimization Methods for Science, Engineering and Technology
process. The structure for two rules is shown in figure 6.9. Circles represent fixed
nodes while squares represent adaptive nodes. Five layers of ANFIS are described.
In the first layer, it receives incoming crisp input and determines rules for each
crisp input. All the node are adaptive nodes. This layer is also called the fuzzification
layer. The output of this layer is given as
Oi1 = μ Ai (x ) i = 1, 2, (6.14)
⎧ ⎛ x − c ⎞2 ⎫
⎪ ⎪
μ Ai (x ) = exp⎨ −⎜ i
⎟ ⎬, (6.15)
⎩ ⎝ 2ai ⎠ ⎭
⎪ ⎪
6-16
Modern Optimization Methods for Science, Engineering and Technology
wi
Oi3 = wi = i = 1, 2. (6.17)
w1 + w2
The nodes are adaptive in the fourth layer and their output is the product of the
input from the third layer and first-order polynomial as
Oi4 = wl ( pi x + qi y + ri ) i = 1, 2. (6.18)
The fifth layer consists of a single fixed node which aggregates all incoming signals.
Hence, the overall output is
2 2
(∑i = 1wi fi )
Oi5 = ∑wi fi i = 1, 2. (6.19)
i=1
w1 + w2
1 n
RMSE = ∑ k=1( yk − ok )2 . (6.21)
n
6.3.2.3 ANFIS-PSO
Neuro-FIS is optimized by adapting the membership function parameter and
consequent parameter, so objective functions are minimized. The back-propagation
6-17
Modern Optimization Methods for Science, Engineering and Technology
algorithm is a widely used adaptation method of FIS that is used to recursively solve
the optimization problem. The difficulty with this algorithm is trapping in local
minima. To overcome this problem, evolutionary algorithms such as the genetic
algorithm and PSO have been used [32, 33]. In this proposed method we perform
two-class MI classification based on ANFIS-PSO. Here PSO is used to tune the
antecedent and consequent parameters of ANFIS and examine the performance
accuracy. The total number of the antecedent parameter is the sum of all the
parameters in each membership function. The antecedent parameter (ai , ci ) is related
to the Gaussian membership function shown in equation (6.15). The consequent
parameters are output parameters ( pi , qi and ri ) that are shown in equation (6.18).
The mathematical model of ANFIS is quite similar to the NN where the weights are
updated according to the error criteria. In ANFIS the antecedent parameters are
calculated which are associated with the membership function. In the rules shown in
equation (6.1) the ‘IF’ part contains the membership function parameters and the
‘THEN’ part contains the linear output variable. These parameters are evaluated as
weights; the performance of ANFIS depends on the structural parameter and
training related parameter. In this chapter, an ANFIS system is trained by evolu-
tionary algorithms (EAs) such as PSO in order to optimize the classification error.
The whole process of this method is shown in figure 6.10. As is evident, the
optimization of ANFIS is based on four steps:
1. Initialize the ANFIS system parameters.
2. Evaluate the output of the initial ANFIS and calculate the MSE by
comparing the actual output value and target value.
6-18
Modern Optimization Methods for Science, Engineering and Technology
3. The output is not too good so in order to minimize the MSE a learning
process should be performed.
4. Using PSO, membership functions are optimized in order to minimize the error.
1. Data loading: Load the EEG data acquired from BNCI Horizon-2020.
2. Generating basic FIS: Generate the initial fuzzy inference system.
3. Training ANFIS: Tune the parameters of the Gaussian membership function and optimize the
MSE value using PSO.
4. Classification: Apply ANFIS-PSO on the training and testing data.
5. Performance evaluation: Calculate the MSE value for classification on the training and
testing data.
END.
6-19
Modern Optimization Methods for Science, Engineering and Technology
Table 6.1. Average of the approximate and detailed coefficients for channel C3 of subject 1.
Dataset D1 D2 D3 D4 D5 A5
Population size 25
Inertia weight 1
Personal learning coefficient 1
Global learning coefficient 2
Performance of ANFIS
Training Testing
6.4 Results
The acquired EEG signals are segmented according to a given trial in the dataset.
We use DWT for the decomposition of EEG into five level wavelet coefficients. The
average approximate and the detailed coefficients for channel C3 subject 1 are
shown in table 6.1.
The extracted features are used as input for ANFIS and ANFIS-PSO. For
ANFIS-PSO different parameters are used during the training. The description of all
the parameters is provided in table 6.2. In order to perform the classification, the
data are divided into two parts, one for training and the other for testing; here 80%
of the data are used for training and 20% of the data are used for testing.
The predicted RMSE and MSE values for three subjects are shown in tables 6.3
and 6.4, figures 6.11 and 6.12 shown the training and testing performance of subject
1 for ANFIS and figures 6.13 and 6.14 shown the training and testing performance
of subject 1 for ANFIS-PSO.
6-20
Modern Optimization Methods for Science, Engineering and Technology
Performance of ANFIS-PSO
Training Testing
6-21
Modern Optimization Methods for Science, Engineering and Technology
6.5 Conclusion
In this chapter classification of two-class MI action of the right hand and foot
moment is performed using an ANFIS and ANFIS-PSO. Before classification, some
preprocessing methods such as segmentation and feature extraction are also
experimented. To extract features the DWT method was used, so in this way we
calculate six features for each channel (C3, Cz, C4) and a total of 18 features for
each subject. These feature are applied as input for the classifier. It is found that PSO
performed well for tuning the parameters of ANFIS with increased accuracy.
6-22
Modern Optimization Methods for Science, Engineering and Technology
References
[1] Mulder T 2007 Motor imagery and action observation: cognitive tools for rehabilitation J.
Neural Transm. 114 1265–78
[2] Kübler A, Kotchoubey B, Kaiser J, Wolpaw J R and Birbaumer N 2001 Brain–computer
communication: unlocking the locked Psychol. Bull. 127 358
[3] Lotte F, Congedo M, Lécuyer A, Lamarche F and Arnaldi B 2007 A review of classification
algorithms for EEG-based brain–computer interfaces J. Neural Eng. 4 R1
[4] Pfurtscheller G, Neuper C, Flotzinger D and Pregenzer M 1997 EEG-based discrimination
between imagination of right and left hand movement Electroencephalogr. Clin. Neurophysiol.
103 642–51
[5] Chiappa S and Bengio S 2004 HMM and IOHMM modeling of EEG rhythms for
asynchronous BCI systems European Symposium on Artificial Neural Networks ESANN
[6] Millan J R and Mouriño J 2003 Asynchronous BCI and local neural classifiers: an overview
of the adaptive brain interface project IEEE Trans. Neural Syst. Rehabil. Eng. 11 159–61
[7] Penny W D, Roberts S J, Curran E A and Stokes M J 2000 EEG-based communication: a
pattern recognition approach IEEE Trans. Rehabil. Eng. 8 214–5
[8] Qin L, Ding L and He B 2004 Motor imagery classification by means of source analysis for
brain–computer interface applications J. Neural Eng. 1 135
[9] Wei Q and Wang Y 2011 Binary multi-objective particle swarm optimization for channel
selection in motor imagery based brain–computer interfaces 2011 4th International
Conference on Biomedical Engineering and Informatics (BMEI) vol 2 (Piscataway, NJ:
IEEE), pp 667–70
[10] Hasan B A S, Jan J Q and Zhang Q 2010 Multi-objective evolutionary methods for channel
selection in brain computer interface: some preliminary experimental results IEEE Congress
on Evolutionary Computation (CEC) pp 1–6
[11] Lv J and Liu M 2008 Common spatial pattern and particle swarm optimization for channel
selection in BCI 3rd International Conference on Innovative Computing Information and
Control (Piscataway, NJ: IEEE), p 457
[12] Kumar S U and Hannah Inbarani H 2017 PSO-based feature selection and neighborhood
rough set-based classification for BCI multiclass motor imagery task Neural Comput. Appl.
28 239–58
[13] Hsu W-Y 2013 Application of quantum-behaved particle swarm optimization to motor
imagery EEG classification Int. J. Neural Syst. 23 1350026
[14] Xu P, Liu T, Zhang R, Zhang Y and Yao D 2014 Using particle swarm to select frequency
band and time interval for feature extraction of EEG based BCI Biomed. Sig. Process.
Control 10 289–95
[15] Ahirwal M K, Kumar A and Singh G K 2012 Analysis and testing of PSO variants through
application in EEG/ERP adaptive filtering approach Biomed. Eng. Lett. 2 186–97
[16] Eberhart R and Kennedy J 1995 A new optimizer using particle swarm theory MHS’95.
Proceedings of the Sixth International Symposium on Micro Machine and Human Science
(Piscataway, NJ: IEEE), pp 39–43
[17] Omrana M, Engelbrechta A and Salmanb A 2005 Particle swarm optimization method for
image clustering Int. J. Pattern Recogn. Artif. Intell. 19 297–322
[18] Kennedy J and Eberhart R C 1997 A discrete binary version of the particle swarm algorithm
1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational
Cybernetics and Simulation vol 5 (Piscataway, NJ: IEEE), pp 4104–8
6-23
Modern Optimization Methods for Science, Engineering and Technology
[19] Bergh F V D and Engelbrecht A P 2006 A study of particle swarm optimization particle
trajectories Info. Sci. 176 937–71
[20] Singh N and Singh S B 2012 Personal best position particle swarm optimization J. Appl.
Comput. Sci. Math. 12 69–76
[21] Kumar A, Singh B K and Patro B 2016 Particle swarm optimization: a study of variants and
their applications Int. J. Comput. Appl. 135 24–30
[22] Poli R, Kennedy J and Blackwell T 2007 Particle swarm optimization Swarm Intell. 1 33–57
[23] Angline P 1998 Evolutionary optimization versus particle swarm optimization: philosophy
and performance difference Int. Conf. Evolution. Program. 1447 601–10
[24] Parsopoulos K and Vrahatis M 2001 Particle swarm optimizer in noisy and continuously
changing environments Artificial Intelligence and Soft Computing ed M H Hamza
(Anaheim, CA: IASTED/ACTA Press) pp 289–94
[25] Hendtlass T 2005 WoSP: a multi-optima particle swarm algorithm 2005 IEEE Congress on
Evolutionary Computation vol 1 (Piscataway, NJ: IEEE), pp 727–34
[26] Silva A, Neves A and Costa E 2002 Chasing the swarm: a predator–prey approach to
function optimisation Proc. of the MENDEL2002––8th Int. Conf. on Soft Computing Brno,
Czech Republic
[27] Parsopoulos K E and Vrahatis M N 2002 Particle swarm optimization method for
constrained optimization problems Front. Artif. Intell. Appl. 76 214–20
[28] Steyrl D, Scherer R, Förstner O and Müller-Putz G R 2014 Motor imagery brain–computer
interfaces: random forests vs regularized LDA-non-linear beats linear Proc. of the 6th Int.
Brain-Computer Interface Conf. pp 241–4
[29] Ren W, Han M, Wang J, Wang D and Li T 2016 Efficient feature extraction framework for
EEG signals classification 2016 Seventh International Conference on Intelligent Control and
Information Processing (ICICIP) (Piscataway, NJ: IEEE), pp 167–72
[30] Jahankhani P, Kodogiannis V and Revett K 2006 EEG signal classification using wavelet
feature extraction and neural networks IEEE John Vincent Atanasoff 2006 International
Symposium on Modern Computing (JVA’06) (Piscataway, NJ: IEEE), pp 120–4
[31] Jang J-S R 1993 ANFIS: adaptive-network-based fuzzy inference system IEEE Trans. Syst.
Man Cybernet. 23 665–85
[32] Catalão J P d S, Pousinho H M I and Mendes V M F 2011 Hybrid wavelet-PSO-ANFIS
approach for short-term electricity prices forecasting IEEE Trans. Power Syst. 26 137–44
[33] Ghomsheh V S, Shoorehdeli M A and Teshnehlab M 2007 Training ANFIS structure with
modified PSO algorithm Mediterranean Conference on Control and Automation (Piscataway,
NJ: IEEE), pp 1–6
6-24
IOP Publishing
Chapter 7
Multi-criterion and topology optimization using
Lie symmetries for differential equations
Sailesh Kumar Gupta
Lie addressed the unification problem of all the apparently different integration
methods for differential equations and investigated some elegant and simple
algebraic structures, commonly known as continuous transformation groups, which
hold the key to the problem. This connection with differential equations naturally
led to the study of the topological manifolds associated with group structure.
Interestingly, the manifolds needed for solving different aspects related to differ-
ential equation systems can be thought to be locally some open subsets defined in an
n-dimensional Euclidean space where one can choose the local coordinates freely.
The key concept turns out to be that of symmetry or the infinitesimal generator
associated with the group. Once the generators associated with a system of equations
become available, many applications become immediate. For ordinary differential
equations (ODEs), the symmetries and their generators always lead to the possibility
of going a step forward in the integration of the equation, and if one has a
sufficiently large number of symmetries, the complete integration of the differential
equation is guaranteed in most cases by the method of quadrature alone. For most
partial differential equations (PDEs), one cannot write down the general solution
but has to rely on various ansatz such as the similarity solutions, traveling waves,
separable solutions and so on. These ansatz methods usually lead to the problem of
solving some ODE generated in the process. These methods involve nothing more
than looking for solutions that are invariant under a particular group of symmetries
associated with the equations. Further, it is well documented in the associated
literature that any linear combination of the infinitesimal generators associated with
a given system of equations also leads to some group invariant solutions for the same
system of equations. Thus, it is easy to understand that we are left with a situation
for dealing with infinite group invariant solutions for the system. Hence, we have to
devise a mechanism to classify all the inequivalent solutions only and the corre-
sponding problem leads to the optimization problem for the group invariant
solutions.
7.1 Introduction
Accurate formulations of different physical problems can be achieved with the help
of nonlinear ordinary or partial differential equations (ODEs and PDEs). A vast
literature exists on all of the solution methods, such as analytical, numerical or
approximate, but the concept of Lie symmetry groups provides a complete analytical
understanding of all these apparently different solution methods. Lie [1] began the
unification of all these different methods and founded the concept of continuous
transformation groups or Lie groups [2–4]. He found that the mathematically
challenging nonlinear conditions generated under the action of a continuous trans-
formation group, to preserve the invariance of a differential equation system, could
be systematically replaced by some simple linear conditions associated with the
generators of the group. The connection of Lie groups with the differential equations
naturally led to the study of the topological manifolds [5–7] associated with the group
structure. These manifolds can be thought, at least locally, to be some open subsets of
an n-dimensional Euclidean space in which the coordinates can be chosen freely, at
least locally. The idea of the infinitesimal generator of the group plays an important
role in the process. The infinitesimal generators are like a vector field of a given
manifold and their flows coincide with the one-parameter groups that they generate.
Thus, the starting point of the study of a differential equation using the group
techniques basically requires familiarity with two aspects. First, the idea of flow in a
vector field and, second, the infinitesimal criterion of invariance for the system under
the action of group transformations. The prolongation formula of vector fields
becomes the main tool in calculating the symmetry groups of a system [2, 3], and
requires the introduction of spaces which also includes the derivatives of different
dependent variables along with the independent and dependent variables as coor-
dinates of the manifold. The space of all these variables is called the jet space. Once
the symmetry groups are calculated for a system, many applications become
immediate. For the ODEs, the existence of Lie symmetries always leads to the
possibility of going a step forward in the integration of the equation, and if one has a
sufficiently large number of symmetries, then the complete integration of the
differential equation is guaranteed in most cases by the method of quadrature alone.
Thus, it becomes very important to ascertain the possibility of maximum reduction in
order of the equation in this case. When it comes to the PDEs, usually one obtains
several arbitrary functions which are not very useful in practical applications. For
most PDEs, one cannot write down the general solution but has to rely on various
ansatz such as the similarity solutions, traveling waves, separable solutions and so on.
These ansatz methods usually lead to solving some ODEs generated in the process.
These methods involve nothing more than looking for solutions that are invariant
under a particular symmetry group of transformations for the system. Further, it is a
well known fact that for a given system of differential equations possessing
7-2
Modern Optimization Methods for Science, Engineering and Technology
symmetries, one can have a system of infinite group invariant solutions. This is due to
the fact that any linear combination of the symmetry generators of the system again
leads to a symmetry associated with the same system. Hence, we have to devise a
mechanism to classify all the in-equivalent solutions only, and the problem leads to
the optimization of the associated group invariant solutions for the system. The
classified solutions give the optimal group invariant solutions [2, 3].
Thus, in this chapter we will try to address the different issues discussed so far
with examples. We start with a brief introduction of the fundamentals of topological
manifolds and then establish the connection between the groups and differential
equations [3]. We discuss the methods to calculate the group invariant solutions for
the differential equations and classify them to establish the corresponding optimized
solutions. Finally, concluding remarks are given with further reading suggestions.
7-3
Modern Optimization Methods for Science, Engineering and Technology
• Inverses: For each k ∈ S , there exists a unique inverse k −1, such that
k · k −1 = k −1 · k = e.
a · (b · z ) = (a · b ) · z . (7.1)
7-4
Modern Optimization Methods for Science, Engineering and Technology
dz k
= ξ k(z ), k = 1, … , m . (7.3)
dε
Now it is guaranteed from the theory of differential equations that for smoothly
varying ξ k (z ) we can have a unique solution to the initial value problem of equation
(7.3) with the initial condition
ϕ(0) = z0. (7.4)
Again, if we are given smoothly varying functions h(z ) with domain z ∈ Z , then h · A
becomes a smooth vector field, with (h · A)∣z = h(z )A∣z and h · A = ∑h(z )ξ i (z ) ∂ i .
∂z
7-5
Modern Optimization Methods for Science, Engineering and Technology
ϕ(ϵ ) = (ϕ1(ϵ ), … , ϕ m(ϵ )). The corresponding tangent vector to the manifold Z at
1 m
each point z = ϕ(ϵ ) of C is given by the derivative ddϕε = ( ddϕε , … , ddϕε ). Now, for the
field A ∈ Z , the curve C through the point z ∈ Z is given by α (ϵ, z ). We call it the
flow of A and it has the properties
α(β, α(ϵ , z )) = α(β + ϵ , z ), z ∈ Z, (7.5)
for all β, ϵ ∈ R with
α(0, z ) = z (7.6)
and
d
α(ϵ , z ) = A∣α(ε,z ) (7.7)
dϵ
for all ϵ where defined. Comparing equations (7.5) and (7.6) with equations (7.1) and
(7.2), we find the similarity between the group action of the Lie group R on Z and
the respective flow of A. The transformations defined above are popularly called
one-parameter transformation groups and A is is called the infinitesimal generator.
Now, making use of Taylor’s theorem, we can write
α(ϵ , z ) = z + ϵξ(z ) + O(ϵ 2 ),
where (ξ1(z ), … , ξ m(z )) are the coefficients of A.
The curves C for the vector field A are the orbits of the one-parameter group
action. Conversely, if a one-parameter transformation group, α (ε, z ), acts on Z then
the corresponding generator is obtained by calculating (7.7) at ε = 0 and is given by
d
A∣z = α(ε , z ).
dε ε=0
This notation allows us to rewrite the properties of the flow of A listed above for all
z ∈ Z as
d
exp[(β + ϵ )A]z = exp(β A)exp(ϵ A)z , exp(0A)z = z , [exp(ϵ A)z ] = A∣exp(ε A)z .
dϵ
7-6
Modern Optimization Methods for Science, Engineering and Technology
The process of differentiation can be continued and if substituted into the Taylor
series assuming convergence, we obtain the Lie series for the action of the flow on h
and this can be written as
∞
ϵk k
h(exp(ϵ A)z ) = ∑ A (h)(z ).
k=0
k!
7-7
Modern Optimization Methods for Science, Engineering and Technology
for any (z, v(n) ) ∈ X (n). Now, any vector field in X is given by
p q
∂ ∂
A= ∑ξi(z , v) ∂ + ∑ ϕα(z , v) ∂ .
i=1
zi α=1
vα
Hence we can say that ξ (i )(z, v ) and ϕα(z, v ) determines the coefficient of the nth
prolongation of A and can be written as
p q
∂ ∂
pr (n)A = ∑ξi(z , v) + ∑∑ϕαJ (z , v) ∂ .
i=1
∂z i α=1 J
vJα
7-8
Modern Optimization Methods for Science, Engineering and Technology
symmetry groups S attached to it. Now, the group transformations gε = exp(ϵ A) for
the vector fields A can be written as
(z˜ , v˜ ) = gε(z , v) = (Θε(z ), v)
with the components Θi (z ) satisfying the equation
d Θiε(z )
= ξi (z ).
dz ε=0
where
p
d ∂Θ−κ ε
ϕ j (z , v(1)) = ∑ (Θε(z ))vκ ,
dε ˜j
ε=0 κ= 1 ∂z
with the assumption gε−1 = g−ε in the domain of definition. As the functions are
assumed to be smooth, the order of the differentiation can be interchanged and,
simplifying, we can write
p
∂Θ κ
ϕ j (z , v , vz ) ≡ ϕ j (z , v(1)) = −∑ vκ ,
κ= 1
∂z j
which provides the expression for pr (1) A in (7.8). Now the above expression can be
written in terms of the total derivative of ϕ with respect to z as
∂ϕ ∂ϕ
ϕ j (z , v(1)) = Dj ϕ(z , v) = j
+ vj .
∂z ∂v
7.3.2.1 Definition
Given H (z, v(n) ), the expression
q
∂H ∂H
Di H = + ∑∑vJα/i ∂ , (7.9)
∂z i α= 1 J
vJα
∂v α k +1 α
where for J = (j1 , …, jκ ) the expression vJα/i = Ji = i∂ j v j represents the ith
∂z ∂z ∂z ⋯ ∂z κ
total derivative of H . In (7.9) the summation domain of J goes up to order n.
7-9
Modern Optimization Methods for Science, Engineering and Technology
becomes a vector field in the jet space Z (n), with 1 ⩽ jκ ⩽ p, 1 ⩽ κ ⩽ n , then the
coefficient functions ϕαJ of pr (n) A are given by
⎛ p ⎞ p
ϕαJ (z , v(n)) = DJ ⎜⎜ϕα − ∑ξiviα⎟⎟ + ∑ξivJα/i , (7.10)
⎝ i=1 ⎠ i=1
Proof. The theorem will be proved in general by the very simple and effective
method of the induction principle. We note that the second jet space X (1+1) belongs
to a subspace of the first jet space (X (1) )(1). To be precise, we note that the first-order
derivative of vJα gives the second order derivative of the same. Similarly, the space
X (n+1) can be treated as a subspace of the nth jet space because its first jet space is
given by (X (n) )(1). Thus, we can say that the vector fields on the manifold X (n−1) are
given by pr (n−1) A , which can then be prolonged to (X (n−1) )(1) by the use of first-order
prolongation formula. The resulting vector field is now restricted to Z (n) and its
subspaces which in turn will determine the expression for pr (n) A . In the process the
new nth order coordinates in (X (n−1) )(1) are given by vJα/κ = ∂vJα /∂z κ where
J = (j1 , …, jn−1 ), 1 ⩽ κ ⩽ p and 1 ⩽ α ⩽ q . At this point we use the definition of
the total derivative so that the coefficient of ∂/∂vJα/κ , in (pr (n−1) A)(1), becomes
p
ϕαJ /κ = DκϕαJ − ∑DκξivJα/i . (7.11)
i=1
Now if we are able to prove that (7.10) solves (7.11) in closed form then the proof is
complete. Using induction, we find
⎧
⎪
⎛ p ⎞ p p ⎫
⎪
ϕαJ /κ = Dκ⎨ D J ⎜⎜ϕα − ∑ξiviα⎟⎟ + ∑ξivJα/i − ∑DκξJα/i ⎬
⎪ ⎪
⎩ ⎝ i=1 ⎠ i=1 i=1 ⎭
⎛ p ⎞ p p
= DkDJ ⎜⎜ϕα − ∑ξiviα⎟⎟ + ∑(DκξivJα/i + ξivJα/iκ ) − ∑DκξivJα/i
⎝ i=1 ⎠ i=1 i=1
⎛ p ⎞ p
⎜ i α⎟
= DκDJ ⎜ϕα − ∑ξ vi ⎟ + ∑ξivJα/iκ ,
⎝ i=1 ⎠ i=1
∂ 2v Jα
with vJα/iκ = . Thus, the form of ϕαJ /κ is like (7.10), which completes the proof. □
∂z i ∂z κ
7-10
Modern Optimization Methods for Science, Engineering and Technology
7.3.3 Criterion of maximal rank and infinitesimal invariance for differential equations
The given system Λ of equations Eν will possess the Jacobian matrix
JE (z, v(n) ) = (∂Eν /∂z i , ∂Eν /∂vJα ) and if E (z, v(n) ) = 0, then the maximal rank con-
ditions state that the matrix is of maximum rank with rank l .
Proof. Using theorem 7.1 and the condition of maximal rank for the associated
Jacobian matrix, the proof is immediate [3].
7-11
Modern Optimization Methods for Science, Engineering and Technology
ϕ zzz = Dz3ϕ − vz Dz3ξ − vt Dt3η − 3vzz Dz2ξ − 3vzt Dz2η − 3vzzz Dzξ − 3vzzt Dxη .
Similar expressions can be obtained for other coefficients as well. Once this is done
one needs to put all these in (7.14) while making sure to replace vt by −(vvz + vzzz ) in
all the expressions before simplifying. The working rule to analyze these equations is
to start first from the solution of the coefficients of highest order derivatives equated
to zero. Proceeding in this manner we see the coefficient of vzzt is Dzη. Thus, the
solution of Dzη = 0 gives the solution that η depends on t only. Next the coefficient
of vzz2 gives ξv = 0, which gives the result ηt = 3ξz from the coefficient of vzzz . Thus,
we obtain the result ξ = 13 ηt z + σ (t ). Next the coefficient of vzz is zero thus making ϕ
linear in v and v a function of t only. Next vz gives the equation
−ξt − v(ϕv − ηt ) + v(ϕv − ξz ) + ϕ = 0,
where c are arbitrary constants and hence we obtain the four-dimensional vector
fields for the KdV given by
∂
A1 =
∂z
∂
A2 =
∂t
∂ ∂
A3 = t +
∂z ∂v
∂ ∂ ∂
A 4 = 3t + z − 2v .
∂t ∂z ∂v
7-12
Modern Optimization Methods for Science, Engineering and Technology
Now we give an example of group invariant solution for the KdV equation. We
consider the group of scaling symmetries (z, t , v ) → (εz, ε3t , ε−2v ), generated by the
generator A4 of the KdV equation. We can find functions F (z, t , v ) for any of its
generators Ai, such that they satisfy the equations Aj F ≡ 0. This equation can be
solved for each of the generators by the method of characteristics and can be written as
dz dt du
= = . (7.15)
ξ (z , t , v ) η(z , t , v ) ϕ(z , t , v)
7-13
Modern Optimization Methods for Science, Engineering and Technology
The solutions give the invariants for the system corresponding to the generator
under consideration. The invariants are then treated as new variables for equation
(7.13). Using the procedure enumerated before the global invariants of A4 in the
1 2
upper half space (t > 0) for the KdV are given by y = t − 3 z and u = t 3 v. Now
treating u as the function u(y ) we obtain the reduced equation
1 2
u yyy + uu y − yu y − u = 0.
3 3
This equation gives solutions in terms of the second Painlevé transcendent [3] given
by equation
1 3 1
wyy = w + yw + κ
3 3
with u = wy − 61 w 2 and κ being a constant. Thus, the final similarity solution can be
written in terms of the second Painlevé transcend for the KdV.
7.4.1 Adjoint representation for the cKdV and optimization of the group generators
The cylindrical Korteweg–de Vries (cKdV) equation originates in different physical
situations in non-planar cylindrical geometry and is given by
v
vt + + vvz + vzzz = 0. (7.16)
2t
The cKdV has the four-dimensional Lie point symmetry generators [8, 9] given by
∂
A1 =
∂z
∂ ∂ ∂
A2 = 3t +z − 2v
∂t ∂z ∂v
1 ∂ 1 ∂
A3 = 2t 2 + t− 2
∂z ∂v
1 ∂ 3 ∂ ∂
A4 = 2zt 2
∂z ∂t
1
( 1
+ 4t 2 + zt − 2 − 4vt 2
∂v
. )
7-14
Modern Optimization Methods for Science, Engineering and Technology
The first step to obtaining optimal group invariant solutions of equation (7.16)
following the procedure given in [3] is to find the adjoint representation of the
generators. Now for any n-dimensional symmetry algebra S generated by the vector
fields {A1, A2, …, An}, we can have equivalent elements
⎧
⎪
n n ⎫
⎪
⎨
⎪
A= ∑ai Ai , w= ∑bi Ai⎬ ∈ S,
⎪
⎩ i=1 i=1 ⎭
if any one of the following conditions is satisfied:
• For g ∈ G we have a transformation Adg (w) = A , where Adg is the adjoint of
g and is written as Adg (w) = g −1wg .
• There is a constant ϑ , such that A = ϑw .
The use of the Lie series provides an effective tool to find the adjoint system of any
symmetry group [3] and can be constructed using the relation
ϵ2 ϵ3
Adg(exp(ϵ A))w = w − ϵ[A , w] + [A , [A , w]] − [A , [A , [⋯ + ]]]w, A ,
2! 3!
where [A, w] = Aw − wA , is the commutator between the two vectors. The tables 7.1
and 7.2 give the commutator table and the adjoint system for the cKdV, respectively.
The (i , j )th entry in table 7.1 gives [Ai , Aj ] and in the table 7.2 the (i , j )th entry
indicates Adg (exp(ϵ Ai))Aj .
Proof. We take any vector Ai ∈ S . Let ψ = {z, t , v} denote the set of all variables
for the cKdV equation and T (ϵ ) denote a symmetry mapping generated by vector Ai
such that
T (ε ): ψ ↦ exp(ε A i)ψ = ψ˜ .
We further define the action of T ≡ T (ε ) on any smooth function F (ψ ) by
TF (ψ ) = F (Tψ ) = F (ψ˜ ). (7.17)
Now using (7.17) and assuming the existence of T −1 we can write
˜ F (ψ ) = T AF (ψ ) = T AT −1F (ψ˜ ).
A (7.18)
Using the inverse transform theorem we can write F (ψ˜ ) = h(ψ ) and F (ψ ) = h(ψ˜ )
and this is always true for group invariant solutions generated by Lie symmetry
groups [2, 3] and hence we can write, from (7.18),
7-15
Modern Optimization Methods for Science, Engineering and Technology
A1 A2 A3 A4
A1 0 A1 0 A3
A3 3
A2 −A1 0 A
2 4
2
A3 0 −
A3 0 0
2
A4 −A3 3
− 2 A4 0 0
Adg A1 A2 A3 A4
A1 A1 A2 − ϵ A1 A3 A 4 − ϵ A3
ε 3
A2 e ε A1 A2 e− 2 A3 e− 2 ε A 4
A3 A1 ε
A2 + 2 A3 A3 A4
A4 A1 + ϵ A3
3
A2 + 2 ϵ A 4 A3 A4
˜ h(ψ˜ ) = T AT −1h(ψ ).
A (7.19)
Now an invariant is a function h(z, t , v ) for the cKdV if it satisfies the equation
A ih ≡ 0. (7.20)
Hence using (7.19) and (7.20) we can say that if f = h(z ) is an invariant solution for
A then f˜ = h(z˜ ) will be an invariant solution for à , when
˜ = T AT −1 = exp(ϵ A i)A exp( −ϵ A i) ≡ Ad Ai(A).
A □
7-16
Modern Optimization Methods for Science, Engineering and Technology
as a representative of the class. The set of all inequivalent classes gives the optimized
subalgebra for the system.
Now following the procedure in [3] and using the tables 7.1 and 7.2 we will find
the optimal system of subalgebras and their corresponding representatives for
equation (7.16). The main idea is to simplify as many coefficients of the vector
(ci), as much as possible, by application of the adjoint map given in table 7.2. For the
cKdV, the function ζ (A) = c2 is found to remain an invariant under the action of the
full adjoint (Adg ) group and can be expressed as
ζ(Adg(A)) = ζ(A), A ∈ V, g ∈ S.
Thus, at first we must start by considering different values of c2 . We suppose c2 ≠ 0
and set c2 = 1. The transformed vector is then acted on by Adg (exp( − 23 c4 A 4)) so that
the coefficients of A 4 vanish and we obtain a new vector
⎛ ⎛ 2 ⎞⎞
A′ = Adg⎜exp⎜ − c4A4 ⎟⎟A = c1′A1 + c3′A3 + A2
⎝ ⎝ 3 ⎠⎠
for some other constants c1′, c3′ depending on the original constants c1, c3 and c4 . The
new vector is acted by Adg (exp( −2c3′A3)) so that it is transformed to A″ = c1″A1 + A2 ,
which can again be transformed by Adg (exp(c1″A1)) so that the coefficients of A1
vanish and the new vector A‴ is equivalent to A2. Thus, we see that under the adjoint
action the vector A with c2 ≠ 0 generates subalgebras that are equivalent to the
subalgebra spanned by A2 . The rest have c2 = 0 and are of the form
A = c1A1 + c3A3 + c4 A 4.
Next we suppose c4 ≠ 0 and put c4 = 1. The vector is then transformed by
Adg (exp(c3A1)) so that the coefficients of A3 vanish and we obtain the new
transformed vector
A′ = c1′A1 + A 4.
No further simplification is possible for this vector and therefore the coefficient c1′
can be chosen only as 0 or ±1. Hence, any one-dimensional vector with c2 = 0 and
c4 ≠ 0 is spanned by either
A 4, A 4 + A1 or A 4 − A1.
All the remaining cases can be solved similarly [9] and we obtain an optimized
system of subalgebras for the cKdV to be spanned by the vectors
A1, A2, A3, A 4, A 4 + A1, A 4 − A1.
Again we note that (x , t , v ) → ( −x , t , −v ) gives the discrete symmetry associated
with the cKdV which maps A 4 − A1 to A 4 + A1, and thus reduces the number of
inequivalent subalgebras for the cKdV equation to five, i.e.
A1, A2, A3, A 4, A 4 + A1, (7.21)
7-17
Modern Optimization Methods for Science, Engineering and Technology
which gives the optimal system of subalgebras for the cylindrical Korteweg–de Vries
(cKdV) equation.
7.4.2 Calculation of the optimal group invariant solutions for the cKdV
Having found the optimized system of one-dimensional subalgebras for the cKdV,
we are now in a position to calculate the corresponding optimized system of group
invariant solutions in this case. However, it must be noted that each solution will
have a singularity at t = 0. Now we proceed with each of the generators in the
optimal subalgebra (7.21):
i. A1 = ∂∂z
The invariants for the generator are given by y = t , u = v and the
corresponding reduced equation is
du u
+ = 0.
dy 2y
Simple integration gives the solution u(y ) = a / y , where a is a constant.
Thus, in terms of original variables the solution becomes
1
v = at − 2 . (7.22)
7-18
Modern Optimization Methods for Science, Engineering and Technology
1
iv. For v4 , the invariants are y = zt − 2 and u = vt − z/2. The reduced equation
after two integrations gives
⎛ du ⎞2 u3
⎜ ⎟ =− + mu + n ,
⎝ dy ⎠ 3
where m and n are arbitrary constants. It has solution u = −12℘(y ), where
the Weierstrass elliptic functions (℘) [4] satisfy the equation
⎛ d ℘ ⎞2
⎜ ⎟ = 4℘ 3 − g℘ − g1, (7.26)
⎝ dy ⎠
where g and g1 are the invariants for ℘(y ). Now, if a, b and c are the roots of
equation (7.26) then we can have the following situations:
• When a < b < c then the solution for t > 0 gives the cnoidal wave
solution [4]
z + 2δ 2δ ⎛ δ ⎞
−1
v= + 2 dn 2⎜ zt 2 , s⎟ , (7.27)
2t s t ⎝ 6s 2 ⎠
z + 2a 1 ⎛ c − a −1⎞
v= + (c − a )sech 2⎜ zt 2 ⎟ . (7.28)
2t t ⎝ 12 ⎠
1 1 1
v. For A 4 + A1, the invariants are y = zt − 2 + 1/4t and u = vt − yt 2 /2 − t − 2 /8.
The resulting reduced equation can be integrated to give the first Painlevé
transcendent,
d 2u u2 y
2
+ − = σ,
dy 2 4
where σ is a constant. The corresponding solution in terms of v are given by,
1 3
zt − 2 1 t− 2 1 ⎛ 1 1⎞
v= + + + u⎜zt − 2 + ⎟ . (7.30)
2 8t 8 t ⎝ 4t ⎠
7-19
Modern Optimization Methods for Science, Engineering and Technology
References
[1] Lie S 1888 Theorie der Transformationsgruppen (Leipzig: Teubner)
[2] Ovsiannikov L V 1982 Group Analysis of Differential Equations (New York: Academic)
[3] Olver P J 1993 Applications of Lie Groups to Differential Equations (New York: Springer)
[4] Ibragimov N H 1994 Lie Group Analysis of Differential Equations vol 1 (Boca Raton, FL:
CRC Press)
[5] Cartan E 1930 La Théorie des Groupes Finis et Continus et l, Analysis Situs Mém. Sci. Math.
42 (Paris: Gauthier-Villars)
[6] Chevalley C 1946 Theory of Lie groups I (Princeton, NJ: Princeton University Press)
[7] Warner F W 1971 Foundations of Differentiable Manifolds and Lie Groups (Berlin: Springer)
[8] Zakharov N S and Korobeinikov V P 1980 Group analysis of the generalised Korteweg–de
Vries–Burger’s equation J. Appl. Math. Mech. 44 668
[9] Gupta S K and Ghosh S K 2017 Classification of optimal group-invariant solutions:
cylindrical Korteweg–de Vries equation J. Optim. Theory Appl. 173 763–9
[10] Palais R S 1957 A global formulation of the Lie theory of transformation groups Memoirs of
the American Mathematical Society Number 22 (Providence, RI: American Mathematical
Society)
[11] Patera J, Winternitz P and Zassenhaus H 1975 Continuous subgroups of the fundamental
groups of physics. I. General method and the Poincaré group J. Math. Phys. 16 1597–614
7-20
Modern Optimization Methods for Science, Engineering and Technology
[12] Bluman G W and Cole P D 1969 The general similarity solution of the heat equation
J. Math. Mech. 18 1025–42
[13] Ames W F 1965 Nonlinear Partial Differential Equations in Engineering (New York:
Academic)
[14] Clarkson P and Kruskal Z 1989 New similarity reductions of the Boussinesq equation
J. Math. Phys. 30 2201–13
[15] Levi D and Wintemitz P 1989 Nonclassical symmetry reduction: example of the Boussinesq
equation J. Phys. A 22 2915–24
[16] Nucci Z C and Clarkson P 1992 The nonclassical method is more general than the direct
method for symmetry reductions. An example of the Fitzhugh–Nagumo equation Phys. Lett.
A 164 49–56
[17] Galaktionov V A 1990 On new exact blow-up solutions for nonlinear heat conduction
equations with source and applications Diff. Int. Eq. 3 863–74
7-21
IOP Publishing
Chapter 8
Learning classifier system
Kapil Kumar Nagwanshi
The learning classifier system (LCS) is a technique which utilizes the power of
genetic algorithms and machine learning to make decisions based on specific rules.
The general components of LCSs include rule based and decision systems, learning
algorithms and rule discovery systems. The current chapter aims to discuss different
learning classifier systems for optimization. To understand each of these classifier
systems some datasets are required. MATLAB® has been utilized to simulate the
behavior and track the use of each of these classifiers. Examples of such learning
systems include tree, K-nearest-neighbor, support vector machine discriminant,
Bayes’ and ensemble classifiers. The degree of each of these classifiers has also been
chosen to describe the performance. The principal component analysis may used to
reduce the learning dimension to obtain a faster result. Multicore CPU and GPU
play a significant role in speeding up the system’s performance. Note, classifiers can
be used with every classification problem. This chapter also helps readers to choose
the best classifier for their work. Some other tools are also discussed at the end of the
chapter. The performance of the system depends on the dataset used. Case studies
suggest the type of result, such as a scatter plot, confusion matrix, parallel plot and
ROC curves. Later in this chapter, Python is introduced for developing the learning
classification. This chapter also aims to describe cloud-based products such as
BigML® and Microsoft® AzureML® for the optimization of classification problems.
The chapter ends with concluding remarks including benchmarking.
8.1 Introduction
The Internet is full of data. If you want to know some details about any particular
topic, it will be able to suggest anything from a few items to a very long list. From
a downloaded search, concluding a result can be a straightforward task in a few
cases, but for all other cases it will be a complex problem. There are a significant
8.2 Background
In the year 2000 Holland et al [6] presented the answer to what a learning classifier
system was, according to the best researchers of the time. Holland divided the
classifier algorithm into three parts: the first part addresses parallelism and
coordination, the second part gives the credit assignment and the third part describes
the rule discovery. Then he described how classifier systems deal with these issues,
followed by rule implementation and the description of future directions. Booker
described the importance of classification mechanisms in parallelism as well as
standardization, that allows construction of a block technique for data processing
through the use of differentiation in conflict management.
Riolo [6] found that the main features of such a classifier system should be as
follows: (i) A communications message board. (ii) A representation of understanding
based on rules. (iii) A contest to activate guidelines, biased by inputs, past
performance and anticipated outcome projections. (iv) Parallel firing of rules, with
endogenously emerging consistency and coordination of operation as a rapidly
growing state created and preserved by the bid handling dynamics. Only at the
effector interface is explicit conflict resolution exclusively implemented. (v)
Temporal difference techniques of some sort are used to allocate credit, e.g. the
traditional algorithms of the bucket-brigade, some-sharing system, Q-learning
algorithms, etc. Note that various types of lines of credit may be assigned
simultaneously, e.g. traditional power is anticipated to payoff depending on past
performance, some rule of payoff accuracy or consistency [7], or a certain measure of
the capacity to forecast a subsequent state [8, 9]. (vi) Heuristics suitable for the
8-2
Modern Optimization Methods for Science, Engineering and Technology
modeling scheme create the finding of the rule. Examples include activated bonding
to detect the surprise-triggered forecast of asynchronous causal links [10–12], or
traditional GAs with mutation and recombination. Holmes et al [13] adapted LCS
for the design, implementation and evaluation of EpiCS, for knowledge discovery in
epidemiological monitoring of a national child automobile passenger protection
program. The paper was concluded with significant detail of regression analytics.
Urbanowicz and Moore [3] explain that if the problem is of a high degree of
complexity, then LCS can solve the problem efficiently. The LCS has be applied in
biology, computer science, artificial intelligence, machine learning, evolutionary
biology, evolutionary computation, reinforcement learning, supervised learning,
evolutionary algorithms and GAs to build the solution domain as a learning
classifier system. Zang et al [14] utilized XCS with memory conditions (absent in
standard XCS) to solve ‘maze environments and the perceptual aliasing problem’
with robust output.
8-3
Modern Optimization Methods for Science, Engineering and Technology
8.3.2 BigML®
BigML® is an advanced tool for machine learning, which contributes a collection of
robustly engineered algorithms established to elucidate real-life problems. The
algorithms based on supervised learning have been utilized to solve classification,
regression and time series forecasting while unsupervised learning has been used to
provide cluster analysis, anomaly detection, topic modeling and principal compo-
nents analysis by implementing a single, standardized framework. In this chapter,
the author has chosen BigML for its applicability for understanding the character-
istic features of different classifiers. BigML provides a broad set of modeling
scenarios to analyze and compare, which makes analysis easy. One can directly
open an evaluation account to understand BigML through the webpage https://
bigml.com [18–20].
8-4
Modern Optimization Methods for Science, Engineering and Technology
Figure 8.1. Graduate admission dataset with (a) 61 classes and (b) 8 classes.
8-5
Modern Optimization Methods for Science, Engineering and Technology
admissiontable = readtable('Admission_Predict.csv');
save('admissiontable1');
load admissiontable;
[m,n] = size(admissiontable); P = 0.70;
idx = randperm(m);
Training = admissiontable(idx(1:round(P*m)),:);
save('Training');
Testing = admissiontable(idx(round(P*m)+1:end),:);
save('Testing');
After running the above code, the admissions table is split into the Training
table with 350 rows and the Testing table with 150 rows. On the other hand,
AzureML provides a Dataset Split module which by default splits the dataset into a
50:50 ratio of training:testing. The subscriber can change this value by selecting this
module.
Before going to test each of the above stated models it is required to load the
Admission prediction training dataset using the following MATLAB command (see
section 8.4.1 on splitting dataset for the sample dataset):
» load Training;
After running the above command the variable names were modified as GREScore,
TOEFLScore, UniversityRating, SOP, LOR, CGPA, Research and
ChanceOfAdmit to make them valid MATLAB identifiers. The original names are
saved in the VariableDescriptions property. Now the stated table (Dataset) is
available for training of the classification learners. Subsequently, we can now start
the classification learner app and start the new session from the workspace with the
admissiontable dataset available in the workspace. The app will automatically
select the predictors and response (by default the last column). One can select any
8-6
Modern Optimization Methods for Science, Engineering and Technology
number of predictors from the available list. After selecting predictors and
responses, it is necessary to validate the dataset by pressing the Start Session
button. By default, the cross-validation is five-fold to protect against overfitting of
the data.
The Classification Learner starts with the scatter plotting of the data. After this
step, it is necessary to select the classifier(s) to see the classification performance. A
scatter plot is useful to check the predictability of the response. The plot has the
option to choose different x and y parameters. Figure 8.2(a)–(f) show the scatter plot
to examine seven variables for predicting the response by choosing different options
on the different x and y parameters under Predictors to reflect the distribution of
response variable ChanceOfAdmit in different colors.
Note: In MATLAB the classification learner app does not support the regression
classifier, and for this purpose there is a separate app called the regression learner.
The regression learner is not able to give prediction in the form of a confusion matrix
and ROC directly, so to see this the output of AzureML has been used.
Figure 8.2. Showing the response in color coded format with respect to GRE versus other predictors.
8-7
Modern Optimization Methods for Science, Engineering and Technology
8-8
Modern Optimization Methods for Science, Engineering and Technology
Table 8.2. The confusion matrix and results obtained from AzureML. (ALL: average log loss.)
Predicted class
Class 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ALL Precision Recall
0.3 0 0 0 2 1 0 0 0 2.55 0 0
0.4 0 0 0 0 2 0 0 0 2.87 0 0
0.5 0 0 0 4 4 1 0 0 1.88 0 0
0.6 0 0 0 3 12 0 0 0 1.33 0.33 0.2
0.7 0 0 0 0 21 6 1 0 1.13 0.47 0.75
0.8 0 0 0 0 5 10 3 0 1.14 0.45 0.56
0.9 0 0 0 0 0 5 16 0 0.84 0.67 0.76
1 0 0 0 0 0 0 4 0 2.59 0 0
ROC (see section 8.6.2 on the receiver operating characteristic). Further, the output
can be taken as the .csv file shown in table 8.2 to download the results of a different
set of algorithms.
It is assumed that the reader is aware of the logistic function, therefore, the author
does not intend to show the logistic function plot, although readers can go to https://
www.geeksforgeeks.org/understanding-logistic-regression/ for more details. One can
also assume that the target variable is categorical. Based on the number of classes,
logistic regression can be categorized as follows: (i) Binomial logistic regression is
used if there are two possible target classes 0 or 1, for example admitted versus not
8-9
Modern Optimization Methods for Science, Engineering and Technology
admitted. (ii) Multinomial logistic regression is used when the target variable can
have three or more possible classes which are not ordered, i.e. the types have no
quantitative significance, for example IPL cricket teams in India can be ‘Chennai Super
King’ versus ‘Rajasthan Royals’ versus ‘Kings Eleven’. (iii) Ordinal logistic regression
is used if target classes are ordered, for example a test score can be described as ‘very
poor’, ‘poor’, ‘good’ or ‘very good’. For this purpose, each category can be given a
score, such as 0, 1, 2 and 3.
Now we come to linear classification. For this, the characteristic function for h
has binomial values and the shape of the curve is like ˩.̄ Second, for linear regression
the characteristic function for h has binomial values and the shape of the curve is
like _/ .̄ Finally, for logistic regression the characteristic function for h has
continuous values and the shape of the curve is like ∫ .
Logistic regression is presented as an algorithm. One is a classification technique
based on supervised learning; therefore, it helps us to converge those uncertain
posteriors with a differentiable decision function. For logistic regression, w(0) can be
Figure 8.5. Logistic regression function for four inputs with inputs x0 = b and w0 = 1.
8-10
Modern Optimization Methods for Science, Engineering and Technology
0, it makes sense to start at η = 0.5 because it is the most uncertain state. Again,
maybe it is not a logistical regression issue as other concepts such as neural
networks, as the issue of logistical optimization is smoothly convex. Generally
speaking, we use a blend of requirements as a stop condition. Let us use the
ubiquitous threshold (Δw < threshold ), for example. The stochastic gradient descent
is from time to time a stronger strategy. It is extremely effective and often produces
excellent outcomes. The conjugate gradient is the best of all the derivative-based
techniques. The concept is to optimize the second order without effectively
calculating the Hessian.
In MATLAB based experimentation on the same dataset, figure 8.6 exhibits
different graphical results for the LR classifier. The responses have been plotted for
predictor versus residual, predicted response versus residual, true response versus
predicted response, etc. The confusion matrix obtained from the Evaluate model as a
.csv file, shown in table 8.2, also gives output in the form of average log loss,
precision and recall. In this way, one can use a logistic regression classifier to obtain
and validate the applicability of the algorithms used (figure 8.7).
8-11
Modern Optimization Methods for Science, Engineering and Technology
8-12
Modern Optimization Methods for Science, Engineering and Technology
classification algorithm in figure 8.8(b)–(c). Algorithm 2 gives the pseudocode for the
tree classifier. Initially, it will begin by organizing the most fitting trait of the dataset
at the root of the tree, accompanied by splitting the training set into subsets such that
each subset comprises data with the same attribute value. This operation operates on
each subset indefinitely until leaf nodes are found in all tree branches.
1. Arrange the fittest attribute using attribute selection measures (ASM), such as Gini index,
information gain or gain ratio to split the records of the dataset at the root of the tree.
2. Make that attribute a decision node and divide the training set into subsets.
3. Every subset comprises data with the same value for an attribute.
4. Repeat step 1 and step 2 on each subset until it finds leaf nodes in all the branches of the tree.
All the internal nodes in a decision tree act like decision nodes which contain
features or attributes, while leaf nodes give the response or outcome through
branches which act as decision rules (see figure 8.8(a) and (b)). A topmost decision
node is known as the root of the tree.
The measurement of of the attribute in the attribute selection measure choice or
splitting rules is a method for selecting the splitting criteria that best compute the
partition. It enables us to manage breakpoints on a specified node for tuples. By
defining the specified dataset, ASM offers a class for each function and chooses the
highest resulting function as a dividing variable. The information gain, gain ratio
and Gini index are some common choice criteria [22, 23].
Information gain is a contraction in entropy that measures the distinction
between pre-split entropy and the dataset’s median post-split entropy based on the
function numbers given. Suppose that Pi is the probability that an arbitrary tuple
exists in D ∈ Ci . The median quantity of data needed to differentiate a tuple class
tag in D is Info(D ) given by equation (8.3), in equation (8.4) InfoA (D ) is the
anticipated data required to classify a D tuple depending on the characteristic
∣D ∣
partitioning A, j characterizes the weight of the jth partition and the attribute A
∣D∣
with the peak information gain, and v is the amount of discrete values in trait A.
Equation (8.5) can determine Gain(A). The dividing attribute at node N is selected:
m
InfoA(D ) = −∑ pi log2pi (8.3)
i =1
v ∣Dj ∣
InfoA(D ) = ∑ j=1 × Info(Dj ) (8.4)
∣D∣
8-13
Modern Optimization Methods for Science, Engineering and Technology
Figure 8.8. Classification tree algorithm: (a) sample rules; (b) sample tree diagram in BigML; and (c) sample
tree diagram in MATLAB (pruning level 9 of 13).
8-14
Modern Optimization Methods for Science, Engineering and Technology
of any student, it has zero Info(D ). Due to the tidy partitioning, it maximizes the
benefit of data Gain(A) and creates unusable partitioning. The gain ratio
GainRatio(A) provided in equation (8.7) can be used to resolve this problem, it
handles the bias issue by using divided data to normalize the information gain
SplitInfoA(D) as shown in equation (8.6). For trait A, the maximum value of the gain
ratio GainRatio(A) is chosen as a splitting attribute:
v ∣Dj ∣ ∣Dj ∣
SplitInfoA(D ) = −∑ × log2 (8.6)
j =1 ∣D∣ ∣D∣
Gain(A)
GainRatio(A) = . (8.7)
SplitInfoA(D )
The Gini index provides another way of splitting the decision tree. For a given
partition of data D into D1 and D2 , the Gini index of D can be estimated by
∣D1∣ ∣D 2 ∣
GiniA(D ) = Gini(D1) + Gini(D2 ), (8.8)
∣D∣ ∣D∣
with
m
Gini(D ) = 1 − ∑i=1pi2 . (8.9)
8-15
Modern Optimization Methods for Science, Engineering and Technology
1. Estimate the d-dimensional mean vectors for the distinct classes from the dataset.
2. Compute the scatter matrices (in-between-class and within-class scatter matrix).
3. Compute the eigenvectors (e1, e2, …, ed) and corresponding eigenvalues (λ1, λ2, …, λd) for the
scatter matrices.
4. Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest
eigenvalue to form a d × k-dimensional matrix W (every column represents an eigenvector).
5. Use this d × k eigenvector matrix to transform the samples onto the new subspace. This can be
summarized by the matrix multiplication: Y[n, k] = X[n, d] × W[d, k] (where X is a n × d-
dimensional matrix representing the n samples, and y are the transformed nk-dimensional
samples in the new subspace).
Readers can also use the RSS feed by Raschka to go through each of the above
steps in detail using Python [27]. LDA is a classification technique that is easy and
efficient. Many extensions and variants to the method are available because it is easy
and well known. Some popular extensions lead to: (i) regularized discriminant
analysis (RDA), which introduces regularization in variance estimation which is
effectively covariance, moderating the impact of various factors on LDA; (ii) flexible
discriminant analysis (FDA), where nonlinear input components such as splines are
used; and (iii) quadratic discriminant analysis (QDA), in which, when there are
numerous input factors, each category utilizes its own variance or covariance
estimate.
The initial design was termed as discriminant analysis of the linear discriminant
or Fisher’s discriminant analysis. Multiple discriminant analysis has been linked to
the multiclass variant. All of these are now called linear discriminant analysis [28].
8-16
Modern Optimization Methods for Science, Engineering and Technology
The kernel describes the resemblance or measure of range between fresh information
and vectors of assistance. The dot-product is the degree of resemblance used for linear
SVM or a linear kernel since the distance is a linear input combination. Other kernels,
such as polynomial kernels and radial kernels, can be used to transform the input space
into higher dimensions. This is called the trick of the kernel or kernel trick. It is suitable to
use more complex kernels as it allows lines to differentiate the classes that are bent or
even more complex. This in turn can lead to more accurate classifiers. Further, the
SVM classifiers can be divided into the following. (i) Polynomial kernel SVM, where
we can use a polynomial kernel instead of the dot-product, for instance
K (x , xi ) = 1 + ∑(x × xi )d , and where the degree of polynomial to the learning
procedure must be defined by the side. When d = 1 this will contribute to the linear
kernel. The polynomial kernel enables input space for curved rows. (ii) Radial kernel
SVM, which is more complicated. For example, one can see the value of
2
K (x , xi ) = e (−γ × ∑(xxi )) with γ as some learning parameter. A worthy default value
for γ is 0.1, and γ lies between 0 < γ < 1. The radial kernel is quite global and can
generate complicated areas inside the feature space, such as closed polygons in 2D space.
An optimization method must be used to solve the SVM model. A quantitative
optimization method could be used to search for the hyperplane coefficients. This is
inefficient and is not the strategy used in SVM applications, such as LIBSVM, that are
commonly used. You could use stochastic gradient descent if you implement the
algorithm as an workout. Specialized optimization processes exist that re-formulate the
issue of optimization to be a issue of quadratic programming. The sequential minimum
optimization approach is the most common technique for SVM planning. It divides the
issue into sub-problems that can be analytically rather than numerically resolved.
8-17
Modern Optimization Methods for Science, Engineering and Technology
Many methods are available to improve the efficiency and velocity of the next
classification. One strategy for this issue is the pre-sorting of the practice sets (such as
kd-trees or cells of Voronoi). Another alternative is to select a subset of trained
information so as to approximate the Bayes’ ranking in accordance with the 1-NN
law (through the subgroup). As k can now be restricted to one and redundant data
points taken from the training set, this can produce substantive velocity gains. As k
can now be restricted to 1 and redundant data points are deleted from the practice
pack, important increases in velocity can be achieved. These methods of information
alteration can also enhance efficiency with the removal of misclassification points.
8.6 Performance
The dataset has been converted to obtain the visualization of results in a simple
form, and this is the reason why the results are not up to the mark. Figure 8.9 shows
the prediction response in color coded format with respect to GRE versus
TOEFLScore, where the legends mean: ∙ = correct data and × = incorrect data.
8-18
Modern Optimization Methods for Science, Engineering and Technology
Figure 8.9. Showing the prediction response in color coded format with respect to GRE versus TOEFLScore
(• = correct data and × = incorrect data).
In this figure, readers can compare the visuals of the decision tree, SVM, NN,
discriminant analysis and ensemble with the original data.
Similarly, figure 8.10 shows the prediction response in color coded format of two
classes with respect to GRE versus TOEFLScore. This figure exhibits the results of
scatter plots for the SVM classifier only. In this set of plots, the author has recorded
8-19
Modern Optimization Methods for Science, Engineering and Technology
Figure 8.10. Showing the prediction response in color coded format of two classes with respect to GRE versus
TOEFLScore (• = correct data and × = incorrect data).
predictions for two–two classes (0.3–1, 0.4–0.9, 0.5–0.8 and 0.6–0.7) so that the
readers may be able to see the responses and count the correct and incorrect
responses as well. The readers can plot the plots for all the available classifiers. All
the tree tools discussed support different styles of scatter plots.
1
The definitions used in this chapter have been taken from the website: https://classeval.wordpress.com/
introduction/basic-evaluation-measures/.
8-20
Table 8.3. Confusion matrix and performance measurements for different classifiers.
8-21
Negative False negative (FN) True negative (TN) Total number of non- TN
NPV =
matches = TN + FN TN + FN
FPR = 1 − TNR
Modern Optimization Methods for Science, Engineering and Technology
N = FP + TN. (8.13)
Figure 8.11 gives the confusion matrix for (a) decision tree, (b) SVM, (c) NN, (d)
discriminant analysis, (e) ensemble and (f) logistic regression. This confusion matrix
is plotted for the number of observations or frequency of occurrence. Diagonal
responses have been shown in green, i.e. if the true class and predicted classes are the
same, and red observations show the true classes that were confused with predicted
8-22
Modern Optimization Methods for Science, Engineering and Technology
classes. The confusion matrix for the number of observations for the logistic
regression algorithm has been taken from the AzureML tool.
Definition 12. Sensitivity, recall or true positive rate ‘is calculated as the number of
correct positive predictions divided by the total number of positives. ... The best
sensitivity is 1.0, whereas the worst is 0.0’:
TP TP
TPR = = . (8.14)
P (TP + FN)
Definition 2. Specificity or true negative rate (TNR) ‘is calculated as the number of
correct negative predictions divided by the total number of negatives. ... The best
specificity is 1.0, whereas the worst is 0.0’:
TN TN
TNR = = . (8.15)
N (FP + TN)
Definition 3. Miss rate or false negative rate (FNR) is the ratio between false
negative and false negative + true positive predictions:
FN
FNR = . (8.16)
(FN + TP)
TP
PPV = . (8.17)
(TP + FP)
TN
NPV = . (8.18)
(TN + FN)
Definition 6. False discovery rate (FDR) ‘is calculated as the number of false
positive predictions divided by the total false positive and true positive predictions.
The best FDR is 0.0, while the worst is 1.0. It can also be calculated as (1 − PPV)’:
2
Definitions are taken from https://classeval.wordpress.com/introduction/basic-evaluation-measures/.
8-23
Modern Optimization Methods for Science, Engineering and Technology
FP
FDR = = 1 − PPV. (8.19)
(FP + TP)
Definition 7. Fall-out or false positive rate (FPR) ‘is calculated as the number of
incorrect positive predictions divided by the total number of negatives. The best false
positive rate is 0.0 whereas the worst is 1.0’:
FP FP
FPR = = = 1 − TNR. (8.20)
N (FP + TN)
Definition 8. Informedness is the distance from (i.e. measured perpendicular to) the
random line joining coordinate (0,0) and coordinate (1,1) in the ROC. You can then
use markedness as the second metric to identify your classification system’s
general value:
Definition 9. Error rate (ERR) ‘is calculated as the number of all incorrect
predictions divided by the total number of the dataset. The best error rate is 0.0,
whereas the worst is 1.0’:
FP + FN FP + FN
EER = = . (8.23)
TP + TN + FP + FN P+N
Definition 10. Accuracy (ACC) ‘is calculated as the number of all correct predic-
tions divided by the total number of the dataset. The best accuracy is 1.0, whereas
the worst is 0.0. It can also be calculated by 1 – ERR’:
(TP + TN)
ACC = = 1 − EER. (8.24)
(P + N)
Figure 8.12 shows the confusion matrix for PPV versus FDR, and figure 8.13
shows the confusion matrix for TPR versus FNR. We only explain the confusion
matrix for four algorithms and the remaining two algorithms are left as an exercise
for the reader. The matrix in figure 8.12 determines the horizontal characteristics of
8-24
Modern Optimization Methods for Science, Engineering and Technology
the predicted classes. The green result is known as the positive predictive value (see
definition 5), while the false discovery rate (see definition 6) is represented by the red
shaded results. The matrix in figure 8.13 determines the vertical characteristics of
true classes. The green shaded result is known as the TPR (see definition 1), while
FNRs (see definition 3) are in red.
8-25
Modern Optimization Methods for Science, Engineering and Technology
8-26
Modern Optimization Methods for Science, Engineering and Technology
8-27
Modern Optimization Methods for Science, Engineering and Technology
8.7 Conclusion
This chapter covered an introduction and background to LCS. The focus of this
chapter is to understand the LCS practically, therefore, MATLAB, BigML and
AzureML have been discussed to understand how the results from learning
classifiers can be obtained. The dataset and its processing, such as splitting into
training and testing datasets has been covered in detail with MATLAB code as well
as AzureML procedure. Six learning classifiers, namely logistic regression classifiers,
decision tree classifiers, discriminant analysis classifiers, support vector machines
classifiers, nearest neighbor classifiers and ensemble classifiers have been discussed.
The logistic regression classifier is included in MATLAB as an independent classifier
app, so the author has used AzureML to address various performance matrices and
also MATLAB. The remaining five classifiers have been tested using MATLAB’s
Classification Learner App.
In the latter half of this chapter, different visualizations of performance have been
shown, which include scatter plots, prediction plots, tree diagrams, code snippets,
confusion matrices, parallel plots and receiver operating characteristics. Table 8.4 at
Prediction Training
Classifier Accuracy speed (ops) time (sec) Remarks
Decision trees 52.8 46 000 0.89 #Split 20, Gini’s diversity index
Discriminant 58.2 29 000 0.69 Linear, full covariance exponential
analysis kernel, MSE = 0.0048
Logistic 0.069 RMSE 12 000 15.41 MAE = 0.0522, R squared = 0.80,
regression basis function is constant
Support vector 60.8 5200 1.93 Linear kernel, box constraint
machines level = l, multiclass method,
one-vs-one
Nearest 57 17 000 2.64 N = 10, Euclidian distance, square
neighbor inverse distance weight
Ensemble 59 2400 3.96 Subspace ensemble dimension 4;
descriminant learning,
30 learners
8-28
Modern Optimization Methods for Science, Engineering and Technology
the end of this chapter has been provided on the basis of experimentation carried out
by the author for the readers to justify their choices of algorithms. The LCS system
of MATLAB is a potent tool for researchers to obtain results quickly, and hence the
comparison of different classifiers is meaningful in the chapter. BigML is a purely
automated tool with almost negligible user intervention. AzureML is a robust tool
with a high degree of flexibility.
Acknowledgments
The author is grateful to his colleague Dr S P Dubey from Rungta College of
Engineering and Technology, Bhilai, India, who provided insight and expertise that
greatly assisted in the research, although they may not agree with all of the
interpretations/conclusions of this chapter. He is immensely grateful to his mentor
Dr G R Sinha, who always motivates him for research and development, and
academic excellence.
References
[1] Holland J 1976 Adaptation Progress in Theoretical Biology ed R Rosen and F M Snell
(New York: Academic)
[2] Goldberg D E and Holland J H 1988 Genetic algorithms and machine learning Mach. Learn.
3 95–9
[3] Urbanowicz R J and Moore J H 2009 Learning classifier systems: a complete introduction,
review, and roadmap J. Artif. Evol. Appl. 2009 736398
[4] Butz M V 2015 Learning classifier systems Springer Handbook of Computational Intelligence
(Berlin: Springer) pp 961–81
[5] Acharya M S, Armaan A and Antony A S 2019 A Comparison of Regression Models for
Prediction of Graduate Admissions ICCIDS 2019: IEEE Int. Conf. on Computational
Intelligence in Data Science Chennai, India,
[6] Holland J H et al 2000 What Is a Learning Classifier System? Learning Classifier Systems
(Berlin: Springer)
[7] Wilson S W 1995 Classifier fitness based on accuracy Evol. Comput. 3 149–75
[8] Riolo R L 1991 Lookahead planning and latent learning in a classifier system Proceedings of
the First International Conference on Simulation of Adaptive Behavior on From Animals to
Animats (Cambridge, MA: MIT Press), pp 316–26
[9] Stolzmann W 1998 Anticipatory classifier systems Genet. Program. 98 58–64
[10] Holland J H 1983 Escaping brittleness Proc. Second Int. Workshop on Machine Learning
[11] Robertson G G and Riolo R L 1988 A tale of two classifier systems Mach. Learn. 3 139–59
[12] Holland J H, Holyoak K J, Nisbett R E and Thagard P R 1989 Induction: Processes of
Inference, Learning, and Discovery (Cambridge, MA: MIT Press)
[13] Holmes J H, Durbin D R and Winston F K 2000 The learning classifier system: an
evolutionary computation approach to knowledge discovery in epidemiologic surveillance
Artif. Intell. Med. 19 53–74
[14] Zang Z, Li D and Wang J 2015 Learning classifier systems with memory condition to solve
non-Markov problems Soft Comput 19 1679–99
8-29
Modern Optimization Methods for Science, Engineering and Technology
[15] Zebin T, Scully P J and Ozanyan K B 2017 Inertial sensor based modelling of human activity
classes: feature extraction and multi-sensor data fusion using machine learning algorithms
eHealth 360° (Cham: Springer), pp 306–14
[16] Noor N Q M, Sjarif N N A, Azmi N H F M, Daud S M and Kamardin K 2017 Hardware
Trojan identification using machine learning-based classification J. Telecom. Electron.
Comput. Eng. (JTEC) 9 23–7
[17] Maleki M, Manshouri N and Kayikçioğlu T 2017 Application of PLSR with a comparison
of MATLAB classification learner app in using BCI 2017 25th Signal Processing and
Communications Applications Conf. (SIU)
[18] Nagwanshi K K and Dubey S 2018 Statistical feature analysis of human footprint for
personal identification using BigML and IBM Watson analytics Arab. J. Sci. Eng. 43
2703–12
[19] Kessel M, Ruppel P and Gschwandtner F 2010 BIGML: a location model with individual
waypoint graphs for indoor location-based services PIK-Praxis Informationsverar.
Kommunikation 33 261–7
[20] Zainudin Z and Shamsuddin S M 2016 Predictive analytics in Malaysian dengue data from
2010 until 2015 using BigML Int. J. Adv. Soft. Comput. Appl. 8 18–30
[21] Chappell D 2015 New White Paper: Introducing Azure Machine Learning A Guide for
Technical Professionals, Microsoft Corporation http://davidchappellopinari.blogspot.com/
2015/08/new-whitepaper-introducing-azure.html
[22] Mántaras R L D 1991 A distance-based attribute selection measure for decision tree
induction Mach. Learn. 6 81–92
[23] Liu W Z and White A P 1994 The importance of attribute selection measures in decision tree
induction Mach. Learn. 15 25–41
[24] Navlani A 2018 Decision Tree Classification in Python https://www.datacamp.com/com-
munity/tutorials/decision-tree-classification-python
[25] Hallinan J S 2012 Data mining for microbiologists Systems Biology of Bacteria ed C
Harwood and A Wipat (Methods in Microbiology vol 39) (Amsterdam: Elsevier) pp 27–79
[26] Tharwat A, Gaber T, Ibrahim A and Hassanien A E 2017 Linear discriminant analysis: a
detailed tutorial AI Commun. 30 169–90
[27] Raschka S 2014 Linear Discriminant Analysis—Bit by Bit https://sebastianraschka.com/
Articles/2014_python_lda.html
[28] Sawla S 2018 Linear Discriminant Analysis https://medium.com/@srishtisawla/linear-
discriminant-analysis-d38decf48105
[29] Li M, Zhen L and Yao X 2017 How to read many-objective solution sets in parallel
coordinates IEEE Comput. Intell. Mag. 12 88–100
[30] Hussain A and Vatrapu R 2014 Social data analytics tool (SoDaTo) Advancing the Impact of
Design Science: Moving from Theory to Practice ed M C Tremblay, VanderMeer D,
Rothenberger M, Gupta A and Yoon V (Cham: Springer)
8-30
IOP Publishing
Chapter 9
A case study on the implementation of six sigma
tools for process improvement
Bonya Mukherjee, Rajesh Chamorshikar and Subrahmanian Ramani
A blast furnace (BF) is one of the primary processes for the production of crude steel
in an ore based integrated steel plant (ISP). In a BF, iron ore is reduced to pure
molten iron by supplying hot air blasts through the tuyers at the bottom of the
furnace, just above the hearth. A redox reaction takes place due to heat and mass
transfer and, along with the molten iron and slag that are tapped from the bottom of
the furnace, BF gas is emitted from the top of the BF as a by-product of the smelting
process. The raw BF gas carries a dust load of 25–40 g N−1 m−3, which is removed in
a gas cleaning plant (GCP) by spraying with water at high pressure through nozzles
located at various levels of a gas scrubber. The water removes the dust from the BF
gas and exits from the bottom of the scrubber in the form of slurry, is cooled in a
cooling tower and goes to radial settling tanks (RSTs) for removal of dust and is
then recycled back to the scrubber for cleaning of BF gas. The clean BF gas is then
fed into the gas network for consumption as fuel gas in various units of the plant.
Since April 2012, the ISP in question was facing the problem of a high concentration
of suspended solids in the clean water supply of the GCPs of BFs. The average
suspended solids concentration was 277 ppm against the specification of 100 ppm,
which was adversely affecting the functioning of the GCPs, leading to insufficient
removal of dust from BF gas. There were a number of factors that could have
impacted the concentration of total suspended solids (TSS) in the supply water to
GCPs. These factors were: insufficient removal of dust from BF gas in the dust
catchers of BFs; a high concentration of TSS in the fresh make-up water supply
itself; and insufficient and inefficient settling of dust and suspended solids in
the RSTs.
To narrow down the probable root cause/causes within the shortest possible time,
the authors decided to take up the above problem as a six sigma project. Based on
the past year’s performance data for the BF dust catchers, GCPs and RSTs, various
six sigma tools and techniques were used to find and filter out the critical inputs that
influence the output to a large extent. Thus, it was possible to quickly carry out a
root cause analysis (RCA) and it was found that the problem was mainly due to
certain operational lacunae in the RSTs and necessary remedial actions were taken.
The TSS levels in the supply water were effectively brought down to less than 100
ppm, and have been maintained at those levels since April 2013. The financial
benefits have been to the tune of Rs. 1 Crore per annum, as validated by the finance
department. Other benefits have included the reduced choking of nozzles/throttle
assembly and U-seals, as well as gas lines, reduced frequency of unscheduled/
emergency breakdowns/shutdowns in GCPs, and better removal of dust from BF
gas, leading to reduced specific fuel consumption and better furnace productivity at
the BF gas consumer end. The non-tangible benefits have been a renewed motivation
in the workforce, due to the elimination of non-value adding activities such as
attending to repetitive unscheduled repairs and sudden breakdowns.
9.1 Introduction
9.1.1 Generation and cleaning of BF gas
BF is one of the primary processes for the production of crude steel in an ore based
ISP. In a BF, counter current heat and mass exchange takes place for smelting of iron
ore using coal and coke to produce hot metal or molten iron in the liquid state. The
solid raw materials such as iron ore lumps, sinter and coke, and flux materials such as
limestone are charged from the top and hot air blast is supplied through the tuyers at
the bottom of the furnace, just above the hearth. A redox reaction takes place due to
heat and mass transfer and along with the molten iron and slag that are tapped from
the bottom of the furnace, BF gas is emitted from the top of the BF as a by-product of
the smelting process. This gas has a calorific value of 780–800 kcal N−1 m−3 and is
utilized as one of the primary fuel gases in an ISP. The raw BF gas carries a dust load
of 25–40 g N−1 m−3. This dust content has to be removed or minimized before the raw
BF gas can be used as a fuel in the various furnaces and stoves of an ISP.
The primary function of the GCPs of BF is to remove the suspended dust particles
from the raw BF gas, before it is supplied to the consumer as fuel. The raw BF gas
emerging from the BFs has a dust load of around 25–40 g N−1 m−3. More than 50%
of the dust is removed in the dry dust catcher. The raw BF gas containing the
9-2
Modern Optimization Methods for Science, Engineering and Technology
remaining 10–15 g N−1 m−3 of dust then goes to the scrubber where the gas flows
from bottom to top and water is sprayed on it from nozzles at four levels at a rate of
800–1000 m3 h−1. After the scrubber, the raw BF gas passes through a set of parallel
venturi atomizers where again water is sprayed on the gas, to remove the remaining
dust. The dust content of clean BF gas is less than 5 mg N−1 m−3. The clean BF gas
is then fed into the gas network for consumption as fuel gas in various units of
the plant.
The authors decided to use a six sigma (DMAIC) process to bring about process
improvement by reducing the concentration of TSS in the supply water, which was
adversely affecting the operation and equipment health of the GCPs of the BFs.
Various six sigma tools and techniques were used to find and filter out the critical
inputs that influenced the output to a large extent.
9-3
Modern Optimization Methods for Science, Engineering and Technology
600
500
TSS Average Supplied
400 380.000
321.250
300
222.800 210.000
197.500
200 162.500
145.000 133.750
92.333
100 68.600 58.000 63.000
0
12 01 02 03 04 05 06 07 08 09 10 11 12
011 012 012 012 012 012 012 012 012 012 012 012 012
2 2 2 2 2 2 2 2 2 2 2 2 2
Months
Figure 9.1. Monthly TSS (in ppm) in the supply water of GCPs 1–6.
9-4
Modern Optimization Methods for Science, Engineering and Technology
Minimum 133.75
1st Quartile 153.75
Median 210.00
3rd Quartile 350.63
100 200 300 400 500 600 700
Maximum 678.00
95% Confidence Interval for Mean
139.74 404.88
95% Confidence Interval for Median
148.99 366.61
Median
Figure 9.2. Graphical summary for the period April 2012–December 2012 before the start of the project.
However, a graphical summary (see figure 9.3) of the TSS data for the four
months before April 2012, i.e. December 2011 to March 2012 shows that the average
TSS was not only below 100 ppm, but also the distribution was normal, i.e. the
process capability was within the limits.
In a hypothesis test, a P-value establishes the significance of the results.
Hypothesis tests are used to establish the validity of a claim made about a dataset.
This claim is known as the null hypothesis.
The P-value is a number between 0 and 1 and can be interpreted as such:
a. A P-value of less than 0.05: strong evidence against the null hypothesis and
hence the null hypothesis can be rejected.
b. A large P-value of more than 0.05: weak evidence against the null
hypothesis, so the null hypothesis cannot be rejected outright and can be
further investigated statistically for validity.
c. P-values very close to 0.05 can go either way.
9.3.2 Measurement
In the measurement phase of a lean six sigma improvement process, the current
performance of the process, the magnitude of the problem and the probable factors
affecting the problem are measured.
9-5
Modern Optimization Methods for Science, Engineering and Technology
Mean 70.483
StDev 15.197
Variance 230.934
Skewness 1.53124
Kurtosis 2.46109
N 4
Minimum 58.000
1st Quartile 59.250
Median 65.800
3rd Quartile 86.400
60 65 70 75 80 85 90
Maximum 92.333
95% Confidence Interval for Mean
46.302 94.664
95% Confidence Interval for Median
58.000 92.333
Median
40 50 60 70 80 90 100
Figure 9.3. Graphical summary for the period December 2011–March 2012.
Various six sigma tools were used in the measurement phase, namely the process
flow diagram, time series plot, regression and correlation analysis, I/O worksheet,
and C and E matrix, to identify the causes of high TSS in the supply water.
Regression analysis (fitted line plot): Fitted line plots were drawn between the
measured TSS of the return water from GCPs 1–6 to RSTs 1 and 2 and the return
water to RSTs 3, 4 and 5 to check whether the TSS of the return water had any
bearing on the TSS of the supply water. A close correlation was found between the
TSS of the return water before RSTs and supply water after RSTs, as is evident from
figures 9.4 and 9.5.
A fitted line plot is a statistical technique for regression analysis (linear) to find the
best fit line for a set of data points. This is used when experimental data or historical
process data are plotted and the data points are scattered across the plot area.
The formula for fitted line plot or regression line is y = mx + b where m is the
slope and b is the y-intercept.
R2 or the coefficient of determination is a statistical measure of how close the data
are to the fitted line and how much the variation in dependent variables can be
explained by the variation in the independent variable.
An R2 value of 50% and above usually indicates a better correlation between the
dependent and independent variables. A low R2 value usually fails to indicate any
9-6
Modern Optimization Methods for Science, Engineering and Technology
R-Sq(adj) 38.7%
600
500
400
300
200
100
0
1000 1200 1400 1600 1800 2000 2200 2400 2600
TSS-Returned avg 1&2_Not Ok
Figure 9.4. Fitted line plot between the average TSS of supply water and the average TSS of return water to
RSTs 1 and 2; with an R2 value of 46.3%.
R-Sq(adj) 49.6%
600
500
400
300
200
100
0
1200 1400 1600 1800 2000 2200 2400 2600
TSS-Returned avg 3,4&5_Not OK
Figure 9.5. Fitted line plot between the average TSS of supply water and the average TSS of return water to
RSTs 3, 4 and 5; with an R2 value of 55.9%.
9-7
Modern Optimization Methods for Science, Engineering and Technology
700 S 173.168
R-Sq 11.8%
R-Sq(adj) 0.0%
TSS Average Supplied_Not Ok
600
500
400
300
200
100
5.0 5.5 6.0 6.5 7.0
AvgNoDumpCars
Figure 9.6. Fitted line plot between average TSS of supply water and the average number of dump cars from
the dust catchers; with an R2 value of 11.8%.
9-8
Modern Optimization Methods for Science, Engineering and Technology
R-Sq(adj) 0.0%
2200
2000
1800
1600
1400
1200
1000
5.0 5.5 6.0 6.5 7.0
AvgNoDumpCars
Figure 9.7. Fitted line plot between the average TSS of return water to RSTs 1 and 2 and the average number
of dump cars from dust catchers; with an R2 value of 1.5%.
2600 S 405.481
R-Sq 0.8%
TSS-Returned avg 3,4&5_Not Ok
R-Sq(adj) 0.0%
2400
2200
2000
1800
1600
1400
1200
5.0 5.5 6.0 6.5 7.0
AvgNoDumpCars
Figure 9.8. Fitted line plot between the average TSS of return water to RSTs 3, 4 and 5 and the average
number of dump cars from dust catchers; with an R2 value of 0.8%.
9-9
Modern Optimization Methods for Science, Engineering and Technology
All the three fitted line plots (see figures 9.6, 9.7 and 9.8) show that there is little or
no correlation between the average number of dump cars filled from the dust
catchers and the average TSS of the return water as well as supply water. Thus, the
dust catchers were eliminated from the scope of this project, and only the closed loop
circuit of water supply to the scrubbers of GCPs 1–6 was taken into consideration
Figure 9.9. Process flow diagram of the closed loop treatment circuit for GCP 1–6 supply water for the
removal of total suspended solids.
9-10
Modern Optimization Methods for Science, Engineering and Technology
for identifying critical independent variables that could significantly affect the
dependent variable, namely the TSS of supply water, in the case of our project.
Process flow diagram (PFD): Through PFD, we created a visual representation of
the closed loop process flow of the supply of water to the GCP scrubbers, exit of dust
laden return water to the common slime channel, its distribution to different RSTs,
settling and removal of the dust, addition of make-up water to the overflow water
and pumping back of the supply water to the scrubbers. The PFD helped us to
identify loops, hidden factories and non-value adding activities in the process (see
figure 9.9).
A process is a combination of sequential or closed loop activities (or sub-
processes) that culminate in an outcome in the form of a product or service. All
activities in a processes can be categorized, in six sigma parlance, into the broad
categories of:
1. Value added: This activity in the process adds form, function and value to the
end product or service and is of value to the customer.
2. Non-value added: This activity in the process does not add any form, function
or value to the end product or service and can be eliminated.
3. Non-value added but necessary: This activity by itself may not add any value
but is necessary in the final sequence of activities in order to add value to the
final product, or may be deemed as value added for another concurrent,
dependent or symbiotic sub-process that leads to the final product or service.
A hidden factory refers to the activities or sub-processes that reduce the quality or
efficiency of an operation or business process, and may result in waste or bad work
as they remain hidden under the day-to-day operations.
The hidden factories can be in the form of:
a. Production loss.
b. Compromised quality.
c. Reduced availability of equipment.
d. Delayed delivery.
Identifying the non-value added activities and the hidden factories can help to address
the operational issues that can adversely affect the quality of the process outcome.
Mapping of the process flow diagram of the closed loop treatment circuit for the
supply water of BF GCPs 1–6 for removal of TSS helped to identify two hidden
factories that were adversely affecting the TSS levels in the supply water.
The two existing hidden factories identified were:
a. The jamming of a valve plate in the common slime channel that was causing
uneven distribution of water into the RSTs, thereby increasing the water load
in RSTs 1 and 2 beyond their capacities, and reducing the water load in
RSTs 3, 4 and 5. This was in turn adversely affecting the TSS settling
capability of the RSTs 1 and 2.
b. Further, it was also found that RST 1 was not working properly, thereby
again impacting the removal of TSS from the return slime water.
9-11
Modern Optimization Methods for Science, Engineering and Technology
9-12
Modern Optimization Methods for Science, Engineering and Technology
Figure 9.10. Fishbone diagram to identify possible causes for high TSS in the supply water.
9-13
Modern Optimization Methods for Science, Engineering and Technology
Failure modes and effects analysis: The input for FMEA was taken from the detailed
map, cause and effects matrix and fishbone diagram. The important processes and
their outputs were tabled under the column process function. These outputs were
analyzed for their modes of failure and their effects on the primary metric. The risk
priority numbers (RPNs) were calculated, according to their severity, occurrences
and detection. The processes with the highest RPNs were taken for further analysis.
(See table 9.3 for FMEA.)
9-14
Modern Optimization Methods for Science, Engineering and Technology
Question Answer
After bringing about the above mentioned improvements, data were collected for the
TSS of return water to RSTs 1 and 2 and RSTs 3, 4 and 5 as well as the TSS of the
supply water at the pump house outlet one day per week, over a period of
two months. On the basis of the new data on the TSS of the supply water, capability
analysis of the process was carried out, as shown in figure 9.11. With Cp and Pp of
more than 1, the process capability has been improved.
9-15
Modern Optimization Methods for Science, Engineering and Technology
Table 9.5. Why-why for the incorrect ratio of water and polyelectrolyte.
Question Answer
Question Answer
9-16
Modern Optimization Methods for Science, Engineering and Technology
Table 9.7. Why-why for the deposits of mud and slabs in the slime channel.
Question Answer
Box plot analysis was also carried out for three sets of TSS data:
a. Average TSS of supply water for the period December 2011 to March 2012,
when the TSS value was meeting minimum requirements: TSS average
supplied—ok.
b. Average TSS of supply water for the period April 2012 to December 2012,
when TSS values were not meeting minimum requirements: TSS average
supplied—not ok.
c. Average TSS of the supply water for the period April 2013 to May 2013,
after carrying out the six sigma improvements: TSS average supplied—
improved.
9-17
Modern Optimization Methods for Science, Engineering and Technology
Question Answer
Question Answer
9-18
Modern Optimization Methods for Science, Engineering and Technology
LSL USL
Process Data Within
LSL 0.00000 Overall
Target *
USL 100.00000 Potential (Within) Capability
Sample Mean 70.00000 Cp 1.10
Sample N 7 CPL 1.54
StDev (Within) 15.20000 CPU 0.66
Cpk 0.66
StDev (Overall) 13.13851
CCpk 1.10
Overall Capability
Pp 1.27
PPL 1.78
PPU 0.76
Ppk 0.76
Cpm *
0 15 30 45 60 75 90 105
Observed Performance Exp. Within Performance Exp. Overall Performance
PPM < LSL 0.00 PPM < LSL 2.06 PPM < LSL 0.05
PPM > USL 0.00 PPM > USL 24208.83 PPM > USL 11204.44
PPM Total 0.00 PPM Total 24210.89 PPM Total 11204.49
Figure 9.11. Capability analysis of TSS average supplied for the period April–May 2013—post improvement.
Box plot of TSS Average Supp_Ok, TSS Average Supp_NotOk, TSS Average Supp_Improved
700
600
500
400
Data
300
200
100
0
TSS Average Supplied_OK TSS Average Supplied_NotOK TSS Average Supplied_Improved
Figure 9.12. Box plot analysis of the TSS of the supply water: historical, skewed and improved.
9-19
Modern Optimization Methods for Science, Engineering and Technology
60
50
40
30
20
10
1
0 20 40 60 80 100 120 140
Data
Figure 9.13. Probability plot of TSS average supplied: historical versus improved.
9.3.4 Control
Various control measures were adopted and practiced for maintaining the processes
and the same activities are evaluated for reducing breakdowns. The control plan is
shown in table 9.10.
9.4 Conclusion
9.4.1 Financial benefits
The financial benefits forecast, on account of savings in terms of reduced breakdown
and reduced replacement of spares, as validated by the finance department, was
0.992 crores/annum and the actual savings from March 2013 to May 2013 were
Rs. 24.8 lakhs.
9-20
Modern Optimization Methods for Science, Engineering and Technology
5. The lifespan of the throttle assemblies (septum valves), nozzles, gate valves
and overflow throttles of the scrubber and atomizers will increase.
6. The BF gas leakage problem will be reduced. The safety of GCP operators as
well as the area surrounding GCPs will be ensured.
7. Better removal of dust from BF gas, thereby causing less choking of valves
and burners at the consumer end, leading to reduced specific fuel consump-
tion and better furnace productivity.
9-21
IOP Publishing
Chapter 10
Performance evaluation and measures
K A Venkatesh and N Pushkala
Measuring performance is an art and identifying the right tool to evaluate the
performance is another Herculean task. Philosophically, anything and everything is
perception. Evaluating the performance in terms of the perception of the promoters,
the internal customers/employees or the end users are different tasks. This goal of
this chapter is to introduce a tool which could address them all. In this chapter, we
introduce an elegant non-parametric tool, called data envelopment analysis (DEA),
in which the choice of inputs and outputs is based on the needs of the evaluator.
Among the many other tools, the growth of DEA literature is incomparable. DEA is
an extension of linear programming problems, in which one has to solve a number of
linear programming problems in order to obtain relative efficiency scores. Initially
this chapter introduces the various paradigms in performance measurement for
decision making and ends with a discussion of the tool R in addressing DEA.
Definitions
1. Let X be a set. The fuzzy set A′ is given by A′ = {(a , μA(x ))∣x ∈ X }, where
μA ′(x ):X → [0, 1] is the membership function.
2. The support of the fuzzy set A′ is given by supp(A′) = {x ∈ X ∣μA ′(x ) > 0}.
3. The α − cut of a fuzzy set A′ is defined as A′(α ) = {x ∈ X ∣μA ′(x ) ⩾ α},
∀ αϵ[0, 1].
4. The height of ′ is given by h(′) = supx∈X μA(x ). If h(A′) = 1, then the
fuzzy set is said to be normal, otherwise it is called subnormal.
Operations
The fundamental arithmetic operations addition, subtraction, multiplication and
division (/) are obtained based on the extension principle and α − cut arithmetic.
Extension principle
Let a′, b′ be any two fuzzy numbers and c be a particular event. The fundamental
operations are defined as:
1. μa ′+b ′(c ) = suppx,y∈X {mini(a′(x ), b′(y )∣x + y = c}.
2. μa ′−b ′(c ) = suppx,y∈X {mini(a′(x ), b′(y )∣x − y = c}.
10-2
Modern Optimization Methods for Science, Engineering and Technology
α − cut arithmetic
Let r, s, t be the infimum, the mode and the superimum of the two fuzzy numbers
a′ = [a r , a s , at ] and b′ = [br , b s , bt ], respectively. Using α − cut , one could define
the fundamental operations as:
1. a′(α ) + b′(α ) = [a r (α ) + br (α ), at(α ) + bt(α )].
2. a′(α ) − b′(α ) = [a r (α ) − b s (α ), at(α ) − br (α )].
3. a′(α )Xb′(α ) = [A, B ]
where A = mini{a r , br , a s , b s , at , bt},
B = mini{a r , br , a s , b s , at , bt}.
a ′ (α ) ⎡ ⎤
4. b ′ (α ) = [a r(α ), at(α )]⎢⎣ t1 , br1(α ) ⎥⎦ .
b (α )
Let a1 a2, …, an be the attributes and the relative importance is given by w1, w2, …,
wn and the pairwise comparison is given in the matrix form, called an association
matrix
⎡ a11 ⋯ a1n ⎤
A = ⎢ ⋮ ⋱ ⋮ ⎥,
⎢⎣ an1 ⋯ ann ⎥⎦
where aij = 1/aji and aij · ajk = aik . Generally the ratio wi /wj is unknown. The
problem of AHP is to determine aij = wi /wj . Let W be the weight matrix given as
⎡ w1/ w1 ⋯ w1/ wn ⎤
A = ⎢⎢ ⋮ ⋱ ⋮ ⎥⎥ ,
⎣wn / w1 ⋯ wn / wn ⎦
where w = [w1, w2, .., wn ]T is a column vector. The system can be written as
(W − nI )w = 0. One could employ an eigenvalue method to solve the system
(W − nI )w = 0, then obtain the comparative weights by finding the largest
eigenvalue λ of A in magnitude. For the consistency test we introduce two more
indices called the consistency index (CI) and consistency ratio (CR). The index CI is
given by CI = (λ − n )/(n − 1) and CR is given by CR = CI/random CI.
10-3
Modern Optimization Methods for Science, Engineering and Technology
wi − Fuij (α )wj ⩽ 0.
The above two inequalities can be written in matrix notion as Sw ⩽ 0, where S is the
matrix of order 2m × n.
The linear membership function is used to measure the consistency test towards
the interval judgement
⎧ Sw
⎪1 − t if St w ⩽ Tt
μt (Stw) = ⎨ Tt ,
⎪ 0,
⎩ otherwise
where Tt is the parameter of tolerance.
10-4
Modern Optimization Methods for Science, Engineering and Technology
Definitions
1. For the given input i , the system produces an output o then production
possibility set is given by T = {(i , o)∣i can produce o}. That is, for the
amount of input i, the system/firm produces the amount of output o. Note
that the inputs may be resources such as raw material and human resources,
and outputs may be the services, number of transactions and so on.
2. Farrell input efficiency tells us the amount of reduction in input i that is able
to produce the same amount of output o. It can be defined as
Fie = min{k∣ki can produce o}, where k ⩾ 0.
3. Farrell out efficiency tells us the amount of increase in the output o, for the
same amount of input i , and is given by Foe = max{l∣i can produce lo}, where
l ⩾ 0.
4. Let (i1, o1) and (i2, o2 ) be the members of the production possibility set
(i2, o2 ) dominates (i1, o1) iff i2 ⩽ i1 and o2 ⩾ o1 and (i1, o1) ≠ (i2, o2 ).
5. A (i , o) is efficient in the production possibility set iff ∃ no member
(il , ol ) ∈ T dominates (i , o).
10-5
Modern Optimization Methods for Science, Engineering and Technology
Using Farrell efficieny scores, the ranking of organizations is determined. There are
other methods to find efficiency scores based on distance, called the Shepard distance.
In the literature there are methods based on the reduction of input and expansion
output simultaneously to obtain the efficiency scores. The efficiency obtained based
on Farrell is called technical efficiency (TE). If we replace the input vector with a cost
vector and the efficiency computation is carried out, the resulting computation is
called cost efficiency (CE). Cost efficiency is choosing the right candidate with
minimal cost. The other efficiency is allocative efficiency (AE) and it can be
computed as AE = CE/TE. In the same way one could define revenue efficiency
and profit efficiency.
Based on the need, one could construct the production possibility set in terms of
production function or cost function with CRS, IRS, DRS, FDH and VRS.
10-6
Modern Optimization Methods for Science, Engineering and Technology
10-7
Modern Optimization Methods for Science, Engineering and Technology
10-8
Modern Optimization Methods for Science, Engineering and Technology
ui ⩾ 0 (1 ⩽ i ⩽ n ) and vj (1 ⩽ j ⩽ m),
where yik denotes the output k produced by DMU i , xij denotes the input j used by
DMU i , and ui and vj are weights supplied to outputs and inputs, respectively. The
objective of this fractional linear programming problem (10.2) is to obtain the
weights ui , vj which could maximize the ratio DMU0; the DMU0 is being evaluated.
The model (10.2) can be converted to a multiplier model using CCR trans-
formation and is
Maxu,vz = u1y1, j + u2y2, j + …. +u nyn,j (10.3)
10-9
Modern Optimization Methods for Science, Engineering and Technology
10-10
Modern Optimization Methods for Science, Engineering and Technology
DMU1 5 14 9 4 16
DMU2 8 15 5 7 10
DMU3 7 12 4 9 13
5v1 + 14v2 = 1
10-11
Modern Optimization Methods for Science, Engineering and Technology
k
∑yrj λj ⩽ yr0 . 1⩽r⩽m
j=1
λj ⩾ 0, ∀ j .
The dual model (10.4) is feasible if z ∗ = 1, λ j∗ = 1 and λ 0 = 0, for j ≠ 0. This
process has to be repeated as many times as the number of DMUs.
Note that the optimal solutions occur on the boundary points and not all
boundary points are efficient. In such a scenario, we employ the modified model
(10.5) as
n s
Max∑ si− + ∑sr+, (10.5)
i=1 r=1
subject to
k
∑xijλj + si− = zxi0, 1⩽i⩽n
j=1
k
∑yrj λj − sr+ = yr0 , 1⩽r⩽m
j=1
λj ⩾ 0, 1 ⩽ j ⩽ k
si− ⩾ 0, 1 ⩽ i ⩽ m
sr+ ⩾ 0, 1 ⩽ i ⩽ n ,
where sr+ and si− are slack variables.
DEA efficiency: Any DMU is efficient if and only if z ∗ = 1 and all slack variables
are zero.
10-12
Modern Optimization Methods for Science, Engineering and Technology
Now we will solve example 10.1 using input oriented and output oriented
methods and present the efficiency scores for three DMUs. The efficiency scores
of DMU1, DMU2 and DMU3 are, 1, 0.7733 and 1, respectively. Clearly DMU2 is
inefficient.
The models given in (10.2) to (10.5) are CCR—i.e. input oriented. In the same
manner, one can define the output oriented models.
k
∑yrj λj ⩽ yr0 , 1⩽r⩽m
j=1
k
∑ λj = 1
j=1
λj ⩾ 0, ∀ j .
BCC efficient: Any DMU is BCC efficient if z = 1, that is there exists one optimal
solution for non-negative u and v, otherwise the DMU is inefficient.
As we did in the CCR model, in the case of an inefficient boundary point we
deploy the following model:
n s
Max∑si− + ∑sr+, (10.7)
i=1 r=1
subject to
k
∑xijλj + si− = zxi0, 1⩽i⩽n
j=1
k
∑yrj λj − sr+ = yr0 , 1⩽r⩽m
j=1
10-13
Modern Optimization Methods for Science, Engineering and Technology
k
∑ λj = 1
j=1
λj ⩾ 0, 1 ⩽ j ⩽ k
si− ⩾ 0, 1 ⩽ i ⩽ m
sr+ ⩾ 0, 1 ⩽ i ⩽ n ,
where sr+and si− are slack variables.
The BCC output oriented model is given as
w∗ = max w, (10.8)
subject to
k
∑xijλj ⩽ xi0, 1⩽i⩽n
j=1
k
∑yrj λj ⩾ wyr0, 1⩽r⩽m
j=1
k
∑ λj = 1
j=1
λj ⩾ 0, ∀ j .
10-14
Modern Optimization Methods for Science, Engineering and Technology
Allahabad Bank 6137.051 46 378.76 556 579.193 1560 695.2 188 849.46 148.5
Andhra Bank 8847.228 0 537 209.13 416 888.62 176 346.75 166.5
Bank of Baroda 37 564.451 353 920.31 1108 851.704 2289 771.6 440 612.77 168
Bank of India 26 310.955 217 519.64 1136 107.609 3538 093.2 417 964.7 179.6
Bank of 8189.764 0 362 308.724 299 714.37 130 529.86 180.2
Maharashtra
Bharatiya Mahila 56.178 1350 4788.136 5.841 1558.457 32.9
Bank Ltd
Canara Bank 20 118.673 66 393.138 1413 138.326 2879 455.5 440 221.35 144.462
Central Bank of 16 584.535 12 000 888 200.49 768 034.42 258 878.97 119.478
India
Corporation Bank 12 213.76 6000 632 800.927 805 599.41 194 112.37 187.9
Dena Bank 5975.075 0 352 262.182 390 862.62 106 457.34 146.2
Idbi Bank Limited 15 856.281 14 437.12 989 994.323 1983 065.7 280 431.02 251.8
Indian Bank 5371.302 62.942 505 146.666 297 525.23 162 437.84 153.1
Indian Overseas 18 253.872 57 568.519 757 189.041 758 587.35 235 172.95 124.1
Bank
Oriental Bank of 6947.381 1656.375 656 578.381 403 696.29 200 587.1 168.873
Commerce
Punjab and Sind 2507.299 8000 276 450.374 149 244.64 87 443.409 162
Bank
Punjab National 22 238.461 4690 1525 444.026 3357 959.2 474 243.5 135.9
Bank
Syndicate Bank 8945.607 112 963.74 678 464.959 813 299.55 231 977.8 146.1
Uco Bank 5838.033 45 147.936 818 859.854 593 404.56 185 609.74 126.8
Union Bank of 11 248.14 3619.76 877 728.066 3937 164 321 988.01 155.1
India
United Bank of 5588.093 21 000 447 233.834 115 839.1 99 366.709 123.7
India
Vijaya Bank 5545.97 301.612 418 424.895 172 765.8 120 835.79 145.7
State Bank of Bik 5025.53 0 247 823.746 502 362.9 95 924.728 121.5
and Jai
State Bank of 5191.821 7006.436 380 075.95 694 043.33 141 872.07 147.1
Hyderabad
State Bank of 150 809.19 124 570.23 4376 605.631 9719 560.1 1636 853.1 141.1
India
State Bank of 3882.42 0 201 239.554 393 592.22 71 277.658 118.3
Mysore
State Bank of 2693.868 0 309 170.205 302 991.46 104 570.99 128.8
Patiala
State Bank of 5879.217 11 496.507 360 618.29 407 926.73 96 088.79 117.6
Travancore
10-15
Modern Optimization Methods for Science, Engineering and Technology
10-16
Modern Optimization Methods for Science, Engineering and Technology
References
[1] Venkatesh K A, Pushkala N and Mahamayi J 2017 IFA - Off-Balance Sheet Items and
Performance Evaluation of Public and Private sector banks in India: A DEA approach
CFA Institute https://arx.cfa/post/IFA-Off-Balance-Sheet-Items-and-Performance-Evaluation-
of-Public-and-Private-sector-banks-in-India-A-DEA-approach-4968.html
[2] Cooper W, Seiford L M and Tone K 2007 Data Envelopment Analysis: A Comprehensive Text
with Models, Applications, References and DEA-Solver Software (Berlin: Springer)
[3] Coelli T and Perelman S 1999 A comparison of parametric and non-parametric distance
functions: with application to European railways Eur. J. Operation. Res. 117 326–39
[4] Behr A 2015 Production and Efficient Analysis with R (Berlin: Springer)
[5] Ray S C DEA, Theory and Techniques for Economics and Operations Research (Cambridge:
Cambridge University Press)
[6] AliEmrouznijad M T Performance Management with Fuzzy Data Envelopment Analysis
(Berlin: Springer)
[7] Bogetoft P and Otto L 2011 Benchmarking with DEA, SFA and R (Berlin: Springer)
10-17
IOP Publishing
Chapter 11
Evolutionary techniques in the design of
PID controllers
Santosh Desai, Rajendra Prasad and Shankru Guggari
Currently, the world is witnessing great changes in almost all areas, including the
field of system engineering, with steeply increasing system complexity resulting in
outsized systems. Generally these systems are best described by a large number of
differential or difference equations to form a mathematical model and ease the
process of analysis, simulation, design and control. However, the said task is not as
easy as it seems to be. It is very grueling, sometimes not feasible and also proves to
be a costly affair because of what may be called ‘the curse of dimensionality’. Hence
model order reduction has been borne out of the necessity to provide simplified
models, thereby addressing the effects of higher dimensional models and allowing us
to cope with the emergence of present day technology.
Today, various methods are being proposed by authors [1–9] and are available in
the literature. However, choosing the right method is one aspect which needs to be
considered. The area of application may also have an effect in addition to satisfying
the design requirements. Finally, this results in the need for approximations (low
order). Hence there is a need for a suitable technique to combat/improve the
drawbacks of the prevailing methods by developing/proposing new reduction
techniques satisfying the need of the hour. The proposed methods are not only
limited to continuous time systems, but can also be applied to systems presented in
discrete time. The same is justified by applying the proposed technique on several
numerical models. The outcomes are compared to other well-known methods when
subjected to specific test inputs. However, these simulations were performed to
obtain the behavior of the system in open loop configuration, but in most practical
cases some sort of controller always exists to control the system behavior. The design
of such a controller becomes a crucial task, in particular when the plant size is very
large. In that case, the size of the controller also increases, thereby resulting in
complicated and costly design. Apart from this, more computational time, and
difficulties in analysis, simulation and understanding of the system arise. Hence,
there is a need for a suitable lower order controller which can be derived from a
higher order controller, preserving the crucial dynamics of the original.
Furthermore, the derived reduced controller will be able to control the original
higher order system satisfactorily. This has resulted in the application of order
reduction methods to address controller reduction problems. In this chapter, the
design of a PID controller is achieved with the aid of evolutionary techniques such as
PSO and BBBC. Further, these techniques are extended to fine-tune the controller
parameters, namely λ (integral order) and μ (derivative order) in the fractional order
proportional integral derivative (FOPID) controller structure.
Plant Controller
Reduction Reduction
11-2
Modern Optimization Methods for Science, Engineering and Technology
+
Gc(s) Gp(s)
-
R(s) E(s) u(s) C1(s)
R(s)
+ E(s) u(s) C2(s)
Rc(s) Rp(s)
-
Figure 11.2. Original and reduced controller configuration with reference model.
appropriately with unity feedback and compared to RCL(s). The procedure carried
out in the controller reduction approach is also referred to as an indirect approach.
Here, the reduction process is carried out at the last stages and the possibility of
error propagation is ruled out. In the direct approach the issue of error creeps in
since the reduction is carried out during the initial stage of design.
In the present study, both direct and indirect design approaches have been
considered. Another optimization technique based on big bang big crunch (BBBC)
theory [13] is being used to tune the parameters of the PID controller for the said
purpose. This approach has yielded better results when compared to the HNA
assisted approach. In addition to this PSO, another evolutionary method, is also used
for optimizing the design parameters. The evolutionary technique under discussion
helps in searching for and selecting the best possible set of parameters which satisfies
the design requirements. When the design task is completed, the closed loop
responses are then compared with the reference model M(s). The reference model
M(s), also called the specification model or standard model, is the desired closed loop
system. To conclude, M(s) meets all the desired performance specifications and acts
as the basis for comparison.
11-3
Modern Optimization Methods for Science, Engineering and Technology
go + g1s + … + gus u
M (s ) = ; u < v. (11.2)
ho + h1s + h2s 2 + … + h vs v
Step 2. Obtain the equivalent open loop specification model from the reference
model M(s) using
˜ (s ) = M (s )
M . (11.3)
1 − M (s )
Step 3. Consider the structure of the controller given by
po + p1 s + … + pk s k
Gc(s ) = ; k < j. (11.4)
qo + q1s + q2s 2 + … + qj s j
Step 4. Compare the response of the system with the reference model to obtain the
unknown controller parameters by
Gc(s )Gp(s ) = M ˜ (s )
or
˜ (s ) ∞ (11.5)
M
Gc(s ) =
Gp(s )
= ∑eis i ,
i=0
11-4
Modern Optimization Methods for Science, Engineering and Technology
Step 6. RP(s) is obtained from GP(s) using the reduction method. Steps 4 and 5 are
repeated and the transfer function (closed loop) for RCL(s) (reduced order
model) is given by
R c(s )Rp(s )
R CL(s ) = . (11.8)
1 + R c(s )Rp(s )
11-5
Modern Optimization Methods for Science, Engineering and Technology
positions pbest [19] (personal best) and gbest (global best) found by the entire
swarm. pbest and gbest are iteratively updated for each particle, until a better or
more dominant solution (in terms of fitness) is found. This process continues, until
the maximum iterations are reached or specified criteria are met. The equation
governing the movement of the particles of the swarm is
vi = vi + c1r1(pi − xi ) + c2r2(pg − xi ), (11.9)
where
vi is the velocity vector of the ith particle,
xi is the position vector of the ith particle,
pi is the n-dimensional personal best of the ith particle found from initialization,
pg is the n-dimensional global best of the swarm found from initialization,
c1 is the cognitive acceleration coefficient and
c2 is the social acceleration coefficient
According to Eberhart and Shi [21], the optimal strategy is to initially set w to 0.9
and reduce it linearly to 0.4, allowing initial exploration followed by acceleration
toward an improved global optimum.
The problems in velocity update equations (11.9) and (11.10) were addressed by
Clerc [15] by introducing constriction coefficient χ so as to result in
vi = χ [vi + c1r1(pi − xi ) + c2r2(pg − xi )], (11.12)
where χ is computed as
2
χ= , (11.13)
∣2 − ϕ − ϕ(ϕ − 4) ∣
where ϕ = c1 + c2, ϕ > 4.
The velocity equation forms the main component of the PSO algorithm and is
unlikely to become stuck in the local minima [22]. The basic PSO algorithm is as
follows:
Step 1. [Start.] Initialize the position, velocity and the personal best of each particle
in the swarm at random.
Step 2. [Evaluate fitness value.] For each iteration, the particles will move progres-
sively into the solution space. The action involves updating the particle velocity
and its movement, and assessing the fitness function for the current position.
11-6
Modern Optimization Methods for Science, Engineering and Technology
Step 3. [Compare fitness function.] The fitness function of the current position and
gbest are compared. Revisit the above stages for the whole particles.
Step 4. [Maximum iteration.] Continue until the iteration limit or until the termination
criteria are reached. Stop, and return the best solution gbest, otherwise update
w and go to the next iteration.
Step 5. [Loop.] Go to step 2, evaluate fitness value.
The flowchart as shown in figure 11.3 indicates the process flow of PSO. Table 11.1
gives the typical parameters used for PSO in the present study.
Start
Initialize swarm
Parameters Value
Swarm size 20
Maximum generations 100
c1, c2 2, 2
wstart, wend 0.9, 0.4
11-7
Modern Optimization Methods for Science, Engineering and Technology
s 5 + 8s 4 + 20s 3 + 16s 2 + 3s + 2
Gp(s ) =
2s + 36.6s 5 + 204.8s 4 + 419s 3 + 311.8s 2 + 67.2s + 4
6
0.023s + 0.0121
M (s ) = 2 .
s + 0.21s + 0.0121
Step 1. Consider M(s) and determine the equivalent open loop transfer function
using (11.3):
3 2
˜ (s ) = 0.023s + 0.01693s + 0.002819s + 0.0001464 .
M
s 4 + 0.397s 3 + 0.05137s 2 + 0.002263s
Step 2. Let the desired controller be according to (11.5) and this is given by
M˜ (s ) 1
Gc(s ) = = (0.064707 + 0.767859s + 0.801795s 2 − 4.681159s 3 + …).
Gp(s ) s
Step 4. Gc(s) is compared to the power series expansion to obtain the parameters K1,
K2 and K3. This results in the PID controller
0.064707 + 0.767859s + 0.801795s 2
Gc(s ) = .
s
11-8
Modern Optimization Methods for Science, Engineering and Technology
Step 6. Apply the PSO technique to attain a reduced order model (second) from
GP(s). This is achieved by minimizing the ISE between the Gp(s) and Rp(s)
using the formula
0
I= ∫∞ [y(t ) − yr (t )]2 dt , (11.14)
where y(t) = unit step response of Gp(s) and yr(t) = unit step response of Rp(s).
Thus
0.02555s + 0.01036
R ppso(s ) = 2
.
s + 0.4756s + 0.01036
Step 10. The values of K1, K2 and K3 are obtained with the comparison of power
series coefficients:
0.06471 + 0.714036s + 1.89926s 2
R cga(s ) = .
s
Step 11. Obtain the transfer function of RCLPSO(s), RCLGA(s) RCLHNA(s) as per
step 6 in 11.1.1.1
11-9
Modern Optimization Methods for Science, Engineering and Technology
1.2
0.8
Amplitude
0.6 M (s)
GCL (s)
RCLPSO (s)
0.4
0.2
0
0 10 20 30 40 50 60 70 80 90 100
Time (sec)
Table 11.2. Qualitative comparison of original and reduced order plants in terms of time domain specifications
for example 11.1.
11-10
Modern Optimization Methods for Science, Engineering and Technology
˜ (s ) = 4s 2 + 16s + 16
M .
s 4 + 8s 3 + 20s 2 + 16s
Step 2. According to (11.5), the controller transfer function is given as
M˜ (s ) = Gc(s )Gp(s )
or
M ˜ (s )
Gc(s ) =
Gp(s )
1
= (0.8316 + 0.5313s − 0.2841s 2 + 0.1159s 3 + …).
s
Step 3. Choose the controller structure as
k (1 + k1s )
Gc(s ) = .
s(1 + k2s )
Step 4. Comparing the power series expansion coefficients with controller structure
results in
K = 0.8316, K1 = 1.1735, K2 = 0.5347
and Gc(s) for the original plant will be
0.976s + 0.8316
Gc(s ) = .
0.5347s 2 + s
11-11
Modern Optimization Methods for Science, Engineering and Technology
0.8
Amplitude
0.6 M(s)
GCL(s)
RCLPSO(s)
0.4
0.2
0
0 1 2 3 4 5 6 7 8 9
Time (sec)
Table 11.3. Qualitative comparison of original and reduced order plants in terms of transient response
parameters for example 11.2.
Step 6. The third order reduced model using PSO is obtained by reducing the
transfer function (closed loop) by minimizing (11.14):
0.8622s 2 + 2.05s + 0.9609
R CLPSO(s ) =
s 3 + 3.258s 2 + 3.172s + 0.9609
0.4844s 2 + 2.393s + 1.674
R CLGA(s ) = 3
s + 3.233s 2 + 4.045s + 1.674
0.9633s 2 + 3.88s + 1014
R CLHNA(s ) = .
1.176s 3 + 1.404s 2 + 1190s + 1013
The comparison of the step responses of M(s), GCL(s) and RCLPSO(s) is depicted
in figure 11.5. These responses are found to be comparable in terms of transient
response specifications with the responses from other methods in table 11.3. It
can be concluded that the result obtained by PSO is comparable.
11-12
Modern Optimization Methods for Science, Engineering and Technology
s 5 + 8s 4 + 20s 3 + 16s 2 + 3s + 2
GP(s ) =
2s + 36.6s 5 + 204.8s 4 + 419s 3 + 311.8s 2 + 67.2s + 4
6
0.023s + 0.0121
M (s ) = 2 .
s + 0.21s + 0.0121
Step 1. Following steps 1 to 5 of example 11.1 in section 11.1.2.1.1, we obtain
GCL(s )
0.8228s 7 + 7.349s 6 + 22.66s 5 + 29.02s 4 + 16.03s 3 + 4.981s 2 + 1.728s + 0.1294
= .
1.823s 7 + 25.65s 6 + 125.1s 5 + 238.5s 4 + 172s 3 + 38.58s 2 + 3.728s + 0.1294
Step 2. The second order model is obtained after the original system is reduced using
the proposed BBBC by minimizing (11.14) to obtain
0.0233s + 0.01176
RPBBBCOA(s ) = .
s 2 + 0.2035s + 0.01176
Step 3. Similarly, using PSO, GA and HNA, the reduced transfer function is
0.02555s + 0.01036
R ppso(s ) = 2
s + 0.4756s + 0.01036
0.01414s + 0.009369
R pga(s ) = 2
s + 0.1436s + 0.009369
0.0113s + 0.0736
R phna(s ) = 2 .
s + 0.1436s + 0.009369
11-13
Modern Optimization Methods for Science, Engineering and Technology
M˜ (s )
R CBBBC(s ) =
RPBBBC(s )
1
= (0.06191 + 0.76252s + 0.5764s 2 − …).
s
Step 6. K1, K2 and K3 of the controller are obtained using power series coefficients
0.06191 + 0.7625s + 0.5764s 2
R CBBBC(s ) = .
s
Step 7. Second order model is obtained by reducing Gp(s) and GC(s) (PSO, HNA,
BBBC, GA) in closed loop results as
0.01343s 3 + 0.02455s 2 + 0.01041s + 0.000728
R CLBBBC(s ) =
1.013s 3 + 0.2281s 2 + 0.02217s + 0.000728
0.04853s 3 + 0.03861s 2 + 0.00933s + 0.0006703
R CLPSO(s ) =
1.049s 3 + 0.2142s 2 + 0.01969s + 0.0006703
0.01165s 3 + 0.07664s 2 + 0.005604s + 0.004762
R CLGA(s ) =
1.012s 3 + 0.417s 2 + 0.0792s + 0.004762
0.05177s 3 + 0.0445s 2 + 0.007102s + 0.0006152
R CLHNA(s ) = .
1.052s 3 + 0.1885s 2 + 0.0167s + 0.0006152
The response of M(s), GCL(s) and RCLBBBC(s) for a given step input is plotted in
figure 11.6. Further, the outcomes are compared with other methods as listed in
table 11.4.
Example 11.4. A sixth order stable practical system (example 11.2 in section
11.1.2.1.2) is considered:
248.05s 4 + 1483.3s 3 + 91931s 2 + 468730s + 634950
Gp(s ) =
s 6 + 26.24s 5 + 1363.1s 4 + 26803s 3 + 326900s 2 + 859170s + 528050
4
M (s ) = 2 .
s + 4s + 4
11-14
Modern Optimization Methods for Science, Engineering and Technology
0.8
Amplitude
M(s)
GCL(s)
0.6 RCLBBBC(s)
0.4
0.2
0
0 10 20 30 40 50 60 70 80 90 100
Time (sec)
Table 11.4. Qualitative comparison of original and reduced order plants in terms of transient response
parameters for example 11.3.
11-15
Modern Optimization Methods for Science, Engineering and Technology
1.2
0.8
Amplitude
0.6
M(s)
GCL(s)
0.4 RCL3BBBC
RCL2BBBC
0.2
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Time (sec)
Step 2. GCL(s) is reduced to third and second order using BBBC by minimizing
(11.14) and is given by
Step 3. The reduced third order systems obtained by PSO, GA and HNA are
0.8622s 2 + 2.05s + 0.9609
R CL3PSO(s ) =
s 3 + 3.258s 2 + 3.172s + 0.9609
0.4844s 2 + 2.393s + 1.674
R CL3GA(s ) = 3
s + 3.233s 2 + 4.045s + 1.674
0.9633s 2 + 3.88s + 1014
R CL3HNA(s ) = .
1.176s 3 + 1.404s 2 + 1190s + 1013
The comparison of the step responses of M(s), GCL(s), RCL3BBBC(s) and RCL2BBBC(s)
is depicted in figure 11.7. It is seen that GCL(s), RCL3BBBC(s) and RCL2BBBC(s) almost
overlap each other and are competitive. The same can be concluded by examining
table 11.5, which compares the responses from other methods in terms of transient
domain specifications. It can be concluded that the result obtained by BBBC is
outstanding.
11-16
Modern Optimization Methods for Science, Engineering and Technology
Table 11.5. Qualitative comparison of original and reduced order plants in terms of transient response
parameters.
11-17
Modern Optimization Methods for Science, Engineering and Technology
Figure 11.8. Fractional order PID and the classical PID (integer and fractional order).
i=1
where bj, ai are scalar constants, n is the order of the numerator and m is the order of
the denominator polynomial. The FOPID controller GFOPID(s) in the form of
(11.15) is the objective of this proposal. Later, the response of Gp(s) connected in
series with GFOPID(s) (with unity feedback) injected with unit step input is obtained.
The closed loop response meets the specified time domain parameters, i.e. MP, TS,
Tr, steady state error (SSE) and integral square error (ISE).
11-18
Modern Optimization Methods for Science, Engineering and Technology
40
Gp(s ) = 3 2
.
2s + 10s + 82s + 10
where y(t) = the unit step response of the higher order model and yr(t) = unit step
response of the reduced order model at time t, and the peak overshoot is given by
−σd π
Mp = exp( ωd ) 100, (11.18)
Step 7. The transfer function in the closed loop configuration (for λ = 0.9, μ = 1.1) is
given by
11-19
Modern Optimization Methods for Science, Engineering and Technology
Figure 11.18 illustrates the path traced by λ, μ for a population size of 75; the
number of iterations is 60 along with 0.75 as the reduction rate. The values obtained
in Matlab (2011b) during every iteration during the optimization process using
BBBC are considered for plotting the path.
Here, another approach to optimally tune the FOPID parameters using BBBC
[34–37] is being proposed. The results yielded are compared to GA [32]. The values
1.4
1.2
1
Amplitude
0.8
0.6 GP(s)
GCLBBBC(s)
0.4 GCLBBBCRED(s)
GCLGA(S)
0.2
0
0 5 10 15 20 25 30 35
Time (sec)
Figure 11.9. Comparison of unit step responses (FOPIDBBBC and FOPIDGA controllers for λ = 0.9 and
μ = 1.1).
11-20
Modern Optimization Methods for Science, Engineering and Technology
0.8
Amp litud e
0.6
0.4
0.2
0
0 5 10 15 20 25 30 35 40 45 50
Time (sec)
of Mp, Ts, ISE and ITAE are being taken into account for the qualitative analysis
and it is observed that the proposed technique has performed comparatively better.
Also, the FOPID controller yielded good results compared to the PID controller
(integer mode). Further, researchers can continue tuning the values of λ, μ up to two
or three decimal digits for better responses.
11-21
Modern Optimization Methods for Science, Engineering and Technology
Figure 11.12. Unit step response for FOPID controller λ (λ < 1).
Figure 11.13. Unit step response for FOPID controller (λ < 1, μ < 1).
11.3 Conclusion
In this chapter, the task of designing both PID and FOPID controllers has been
accomplished successfully. In the case of the PID controller, both direct and indirect
methods of controller design are considered in this work. The reduction of the closed
11-22
Modern Optimization Methods for Science, Engineering and Technology
Figure 11.14. Unit step response for FOPID controller (λ > 1).
Figure 11.15. Unit step response for FOPID controller (λ > 1, μ > 1).
loop system is then performed using PSO and BBBC by reducing the error between
the reference model and the reduced model. Later, unknown controller parameters
are found. The suitability of the proposed methods is justified by solving numerical
examples from the literature available. The step responses illustrate the value of the
proposed method.
11-23
Modern Optimization Methods for Science, Engineering and Technology
Figure 11.16. Unit step response for FOPID controller (λ > 1, μ < 1).
Figure 11.17. Unit step response for FOPID controller (λ < 1, μ > 1).
In the case of the FOPID controller, the values of λ and μ are tuned by using
BBBC and then the results are compared to the existing technique. Qualitative
comparison in terms of settling time, peak overshoot, ISE and IATE are tabulated
apart from step responses to justify the proposed method.
11-24
Modern Optimization Methods for Science, Engineering and Technology
Peak Peak
Settling overshoot Settling overshoot
λ μ time Ts Mp ISE IATE time Ts Mp ISE IATE
11-25
Modern Optimization Methods for Science, Engineering and Technology
60 *
50 *
* *
* * * * * *
40
* **
* * *
* * **
*
Iterations
30
*
*
* *
20
* * * * **
* *
* ** * *
10
* * *
*
0
* ** **
1.4
* *
1.2
1 * *
0.8 0.9
0.8
0.6 0.7
0.6
Mu 0.4 0.5
0.4
0.2 0.3
0.2
0 0.1 Lambda
0
References
[1] Aguirre L A 1991 New algorithm for closed-loop model matching Electron. Lett. 27 2260–2
[2] Bandyopadhyay B, Unbehauen H and Patre B M 1998 A new algorithm for compensator
design for higher-order system via reduced model Automatica 34 917–20
[3] Goddard P J and Glover K 1998 Controller approximation: approaches for preserving H∞
performance IEEE Trans. Autom. Control 43 858–71
[4] Mukherjee S 1997 Design of compensators using reduced order models Int. Conf. on
Computer Applications in Electrical Engineering, Recent Advances (CERA-97) (Roorkee:
Indian Institute of Technology Roorkee), pp 529–36
[5] Pal J and Sarkar P 2002 An algebraic method for controller design in delta domain Int. Conf.
on Computer Applications in Electrical Engineering, Recent Advances (CERA-01) (Roorkee:
Indian Institute of Technology Roorkee), pp 441–9
[6] Prasad R, Pal J and Pant A K 1990 Controller design using reduced order models 14th
National Systems Conf. (NSC-1990) (Aligarh: Aligarh Muslim University), pp 182–6
[7] Wang G, Sreeram V and Liu W Q 2001 Performance preserving controller reduction via
additive perturbation of the closed-loop transfer function IEEE Trans. Autom. Control 46
771–5
[8] Ramesh K, Nirmalkumar A and Gurusamy G 2009 Design of discrete controller via novel
model order reduction technique Int J Elect Power Eng. 3 163–8
[9] Hemmati R, Boroujeni S M S, Delafkar H and Boroujeni A S 2011 PID controller
adjustment using PSO for multi area load frequency control Aust. J. Basic Appl. Sci. 5
295–302
[10] Anderson B D O and Liu Y 1989 Controller reduction: concepts and approaches IEEE
Trans. Autom. Control 34 802–12
11-26
Modern Optimization Methods for Science, Engineering and Technology
[11] Wilson D A and Mishra R N 1979 Design of low order estimators using reduced models Int.
J. Control 29 447–56
[12] Yousuff A and Skelton R E 1984 A note on balanced controller reduction IEEE Trans.
Autom. Control 29 254–7
[13] Erol O K and Eksin I 2006 New optimization method: Big Bang–Big Crunch Adv. Eng.
Software 37 106–11
[14] Hassan R, Cohanim B, de Weck O and V G 2005 A comparison of particle swarm
optimization and the genetic algorithm 46th AIAA/ASME/ASCE/AHS/ASC Structures,
Structural Dynamics and Materials Conf. (Austin, TX)
[15] Clerc M 1999 The swarm and the queen: towards a deterministic and adaptive particle
swarm optimization ICEC 1999 (Washington, DC) pp 1951–7
[16] Kennedy J and Eberhart R 1995 Particle swarm optimization Proc. IEEE Int. Conf. on
Neural Networks vol 4 pp 1942–8
[17] Engelbrecht A P 2007 Computational Intelligence: An Introduction 2nd edn. (Chichester: Wiley)
[18] Kennedy J, Eberhart R C and Shi Y 2001 Swarm Intelligence vol 10 (San Francisco, CA:
Morgan Kaufmann)
[19] Bahrepour M, Mahdipour E, Cheloi R and Yaghoobi M 2009 SUPER-SAPSO: a new SA-
based PSO algorithm Applic. Soft Comput. 58 423–30
[20] Shi Y and Eberhart R C 1998 A modified particle swarm optimizer IEEE Int. Conf. on
Evolutionary Computation (Piscataway, NJ: IEEE), pp 69–73
[21] Eberhart R C and Shi Y 2000 Comparing inertia weights and constriction factors in particle
swarm optimization Congress on Evolutionary Computation 2000 (San Diego, CA) pp 84–8
[22] Clerc M 2006 Particle Swarm Optimization (London: ISTE)
[23] Jamshidi M 1983 Large Scale Systems Modelling and Control Series vol 9 (Amsterdam:
North Holland)
[24] Prasad R 1989 Analysis and design of control systems using reduced order models PhD
Thesis University of Roorkee, India
[25] Valerio D and da Costa J S 2006 Tuning-rules for fractional PID controller Fract. Diff. Appl.
2 28–33
[26] Abu-Al-Nadi D, Othman M K A and Zaer S A-H 2011 Reduced order modeling of linear
MIMO systems using particle swarm optimization The Seventh Int. Conf. on Autonomic and
Autonomous Systems (ICAS 2011)
[27] Zhang J, Wang N and Wang S 2004 A developed method of tuning PID controllers with
fuzzy rules for integrating processes 2004 American Control Conf. (Boston, MA) pp 1109–14
[28] Podlubny I 1999 Fractional-order systems and PID controllers IEEE Trans. Autom. Control
44 208–14
[29] Caponetto R, Fortuna L and Porto D 2004 A new tuning strategy for a non-integer order
PID controller First IFAC Workshop on Fractional Differentiation and its Applications
(Bordeaux)
[30] Petras I 1999 The fractional order controllers: methods for their synthesis and application
J. Electr. Eng. 50 284–8
[31] Cao J-Y and Cao B-G 2006 Design of fractional order controller based on particle swarm
optimization Int. J. Control Autom. Syst. 4 775–81
[32] Padhee S, Gautam A, Singh Y and Kaur G 2011 A novel evolutionary tuning method for
fractional order PID controller Int. J. Soft Comput. Eng. (IJSCE) 1 1–9
11-27
Modern Optimization Methods for Science, Engineering and Technology
[33] Saptarshi D, Indranil P, Shantanu D and Amitava G 2012 Improved model reduction and
tuning of fractional-order PIλDμ controllers for analytical rule extraction with genetic
programming ISA Trans. 51 237–61
[34] Desai S R and Prasad R 2012 PID controller design using BB-BCOA optimized reduced
order model IJCA Special Issue on Advanced Computing and Communication Technologies
for HPC Applications ACCTHPCA 5 32–7
[35] Desai S R and Prasad R 2013 A novel order diminution of LTI systems using big bang big
crunch optimization and routh approximation Appl. Math. Modell. 37 8016–28
[36] Desai S R and Prasad R 2014 Novel technique of optimizing FOPID controller parameters
using BBBC for higher order system IETE J Res. 60 211–7
[37] Desai S R and Prasad R 2013 A new approach to order reduction using stability equation
and big bang big crunch optimization Syst. Sci. Control Eng. 2013 20–7
11-28
IOP Publishing
Chapter 12
A variational approach to substantial efficiency
for linear multi-objective optimization problems
with implications for market problems
Glenn Harris and Sien Deng
12.1 Introduction
Often in problems of engineering, industry, economics, and the like, decisions are
made concerning optimizing multiple objectives. However, these objectives are not
always measured on a common scale. For example, a genuine analysis of a
company’s profit, worker happiness, public image and time spent on a project
cannot all necessarily be put in terms of a single currency. These are multi-objective
optimization problems and are a very important area in optimization.
12-2
Modern Optimization Methods for Science, Engineering and Technology
nontrivial increase in the value of one currency traded for a negligible decrease of
value of one of the other two currencies. An intrepid investor may spot this
possibility and, with the advent of instant trading and the lightning speed at which
information travels in the modern age, see an opportunity for exploitation. Evidence
of this speed goes as far back as 2004 when electronic currency trading took place at
two centralized electronic broker systems between parties in under a second [1]. In
order to see how this manipulation might be achieved, let f1, f2 and f3 be the
objective functions of countries 1, 2 and 3, respectively, and consider this a
maximization problem. Let x be the properly efficient solution at which the system
is currently. Let {yn }n∈ be some other collection of efficient solutions such that
f1 (yn ) − f1 (x )
f1 (x ) < f1 (yn ), f2 (x ) > f2 (yn ) and f3 (x ) > f3 (yn ), but f3 (x ) − f3 (yn )
> n . Even if as n → ∞
the gain in the first objective is minuscule, the loss of the other is that much smaller.
The investor could profit by exchanging a large sum of currency from country 3’s
to country 1’s currency, then implement actions that would ensure the state of affairs
yn occurs. They would then change their currency back to country 3’s. They could,
depending on the size of the sum of money they exchanged and the state yn they
aimed for, achieve practically any level of profit they wanted. Not only that, but
given that computer programs can be implemented to carry out instant transactions
repeatedly over a short period of time, it could be the case that this operation can be
carried out multiple times over the course of seconds or minutes pushing between the
states yn and x. In between those swaps, they would be purchasing and selling
currencies between the two countries resulting in profit for the investor. This could
occur repeatedly until no one is willing to buy or sell their currency anymore,
effectively skewing the global currency market driving prices up and seizing up the
system that would otherwise be needed for international investment. This may not
even be immediately detectable as yn may be a state very near x. Knowing when the
state of affairs does not have a bounded trade-off between all pairs of contravening
objectives helps to determine what sorts of regulation (fines/fees) to apply to block
the misuse. Either that or the currency brokers could know when a sort of extra
premium should be charged for a currency exchange between countries whose
marginal trade-off is not bounded in order to stop investors from taking advantage.
Another realistic example, which one could call an ‘option scheme’, is rooted in
the stock market. An option is a finance derivative contract which gives the holder
the liberty (but is not compelled, or forced) to buy or sell an asset at a given price by
a certain date. A savvy business entity could purchase and sell stock options
appropriately and then nudge the market one way or the other to make significant
gains in one stock to negligible losses in another. So a person could sell the option to
buy from the company that stands to lose negligibly beneath that loss. They could
also short a stock knowing full well what they will do later will not change the stock
values notably, then use those funds to buy the option to sell the stocks of the
company that stands to gain a noticeable amount (in relation to the negligible loss).
Admittedly, these options may be harder to come by, but it is still a possibility that a
large or unlimited number of options may be offered in certain circumstances.
Indeed, in 2014 the total dollar value of the Chicago Board Options Exchange, a
12-3
Modern Optimization Methods for Science, Engineering and Technology
powerful options trading place, was roughly 570 billion dollars [2] and in 2016 it was
roughly 666 billion dollars [3]. In addition to options, the scheme may be applicable
to other derivitive products as well. There seems to be an almost unlimited potential
for derivatives in the form of credit default swaps, with an overall market value of
between 45 and 62 trillion dollars in 2007 [4], with a large portion of them
responsible for exacerbating the housing market crisis in 2008. Moreover when
these swaps are on securitized debt they can be exercised for partial losses and do not
require an entire collapse [4]. That is an important point for the scheme because the
exchanges are on marginal amounts, but high in volume.
As long as there is a customer, one can sell the option to buy a stock. Such an
option scheme would probably require more than diminutive gains, because they
may not trigger the stock option to sell the rising stock. They can still be small
though, such as $0.10 or $0.02 but perhaps not $0.00003. It may also be difficult for
the exploiter to manipulate stock values, but it is not impossible. Companies can
misinform or withhold information, as in the case of Enron projecting estimated
profits up to 20 years out, even though there were obvious concerns about those
profits [5]. Commodities holders and bankers can withhold or saturate stored
commodities or capital, resulting in altered valuation of stocks. These are only
some of the ways people could contribute to revaluing stocks, which could be
considered as components of the vectors in the feasible set of solutions. While
purchasing stocks can be done manually and valued individually by humans, many
stocks are evaluated using models. These values are recalculated over the day
instantly by computers. That in conjunction with actions on the options being
instantly transacted (within milliseconds) with the modern stock systems [6] makes it
possible to repeatedly take advantage of the option scheme. This could be
considered a misuse of stock options that could degrade trust in the market system
resulting in stagnation or depression. Not knowing this potential for certain
solutions means that stock options may not be valued correctly.
These examples show that there are many situations when properly efficient
solutions to multi-objective problems are not enough. In multi-objective market
problems it is more desirable to have a solution whose trade-off is bound for all pairs
of objectives that have opposing outcomes when switching to another solution. This
is where substantial efficiency enters. Substantial efficiency is a natural extension of
proper efficiency, requiring the trade-off bound apply to all the objectives that stand
to suffer.
The sorts of machinations given above make knowing when a solution to a
problem is or is not substantially efficient of interest to many groups. Investors,
business people and fortune seekers may want to know when a solution to a problem
is not substantially efficient in order to exploit that to potentially build a treasure
trove. Governing bodies and institutions may want to ensure that a solution to a
problem they are solving is a substantially efficient solution in order to avoid such
actions by said investors. Knowing when the current state is not substantially
efficient could result in a governing body taxing such option transactions, making
the option scheme impossible. Or in the currency case, a governing body could make
small changes to the value of a currency, or note that more premiums should be
12-4
Modern Optimization Methods for Science, Engineering and Technology
12.2 Background
The information that is contained in this section is a collection of terminology that is
required for understanding the later sections. Everything that follows will be using
12-5
Modern Optimization Methods for Science, Engineering and Technology
real Euclidean vector spaces. The following notation is for describing vectors and
comparing them, along with some topological notation to be used.
12-6
Modern Optimization Methods for Science, Engineering and Technology
The quotient on the left may be referred to as the trade-off quotient from here on out.
Efficient solutions that are not properly efficient are simply called improperly
efficient. ◊
Solutions that are properly efficient cannot do any ‘better’ in any single component
without doing proportionally ‘worse’ in at least one other component. Proper
efficiency has been shown to be of interest to many in the optimization community
and within industry as well. For instance, in [11] Seinfeld and McBride state that
proper efficiency help to avoid anomalous solutions in their problem solving
process that could arise when just any efficient solution would be chosen.
Another example, in [12] Belenson and Kapur explicitly state that an efficient
solution on its own may not be desirable and instead opt for properly efficient
solutions as they avoid some anomalies as well. In [13], Borwein characterized
proper efficient points in terms of tangent cones. In [14], Benson and Morin
developed some necessary and sufficient conditions for the presence of properly
efficient solutions. Then they related those conditions to the stability of specified
single-objective maximization problems. Also, in [15] Wendell and Lee involved
proper efficient points in their generalization of results from LMOPs to non-linear
cases relying on duality. Further, in [16], Benson furthered Borwein’s work by
defining properly efficient points in terms of projection cones. In [17], Liu extends
the concept of proper efficiency to ϵ -proper efficiency. In [18], Jiang and Deng
extend the concept of proper efficiency to α -proper efficiency. These are just a few of
the many considerations relevant to the concept of proper efficiency. The area has
garnered much more attention than the record provided.
A well known fact following [14] corollary 1 or [18] corollary 3.2 is the following.
Proposition 12.1 In an LMOP, every efficient solution is properly efficient.
Some helpful notation is provided to distinguish the different sets of solutions.
Notation 12.2. Let E(P ) ≔ {x ∈ X : x is efficient for the problem (P )} and let
P(P ) ≔ {x ∈ X : x is properly efficient for the problem (P )}.
The following definition of a tangent cone, which can be thought of as all the
vectors away from x into X , can be found in [19].
12-7
Modern Optimization Methods for Science, Engineering and Technology
This was a short overview of what is necessary for what follows. Good references
that can be used for further variational analysis and optimization are [20] and [19].
Further information concerning multi-objective optimization is located in [21] and [22].
Solutions that are substantially efficient have the property that that one cannot do
any ‘better’ in any single component without doing proportionally ‘worse’ in all
components that stand to suffer.
The definition was again stated in [24] under the name ‘strongly properly
efficient’. In this paper Khaledian, Khorram and Soleimani-Damaneh rediscovered
the concept of substantial efficiency, giving examples and studying the notion in
different senses. One was in the sense of Geoffrion, which is the sense presented in
this paper. The other was in the sense of Benson using cones.
Some more work was done in [25] where Pourkarimi and Karimi provided two
characterizations for substantial efficiency, one rooted in a scalar problem and the
other based on a stability concept. They also introduced the concept of quasi-
substantial efficiency and presented two similar characterizations. Quasi-substan-
tially efficient solutions can have varying orders, with each order determining the
type of bound on the trade-off quotients.
A fact that will be used, that is adapted from [23, p 135] is the following.
Proposition 12.2. For any problem (P), if the number of efficient solutions is finite
then all the efficient solutions are substantially efficient.
The following are trivial facts based on the definitions needed before moving
forward to the new results.
12-8
Modern Optimization Methods for Science, Engineering and Technology
Proposition 12.3. If the problem (P) has only two objective functions (i.e.
f (x ) = {f1 (x ), f2 (x )}) then any properly efficient solution is actually substantially
efficient.
Remark. It is apparent that S(P ) ⊆ P(P ) ⊆ E(P ) for any problem (P).
fi (x*) − fi (x ) mi x* − bi − mi x + bi
=
f j (x ) − f j (x*) mj x − bj − mj x* + bj
(12.1)
m (x * − x ) −mi
= i ⩽ .
mj (x − x )
* mj
Take M = maxi,j ∈{1,…,n} −mmi to obtain a bound that works for all viable combinations
j
of i and j . This is enough to say that any efficient solution is actually substantially
properly efficient. ■
This leads one to wonder if substantial efficiency is equivalent to proper efficiency
in general LMOPs. However, despite every efficient solution being properly efficient
in LMOPs, it is not always the case that every efficient solution is automatically a
substantially efficient solution. A counter-example is provided.
Example 12.1. Consider the LMOP
Minimize: f (x ) = f (x1, x2 , x3)
= (f1 (x ), f2 (x ), f3 (x ))
(12.2)
= (x1 + x2 − x3, x1 − x2 , −x1 + x2 + x3),
subject to: x = (x1, x2 , x3) ⩾ 0.
12-9
Modern Optimization Methods for Science, Engineering and Technology
which is minimized when x1, x2 = 0. The solution x* = (0, 0, 0) has that property
and so it is properly efficient.
Now it will be shown that x* is not substantially efficient. Let M > 1 be any
1
potential bound on the trade-off quotients. Consider the solution x = (1, 1 − M , 5),
which gives f (x ) = ( −3 − M1 , M1 , 5 − M1 ). Since f1 (x ) < f1 (x*) and f2 (x ) > f2 (x*), if
x* is SE then it should be the case that the trade-off quotient between f1 and f2 is
bounded by M . However that is not the case,
1
f1 (x*) − f1 (x ) 3+
= M = 3M + 1 > M . (12.4)
f2 (x ) − f2 (x*) 1
M
So given any M > 1, an x can be found so that the trade-off quotient is greater than
M . So x*, while efficient and properly efficient, cannot be SE. □
So it is not the case that every efficient solution is SE in LMOPs, but it would be
nice if a substantially efficient solution were present in any given problem. That
would give problem analysts a target to aim for when solving problems. However, it
is not always the case that a substantially efficient solution exists.
12-10
Modern Optimization Methods for Science, Engineering and Technology
and a feasible solution (x1, y1, z1), where x1 < y1 and z1 > y1 − x1 > 0. Whether or not
x1 = y1, f2 and f3 can be made smaller without influencing any other objective by
taking z2 = y1 − x1. So an efficient solution must have its third component equal
to the second minus the first. Observe that f1 (x1, y1, z2 ) = −x1 + y1 > 0 and
f3 (x1, y1, z2 ) = z2 − x1 + y1 = 2y1 − 2x1 > 0 since y1 > x1. Both can be lessened by
taking y2 = x1. This will not affect f2 since f2 will stay at 0. Also f4 and f5 will remain
unchanged since x1 has not changed. So a new solution that will lower the first three
components has the form (x1, y2 , z2 ) = (x1, x1, 0). So if a solution is to be efficient it
must have the form (x , x , 0) and f (x , x , 0) = (0, 0, 0, x , −x ). From this it is seen
that every solution of the form (x , x , 0) is efficient as a change in x will result in a
fair exchange in the fourth and fifth component of f.
However, none of these efficient solutions turn out to be substantially efficient.
Two cases will be examined for the efficient solution (x , x , 0). First when 0 < x ⩽ 1
and second when x = 0.
Case 1. Assume that 0 < x ⩽ 1 and let M > 0. Take ϵ so that (x − ϵ, x − ϵ, ϵ 2 ) ∈ X
1
and ϵ < M . Then
f4 (x , x , 0) = x > x − ϵ = f4 (x − ϵ , x − ϵ , ϵ 2 )
and
f3 (x − ϵ , x − ϵ , ϵ 2 ) = ϵ 2 > 0 = f3 (x , x , 0)
so the trade-off quotient takes the form
f4 (x , x , 0) − f4 (x − ϵ , x − ϵ , ϵ 2 ) x − (x − ϵ ) ϵ 1 1
= = 2 = > 1 = M . (12.5)
f3 (x − ϵ , x − ϵ , ϵ 2 ) − f3 (x , x , 0) ϵ2 − 0 ϵ ϵ
M
12-11
Modern Optimization Methods for Science, Engineering and Technology
A simple criterion for the existence of SE solutions has eluded the authors. The
difficulty of not having SE solutions abundantly available leads to investigating
what may be responsible for their irregularity. That could reveal what types of
problems do have SE solutions. The following theorem helps one to understand why
SE solutions are not ubiquitous.
Theorem 12.1 If (L) is an LMOP such that S(L ) = ∅ then either X is unbound in at
least one component or for each x* ∈ E(L ) there is an mj perpendicular to a non-zero
vector in TX (x*).
Proof. Note that the definition of (L) ensures X is closed and convex. Let x* ∈ E(L )
but x* ∉ S(L ), then for all M > 0 there exists xM ∈ X and indices iM , jM ∈ {1, …, n},
miM · x* > miM · xM ,
m jM · x* < m jM · xM ,
(12.7)
miM · (x*−xM )
> M.
m jM · (xM − x*)
This defines a sequence {xk}k∈ . Moreover by the pigeon hole principle it is possible
to pass to a subsequence so that the objective indices i and j are fixed because there is
only a finite number of combinations of two elements from a collection of size n but
infinitely many xk . Let Mk be the corresponding bounds of that subsequence.
There are then two cases since
mi · (x*−xk )
lim > lim Mk = ∞ . (12.8)
k →∞ mj · (xk − x*) k →∞
Case 1. mi · (x*−xk ) → ∞ .
In this case mi · x*−mi · xk > Nk for some scalar sequence {Nk}k→∞. Since mi · x*
is fixed then mi · xk must go to −∞ to stay ahead of Nk as Nk → ∞. Since mi is fixed
this is only possible if at least one of the components of xk goes to ±∞.
Case 2. limk→∞mj · (xk − x*) = 0.
x − x*
In this case either xk → x ≠ x*, xk → x*, or ∥xk∥ → ∞ with mj ⊥ limk→∞ ∥ xk − x* ∥ .
k
If ∥xk∥ → ∞ then X is unbounded in some component. If xk → x then since X is
x (k − 1)x*
convex, every point between xk and x* will be in X . In particular kk − k
∈ X.
This forms a sequence in X with
xk (k − 1)x*
+ − x*
k k = xk − x* → x − x*. (12.9)
1
k
The definition of a tangent cone states that x − x* ∈ TX (x*). So mj is perpendicular
to x − x* since mj · (x − x*) = 0.
12-12
Modern Optimization Methods for Science, Engineering and Technology
by definition. So
mi · (x*−xk )/ ∥xk − x*∥ mi · (x*−xk )
= > Mk . (12.11)
mj · (xk − x )/ ∥xk − x ∥
* * mj · (xk − x*)
Observe that
(x*−xk ) ∥x*−xk∥
mi · = cos(θi )∥mi ∥
∥xk − x*∥ ∥xk − x*∥ (12.12)
= cos(θi )∥mi ∥ ⩽ ∥mi ∥ ,
where θi is the angle between mi and (x*−xk ). So the numerator of the left most
quotient from (12.11) is bounded above. This means that in order for (12.11) to hold,
(xk − x*) xk − x*
lim mj · = mj · lim = 0. (12.13)
k →∞ ∥xk − x*∥ k →∞ ∥xk − x*∥
12-13
Modern Optimization Methods for Science, Engineering and Technology
Corollary 12.1. Let (L) be an LMOP as in definition 12.2, and let x* be an efficient
solution. If X is bounded and there is no mj perpendicular to any non-zero vector in
TX (x*) then x* ∈ X is substantially efficient.
Proof. In theorem 12.1 if it was assumed that x* is not SE then at then end of the
proof a contradiction would be arrived at, implying that x* must have been SE. ■
Observe that X is a closed bounded irregular tetrahedron. Let x* = (0, 2, 2), and
note that x* is the unique maximizer of f3 and unique minimizer of f1 in X , showing
that x* is efficient. Since x* is efficient, the corollary gives a sufficient condition to
check for substantial efficiency. The first condition is that X be bounded, which it is.
The second condition to check is if m1, m2 and m3 are not perpendicular to any
non-zero vector in TX (x*). Since
TX (x*) = {(x1, x2 , x3): x1 + x2 + x3 ⩾ 0, x2 , x3 ⩽ 0, x1 ⩾ 0}
and mi (1) has the opposite sign of both mi (2) and mi (3) for all i, it would be
impossible for there to be x ∈ TX (x*) with x · mi = 0 for any i unless x = (0, 0, 0).
That is to say that there is no non-zero x in the tangent cone perpendicular to any of
the mi . So the corollary then implies that x* = (0, 2, 2) is substantially efficient.
To demonstrate how the corollary compares to a direct method, the direct
method for showing substantial will also be given. Again, note that x* is the unique
minimizer of f1 and f2 and the unique maximizer of f3. So it only needs to be shown
that there exists M > 0 such that for every x ∈ X /{x*},
f3 (x*) − f3 (x ) 6 − ( −x1 + 2x2 + x3)
= ⩽M (12.16)
f1 (x ) − f1 (x*) x1 − x2 − x3 − ( −4)
and
f3 (x*) − f3 (x ) 6 − ( −x1 + 2x2 + x3)
= ⩽ M. (12.17)
f2 (x ) − f2 (x )
* 2x1 − x2 − 2x3 − ( −6)
12-14
Modern Optimization Methods for Science, Engineering and Technology
The comparison of these two methods shows that there are circumstances when
looking at the tangent cone to check substantial efficiency is easier than a direct
method. The same problem gives a non-substantially efficient solution. This example
is given in hopes of explaining why no vector can be perpendicular to mj as
mentioned in corollary 12.1.
Example 12.4. Within the same problem framework as example 12.3 consider the
solution y* = (2, 0, 2) which is also efficient with f (2, 0, 2) = (0, 0, 0). This is seen
because
f1 (x ) = x1 − x2 − x3 < 0 ⇒ x1 < x2 + x3
(12.20)
⇒ f3 (x ) > 0
12-15
Modern Optimization Methods for Science, Engineering and Technology
2+M
Given M > 0, consider the point yM = (2, 1+M
, 1) ∈ X and note that
2+M
f1 (yM ) = 2 − − 1 < 0 = f1 (y*)
1+M
and
2+M
f2 (yM ) = 4 − − 2 > 0 = f2 (y*).
1+M
So
2+M
−4 + +2
f2 (y*) − f2 (yM ) 1+M −2(1 + M ) + 2 + M
= = = M. (12.23)
f1 (yM ) − f1 (y )
* 2 + M (1 + M ) − 2 − M
2− −1
1+M
This is true for every M > 0 so y* cannot be substantially efficient.
This relates to the corollary. Note that limM →∞yM − y* = (0, 1, −1), and
(0, 1, −1) ∈ TX (y*) = {(x1, x2 , x3) ⊂ 3: x1 + x2 + x3 ⩾ 0, and
x1, x3 ⩽ 0 and x2 ⩾ 0}.
This is the type of vector perpendicular to the mj mentioned in the corollary. □
Checking every efficient point’s tangent cones in the pursuit of SE solutions may
be too difficult. So seeing the limitations of that approach, the authors turn to
another way to characterize SE solutions. Since substantial efficiency gives bounda-
ries for the trade-offs of pairs of objectives, it is reasonable to look at all the
restrictions of the problem to two objectives. The next proposition is for a general
problem connecting substantial efficiency to proper efficiency when a problem is
restricted.
Proposition 12.5. Consider a problem (P) (X not necessarily closed). If x* ∈ X is
such that the restriction of (P) to any two objective functions has x* as a properly
efficient solution then x* is SE for (P).
Proof. If x* is properly efficient when (P) is restricted to any two objective functions
then x* must be efficient to (P) on its own. Indeed if there exists an x ∈ X with
12-16
Modern Optimization Methods for Science, Engineering and Technology
fi (x ) < fi (x*) then for any other index j, f j (x ) > f j (x*) otherwise x* would not be
efficient with regards to the restriction of (P) to the ith and jth components.
Let Ma,b be the bound of the trade-off quotient from the properly efficient
criterion when (P) is restricted to the ath and bth objective components. Let
M = max a,b∈{1, 2, …, m};a≠b(Ma,b ). Then for each objective component fi and x ∈ X
satisfying fi (x ) < fi (x*), f j (x*) < f j (x ) implies
fi (x*) − fi (x )
⩽ M.
f j (x ) − f j (x * )
So x* is substantially efficient. ■
The converse does not hold in general, however, with a slight alteration proposition
12.5 can be made necessary and sufficient. First a counter-example is given.
Example 12.6. Consider the minimization problem (P) where f (x ) = ( −x , −x + 1, x )
and X = [0, 1]. Note that every feasible solution is efficient. Also note that x* = 0 is a
substantially efficient solution. Indeed, if x is such that fi (x*) > fi (x ) and f j (x*) < f j (x )
then either i = 1 or 2 and j = 3. Also
f1 (x*) − f1 (x ) f (x*) − f2 (x ) 0 − x
= 2 = = 1 ⩽ 2, (12.24)
f3 (x ) − f3 (x*) f3 (x ) − f3 (x*) x−0
12-17
Modern Optimization Methods for Science, Engineering and Technology
fi (y ) − fi (x )
> 1. (12.25)
f j (x ) − f j (y )
But then
mi · y − mi · x
>1
mj · x − mj · y
(12.26)
mi · y − mi · x > mj · x − mj · y
mi · y + mj · y > mi · x + mj · x ,
but then this again contradicts how y is defined. So y must be substantially efficient.
(Using theorem 12.2.) whenever (L) is restricted to two objective functions, y will
be an efficient solution and thus properly efficient as well. Using theorem 12.2 it
must be the case that y is an SE solution. ■
This is a very particular set of problems for which SE solutions can be guaranteed
to exist. The rarity of the assurance of existence leads one to wonder what can be
said of the set of SE solutions when they do exist. The next theorem ties the topology
of the collection of SE solutions to the trade-off quotient’s bounds.
Theorem 12.3. Consider the problem (P). If the collection of trade-off quotient
bounds {Mx}x∈S(P ) is bounded (i.e. there is a uniform trade-off quotient bound M > 0
that can be applied for any x ∈ S(P )) then S(P ) is closed.
Proof. Assume S(P ) is not closed, thus there is a limit point of S(P ) that S(P ) does not
contain. Let {xn}n∈ ⊆ S(P ) be a sequence that converges to x ∉ S(P ). Since X is
closed, x ∈ X /S(P ). Let M ∈ with M > 2 be a common bound for the trade-off
quotients of every element in {xn}n∈ .
12-18
Modern Optimization Methods for Science, Engineering and Technology
fi n (x ) − fi n (zn)
but > n.
f jn (zn) − f jn (x )
Since there are infinitely many n ∈ but only finitely many combinations of in and
jn then by the pigeon hole principle, a subsequence can be passed to in order to keep
the in and jn fixed. Do so to obtain a fixed i and j for which
fi (x ) − fi (zn) > 0
fi (x ) − fi (zn)
but > n.
f j (zn) − f j (x )
Then pass to another subsequence so that fi (x ) is always greater or lesser than
fi (xn ) and so that f j (x ) is also always greater or lesser than f j (xn ). Now there are
some cases to consider.
Case 1. fi (xn ) > fi (x ) and f j (xn ) > f j (x ).
Choose N large enough that f j (zM ) > f j (xN ). Then
which is a contradiction.
Case 2. fi (xn ) > fi (x ) and f j (xn ) < f j (x ).
f j ( z 2M ) − f j ( x )
Choose N large enough so f j (z2M ) − f j (xN )
< 2, f j (z2M ) − f j (xN ) > 0, fi (xN )−
f j (z2M ) − f j (x )
fi (z2M ) > 0 and f j (x ) < f j (xN ) + ϵ , where ϵ < 2M
. Then
fi (xN ) − fi (z2M ) > fi (x ) − fi (z2M )
> 2M (f j (z2M ) − f j (x ))
> 2M (f j (z2M ) − f j (xN ) − ϵ )
fi (xN ) − fi (z2M ) 2Mϵ
⇒M ⩾ > 2M − (12.28)
f j (z2M ) − f j (xN ) f j (z2M ) − f j (xN )
f j (z2M ) − f j (x )
2M
> 2M − 2M > 2M − 2.
f j (z2M ) − f j (xN )
12-19
Modern Optimization Methods for Science, Engineering and Technology
12-20
Modern Optimization Methods for Science, Engineering and Technology
Lemma 12.1. Let a, b ∈ n with a∥b and 1 > ϵ > 0 be given. Then there is a δ > 0
such that there exists c ∈ ∂Nϵ(0) with a · ( −c ) > δ a ϵ while b · (c ) = m1 ∥b∥ϵ for a
choice of m that is large.
Proof. For this proof let ∠xy denote the positive angle between x and y; so
∠xy = ∠yx . Let Q be the plane containing both a and b. If a ⊥ b then take
c ∈ ∂Nϵ(0) ∩ Q so ∠a( −c ) < 45° and ∠bc < 90° as close to 90° as necessary to
1 2
obtain cos(∠bc ) = m
for some choice of m that is large. Then δ = 2
will suffice
because
2 (12.31)
a · ( −c ) = cos(∠a( −c )) a c > a ϵ = δ a ϵ,
2
and
1
b · c = cos(∠bc ) b c = b ϵ.
m
On the other hand, if a ⊥ b take z to be the unit vector perpendicular to b in Q with
a · z > 0. So z and a are on the same ‘side’ of b, so to speak. Let θ = min{∠ab , ∠za}.
Then take c to be the vector in ∂Nϵ(0) ∩ Q such that ∠c( −z ) < θ /2 and
1 1 1
b · c = m b c = m b ϵ with m being any value for which m < cos(θ /2). So c is
the vector in Q perpendicular to b, with magnitude ϵ , and whose angle with −z
measures between 0 and θ/2. So −c is within an angle of θ/2 of z and is also within the
same plane as a and b. Since ∠za = ∣90° − ∠ab∣ ≠ 0° then
⎧ θ
⎪ ∠za + if 0° < ∠ab ⩽ 45°
⎪ 2
⎪ θ
∠a( −c ) ⩽ ⎨ ∠za + if 45° < ∠ab < 90°
⎪ 2
⎪ θ
⎪ ∠za − if 90° < ∠ab < 180°
⎩ 2
(12.32)
⎧ 1
⎪ 90° − ∠ab + ∠ab if 0° < ∠ab ⩽ 45°
⎪ 2
⎪ 1
= ⎨ 90° − ∠ab + (90° − ∠ab) if 45° < ∠ab < 90°
⎪ 2
⎪ 1
⎪ ∠ab − 90° − (∠ab − 90°) if 90° < ∠ab < 180°.
⎩ 2
In every case ∠a( −c ) is a fixed angle less than 90°. This means that one may set
cos(∠a( −c )) = :δ > 0 so a · ( −c ) = δ∥a∥ ∥−c∥ = δ∥a∥ϵ . ■
A shifted variant of lemma 12.1 is used in the following proposition.
12-21
Modern Optimization Methods for Science, Engineering and Technology
Proposition 12.6. Consider the LMOP (L) with all mi non-parallel and spanning a
subspace of dimension at least n, and an open feasible set X ⊆ n with n ⩾ 2. Then
S(L ) ∩ X ° consists of isolated points. That is, S(L ) ∩ X ° contains no limit points.
Proof. Let (xn )n∈ ⊆ S(L ) ∩ X ° be some sequence converging to a point x ∈ n . The
goal is to show that x ∉ S(L ) ∩ X °. It may be assumed that x ∈ X ° and x ∈ E(L ) or
else x ∉ S(L ) ∩ X ° automatically. For every xn there is a corresponding Mn such that
for all i and y ∈ X with fi (xn ) > fi (y ),
fi (xn) − fi (y )
⩽ Mn
f j ( y ) − f j ( xn )
for all j with f j (y ) > f j (xn ). Since x ∈ X ° and X ° is open, there exists ϵ > 0 and N
such that for all n > N , the neighborhoods Nϵ /2(xn ) ⊂ Nϵ(x ) ⊆ X °.
The fact that the mi span a subspace of at least dimension n shows that there is no
vector v ∈ n such that v is perpendicular to each mi . Therefore, for each xn it must be
the case that for some jn , f jn (xn ) − f jn (x ) = m jn · (xn − x ) > 0; but since each xn and x
itself are efficient there is some other in for which fin (x ) − fin (xn ) > 0. Since there are
infinitely many xn and only finitely many combinations of in and jn , then by the pigeon
hole principle there must a subsequence where the in match and the jn match. Pass to
that subsequence and fix i and j to be those values so that fi (x ) > fi (xn ) and
f j (x ) < f j (xn ) for all n.
For each n > N , using lemma 12.1 take yn ∈ ∂Nϵ /2(xn ) so that mi · (xn − yn )>
1
δ∥mi ∥ ∥xn − yn ∥ > 0 for some fixed δ and mj · (yn − xn ) = n ∥mj ∥ ∥yn − xn∥. The
freedom of choice of yn and the fact that the mi and mj cannot be scalar multiples of
ϵ
one another ensure that such a choice of yn is possible. Note that ∥xn − yn ∥ = 2 .
Take δ 3ϵ > γ > 0 and n large enough so that ∥x − xn∥ < γ . This means that
fi (x ) > fi (xn ) − γ∥mi ∥ because fi is continuous. Similarly true is f j (x ) > f j (xn )−
δ∥mj ∥. Now observe that fi (x ) > fi (xn ) > fi (yn ) and f j (x ) < f j (xn ) < f j (yn ). Now
consider the trade-off quotient
ϵ
fi (x ) − fi (yn) fi (xn) − γ − fi (yn) δ∥mi ∥ − γ∥mi ∥
⩾ > 2 > 0.
1 ϵ (12.33)
f j (yn) − f j (x ) f j (yn) − f j (xn) + γ ∥mj ∥ + γ∥mj ∥
n 2
Since the choice of γ was arbitrary, and γ → 0 implies n → ∞, then the whole trade-
fi (x ) − fi (yn )
off quotient f j (yn ) − f j (x )
→ ∞ as γ → 0. This means that the trade-off quotient cannot
be bounded and thus the sequence xn does not converge to an SE solution.
This quickly leads to a corollary. ■
Corollary 12.3. Consider the LMOP (L) with all mi non-parallel and spanning a
subspace of dimension at least n, and with a closed feasible set X ⊆ n with n ⩾ 2. If
x ∈ S(L ) is not isolated then x must lie on the boundary of the feasible collection.
12-22
Modern Optimization Methods for Science, Engineering and Technology
One may wonder under what circumstances would an SE solution be found on the
interior. The following shows what an SE point on the interior would entail.
Proposition 12.7. For a problem (L) with X closed and bounded, if there exists an
x ∈ X ° that is SE, then all y ∈ X are SE.
Proof. Let x ∈ X ° be SE and assume that y ∈ X is not SE. First note that y must
be efficient, for if it is not then f (y ) ⩾ f (x ) implies that for all i that mi · y ⩾ mi · x
and for some j that mj · y > mj · x . So for some very small ϵ > 0 it must be that
x + ϵ(x − y ) ∈ X since x is on the interior of X . But that implies that
f (x ) ⩾ f (x + ϵ(x − y )) and in the jth component there is a strict inequality.
That implies x is not efficient which is a contradiction so y must be efficient.
Now if y is not substantially efficient, then there exist sequences
{zk}k∈ ⊆ X
{ik}k∈ , {jk }k∈ ⊆ {1, … , n}
for which
fi n (y ) − fi n (zn)
> n.
f jn (zn) − f jn (y )
As earlier, by the pigeon hole principle a subsequence can be passed to hold in and jn
fixed at i and j .
So given any N > 0 there exists an ϵ > 0 for which x + ϵ(y − zN ) ∈ X . Note
that since the objectives are linear fi (x ) − fi (x + ϵ(y − zN )) = ϵ(fi (y ) − fi (zN )) > 0.
Similarly it must be that f j (x + ϵ(y − zN )) − f j (x ) = ϵ(f j (y ) − f j (zN )) > 0. However,
12-23
Modern Optimization Methods for Science, Engineering and Technology
Example 12.7. Let X = {(x1, x2, x3) ∈ 3: x2, x3 ⩾ 0} be the collection of feasible
solutions. Let
F (x ) = (f1 (x ), f2 (x ), f3 (x )) = (m1 · x , m2 · x , m3 · x ),
m2 = (0, 1, 2),
m3 = (0, 2, −1)
be the multi-valued function to be minimized over X . It is evident that the set
S = {(x1, 0, 0) ∈ 3: x1 ∈ } is within the collection of efficient solutions. In order
to check the substantial efficiency of any given s ∈ S only points x ∈ X where
f3 (x ) < f3 (s ) need to be considered because there is no x ∈ X where f1 (x ) < f1 (s ) or
f2 (x ) < f2 (s ). If x has f3 (x ) < f3 (s ) then f1 (x ) > f1 (s ) = 0 and f2 (x ) > f2 (s ) = 0. So
f1 (s ), f2 (s ), f3 (s ) = 0, x3 > 2x2 and x2, x3 ⩾ 0 give
f3 (s ) − f3 (x ) −2x2 + x3
= ⩽1 (12.35)
f1 (x ) − f1 (s ) x2 + x3
f3 (s ) − f3 (x ) −2x2 + x3
and = ⩽ 1. (12.36)
f2 (x ) − f2 (s ) x2 + 2x3
12.5 Conclusion
To conclude, a brief history of SE solutions was provided and then followed by some
new information regarding substantial efficiency in general and in the context of
linear problems. It was shown that SE solutions do not always exist in linear
problems, even when the feasible space is bounded. Substantial efficiency was also
shown to connect the perpendicularity of the tangent cone to the direction vectors of
objective functions in linear problems. Various restrictive existence criteria were
provided to ensure a substantially efficient point in very specific cases. Finally,
information about the topology of the set of SE solutions is presented. The theme of
the new results was that finding SE solutions is difficult but not impossible. The
insight gained was that SE solutions are worth pursing, but when they cannot be
found, analysts and decision makers need to be aware of the potential for
disproportional trade-offs to occur between some objectives which can have negative
consequences.
Engineers can find uses in SE solutions as they may find the potential anomaly of
an unbound trade-off between any of their objectives disconcerting. The implica-
tions for market analysts are many if their problems can be made into LMOPs.
First, the knowledge that SE solutions are rare and difficult to detect should keep the
analyst forever on guard against market manipulators and could call for perpetual
regulation through things such as taxation, fines, premiums, time constraints on
12-24
Modern Optimization Methods for Science, Engineering and Technology
References
[1] Chaboud A P, Chernenko S V, Howorka E, Iyer R S K, Liu D and Wright J 2004 The high-
frequency effects of US macroeconomic data releases on prices and trading activity in the
global interdealer foreign exchange market International Finance Discussion Papers 823,
Board of Governors of the Federal Reserve System (US) https://EconPapers.repec.org/
RePEc:fip:fedgif:823
[2] CBOE 2014 2014 CBOE market statistics Technical report, CBOE Global Markets
[3] CBOE 2016 2016 CBOE market statistics Technical report, CBOE Global Markets
[4] Brunnermeier M K 2009 Deciphering the liquidity and credit crunch 2007–2008 J. Econ.
Perspect. 23 77–100
[5] Healy P M and Palepu K G 2003 The fall of Enron J. Econ. Perspect. 17 3–26
[6] Wah E and Wellman M P 2013 Latency arbitrage, market fragmentation, and efficiency: a
two-market model Proceedings of the Fourteenth ACM Conference on Electronic Commerce,
EC ‘13 (New York: ACM), pp 855–72
[7] Pareto V 1971 Manual of Political Economy (New York: Kelley)
[8] Koopmans T C 1951 Efficient allocation of resources Econometrica: J. Economet. Soc. 19
455–65
[9] Kuhn H W and Tucker A W 1950 Nonlinear programming Proc. Second Berkeley Symp. on
Mathematical Statistics and Probability pp 481–92
[10] Geoffrion A M 1968 Proper efficiency and the theory of vector maximization J. Math. Anal.
Appl. 22 618–30
12-25
Modern Optimization Methods for Science, Engineering and Technology
[11] Seinfeld J and McBride W 1970 Optimization with multiple performance criteria, application
to minimization of parameter sensitivities in a refinery model Ind. Eng. Chem. Process Des.
Develop. 9 53–7
[12] Belenson S and Kapur K 1973 An algorithm for solving multicriterion linear programming
problems with examples Oper. Res. Q. 24 65–77
[13] Borwein J 1977 Proper efficient points for maximizations with respect to cones SIAM J.
Control Optimization 15 57–63
[14] Benson H and Morin T 1977 The vector maximization problem: proper efficiency and
stability SIAM J. Appl. Math. 32 64–72
[15] Wendell R and Lee D 1977 Efficiency in multiple objective optimization problems Math.
Program. 12 406–14
[16] Benson H 1979 An improved definition of proper efficiency for vector maximization with
respect to cones J. Math. Anal. Appl. 71 232–41
[17] Liu J 1999 ϵ-properly efficient solutions to nondifferentiable multiobjective programming
problems Appl. Math. Lett. 12 109–13
[18] Jiang Y and Deng S 2014 Enhanced efficiency in multi-objective optimization J. Optim.
Theory Appl. 162 577–88
[19] Rockafellar R T and Wets R J B 1998 Variational Analysis (Berlin: Springer)
[20] Chen G, Huang X and Yang X 2005 Vector Optimization, Set-Valued and Variational
Analysis (Berlin: Springer)
[21] Ehrgott M 2005 Multicriteria Optimization 2nd edn (Berlin: Springer)
[22] Miettinen K M 1999 Nonlinear Multiobjective Optimization (Norwell, MA: Kluwer
Academic)
[23] Kaliszewski I 1994 Quantitative Pareto Analysis by Cone Separation Technique (Norwell,
MA: Kluwer Academic)
[24] Khaledian K, Khorram E and Soleimani-damaneh M 2016 Strongly proper efficient
solutions: efficient solutions with bounded trade-offs J. Optim. Theory Appl. 168 864–83
[25] Pourkarimi L and Karimi M 2016 Characterization of substantially and quasi-substantially
efficient solutions in multiobjective optimizations problems Turk J. Math. 41 293–304
12-26
IOP Publishing
Chapter 13
A machine learning approach for engineering
optimization tasks
Arpana Rawal, Mamta Singh and Jyothi Pillai
The evolving world is continuously tuning itself with the famous underlying
Darwinian principle ‘survival of the fittest species’. This naturally existing successful
optimization system continues to inspire philosophers, analysts and practitioners to
address real-world optimization tasks in our daily lives as well. While dealing with
the combinations of possible worlds that satisfy combinations of constraints,
researchers often attempt to build preference relations for possible worlds, and try
to find a best possible world according to the preferences. The preference for
choosing a set of available alternatives is often based on minimizing some negative
performance parameters, say an error, or maximize some favorable parameters, say
accuracy. Both constraints and alternatives act as functioning world parameters in
optimization terminology.
The conventional taxonomy of optimization methods has seen a multitude of
mathematical programming techniques formulated based on either the types of
objectives or constraints [1]. The principles of mathematical programming/optimi-
zation are borrowed from the field of operations research (OR) and are found to be
suitable for analyzing a set of model functioning parameters under a precise set of
constraints so as to minimize the defined objective function of the model.
Optimization tasks are inherently combinatorial in nature and can be described
with a finite set of constraints and more importantly have a cost function attached to
them. The main goal of carrying out optimization tasks is to find the problem’s
solution or the parameters of the problem in order to arrive at a ‘good’ or
‘acceptable’ solution, if not the ‘best’ solution. Some degree of intelligence ought
to be applied to arrive at optimal, good or acceptable solutions, because of the
infeasibility of applying brute-force to determine all possible solutions in problem
search spaces. One of the most significant trends in the realm of artificial intelligence
(AI) during the past fifteen years has been the integration of optimization methods
throughout. Virtually all machine learning (ML) algorithms are reduced to optimal
solution results by mixing models and optimization methods.
13-2
Table 13.1. Successful optimization implementations in machine learning domains.
Machine learning domain Application domain Case study (instance) Optimization technique
Combinatorial methods Network routing problems Traveling salesman problems Continuous approximation
(exact + approximate methods) methods
Deep neural networks Traveling salesman problems Continuous approximation
methods
Constraint satisfaction Traveling salesman problems Heuristic search methods
Multi-attribute utility theory Decision-making problems Product line consolidation and Statistical methods
(MAUT) selection (family of staplers)
Bayesian classifiers Candidate recruitment Heuristic search methods
Statistical ML Classification problems Text classification Continuous approximation
(convex optimization)
methods
Bayesian networks + non-negative fMRI brain imaging Hidden Markov models
matrix factorization (NMF-FE)
13-3
SVM classifiers Classification (imbalanced Mathematical programming
datasets)
Multi-objective learning Multiple-digit classification, Perceptual task learning,
scene understanding gradient descent search
(citiscape identification)
Probabilistic inference (ranking) Classification problems Search cost optimization for Polynomial (linear)
models early prediction of good programming
protein torsion angles for
protein structure
Modern Optimization Methods for Science, Engineering and Technology
reconstruction
Reinforcement learning Planning and control Job-shop scheduling problems Mathematical programming
Deep neural networks Perceptual task learning Speech and image recognition Gradient descent search
Modern Optimization Methods for Science, Engineering and Technology
13-4
Modern Optimization Methods for Science, Engineering and Technology
13-5
Modern Optimization Methods for Science, Engineering and Technology
paradigm shift of data science and analytics. This is due to the fact that the
fundamental natural and human resources still remain finite along with the legal and
ethical bounds on real-world complex problem-solving frontiers. Many decades of
early works saw conventional AI and ML algorithms solve real-world optimization
problems using both OR and ML analysis. These works were the consequence of the
need to arrive at the most suitable option from a family of ML models that could
perform fairly well according to some minimum estimate of the generalization error
based on the given training data. This search typically involves some combination of
data pre-processing, optimization and heuristics; these are still constrained by three
popular sources of error: error distribution introduced by learning bias, noise in
datasets and the difficulty of the search problem that underlies the given modeling
problem. These standard optimization packages suffered from efficiency and
scalability issues, owing to the large-scale ML model designs. This in turn led to
the need to design suitable optimization methods using ML heuristics. This is rightly
described as ‘the interplay of optimization and machine learning’ [6].
Optimization lies at the heart of ML. Most ML problems reduce to optimization
problems. The readers of this book must have sound knowledge of operations
research, mathematical programming and baseline ML algorithms. The chapter
does not intend to deal with the mathematical foundations of optimization types,
statistical methods, meta-heuristics nor does it drill down into optimization
classification hierarchy but attempts to explore the depth to which optimization
has invaded different ML tasks based on assimilated exhaustive works performed by
ML analysts. The aim of this chapter is to explore the increasing degree of coupling
observed between available optimization techniques and ML models with the help
of case studies in diverse domains. The authors seek to examine the cross-boundaries
of the following types of works in the literature. First, how small changes to existing
ML models which use methods such as multi-kernel, ranking, clustering, structured
learning and similar have adapted the optimization tasks into newer dimensions.
The second category of works explores how extended sets of optimization methods
have enhanced the scalability and efficiency of ML models, which have also
promoted the use of newer exploratory domains. We have already introduced
chronologically the classification hierarchy of optimization methods as applied in
diversified domains of ML tasks. The rest of the chapter is organized as follows.
Section 13.3 deals with popular optimization methods used for ML in supervised
learning. Section 13.4 explains some case studies revealing how optimal solutions
can be used to address the most challenging task of feature selection in ML.
13-6
Modern Optimization Methods for Science, Engineering and Technology
there is no global optimal method for classification. The classification results are also
altered by the noise in the data, that is, redundant records, incorrect records, missing
records, outliers and so forth. All these problems affect the accuracy of a classifier.
13-7
Modern Optimization Methods for Science, Engineering and Technology
probabilities. Even if the Bayesian learning algorithm does not explicitly manipulate
probabilities, it can help in identifying the model characteristics under which the
learning behaves optimally [8]. Such Bayesian optimality can be observed in the
learning of problems with continuous-valued domains, where Bayesian analysis is
used for minimizing the squared error between the output hypothesis predictions
exhibiting maximum likelihood estimates. The reader is referred to the basic
concepts of probability densities over normal distributions for understanding
Bayesian classifiers in [8].
Table 13.2 provides the binomial distribution of 11 patient instances in age group 1
and 17 patient instances under age group 2, whose weights need to be computed
using Bayesian optimization over the maximum likelihood hypothesis. The like-
lihood estimate computations, P(di∣ h, xi) can be seen in table 13.3 for a set of
different iterating probabilities, h(.). The maximum of all h(.) values is declared as
the MLE and is assigned as the weight for learning the fuzzy weighted associative
classifier.
13-8
Modern Optimization Methods for Science, Engineering and Technology
1 I 0 4 II 0
2 I 0 5 II 0
3 I 0 6 II 0
4 I 0 7 II 0
5 I 0 8 II 0
6 I 0 9 II 0
7 I 0 10 II 1
8 I 0 11 II 1
9 I 1 12 II 1
10 I 1 13 II 1
11 I 1 14 II 0
1 II 0 15 II 1
2 II 0 16 II 1
3 II 0 17 II 1
Prior Training
probability sample h(.) (iterating hML
Patient_age_group_type (h(xi)) (1 − h( xi)) size (m) probabilities) (MLE)
13-9
Table 13.4. Bayesian optimization over the maximum likelihood hypothesis. Adapted as illustration from [10] © 2018 IOP Publishing. Reproduced with permission.
All rights reserved.
Indicator 5 4 3 2 1
Work quality Always exceeds the set Sometimes exceeds the set Always meets the set Sometimes does not meet Always achieves less than
standards. standards. standards. the set standards. the set standards.
Work quantity The quantity, volume, The quantity, volume, The quantity, volume, The quantity, volume, The quantity, volume,
frequency or speed of frequency or speed of frequency or speed of frequency or speed of frequency or speed of
completion of work completion of work completion of work completion of work is completion of work is
always exceeds the set sometimes exceeds the always meets the set sometimes insufficient always less than the set
standards. set standards. standards. to meet the set standards.
standards.
Knowledge and Very good at mastering Good at mastering and Masters and understands A lack of mastering and Very lacking in mastering
work skills and understanding the understanding the the knowledge and understanding the and understanding the
knowledge and skills for knowledge and skills for skills for the range of knowledge and skills for knowledge and skills for
the range of duties and the range of duties and duties and the range of duties and the range of duties and
13-10
responsibilities. responsibilities. responsibilities. responsibilities. responsibilities.
Initiative Very fast, precise and Very fast, precise and Very fast, precise and Slow in acting to carry out Slow in acting to carry out
correct in acting to correct in acting to correct in acting to and complete duties, and complete duties,
carry out and complete carry out and complete carry out and complete and always waits for and always waits for
duties without waiting duties, and sometimes duties, and sometimes detailed instructions. very detailed
for orders and waits for instructions waits for relatively instructions.
instructions. that are general in detailed instructions.
nature.
Cooperation Active in helping and Active in helping and Needs to be reminded to Not enthusiastic in helping Provides no help or
supporting all co- supporting co-workers help and support co- or supporting support for colleagues
Modern Optimization Methods for Science, Engineering and Technology
workers, gives very in certain sections/ workers, needs to pay colleagues despite being and/or always becomes
serious attention to the groups only, gives serious attention to the reminded/reprimanded, an obstacle in achieving
achievement of group/ serious attention to the achievement of group/ little attention to the group/company goals.
company goals. achievement of group/ company goals. achievement of group/
company goals. company goals.
Reliability Always willing to be Always willing to be Always willing to be Always willing to be Always unwilling to be
assigned outside duties assigned outside duties assigned outside duties assigned outside duties assigned outside duties
and responsibilities and and responsibilities, and responsibilities, and responsibilities, and responsibilities,
results are as expected. results are relatively results are less than results are rarely as results are rarely as
close to what is expected. expected. expected.
expected.
Learning ability A great passion to learn A good spirit to learn for Sufficient interest to learn Less enthusiasm for Little interest in learning
and for self-development self-development and for self-development learning for self- for self-development
willingness and improvement of improvement of work and improvement of improvement and and improvement of
work ability, earnestly ability, seriously wants work ability, but needs improvement in work work ability, although
applies what has been to apply what has been to request support from ability, and/or needs to has been reminded.
learned in duties. learned in duties. superiors to apply what be controlled /reminded
has been learned in to always apply what
duties. has been learned in
duties.
Attendance Never late, never leaves Performance appraisal is Performance appraisal is Performance appraisal is Performance appraisal is
13-11
early, no work lost. affected by early logout affected by 3–5 late affected by late logins affected by habitual late
and/or 1 day lost work. arrivals and early and/or 4–5 days lost logins and/or more than
departures, and/or 2–3 work. 5 days lost work.
days lost work.
Planning and Excellent ability in setting Good ability in setting Fairly good ability in Less able to set task Not able to assign task
organizing task priorities and task priorities and setting task priorities priorities and less priorities and manage
placing and managing placing and managing and placing and effective in placing and existing resources.
existing resources. existing resources. managing existing managing existing
resources. resources.
Controlling Good at monitoring and Good at monitoring and Sometimes needs guidance Always needs guidance to Always needs a reminder
Modern Optimization Methods for Science, Engineering and Technology
controlling all resources controlling all resources to monitor and control monitor and control all to simply monitor and
very optimally. optimally enough. all resources, but less resources, but less than control resources.
than optimally. optimally.
(Continued)
Table 13.4. (Continued )
Indicator 5 4 3 2 1
Decision Makes a decision on a Makes a decision on a Capable of making a Sometimes needs a great Always slow in making
making problem very quickly problem quickly and decision on a problem deal of support and decisions, even if they
and accurately, and the accurately, and the quickly and accurately, direction to make have been given
results can be justified. results can be justified. although often needs decisions quickly and guidance, and their
support and direction. accurately, and/or directives or decisions
decisions are sometimes are not justifiable.
less justified.
Development of Very attentive in the Attentive in the Sufficient attention on the Lack of attention on the No attention on the
subordinates improvement and improvement and improvement and improvement and improvement and
development of development of development of development of development of
subordinates and/or has subordinates, but lacks subordinates, but does subordinates, and does subordinates, and does
a planned program for a planned program for not have a planned not have a planned not have a planned
staff. staff. program for staff. program for staff. program for staff.
13-12
Modern Optimization Methods for Science, Engineering and Technology
Modern Optimization Methods for Science, Engineering and Technology
Table 13.5. The ranked criteria for candidate assessment. Adapted as illustration from [10] © 2018 IOP
Publishing. Reproduced with permission. All rights reserved.
Criterion 1 Has a good ability to understand, a high morale and a satisfactory quality of
work.
Criterion 2 The results obtained are very effective with a near perfect level of accuracy.
Criterion 3 Self-organization and subordinates are optimal, and is responsible for all the
work done.
Criterion 4 Has a high level of creativity and initiative in working with the task of
applying the knowledge mastered.
Criterion 5 Has the courage to make decisions and dares to be accountable for all the
risks that may occur.
These indicators are: (a) work quality, (b) work quantity, (c) knowledge and work
skills, (d) initiative, (e) cooperation, (f) reliability, (g) learning ability and willing-
ness, (h) attendance, (i) planning and organizing, (j) controlling, (k) decision making
and (l) development of subordinates, each expressed with five-point rated descriptive
options that can closely measure the participating candidate. The indicators and
their rating scale of responses is pre-defined by the HR department of company
X after analyzing several factors that are sure to influence the decision-making
process, including the past experience of the candidate, cognitive differences, age
and individual differences, belief in personal relevance and lastly their escalation of
commitment. Another constraint imposed by the HR department is the fulfillment
of the criteria in selecting candidates according to the needs of the position.
The criteria are defined in ascending order of preference for recruiting candidates
for top managerial positions of the company, as can be seen in table 13.5, as a
consequence of previous review; this study is internal and has been recognized by the
management of the company.
The understanding of Bayes’ optimal classification in the case study refers to the
fulfillment of the recommendation criteria by any contesting candidate (Kj) for
whom the probability P(Kj∣D) is maximum. This most probable classification of
instance ki can be computed from an expression of Bayesian optimal classification:
argmax
vj ∈ V ∑hij ∈H
P(vj ∣hi )P(hi ∣D ). (13.5)
It should be noted that in the case study experiments the set of five criteria is
defined as a set of five hypotheses hi in the whole hypothesis space H. While the
candidates were being personally judged, the managers felt that some of the
decision-making indicators were relatively more important and occupied the top
priority in lieu of the fulfillment of the specific criteria that were defined in table 13.5.
Hence, distinct sets of preferred indicators needed to be screened while interviewing
candidates under distinct criteria, in order to decide whom to recommend for the
related vacant positions. This causes the formulation of a criteria-specific feature-
13-13
Modern Optimization Methods for Science, Engineering and Technology
vector mapping table that could be drawn from the response value patterns that were
captured by the managers during the candidates’ interview procedures, as can be
seen in table 13.6.
Eventually, each of the candidates were assessed with above average values of
responses in a set of not more than 6–7 indicators, as listed in table 13.7. The rest of
the responses are assumed to be ‘0’ or ‘—’ (do not care) for obtaining the resulting
table of naïve Bayesian probability computations for the contesting candidates
under each criterion of assessment.
The posterior probabilities of each candidate being recommended into each
criterion needs the computation of separate hypothesis combinations, governed by
the distinct set of feature vectors (acting as evidential support) mentioned above. For
example, the posterior probability of candidate Kj, P(Kj ∣ D) can be computed by the
expression that follows from equation (13.5) over all five hypothesis (criterion)
combinations.
Each P(Kj ∣ hi) is the conditional probability of candidate Kj being recommended
in criterion (hypothesis combination) hi in our case study; P(hi ∣ D) is the frequency
of occurrence of the relative sample of criterion hi assessed over each of the
12 indicators. The latter component can be visualized as the consequence of training
Criterion a b c d e f g h i j k l
h1 X X X X X X
h2 X X X X X X X
h3 X X X X X X X X
h4 X X X X X X X X X
h5 X X X X X X X X
K1 1 4 3 3 — — — 5 5 3 — — — 0.7138
K2 4 4 — 5 5 — — 3 5 — — 3 3 0.7188
K3 2 5 5 — — — 4 — 4 3 — 3 — 0.7119
K4 3 3 — — 5 — — — 5 5 — 4 5 0.6853
K5 5 — 3 — 5 3 — 3 4 — 3 5 — 0.7875
K6 4 3 — 5 5 3 — 5 5 — 3 — — 0.7478
K7 1 4 4 3 — — — 4 5 3 — — — 0.7069
K8 3 3 — 3 4 5 — — 4 5 — 4 5 0.816
K9 5 — 3 — 4 — 3 3 4 — — 5 — 0.6958
K10 2 5 4 3 — — 4 — 4 — — 3 — 0.678
13-14
Modern Optimization Methods for Science, Engineering and Technology
the recruitment domain with past employee promotion histories and is available in
table 13.8;
Ph5(K5 ∣ D ) = ∑P(K5 ∣ h5). P(h5 ∣ D) (13.6)
It can be observed that the candidates were assessed under five different criteria
using the relevant set of decision-making indicators, and their posterior probabilities
of falling in those criteria were computed using the conditional probabilities of
table 13.8 relevant to those criteria. These observations are listed in table 13.9 along
with the recommended criterion fulfilled by each candidate Kj.
The candidate K5, who is ranked one under criterion 5, was recommended for the
top managerial position for decision making in the company’s management, while
candidate K6, who satisfies criterion 4 with maximum marginal probability (0.7478)
Table 13.8. Naïve Bayesian conditional probability distribution over 12 indicators, five criteria. Adapted as
illustration from [10] © 2018 IOP Publishing. Reproduced with permission. All rights reserved.
a b c d e f g h i j k l
1 1/5 9/58 8/58 7/58 1/58 1/58 1/58 10/58 11/58 7/58 1/58 1/58 1/58
2 1/5 11/59 10/59 4/59 1/59 1/59 9/59 1/59 9/59 4/59 1/59 7/59 1/59
3 1/5 7/75 1/75 4/75 8/75 11/75 1/75 1/75 10/75 11/75 1/75 9/75 11/75
4 1/5 8/69 1/69 11/69 11/69 4/69 1/69 9/69 11/69 1/69 4/69 4/69 4/69
5 1/5 7/48 1/48 1/48 10/48 4/48 4/348 7/48 9/48 1/48 4/48 11/48 1/48
Table 13.9. Naïve Bayesian posterior probability computations for candidates K1–K11.
13-15
Modern Optimization Methods for Science, Engineering and Technology
was recommended as the assistant of the elected head of office. The candidates K8,
K3 and K1 satisfying criteria 3, 2 and 1 with maximum a posteriori probabilities
0.816, 0.712 and 0.714, respectively, will be placed around the factory with the
responsibility of holding machines and necessary job-scheduling tasks. The reader is
referred to [10] for the details of the other objectives of the mentioned case study.
13-16
Modern Optimization Methods for Science, Engineering and Technology
where each jth component falls either as an element of binary class 1 or class 0 for
any mth solution instance xm. Hence the two partitions constructed out of Ij(t) can be
defined by the two sequences:
I 1j (.) = {(x k
j , ) ( )
f (x k ) : x jk , f (x k ) ∈ I j (t ), x jk = 1}, (13.12)
I j0(.) = {(x k
j , ) ( )
f (x k ) : x jk , f (x k ) ∈ I j (t ), x jk = 0}. (13.13)
In other words, in order to build a prediction model, we need to map n such vectors,
I j (t ), j ∈ 1, n , into the interval [0, 1], estimating the larger of the two conditional
probabilities P (j ∈ C1 ∣ I j (t )) and P (j ∈ C 0 ∣ I j (t )), and classifying the class label
with that class, whose probability value is greater. Now, this binary classification
problem can be solved with a variety of statistical methods. The case study aims to
classify such a problem using the logistic regression method, a widely used statistical
technique and, hence, often included as a combinatorial optimization method,
whose sigmoidal functional representation is defined as
1
h θ (X ) = . (13.14)
1 + e−(θ0+θX )
Usually, the maximum likelihood estimate (MLE) metric Θ is used to fit the
parameters of the above-mentioned dataset {xk, f(xk)} and can be obtained as
1
hθ(t )(I j (t )) = .
⎡ ⎤ (13.15)
1 + exp ⎢∑ θ (t )f (xk ) − ∑(0, f (x ))∈I (t ) θ (t )f (xk )⎥
⎣ (1, f (xk ))∈I 1j (t ) k
0
j ⎦
D 1j (i ) = min ({f (x ) k
j (x jk , )
f (x k ) ∈ I j (t ), x jk = 1 }). (13.18)
13-17
Modern Optimization Methods for Science, Engineering and Technology
D1(t) –
D0j (i ) D1j (i ) D0(t) Ο(.) Θ * Δ hΘ=0.5 Θ * Δ hΘ=0.2 Θ * Δ hΘ=0.15 Θ * Δ hΘ=0.1 Θ * Δ hΘ=0.05
1395 1366 29 0 14.5 0.00 5.80 0.00 4.35 0.01 2.9 0.05 1.45 0.19
1368 1400 −32 1 −16 1.00 −6.40 1.00 −4.8 0.99 −3.2 0.96 −1.6 0.83
1366 1438 −72 1 −36 1.00 −14.40 1.00 −10.8 1.00 −7.2 1.00 −3.6 0.97
1373 1366 7 0 3.5 0.03 1.40 0.20 1.05 0.26 0.7 0.33 0.35 0.41
1379 1365 14 1 7 0.00 2.80 0.06 2.1 0.11 1.4 0.20 0.7 0.33
1365 1389 −24 1 −12 1.00 −4.80 0.99 −3.6 0.97 −2.4 0.92 −1.2 0.77
computed for various probabilities P(.) = {0.5, 0.2, 0.15, 0.1, 0.05} and the model is
found to fit the best for the predicted probability (0.5), as can be seen in the calculations
performed in the same table.
13-18
Modern Optimization Methods for Science, Engineering and Technology
instance space and associate them with classification labels 1 ... C (whether
Boolean, multiple valued or continuous) according to the target function, c. A
feature xj can be considered as relevant to target concept c, if there exists a
pair of samples s and s′ (s, s′ ∈ S) such that c(s) ≠ c(s′) and xj(s) ≠ xj(s′) only
apply in their feature vectors.
• Extending the concept of relevance with respect to the distribution of samples
in instance space, a feature xj can be considered as strongly relevant to sample
set S, if there exists a pair of samples s and s′ (s, s′ ∈ S) such that c(s) ≠ c(s′)
and xj(s) ≠ xj(s′) only applies in their feature vectors. In other words, a feature
xj is said to be weakly relevant to sample S, where S = f(c, D), and if it is
possible to remove half a subset of features so that xj becomes strongly
relevant. Several other works have also concluded that a feature can be
regarded as irrelevant, if it is conditionally independent of class labels which
means that the feature having no influence on class labels can be discarded
[19–22]. Even if the feature is independent of the input data, it cannot be
independent of the class label for constructing a suitable learning model.
Discussing the vast spectrum of feature selection algorithms for dealing with
datasets that contain large numbers of irrelevant attributes, these have been
characterized as either ‘wrapper’, ‘filter’ or ‘embedded’ approaches, based on the
relation between the selection scheme and the basic induction algorithm. Irrespective
of any of the methods adopted, a convenient paradigm for viewing many of these
algorithms is to find a sub-optimal procedure to compute a subset of possible
features out of a heuristic search, with each state in the search space specifying a
subset of the possible features. Instead of directly evaluating all subsets of (2N)
features in N-dimensional instance space, we adopt one of the following approaches
to implement our heuristics:
1. One might begin with nothing or a finite set of attributes representing a
starting point in the search space and successively add attributes; this
approach is called forward selection.
2. One might start with all attributes and successively remove them on the basis
of some heuristics; this approach is known as backward elimination.
This is nicely explained by Blum and Langley through a step-wise constructed state-
transition diagram of partially ordered feature search space [12].
Filter methods use variable ranking techniques as the principle criteria for variable
selection by ordering; hence, they are used for the pre-processing step, wherein highly
ranked features are selected above a threshold using a suitable ranking criterion. They
can be applied before classification to filter out the less relevant variables. Two of the
well-known filter methods are RELIEF and its variants [13] and FOCUS [14]. While
RELIEF computations provide the relevance weighting of each feature with reference
to its class label; the FOCUS algorithm iterates on all feature subsets to arrive at a
minimal set of features that can provide consistent labeling of training data. The
reader is advised to go through these methods before working on optimal feature-
extraction methods, i.e. see [13, 14].
13-19
Modern Optimization Methods for Science, Engineering and Technology
The major difference between the two filter methods is that RELIEF computa-
tions need an extra heuristic to compute a threshold on selecting the final feature
subset. Although effective in removing redundant features, all combinations of
highly correlated features in the given feature vector will obtain high relevance
weightings in RELIEF, while the FOCUS algorithm is very sensitive to noise or
inconsistencies residing in the training dataset. Another less popular category of
wrapper methods performs a search through the space of feature subsets using the
estimated accuracy obtained after ‘wrapping around’ the particular selection from
that induction (learning) model. These methods are computationally expensive.
Moreover, both feature-extraction methods lose their practicality with an exponen-
tial rise in the dimensionality of the total feature vector given at the input.
13-20
Modern Optimization Methods for Science, Engineering and Technology
4 ⎛ xi ⎞
∑i=1p(unfit). p⎜⎝ ⎟
unfit ⎠
P(unfit ∣ {x1, x2 , x3, x4}) = . (13.20)
4 ⎛ xi ⎞ 4 ⎛ xi ⎞
∑i=1p(fit). p⎜⎝ ⎟ + ∑i=1p(unfit). p⎜⎝ ⎟
fit ⎠ unfit ⎠
These NB a posteriori computations on class labels were further used to arrive at
the relative relevance of attributes contributable to success/failure grades in the
second phase. Here, the individual portions of the summative numerator compo-
nents of expressions (4.2) and (4.3) that pertain to the single attribute, reflected the
posterior effect of that attribute in evaluating relative attribute fitness/unfitness. The
comparisons among such individual portions helped in ranking the attributes of the
experimental feature vector. In this way, attribute precedence relations were obtained
by revisiting these individual numerator components, i.e. the average fitness
(average_fit(xi, tj)) and average unfitness (average_unfit(xi, tj)) of the students owing
to the degree of involvement from each attribute.
Kira and Rendell [13] designed the RELIEF algorithm that assigns a relevant
weight to each attribute of the feature vector by computing the difference between
the selected test instance with reference to the nearest-hit and the nearest-miss
training instances. Assuming the training instances are denoted by a p-dimensional
feature vector X, the RELIEF algorithm makes use of a p-dimensional Euclidean
distance to select ‘near-hit’ and ‘near-miss’ instances from the training dataset. If the
test instance xj is predicted as a positive instance then the near-hit instance (xj) is
assigned as Z+ and the near-miss instance (xj) is assigned a Z-value. The reverse
happens if the test instance is predicted as a negative instance. In such a situation,
the nearest negative training neighbor is assigned Z+ and the nearest positive
training neighbor, possessing the opposite class value, is assigned a Z-component.
These components are computed as part of the preparation for computing weight
updates, as described in expression (13.21). This weight update operation was
performed on each of the participating attributes in the experimental feature vector.
These updated attribute weights act as rank values of the attributes when sorted in
increasing order of relevance. The author also appreciates the nearest-neighbor
approach to finding the ‘nearest-hit’ and ‘nearest-miss’ training instances to compute
the weight updates as defined above:
Sun and Wu [15], during their in-depth study of feature selection methods, proved
that RELIEF is the most successful algorithm that solves a convex optimization
problem with a margin based objective function [15]. As it was observed that the
RELIEF model could not filter out redundant attributes as well as weakly relevant
ones, this case study provided variant logistics to the approach. Attribute precedence
relations were thus introduced as an innovative mining metric for personalized
counseling of students and the performance of such a hybrid feature-extraction-cum-
ranking model is evaluated by extending the experiments with the RELIEF method
13-21
Modern Optimization Methods for Science, Engineering and Technology
13-22
Modern Optimization Methods for Science, Engineering and Technology
The above processing steps on feature-vector ranking were still deviated further
by computing an optimal set of fitness precedence relations from both the
precedence relation sets obtained due to average probabilities of fitness and unfit-
ness. Initially the attribute precedence relation of unfitness is converted into
equivalent attribute precedence relations of fitness by simply reversing the increasing
order of attributes so that the least unfit attributes becomes the most fit in equivalent
fitness precedence and vice versa. These two sets of fitness precedence relations were
compared in order to identify a consistent position j, defined either as exact
corresponding position j occupied by attribute xi in both relations or, at the most,
occupying either of the adjoining position combinations such as ( j, j + 1) or ( j − 1, j).
In this way, the attributes find their final positions of optimized fitness precedence
owing to the heuristics applied over their consistent valid and conflicting positions.
Having observed increased model accuracies in the latter set of experiments, it
could be concluded that the feature-vector ranking becomes more robust for the
Bayesian driven hybrid FE approach, if optimized attribute precedence relations of
fitness are used instead of conventional attribute precedence relations of fitness
(table 13.12).
13-23
Modern Optimization Methods for Science, Engineering and Technology
the average correlation between classifiers and s is the overall strength of the
classifiers. Thus, the ensemble pruning dealt with in the case study initially
formulates the ensemble error function by computing strength and diversity
measurements for a classification ensemble followed by minimizing this approximate
ensemble error function through a quadratic integer programming formulation.
At first, this ensemble pruning method was tested on twenty-four UCI repository
datasets with Adaboost as the ensemble generation technique and was reported to
perform favorably over two other metric-based ensemble pruning algorithms:
diversity-based pruning and Kappa pruning, picked as the benchmarks. This was
followed by testing the same subset selection procedure in the form of a cross-
domain classifier-sharing strategy on a publicly available marketing dataset—here it
was a catalog marketing dataset from the direct marketing association (DMEF
academic dataset three, specialty catalog company, Code 03DMEF)—to select a
13-24
Modern Optimization Methods for Science, Engineering and Technology
good subset of classifiers from the entire ensemble for each problem domain. The
essence of sharing classifiers is sharing common knowledge among different but
closely related problem domains. In that study, classifiers trained from different but
closely related problem domains are pooled together and then a subset of them is
selected and assigned to each problem domain. The computational results show that
the selected subset performs as well as, and sometimes better than, including all
elements of the ensemble. The reader is referred to Zhang et al [17] for the details of
the methodology in both benchmarking and empirical experiments.
where xi is the ith variable, Y is the output (class labels), cov() is the covariance and
var() is the variance. Correlation ranking can only detect linear dependencies
between the variable and target. Another feature selection metric measures the
dependence of two variables and is based on information entropy of class label c,
which is defined as
H (Y ) = −∑ p(y )log(p(y )), (13.23)
y
while the conditional entropy of class label c, given attribute xj, can be defined as
H ( Y ∣X ) = − ∑ ∑y p(x, y )log(p(y∣x)), (13.24)
x
13-25
Modern Optimization Methods for Science, Engineering and Technology
⎧1, classify(s ) = sc
ass(s ) = ⎨ , (13.27)
⎩ 0, otherwise
Definition Another metric of feature discrimination is the F-score. The larger the F-
score is, the more this feature is discriminative. Given training vectors Xk if the
number of the jth dataset is nj, then the F-score of the ith feature is defined as
m _
∑ j=1( xi,j − xi )2
F (si ) = m nj 2
, (13.28)
∑ (1/(nj + 1))∑ xik,j − xi,j
j =1
(
k =1
)
_
where xi , xi,j are the average of the ith feature of the whole dataset and the jth
dataset, respectively; xik,j is the ith feature of the kth instance in the jth dataset and m
is the number of datasets, k = 1, 2, …, m and j = 1, 2, …l.
References
[1] Rao S S 2009 Engineering Optimization: Theory and Practice (New York: Wiley)
[2] Trevisan L 2011 Combinatorial optimization: exact and approximate algorithms CS261:
Optimization and Algorithmic Paradigms Lecture Notes (Stanford, CA: Stanford University)
13-26
Modern Optimization Methods for Science, Engineering and Technology
[3] Collette Y and Siarry P Multi-Objective Optimization: Principles and Case Studies (New
York: Springer)
[4] Thomas A, Sharma H R and Sharma S 2014 A Tool Design of Subjective Question Answering
Using Text Mining (Saarbrücken: Lambert Academic)
[5] Fourment L, Duclocix R, Marie S, Ejday M, Monnereau D, Masse T and Montmitonnet P
2010 Mono and multiobjective optimization techniques applied to a large range of industrial
test cases using metamodel assisted evolutionary algorithms AIP Int. Conf. Proc. 1252 833–40
[6] Bennet K P and P-Hernandz E 2006 The interplay of optimization and machine learning
research J. Machine Learn. Res. 7 1265–81
[7] Sener O and Koltum V 2018 Multi-task learning as multi objective optimization Proc. of the
32nd Conf. on Neural Information Processing Systems (NeurIPS) (Montreal, Canada) pp
1–15
[8] Niculescu R S, Mitchell T M and Rao R B 2006 Bayesian network learning with parameter
constraints J. Machine Learn. Res. 7 1357–83
[9] Soni S and Vyas O P 2011 Performance evaluation of weighted associative classifier in health
care data mining and building fuzzy weighted associative classifier International Conference
on Parallel Distributed Computing Technologies and Applications Communications in
Computer and Information Science vol 203 (Berlin: Springer), pp 224–37
[10] Kadar J A, Agustono D and Napitupala D 2018 Optimization of candidate selection using
naïve Bayes: case study in company X J. Phys: Conf. Ser. 954 012028
[11] Shylo O and Shams H 2018 Boosting Binary Optimization Via Binary Classification: A Case
Study of Job Shop Scheduling arXiv: 1808.10813v1
[12] Avrim L and Langley P 1997 Selection of Relevant Features and Examples in Machine
Learning (Amsterdam: Elsevier), pp 245–71
[13] Kira K and Rendell L 1992 The feature selection problem: traditional methods and a new
algorithm 10th National Conference on Artificial Intelligence (San Francisco: Morgan
Kaufmann), pp 128–34
[14] Almuallim H and Dietterich T G 1991 Learning with many irrelevant features 9th National
Conference on Artificial Intelligence (Cambridge, MA: MIT Press), pp 547–52
[15] Sun Y and Wu D 2008 A RELIEF based feature extraction algorithm Conf. Proc. of the
SIAM Int. Conf. on Data Mining (Atlanta, GA) pp 188–95
[16] Singh M 2017 Prediction of academic performance of students using machine learning
techniques Doctoral thesis Dr C V Raman University, Kota Bilaspur, Chhattisgarh, India
[17] Zhang Y, Burer S and Street W N 2006 Ensemble pruning via semi-definite programming
J. Mach Learn. Res. 7 1315–38
[18] Wu S, Hu Y, Wang W, Feng X and Shu W 2013 Application of global optimization methods
for feature selection and machine learning Math. Prob. Eng. 2013 241517
[19] Chandrashekar G and Sahin F 2014 A survey on feature selection methods Comput. Electr.
Eng. 40 16–28
[20] Dietterich T G 2000 Ensemble methods in machine learning ed J Kittler and F Roli MCS
2000, LNCS 1857 (Berlin: Springer), pp 1–15
[21] Koller D and Sahani M 1996 Toward optimal feature selection Conf. Proc. of Machine
Learning pp 1–14
[22] Guyan I 2003 An Introduction to variable and feature selection J. Mach. Learn. Res. 3
1157–82
13-27
IOP Publishing
Chapter 14
Simulation of the formation process of spatial
fine structures in environmental safety
management systems and optimization of the
parameters of dispersive devices
Sergij Vambol, Viola Vambol, Nadeem Ahmad Khan, Kostiantyn Tkachuk,
Oksana Tverda and Sirajuddin Ahmed
The topical scientific applied issue of the creation of control systems for ecological
safety through the use of dispersive devices is considered. For suppression of the
formation processes of toxic substances and limitation of their distribution in the
atmosphere during extraction, processing and transportation of bulk materials
(which produce dust) and during fire suppression and thermal waste treatment, an
analysis of the systems that use spatial fine structures is offered. The results of
numerical simulation of these ecologically hazardous technological processes are
described. The physical model of controlling such processes is based on injecting a
cooling liquid using dispersive devices. Mathematical models of the gas and
dispersed phases of spatial fine structures are developed. The mathematical
formulation of the conservation laws for viscous gas (steam) is achieved through
the Navier–Stokes equations; for drops, it is given as an equation of the balance of
forces that affect the drop and equalize the inertia force and the resultant forces
of gravity and aerodynamic resistance. For dispersion of the fluid, irrigation systems
of the nozzle type, atomizers and centrifugal atomizer have been suggested. The
dependence of the ability to create effective fine structures on the characteristics of
the technical devices in the context of natural and man-made hazards of different
origins is presented. Using the numerical simulation of the formation processes of
spatial fine structures, the most efficient modes of supplying liquid to various
hazardous factors are defined.
14-2
Modern Optimization Methods for Science, Engineering and Technology
time, such emergency situations are characterized by the entry into the atmosphere
of a significant amount of carbon monoxide, carbon dioxide, soot, etc, but no special
measures are taken to localize (prevent the spread of) these substances [16–18].
The use of thermal methods for the disposal of solid waste products improves the
environment by reducing the number of landfills. However, the process of thermal
utilization itself is accompanied by harmful emissions into the environment [19, 20].
This process can be ecologically effective in the case of preventing the formation of
highly toxic substances (such as dioxins and furans) at the stage of thermo-chemical
treatment of waste. The study of the mechanism of the formation of dioxins during
heat treatment of waste was given much attention in [21, 22], and in [23] the authors
proposed a purification system, scientifically substantiated its effectiveness and
confirmed this experimentally. It should be emphasized that it is difficult to
implement such a system for economic reasons.
The most modern and promising methods of waste disposal are based on the use
of high-temperature treatment [19, 24]. This is due to the fact that under conditions
of high temperature (1200 °C and above), dioxins and other highly toxic substances
decompose into simple fragments [20]. However, the mechanisms of dioxin re-
formation have also been investigated. Re-formation of dioxins during cooling of
the high-temperature multicomponent gas is observed if the gas temperature is from
450 °С to 300 °С. The process of their formation is affected by the presence of
chlorine and oxygen as well as the rate of gas cooling [20]. This fact gives us reason
to believe that the waste recycling process can be more environmentally friendly.
14-3
Modern Optimization Methods for Science, Engineering and Technology
14-4
Modern Optimization Methods for Science, Engineering and Technology
The object of this research is finely dispersed multiphase structures. The subject of
the research is the dependence of the parameters of ecologically effective finely
dispersed water structures on the technical features of the devices used for their
creation.
The interaction of the phases was taken into account as a ‘drop—source in a cell’
model. In accordance with this model, the presence of particles in the flow manifests
itself through an additional source of momentum in Reynolds-averaged Navier–
Stokes equations, which are closed by a semi-empirical model of k-ε–type
turbulence.
Thus, to fully describe the creation process of spatial fine-dispersed structures, it is
necessary to consider models of the gas phase, the dispersed phase and the transition
process—the model of interfacial interaction.
14-5
Modern Optimization Methods for Science, Engineering and Technology
For the closure of the system of equations averaged according to Reynolds (14.1),
(14.2), the Lauder–Spalding turbulence k-ε–model was used [28]. The transport
equations for the kinetic energy of turbulence k and its dissipation rate ε are
∂ ∂ ⎡⎛ μ ⎞ ∂k ⎤
(ρkui ) = ⎢⎜μ + t ⎟ ⎥ + G − ρε , (14.3)
∂xi ∂xj ⎣⎝ σ k ⎠ ∂xj ⎦
∂ ∂ ⎡⎛ μ ⎞ ∂ε ⎤ ε ε2
(ρεui ) = ⎢⎜μ + t ⎟ ⎥ + C1ε G − C2ερ , (14.4)
∂xi ∂xj ⎣⎝ σε ⎠ ∂xj ⎦ k k
where ρ is density, k is the kinetic energy of turbulence, ui is the the projection of the
averaged gas velocity on the axis of a three-dimensional rectangular Cartesian
coordinate system, xj are the coordinates of a three-dimensional rectangular
Cartesian coordinate system, ε is the turbulence kinetic energy dissipation rate
and σk , σε , C1ε , C2ε are empirical coefficients. G is the term characterizing the
emergence of the kinetic energy of gustiness due to shear stresses and is defined by
the formula
∂uj
G = −ρu i′u j′ .
∂xi
Turbulent viscosity is determined by the Kolmogorov–Prandtl formula [29]
k2
μt = Cμρ , (14.5)
ε
where Cμ is the empirical coefficient.
To determine Sf in equation (14.2) an interfacial interaction model was used.
14-6
Modern Optimization Methods for Science, Engineering and Technology
ρC μ1/4k P1/2yP
y* = , (14.8)
μ
where UP is the average velocity of gas at a point Р, kP is the kinetic energy of
gustiness at point Р, τw is friction tension on the wall, ρ is gas density, yP is the
distance between point P and the wall, and μ is dynamic viscosity.
Next, it is necessary to solve equation (14.4) in the near-wall cells and the entire
computational domain. At the same time the boundary condition on the wall was set
for k, which is written as
∂k
= 0, (14.9)
∂n
where n is the local coordinate normal to the wall.
The turbulence kinetic energy generation G and its dissipation rate ε, which are
included in the source term of equation (14.4), are calculated in the near-wall cells
based on the local equipoise hypothesis. With this assumption we believe that the
emergence of turbulence kinetic energy and its dissipation rate in the near-wall
control volume are the same. Then equation (14.5) for ε in the near-wall cells is not
solved; instead, the turbulence kinetic energy dissipation rate is determined by the
formula
C μ3/4k P3/2
εP = , (14.10)
κyP
14-7
Modern Optimization Methods for Science, Engineering and Technology
Under the above assumptions, the behavior of the dispersed phase (water
droplets) is conveniently considered in the Lagrangian description. For sprayed
fluids, the Rosin–Rammler expression is the generally accepted droplet size
distribution [32]. The entire range of initial sizes of droplets was divided into
detached intervals; each of them is represented by the average initial diameter for
which the trajectory calculation is performed. In addition, each simulated drop is
actually a ‘package’ of drops with identical trajectories. The Rosin–Rammler
equation describes the distribution of droplets in size, and the droplets’ mass
fraction with a diameter greater than d is described by the formula
n
Yd = e−(d /d ) , (14.11)
14-8
Modern Optimization Methods for Science, Engineering and Technology
Substituting the expressions (14.13) and (14.14) into equation (14.12) and taking
into account that the mass of a spherical drop and the area of its average cross
section are determined by the formulas
πd p3
mp = ρp , (14.15)
6
and having projected the vectors of both parts of equation (14.12) on the axis of the
fixed Cartesian coordinate system, we will have the motion equation system of the
drop in the form
1
πd p3 ⎡ ⎤2
dupj 3ρCR ⎢ 2⎥
= ρp g − (upj − uj ) ∑(upj − uj ) , (14.17)
dt 6 j 4ρp dp ⎢⎣ ⎥⎦
j
where j = 1, 2, 3.
To calculate the trajectory of a drop, we supplement the system (14.17) with the
following equation
dxpj
= upj , (14.18)
dt
where j = 1, 2, 3, хрj are the Cartesian coordinates of a drop.
At values of droplet speeds, when compressibility can be neglected, the coefficient
of aerodynamic resistance CR of a spherical droplet is an unambiguous function of
the relative Reynolds number and can be calculated from the formula
ρdp u − up
Rep = , (14.19)
μ
where dp is the droplet diameter and μ is the dynamic viscosity of gas.
For approximating the dependency, the empirical Zhen–Trizek formula is
used [33]
24 6
CR = + + 0.27. (14.20)
Rep 1 + Rep
14-9
Modern Optimization Methods for Science, Engineering and Technology
where Θ is the angle of inclination of the nozzle to the horizon, and х and y are the
horizontal and vertical Cartesian coordinates, respectively.
When simulating atomizer irrigation, the coordinates of the points of emission of
drops xpj0 were chosen at the centers of the faces of the computational cells
belonging to the output section of the atomizer. It has been assumed that the drops’
initial velocity and the local gas velocity in the outlet section of the atomizer are
equal: upj0 = uj.
In all cases, the diameters of droplets dp were specified by a histogram of the
initial distribution of droplets by size, constructed using equation (14.11).
14-10
Modern Optimization Methods for Science, Engineering and Technology
14-11
Modern Optimization Methods for Science, Engineering and Technology
1. Definition of the discrepancy for each solved differential equation. In this case,
as a rule, to achieve the convergence of the entire solution, it is necessary for
each difference equation to provide the specified level of the residual.
2. Integral discrepancy. In this case, a uniform criterion is determined for all
equations, which allows analyzing the convergence of the solution.
∑Ri2 (14.26)
⩽ ε,
V2
where Ri is the error for difference equations modeling the transfer of independent
variables, V is th volume of control space and ε is the convergence criterion.
A numerical solution can be considered converged if one of the following
conditions is met:
• Condition (14.26) is satisfied.
• The solution no longer changes with the continuation of the iterations.
14-12
Modern Optimization Methods for Science, Engineering and Technology
ρp d p2
τa = , (14.28)
18μ
where dp is the droplet diameter, μ is the dynamic viscosity of gas and ρ is the drop
density.
For the numerical simulation of the specific situation of eliminating the dust
cloud, we assume:
1. Bulk material taken from the center of the pile.
2. When material is collected, a spherical dust cloud with a radius of 2 m is
formed.
3. Water is delivered to the dust cloud by an atomizer, which can be installed in
three positions:
• lower position (at ground level, that is, 6 m below the level of backfill).
• average position (6 m above ground level, that is, at the level of
backfill).
• top position (12 m above ground level, that is, 6 m above the level of
backfill).
14-13
Modern Optimization Methods for Science, Engineering and Technology
atomizers [37]. Using this approach, it is possible to present options for transporting
fine droplets to the zone of occurrence of natural or man-made hazards using the
example of suppressing dust using various sprayers: single-phase jet-centrifugal, two-
phase sprayers and atomizers.
Figure 14.4. The histogram of the distribution of droplets by size with irrigation using a fire-hose.
14-14
Modern Optimization Methods for Science, Engineering and Technology
As a result of the calculations, it was found that at a feed angle of 45°, the
compact jet reaches the surface of the coal pile without disintegrating. Therefore, for
further analysis, four variants with a feed angle of 45° were selected (table 14.1,
figures 14.5–14.10).
From figure 14.5 it can be seen that in variant no. 1 most of the drops, with the
exception of the smallest and largest, fall into the dust cloud. In this case, all the
drops are precipitated within the storage area.
From figure 14.6 it can be seen that in variant no. 2 all the drops fall into the dust
cloud. However, the smallest droplets are carried by the wind (see figure 14.7)
outside the storage area.
From figure 14.8 it can be seen that in variant no. 3, none of the droplets reach the
dust cloud—they all are blown back by the headwind (see figure 14.9). At the same
time, most drops are deposited within the storage area. The smallest drops are
carried by headwinds outside the storage area.
1 45° 40 0 —
2 45° 25 3 tailwind
3 45° 40 3 headwind
4 45° 40 3 cross wind
Figure 14.5. Trajectories of ‘packages’ of drops, painted in accordance with their diameter, in meters (variant no. 1).
14-15
Modern Optimization Methods for Science, Engineering and Technology
Figure 14.6. Trajectories of ‘packages’ of drops, painted in accordance with their diameter, in meters (variant no. 2).
Figure 14.7. Air velocity vectors in the plane of symmetry of the computational domain, painted in accordance
with the absolute value of the velocity, in m s−1 (variant no. 2).
14-16
Modern Optimization Methods for Science, Engineering and Technology
Figure 14.8. Trajectories of ‘packages’ of drops, painted in accordance with their diameter, in meters (variant no. 3).
Figure 14.9. Air velocity vectors in the plane of symmetry of the computational domain, painted in accordance
with the absolute value of the velocity, m s−1 (variant no. 3).
14-17
Modern Optimization Methods for Science, Engineering and Technology
Figure 14.10. Trajectories of ‘packages’ of drops, painted in accordance with their diameter, in meters (variant no. 4).
From figure 14.10 it can be seen that in variant no. 4 part of the representative
droplets reaches the dust cloud. At the same time, most drops are deposited within
the storage area. The smallest drops are carried by the side wind outside the
storage area.
According to the results of numerical simulations, it was found that in variant no.
1, the total residence time of all the drops in the dust cloud was 21.6 s. At the same
time, the mass fraction of droplets that did not fall into the dust cloud is 7.7%. In
variant no. 2, the total residence time of all the drops in the dust cloud was 23 s. At
the same time, the mass fraction of droplets that did not fall into the dust cloud is
1.1%. For variant no. 4, the total residence time of all drops in a dust cloud was 9.2 s.
At the same time, the mass fraction of droplets that did not fall into the dust cloud is
22.4%. The obtained data allow us to further consider the issue of optimizing the
irrigation of a dust cloud with a fire-hose.
14-18
Modern Optimization Methods for Science, Engineering and Technology
• The average median diameter of the droplets in the spray is 100 μm.
• The exponent in the Rosin–Rammler formula is 3.
Figure 14.11. The histogram of the distribution of droplets by size with irrigation using an atomizer.
14-19
Modern Optimization Methods for Science, Engineering and Technology
1 bottom 45° 10 0 —
2 middle 45° 10 0 —
3 top 45° 10 0 —
4 top 0° 10 0 —
5 bottom 45° 3 1 tailwind
6 middle 45° 3 1 tailwind
7 middle 45° 5 1 tailwind
8 middle 45° 10 1 headwind
9 middle 45° 20 1 headwind
10 top 0° 10 1 headwind
11 top 0° 20 1 headwind
12 bottom 60° 20 1 cross wind
13 middle 60° 20 1 cross wind
14 top 60° 20 1 cross wind
15 top 60° 20 1 cross wind
Figure 14.12. Trajectories of ‘packages’ of drops, painted in accordance with their diameter, in meters (variant no. 5).
14-20
Modern Optimization Methods for Science, Engineering and Technology
Figure 14.13. Trajectories of ‘packages’ of drops, painted in accordance with their diameter, in meters (variant no. 6).
Figure 14.14. Trajectories of ‘packages’ of drops, painted in accordance with their diameter, in meters (variant no. 7).
14-21
Modern Optimization Methods for Science, Engineering and Technology
Figure 14.15. Trajectories of ‘packages’ of drops, painted in accordance with their diameter, in meters (variant no. 8).
Figure 14.16. Trajectories of ‘packages’ of drops, painted according to their diameter, in meters (variant no. 10).
14-22
Modern Optimization Methods for Science, Engineering and Technology
Figure 14.17. Trajectories of ‘packages’ of drops, colored according to their diameter, in meters (variant no. 11).
Figure 14.18. Trajectories of ‘packages’ of drops, colored according to their diameter, in meters (variant no. 12).
14-23
Modern Optimization Methods for Science, Engineering and Technology
in the zone of reverse air currents that occur behind the leeward slope of the pile, are
deposited on its surface.
From figure 14.17, it can be seen that in variant no. 11 large drops reach the dust
cloud and settle on the leeward slope of the pile. Small drops are enveloped by a
headwind and are carried out by them to the limits of the storage area.
From figure 14.18, it can be seen that in variant no. 12 only a fraction of the
droplets reach the dust cloud. Most of the larger droplets are deposited on the
leeward side of the pile, before they reach the dust cloud. The smallest drops are
turned by the headwind and are carried by them outside the storage area.
According to the results of numerical simulation in variant no. 13, an insignificant
proportion of the large drops reaches the dust cloud. Most of the droplets do not
reach the dust cloud—the largest drops are deposited on the leeward side of the pile,
not reaching the dust cloud, and the smaller drops turn around the headwind and are
carried away from the storage area. In variant no. 14, some of the droplets fall into
the dust cloud. In this case, all the drops are carried by the cross wind outside the
storage area. In variant no. 15, none of the droplets reach the dust cloud—they are
all carried by the side wind outside the storage area.
The most preferable were the results in variants no. 5 and no. 6. In variant no. 5,
the total residence time of all droplets in a dust cloud was 175.6 s. At the same time,
the mass fraction of droplets that did not fall into the dust cloud was 27.8%. In
variant no. 6, the total residence time of all droplets in the dust cloud was 164.4 s. At
the same time, the mass fraction of droplets that did not fall into the dust cloud is
70.8%.
The obtained data allow us to further consider the issue of optimizing the
irrigation of a dust cloud using an atomizer.
Within the framework of the developed numerical model, the estimated values of
the total residence time of the droplets in the working area were obtained and the
proportion of drops that did not fall into the working area was established.
Numerical simulation of the presented mathematical model made it possible to
determine the most effectual water spraying modes using various devices for dust
suppression under different wind directions. The results suggest that the use of
dispersed water structures to suppress dust is effective. At the same time, environ-
mental safety management is achieved by the right choice of the technical device and
the mode of supplying the process fluid (or water).
14-24
Modern Optimization Methods for Science, Engineering and Technology
The phase interaction was taken into consideration through the discrete model
‘particle—source in a cell’. This model assumes that the droplet presence in the flow
is expressed by additional elements in the conservation equations of the continuous
phase [40]. This is described in section 14.2.3. When calculating the trajectories of
the droplets, the values of the impulse, mass and temperature of droplet ‘packets’
which changed when moving were tracked. These values as initial conditions were
included for the gas phase calculation Sm , Sq , Sfi .
By estimating the droplet mass change as it passes through each control volume
of the geometric model, the mass transfer between dispersed and continuous phases
were calculated as
⎛ Δmp ⎞
ΔSm = ∑⎜⎜ · ṁ p0⎟⎟ , (14.29)
⎝ m p0 ⎠
where Δmp is the change in the drop mass in the control volume, m p0 is the initial
mass of a drop and ṁ p0 is the initial droplets’ mass flow rate.
By estimating the drop pulse change as it passes through each control volume of
the geometric model, the momentum transfer between continuous and dispersed
phases was calculated using formula (14.25). By estimating the drop enthalpy
change as it passes through each control volume of the geometric model, the heat
transfer between continuous and dispersed phases was calculated as
14-25
Modern Optimization Methods for Science, Engineering and Technology
⎛ mp Δmp ⎛ Tp ⎞⎞
ΔSq = ∑⎜⎜ cpΔTp + ⎜−L + ∫T cpi (T )dT ⎟⎟⎟ · ṁ p0 , (14.30)
⎝ m p0 m p0 ⎝ 0 ⎠⎠
where mp is the average drop mass in the control volume, cp is the heat capacity of
drops, ΔT is the change in the temperature of drops in the control volume, L is the
latent heat of evaporation, cpi is the heat capacity of vapor fluid, Тp is the drop
temperature at the outlet of the control volume and Т0 is the enthalpy standard
temperature.
Since the gas phase already affects the dispersed phase, the reverse effect of the
dispersed phase on the continuum was also taken into account [39].
To numerically simulate the instantaneous cooling of a hot gas stream, a
fragment of the heat exchanger space was investigated. This fragment is bounded
by the heat exchanger walls, and its inlet and outlet sections (figure 14.19).
Centrifugal nozzles are built in the heat exchanger to disperse the fluid. The purpose
of this numerical simulation is to investigate the possibility of instantaneous cooling
of a gas stream from 1200 °C to 300 °C.
Figure 14.19. Fragment of the heat exchanger space which was investigated: (a) right side view and (b) isometry.
14-26
Modern Optimization Methods for Science, Engineering and Technology
Table 14.3. The values of the parameters of water injection through nozzles [24].
variants (modes) of water supply. The values of the parameters of water injection
through nozzles are presented in table 14.3 [24].
By integrating the gas phase equations [38, 39], gas parameters can be calculated
at any studied fragment point of the heat exchanger. So we can control the gas flow
temperature and speed in different heat exchanger sections. In addition, most
rational parameters of the liquid supply by sprays can be determined.
For the numerical integration of a system of partial differential equations with
specific boundary conditions, their discretization must be performed. The controlled
volumes method was applied to the discretization of equations in space [41] on an
unstructured computational grid, which contains polyhedric elemental volumes—
cells. The droplet volume distribution in a sprayed liquid stream is based on data
from [26] and is described by the Rosin–Rammler equation.
The droplets’ heat and mass transfer is described by two models—evaporation
and boiling. The evaporation model is valid until the boiling point is reached by the
droplet Tbp. Once the boiling point is reached, then the drop’s heat and mass transfer
is determined by the boiling rate.
The evaporation model suggests that the rate of evaporation of a drop is
determined by Fick’s law:
dmν dc
= Aν Dρ , (14.31)
dt dr
where mν is the steam mass, ρ is the gas density, D is the coefficient of binary
diffusion of vapor in gas, с is the vapor concentration, Аv =πd p2 is the evaporation
surface area, t is time and r is a radial coordinate.
Dividing the variables and integrating equation (14.31) with the boundary
conditions c = cs with r = rp, c = c∞ with r = r∞, we obtain
14-27
Modern Optimization Methods for Science, Engineering and Technology
dmν
= βAν ρ(cs − c∞), (14.32)
dt
where cs is the volume concentration of vapor at the surface of the drop, rp is the
radius of a drop, c∞ is a volume concentration of vapor in the ambient gas and β is
the experimentally determined evaporation rate.
The results of experimental studies are usually presented as criterial dependences
Sh (Re, Sc), where Sh is the Sherwood number, defined as
βdp
Sh = . (14.33)
D
Taking into account (14.33) and the evaporation area of a spherical drop,
equation (14.32) takes the form
dmν
= ShρDπd p2(cs − c∞). (14.34)
dt
For an approximation of the criterion dependence Sh(Re, Sc), the empirical
Rantz–Marshall formula was used [42]:
Sh = 2 + 0, 6 Re0,5 Sc0,33. (14.35)
Suppose that the partial pressure of vapor on the surface is equal to the saturated
vapor pressure psat at the drop temperature Tp. In this case, the vapor concentration
on the surface of the drop can be calculated,
Mνpsat
cs = , (14.36)
Mνpsat + M (p − psat )
where М is the molecular mass of gas, Мv represents the molecular mass of vapor,
and р is the absolute pressure of the gas–vapor mixture.
If we differentiate equation (14.34) by time, then an equation for the decrease rate
of the evaporating drop diameter can be obtained:
dmp d (d p )
= 0,5πρp d p2 . (14.37)
dt dt
dmp dmp
Considering equation =− from (14.34) and (14.37), we obtain
dt dt
d (d p ) 2ShρD
=− (cs − c∞). (14.38)
dt ρp dp
When the gas is cooled, the drop temperature of dispersed fluid changes before
reaching the boiling point in accordance with the heat balance
dTp dm
mpcp = αAν (T∞ − Tp) + L ν , (14.39)
dt dt
14-28
Modern Optimization Methods for Science, Engineering and Technology
ρ p d p2cp
Θ= . (14.43)
6Nuλ
Based on the assumption of a complete analogy between heat and mass transfer
to determine the Nusselt number, to approximate the criterial dependence of Nu
(Re, Pr), a relationship similar to formula (14.35) was used:
Nu = 2 + 0, 6 Re0,5 Pr 0,33. (14.44)
After the droplet temperature is equal to the boiling point, instead of equation
(14.38) the equation of the speed of boiling is applied:
⎛ ⎡ cp∞(T∞ − Tp) ⎤⎞
d (d p ) 4λ ⎜
=− 1 + 0, 23 Re 0,5
ln ⎢ 1 + ⎥⎟ , (14.45)
ρp cp∞dp ⎜⎝ ⎥⎦⎟⎠
p
dt ⎢⎣ L
14-29
Modern Optimization Methods for Science, Engineering and Technology
Table 14.4. The results of numerical simulation of instantaneous cooling of high-temperature gas for
variant no. 1.
Table 14.5. The results of numerical simulation of instantaneous cooling of high-temperature gas for variant
no. 2 [24].
Table 14.6. The results of numerical simulation of instantaneous cooling of high-temperature gas for variant
no. 3 [24].
14-30
Modern Optimization Methods for Science, Engineering and Technology
14-31
Modern Optimization Methods for Science, Engineering and Technology
Figure 14.20. Trajectories of ‘packages’ of drops, painted in accordance with the time of their stay, in seconds
(variant no. 2, isometry).
Figure 14.21. Gas temperature distribution over heat exchanger sections, in °C (variant no. 2, isometry).
14-32
Modern Optimization Methods for Science, Engineering and Technology
(table 14.5). Thus, the structure and parameters of the gas–droplet flow as a whole
correspond to the set purpose and variant no. 2 of water supply can be considered
satisfactory.
In variant no. 3, water drops retain their initial impulse almost unchanged across
the entire width of the channel until they collide with its walls, after which they
follow trajectories mainly due to the rebound phenomenon. Accordingly, the
symmetry of the droplet trajectories is preserved, which is due to the extremely
low sensitivity of very large droplets (≈514 μm) not only to small perturbations
(fluctuations), but also to the averaged gas velocity. At the same time, droplets have
a strong influence on the gas flow, which in this case remains almost symmetrical. It
is also observed that water droplets, as a result of the convective heat exchange and
evaporation, take temperatures from 20 °C to 57 °C, without reaching the boiling
point of water, 100 °C.
Drops do not have time to evaporate, not only to the entrance to a narrow section
of the heat exchanger (to section 1), but also to the output of section 4. Although the
drops only partially evaporate within the entire heat exchanger, they fill the volume
of the flow part very fairly and evenly. As a result, the numerical value of the average
mass fraction of water vapor gH2O.av in the vapor–gas mixture in section 1 is slightly
less than the equilibrium. Accordingly, the numerical value of the gas–vapor mixture
average temperature tav at section 1 is 64 °C higher than the equilibrium (455 °C).
The temperature of the gas–vapor mixture and the mass fraction of water vapor
in it are also fairly evenly distributed over all sections, particularly in section 1,
where the coefficients of uniformity are γТ = 0.9501 and γH2O = 0.9443 (see table 14.6).
At the same time, the gas–vapor mixture maximum temperature in the sections
exceeds the equilibrium temperature (454 °С) by 263 °С, 114 °С, 68 °С and 43 °С,
respectively. Thus, the gas–droplet flow structure and parameters do not ensure the
achievement of the intended purpose, and consequently variant no. 3 of water supply
cannot be considered satisfactory.
It is obvious that the recognition of variant no. 2 for supplying water as
satisfactory does not exclude the existence of an even more perfect version, for
the future search for which we can formulate and solve the corresponding
optimization problem.
Further, a numerical experiment to determine the cooling time of the gas stream
to the required temperature was carried out. Considering the movement of the gas
elementary volume at the heat exchanger fragment, figure 14.22 shows the control
jet of the gas stream for various modes of dispersed fluid supply and figure 14.23
shows the Z-coordinate dependence graph of the gas elemental volume on time τ.
From the graph Z(τ), the moments of time were determined at which the
elementary volume of gas reaches the next section i. The average tav, minimum
tmin and maximum tmax gas temperature, average temperature deviation and
uniformity coefficient of gas temperature distribution in control sections, and the
residence time Δτ, cooling Δt, gas elementary volume average cooling rate Δt/Δτ
between adjacent sections i and (i—1) were determined.
14-33
Modern Optimization Methods for Science, Engineering and Technology
Figure 14.23. The Z-coordinate dependence graph of the gas elementary volume on the movement time along
the heat exchanger.
The total gas elementary volume existence time at the heat exchanger for the
investigated variants is 1.32 s, 6.42 s and 2.37 s, respectively. The maximum gas
cooling is observed during its contact with the dispersed fluid droplets, which
evaporate:
• Variant no. 1 is characterized by lowering the gas temperature to 472 ° C for
0.47 s. This is observed between sections C and E.
• Variant no. 2 is characterized by gas temperature 436 °C between sections C
and E. This temperature is reached for 1 s of its stay in the heat exchanger.
• Variant no. 3 is characterized by gas temperature 454 °С between sections A
and E. This temperature is reached during 1.6 s of its stay in the heat
exchanger.
14-34
Modern Optimization Methods for Science, Engineering and Technology
The investigation shows that the maximum gas cooling rate is reached between
sections C and D. According to the results of numerical simulation, it can be seen
that for variant no. 1 it is −1919 °C s−1, for variant no. 2 it is −1269 °C s−1 and for
variant no. 3 it is −975 °C s−1. The rest of the time is taken to mix the gas with water
vapor. The dependence of the gas temperature on the residence time of the gas
stream in the heat exchanger is shown in figure 14.24 and the histogram of the
average gas cooling rate in the areas located between the control sections i and (i − 1)
is shown in figure 14.25.
Thus, the variant no. 2 of dispersed fluid supply is characterized by a short time to
establish an equilibrium state of the vapor–gas mixture, by reduced fluid flow and
eliminatesing its accumulation in the heat exchanger. However, from the point of
view of ensuring environmental safety, the first option can be considered as the most
Figure 14.24. The dependence of the gas flow temperature on the time of its cooling: red—variant no. 1;
blue—variant no. 2; black—variant no. 3.
Figure 14.25. The gas residence time between the control sections: red—variant no. 1; blue—variant no. 2;
gray—variant no. 3; A, B, C, D, E, 1, 2, 3, 4 are control sections.
14-35
Modern Optimization Methods for Science, Engineering and Technology
satisfactory, since the cooling time of the gas stream from 1200 °C to 305.8 °C is
1.32 s.
However, these results do not exclude the possibility of determining the most
effective option, in the search for which a corresponding optimization problem can
be formulated and solved in the future.
References
[1] Kolesnyk V, Pavlychenko A, Borysovs’ka O and Buchavyy Y 2018 Formation of physic and
mechanical composition of dust emission from the ventilation shaft of a coal mine as a factor
of ecological hazard Solid State Phenom. 277 178–87
[2] Vambol S, Vambol V, Sundararajan M and Ansari I 2019 The nature and detection of
unauthorized waste dump sites using remote sensing Ecol. Questions 30 1–17
[3] Vambol S, Bogdanov I, Vambol V, Suchikova Y, Kondratenko O, Hurenko O and
Onishchenko S 2017 Research into regularities of pore formation on the surface of
semiconductors Eastern-Eur. J. Enterp. Techn. 3/5 37–44
[4] Kolesnik V, Ye, Pavlichenko A V and Buchavy Y V 2016 Determination of dynamic
parameters of dust emission from a coal mine fang Nauk. Vis. Nat. Hirny. Univ. 2 81–7
[5] Tverda O, Plyatsuk L, Repin M and Tkachuk K 2018 Controlling the process of explosive
destruction of rocks in order to minimize dust formation and improve quality of rock mass
Eastern-Eur. J. Enterp. Techn. 3/10 35–42
[6] Sundararajan M, Sharma S N, Kumar R, Ansari I and Kumar G 2018 Estimation of SPM
emission in air environment through empirical modeling and remedies for dust control in and
around coal mining complexes National Seminar on Environmental Issues: Protection,
Conservation and Management (EIPCM) (Dhanbad, Jharkhand, 26–7 February 2016)
pp 207–14
[7] Tverda O, Tkachuk K and Davydenko Y 2016 Comparative analysis of methods to
minimize dust from granite mine dumps Eastern-Eur. J. Enterp. Techn. 2/10 40–6
[8] Kulyk M P 2014 Analiz ekolohichnoyi nebezpeky obyektiv teplovoyi enerhetyky ta metodiv
zmenshennya shkidlyvykh vykydiv Vis. Inzh. Aakad. Ukrayiny 2 253–8
[9] Vambol S, Vambol V, Sobyna V, Koloskov V and Poberezhna L 2018 Investigation of the
energy efficiency of waste utilization technology, with considering the use of low-temperature
separation of the resulting gas mixtures Energetika 64 186–95
14-36
Modern Optimization Methods for Science, Engineering and Technology
14-37
Modern Optimization Methods for Science, Engineering and Technology
(San Francisco, CA, 18–23 July 1999) (New York: American Society of Mechanical
Engineers), pp 1322
[27] Loitsianskii L G 1978 Mehanika Zhidkosti i Gaza (Moskva: Nauka), p 736
[28] Joshi M 2002 Failure of dust suppression systems at coal handling plants of thermal power
stations—a case study https://plant-maintenance.com/articles/dust_suppression.pdf
[29] Redko A, Dzhyoiev R, Davidenko A, Pavlovskaya A, Pavlovskiy S, Redko I and Redko O
2019 Aerodynamic processes and heat exchange in the furnace of a steam boiler with a
secondary emitter Alexandria Eng. J. 58 89–101
[30] Launder B E and Spalding D B 1974 The numerical computation of turbulent flows Comput.
Meth. Appl. Mech. Eng. 3 269–89
[31] Vambol S A, Skob Y U A and Nechiporuk N V 2013 Modelirovaniye sistemy upravleniya
ekologicheskoy bezopasnost’yu s ispol’zovaniyem mnogofaznykh dispersnykh struktur pri
vzryve metanovozdushnoy smesi i ugol’noy pyli v podzemnykh gornykh vyrabotkakh
ugol’nykh shakht Vestnik Kazan. Tekhnol. Un-ta 16/24 168–74
[32] SAS IP 2011 Using the Rosin–Rammler diameter distribution method https://sharcnet.ca/
Software/Fluent14/help/flu_ug/flu_ug_sec_discrete_diameter.html
[33] Kostyuk V and Ye 1988 K vyboru approksimiruyushchego vyrazheniya dlya koeffitsiyenta
aerodinamicheskogo soprotivleniya kapli Nauch.-metod. Mater. Teor. Aviats. Dvig.: Sbor.
Nauch. Trud. KHVVAIU 6 13–21
[34] Tsipenko A V 2004 Matematicheskaya model’ dispersnogo neravnovesnogo potoka s
bol’shoy doley zhidkosti v sople s uchetom plenki, stolknoveniy i aerodinamicheskogo
drobleniya kapel’ Moskva, NII NT, 46 s
[35] Krou D 1982 Chislennyye modeli techeniy gaza s nebol’shim soderzhaniyem chastits Ser.
Teor. Osn. Inz. Rasch. 104 114–22
[36] Vambolʹ S O 2012 Systema upravlinnya ekolohichnoyu bezpekoyu pry vykorystanni
pylopryhnichuyuchykh system zroshennya u protsesi navantazhennya ta rozvantazhennya
sypkykh materialiv u portakh Otk. Ynform. Komp’yut. Ynteh. Tekhnol. 55 161–67
[37] Vambolʹ S A 2012 Yssledovanye chyslennym metodom protsessa postanovky dyspersnoy
vodyanoy zavesy v systemakh upravlenyya ékolohycheskoy bezopasnosty Ekol. Bez.: Prob.
Shlyak. Vyris. 2 154–59
[38] Vambol V V 2014 Matematicheskoye modelirovaniye gazovoy fazy okhlazhdeniya gener-
atornogo gaza ustanovki utilizatsii otkhodov zhiznedeyatel’nosti Ekol. Bez. 6 148–52
[39] Vambol V V 2015 Modelirovanie gazodinamicheskih protsessov v bloke ohlazhdeniia
generatornogo gaza ustanovki dlia utilizatsii othodov Tehnol. Tehnos. Bezo.: Intern.-Zhur.
1 http://ipb.mos.ru/ttb/index.html
[40] Vambol V V 2015 Matematicheskoye opisaniye protsessa okhlazhdeniya generatornogo
gaza pri utilizatsii otkhodov zhiznedeyatel’nosti Tekhnol. Audit Rez. Proiz. 2/4 23–9
[41] Fletcher K 1991 Vychislitel’nyye metody v dinamike zhidkostey М.: Mir 1 504
[42] Shervud T, Pigford R and Uilki C 1988 Massoperedacha (Moscow: Mashinostroenie), p 600
14-38
IOP Publishing
Chapter 15
Future directions: IoT, robotics and AI based
applications
K C Raveendranathan
The recent innovations in the information age, where data is considered the ‘new oil’,
point toward rapid all pervasive developments in the Internet of Things (IoT),
robotics and artificial intelligence (AI) based applications. Data science has evolved
in the last few decades as a promising field of vast opportunities and challenges,
which encompasses all the endeavors of mankind. As raw data evolves into
information and intelligence through several data processors, its value is multiplied
many-fold. In this chapter, we primarily focus on future directions in the disruptive
technologies such as IoT and its importance in building smarter systems for a brave,
new and smarter world, where robotics and AI based applications play a pivotal role
in every human activity. IoT is a coinage of Kelvin Ashton of MIT in 1999, and is, in
general, any network of smart, connected devices which can be controlled from
anywhere across the globe through the Internet. It should be emphasized that despite
its potential advantages, such as being an enabler of global, remote connectivity and
thus aiding use of our home appliances which are smart enough to connect to the
Internet, its vulnerability to cyber attacks cannot be overlooked by the intelligent
designer, developer, or even the end user. Robotics and its principles were known
to technology evangelists and to end-users from the days of the science fiction play
R.U.R. (Rossum’s Universal Robots) by the Czech science fiction author Karel Čapek
(and his brother Joseph Čapek, who actually coined the term ‘robot’) in the 1920s. The
principles of AI were first suggested by Herbert A Simon, Marvin Minsky, Allen
Newell and John McCarthy at a 1955 conference, however, the credit for the term is
rightfully attributed to the latter. If we take a count of the devices/appliances/other
smart things connected to the internet by the year 2025, it will be the astounding
figure of 34.2 billion worldwide (projected data from IoT analytics), compared to the
present status of 17.8 billion worldwide in 2018, inclusive of IoT devices. One can take
these figures for granted and it is more likely that the actual numbers may exceed the
projections by 2025 unless some other disruptive technology evolves. Robotics has
matured enough with developments in interdisciplinary technologies such as mecha-
tronics and AI, such that fully automated vehicular systems and other means of
transport have become the order of the day. It is seemingly unpredictable, with the
current rate of innovations in mechatronics and AI, in which direction products and
processes will evolve. However, it is arguably going to be more toward a level playing
field, where the tools and techniques are primarily AI based, starting from the raw/
semi-processed data pumped by the IoT devices, communicated through the Internet,
and processed by the most advanced and inexpensive signal processors (both passive
and active). AI has grown to a mature technology in machine intelligence, and there
have been several more recent developments, such as machine learning (ML) and deep
learning, and several types of artificial neural networks (ANNs). The convolution
neural networks (CNN), recurrent neural networks (RNN) and deep neural networks
(DNN) fall into this category.
15.1 Introduction
In our information age, data is considered to be the new oil. The advent of the
Internet and its associated technologies has given way to huge data proliferation
from a number of sources. The IoT is an ecosystem of connected mechanical and
digital devices, physical objects, or living organisms, including human beings, that
are given a unique identity and have the ability to communicate and internetwork
together over a network, without needing human intervention. The IoT devices that
are interconnected through the Internet can be categorized into three groups:
1. Devices that receive and retransmit information.
2. Devices that receive information, and then process and act on it.
3. Devices that do both of the above.
Each of the above categories of things has enormous advantages associated with it.
Examples for the first category of devices, namely those devices that collect and send
information to the Internet, include all sorts of sensors. Typical sensors are
temperature, motion, moisture, air quality, or even optical (light). These sensors
enable the user to gather environmental information such as temperature, humidity,
air quality, etc, from their surroundings autonomously and communicate through
the wired or wireless Internet, which will, in turn, allow the end-users to make apt
and timely decisions. Just like the human sensory organs such as the nose, ear,
tongue and eyes help human beings perceive the surrounding world, these ‘smart
sensors’ enable various IoT devices to do the same. Also, the devices that receive and
act on sensory information can be operated remotely from a distant place, and this is
one of the major merits of the IoT. It is interesting to note that IoT devices also
greatly contribute to the proliferation of data, this will eventually open up a new
avenue for extensive study and research in data sciences, namely data analytics.
The third category of devices—the ones that collect information and retransmit,
as well as receive information and act on it—is the true goal of the IoT. To cite an
15-2
Modern Optimization Methods for Science, Engineering and Technology
example, a soil moisture sensor in an agricultural farm can sense the moisture in the
soil and in turn decide when to switch on the irrigation system to water the plants.
Note that this is done in an intelligent fashion without the intervention of the farmer.
Obviously, the IoT improves the efficiency of operation of devices which are
connected to the Internet. The major outcome of the IoT and associated technol-
ogies is that of data proliferation. The IoT results in large chunks of real-time data to
be processed either locally or at a distant data center. Thus the proliferation of IoT
devices eventually leads to big data analytics.
The IoT is considered to be a disruptive technology, due to the fact that it has indeed
pre-empted several prevalent technologies. Further, several new applications have
evolved and proliferated with IoT, including distributed computing, wireless sensor
networks and so on. The principles of robotics have evolved and the technologies
matured over the past several decades. AI has created another paradigm shift in
modern-day computing. Robotics and AI, together, have produced several cutting edge
technologies and applications in the current millennium. The intelligence displayed by
man-made machines is termed AI or machine intelligence (MI). It is to be contrasted to
the natural intelligence displayed by human beings as well as other types of living
organisms, including plants. With AI technology one can build intelligent machines
that can work and react exactly like human beings. It is interesting to note that we have
now entered into a new era of technological advancements, where machines have
started creating machines, with minimal or almost nil human aid. The machine
intelligence of such machines has to be ‘measured’ using a different gauge of MI.
In this chapter, we mainly focus on the future directions in IoT, robotics, and AI
based applications.
15-3
Modern Optimization Methods for Science, Engineering and Technology
They are very intelligent machines with a very high machine intelligence quotient
(MIQ) that use AI algorithms at their core to improve the accuracy when operating
on patients. AI techniques are used in producing high-resolution digital images and
holograms that are crucial in several applications such as space-imagery, the
detection and prevention of forgery, etc [1].
15-4
Modern Optimization Methods for Science, Engineering and Technology
with machines that could outperform them in every endeavor. Several modern-day
economists envisage that society will not be able to cope easily with the change.
People now realize that the dictum that AI technology that destroys existing jobs will
eventually create new job opportunities is too optimistic in reality [5]. Questions such
as whether the jobs replaced by AI systems will be replaced by the new jobs created,
of course after acquiring a new skill set, are highly relevant. A socio-economic study
conducted in the year 2013 reported that about 47% of American workers held jobs
that were at high risk of extinction due to automation, just a few decades from now.
The highly relevant questions are: ‘Will technology be able to create about a
100 million jobs if these jobs are automated by AI technologies? [5] Will AI
technology be able to create new jobs to compensate for those it has demolished or
made irrelevant? Will these jobs be created fast enough to meet the rising demands of
those who lost their earnings? What will be the state of employees whose skill sets
cannot catch up with the existential advancements in modern technology? Will such
workers lose their frugal existence in society? So far, the answers to the above
questions are affirmative.
A recent US study revealed that the employment potential of highly paid
cognitive jobs and low-paid service jobs are growing fast, such as physical aid to
the disabled or physically challenged older generation and fast-food services, which
are yet to be automated. However, technology is puncturing the economy severely,
through the automation of mid-skill, working-class jobs. Since 2000, several millions
of low salary service jobs have disappeared, and the workers were either forced to
leave the labor force or accept low income jobs that often pay meager sums, without
any other perks or privileges. It is apparent that firms in the communications
technology sector save huge amounts otherwise spent on salary by hiring temporary
employees on a contract basis, instead of full time regular staff. This led to the ‘gig
economy’—a job market where short-term contracts and flexible hours with almost
no benefits to the employees have flourished. Automation due to massive AI based
applications has fully alienated the creation of jobs from economic growth, resulting
in economic growth with an alarming rise in unemployment and large-scale
shrinkages in incomes of workers, thus resulting in more inequality in society. It
may also be noted that the newer technologies have created a ‘winner-takes-all’
situation, where the loser can hardly survive. In 1990, the three largest corporations
in Detroit were worth US$65 billion and had a labor force of 1.2 million. By 2016,
the three largest corporations in the Silicon Valley were worth US$1.5 trillion but
could accommodate only a mere 190 000 workers [5]. Hence the modern industrial
trend is larger communication companies and fewer jobs. If this is true, then AI
technology must create nearly 100 million new jobs to balance the gap in the labor
market.
The reliance is on advances in the domain of AI technologies that would create
more new jobs for the majority of the labor force, and everyone would earn a living
as ‘existential beings in their society’. On the other hand, when fewer new jobs are
created to accommodate even the newly up-skilled jobseekers, there are very high
unemployment rates, which is the situation termed by Verdi [5] as ‘a state of violent
uprising’. Considering the volatile state of the labor market, educational institutions
15-5
Modern Optimization Methods for Science, Engineering and Technology
for engineering must focus on training their student community with the right skill
set needed, which the innovations in AI demand. Employees would need to
continually upgrade and hone their skills by attending appropriate training
programs for better prospects in the job markets. Even then, ‘the need to adapt
and train for new jobs will become more challenging as AI continues to automate a
greater variety of tasks’, as very rightly pointed out by Verdi.
15.1.2.1 Some recent trends on workforce lay-offs, redeployment and new hires
A recent article that appeared in IEEE Spectrum [8] revisited the statistics of jobs lost
due to lay-offs, redeployment after up-skilling, and the newly created jobs. The study
mainly focused on the Silicon Valley area in the US and went on to add that most of
the jobs lost due to lay-offs in various companies are compensated for by the hiring of
new technical and non-technical staff against the vacancies that were created by lay-
offs. Thus the new technologies enable the creation of new jobs and most of the
workforce who lost their jobs could easily find new avenues for employment.
15-6
Modern Optimization Methods for Science, Engineering and Technology
15-7
Modern Optimization Methods for Science, Engineering and Technology
the use of smaller on-board computers in robots. These are used to attend to tasks
that need real-time processing, such as the control of sensors/actuators or stepper
motors. The grouping of low-level and high-level reasoning was first explored as ‘the
concept of remote-brain’ in 1996 at the University of Tokyo. The global knowledge
base for robots is a depository of objects, actions or environments. The robots are
enabled to download the object description and user manual, even when it was the
first time it encountered that particular object, or will be able to plan a route in the
environment, which was earlier traversed by another. One major disadvantage in
using cloud-based architecture is the chance of loss of connectivity between a
transaction, and even if the robot uses the cloud services for basic functionality only,
it will fail to do anything tangible. This is an acceptable constraint if a backup is
provided to meet that contingency. At present, the stability of network connectivity
is no longer an issue. Acquiring a new stable infrastructure is affordable compared
to the cost of a robot with full embedded intelligence. Now the real problem pertains
to the latency of the network [7].
Cloud robotics usually comprise two types of communications, namely machine-
to-machine (M2M) and machine-to-cloud (M2C). The first tier of communication,
M2M, implies that the robots are in a collaborative computing environment termed
an ‘ad hoc cloud’, which allows them to jointly share computationally intensive tasks
by pooling the resources needed and by exchanging information for collaboration,
and more importantly connecting to robots not in the range of the cloud access point
and helping them to communicate with the cloud. In short, most of the communi-
cations in this tier will be restricted to M2M. In the second tier of communication,
i.e. the M2C level, the cloud can release resources for computation and storage on-
demand. The robots in M2M mode communicate and share resources for a task that
is beyond their capability or which is common among themselves. This is illustrated
in figure 15.1.
Figure 15.1. M2M and M2C communication modes. Image source: [9].
15-8
Modern Optimization Methods for Science, Engineering and Technology
As mentioned before, this in effect acts as a ‘remote-brain’ along with the shared
memory of actions, acquired skills, and information already learned and obtained
from the cloud.
15-9
Modern Optimization Methods for Science, Engineering and Technology
Although the domain of cloud computing is quite new, there are multiple projects
based on its principles and those of robotics. The major advantages of cloud
computing are its scalability, massive storage, resilience and ability for parallelism.
The methods used in AI, grabbing all resources in hold (‘grasping’), image processing,
and neural networks, are heavily computationally intensive. Hence it is not viable to
use them on a wide range of robots which do not possess enough on-board computing
resources. With cloud robotics it is possible to achieve the ‘remote-brain’. It is an older
concept in which the entire hardware and software are separate [7].
15-10
Modern Optimization Methods for Science, Engineering and Technology
15.4 Innovative solutions for a smart society using AI, robotics and
the IoT
AI, robotics and the IoT are attracting widespread attention, as they are expected to
be technologies that affect society to a great extent in the future. These innovative
technologies have the potential to build seamless communication and a symbiotic
society between humans and robots, and a safe and secure networked society [16].
Various components of smart solutions include borderless communication and
symbolic communication between human beings and robots (machines).
The smart solutions for a smarter society discussed in this section include
automatic speech translation systems, a robot based dam inspection system, and a
large-scale security system used in imaging. Speech recognition in noisy environ-
ments results in inaccurate translation. The beam forming technology demonstrated
15-11
Modern Optimization Methods for Science, Engineering and Technology
15-12
Modern Optimization Methods for Science, Engineering and Technology
process, deliver, analyze and visualize data from sensors located on mobile units.
The applications of CarTel include traffic mitigation, road surface monitoring and
hazard detection, vehicular networking and so on [17].
Precision agriculture is experimenting on large-scale farming practices, products,
fundamental geographic information of farmland, micro-climate information and
other data. The project ‘wireless underground sensor network’ (WUSN) was
developed by the University of Nebraska-Lincoln Cyber Physical Networking
Lab, where Agnelo R Silva and Mehmet C Vuran developed a novel cyber-physical
system through the integration of pivotal systems with wireless underground sensor
networks, i.e. CPS for precision agriculture [18]. The WUSNs consist of wirelessly
connected underground sensor nodes that communicate through the soil.
The health cyber-physical systems (HCPS) will replace traditional health devices
working on an individual basis. With interconnected sensors and networks, various
health devices work together to detect the patient’s physical condition in real time.
This is particularly useful for patients who are critically ill, such as patients with
heart disease, strokes, etc. The portable terminal devices carried by the patient can
detect the patient’s condition at any time and send a timely warning or prediction of
critical conditions in advance. In addition, the real-time activation of health
equipment and data delivery systems would be much more beneficial for patients
with critical conditions [17]. The proposed standard CPS architecture is illustrated in
figure 15.3.
The six standard CPS architecture modules are the sensing module, data
management module (DMM), next-generation Internet, service aware modules
(SAM), application module (AM), sensors and actuators. The purpose of the
sensing module is to send an communication request to DMM and receive its
acknowledgment of the request. Once the handshake between the DMM and sensing
module is done, the transmission of the sensed data to the DMM from sensing nodes
commences. Here, the bridge between the cyber world and the physical world is
provided by noise reduction and data normalization techniques. Through quality of
service (QoS) routing, data are transferred to SAMs using the next-generation
Figure 15.3. The proposed standard cyber-physical system architecture. Image source: [17].
15-13
Modern Optimization Methods for Science, Engineering and Technology
15.4.2 IoT architecture, its enabling technologies, security and privacy, and
applications
Fog/edge computing has been integrated with the IoT with a purpose—to enable
computing devices deployed at the network edge to improve the user experience and
resilience of the services, in the case of system failures. The advantages of fog/edge
computing are their inherent distributed architecture and closeness to the end-users.
Thus, faster response and greater QoS for IoT based applications can be provided.
This makes fog/edge computing-based IoT, very attractive in IoT deployment [19].
Note that using fog/edge computing, the massive data generated by different kinds
of IoT devices can be processed at the network edge, instead of transmitting it to the
centralized cloud infrastructure due to bandwidth and energy consumption con-
straints. Since fog/edge computing devices are organized following a distributed
architecture model, they can process and store data in networked edge devices,
which are close to end-users. Thus they can provide services with faster response and
greater quality, compared to cloud computing. Thus, fog/edge computing is more
amenable to be integrated with IoT devices, to provide efficient and secure services
for a large number of end-users. In short, fog/edge computing-based IoT can be
considered as the future of IoT infrastructure.
It is well known that both CPS and the IoT try to achieve a close connection
between the cyber and physical worlds. In particular, the CPS and IoT can measure
the status information of the physical components via smart sensors without human
intervention. In both cases, the measured state information can be shared through
communication networks. Based on the analysis of measured status information,
both CPS and IoT can provide secure, efficient and intelligent services to the end-
users. The existing efforts on the CPS and IoT applications have been expanded to
similar application domains, such as smart grids, smart transportation, smart cities,
and so on. In CPS, the sensor/actuator layer, communication layer, and application
or control layers are present. The sensor/actuator layer is used to collect real-time
data and execute the commands. The communication layer delivers data to the
upper layers and commands to the lower layers. The function of the application or
control layer is to analyze the data and make decisions based on it [19]. Note that the
CPS is a vertical architecture. On the other hand, the IoT is a cluster of finely
internetworked devices in large numbers which are used to monitor and control
devices by using modern interconnection technologies in cyber space. Specifically,
the crux of IoT lies in ‘interconnection’. The main objective of IoT is to interconnect
various heterogeneous networks so that the data collection, resource sharing,
analysis and management can be carried out smoothly across networks.
Figure 15.4 illustrates the typical integration of IoT with CPS. The basic
difference between CPS and IoT is that CPS is considered a system, whereas the
IoT is considered an ‘Internet’.
15-14
Modern Optimization Methods for Science, Engineering and Technology
The common requirements for both are real-time, reliable, and secure data
transmission and retrieval. The distinct requirements for CPS and IoT can be
summarized as follows. For CPS, effective, reliable, accurate and real-time controls
are the primary goals. For IoT, resource sharing and management, data sharing and
management, interfacing among different networks, massive-scale data and big data
collection and storage, data mining, data aggregation and information extraction,
and very high QoS networking are important requirements. Applications of the
integrated IoT and CPS include smart grids, intelligent transport systems (smart
transportation) and smart cities.
The smart grid is an integrated technology of IoT and CPS. The smart grid has
been developed to replace the traditional power grid to provide reliable and efficient
energy to consumers [20]. Distributed energy resources are introduced to improve
the utilization of distributed energy in electric vehicles to improve the capability of
energy storage and reduce the emission of CO2. In this, smart energy meters and
duplex communication networks are introduced to achieve the effective interactions
between customers and utility providers. Using IoT devices, a huge number of smart
meters can be deployed in houses and buildings connected to the communication
networks in the smart grid. Smart meters can monitor energy generation, storage
and consumption. They are capable of interacting with power utility providers to
report the energy demand information of customers and receive real-time electricity
pricing for customers. With the aid of fog/edge computing infrastructure, the huge
chunks of data collected from smart meters can be stored and effectively processed
so that the efficient operation of the smart grid is possible. With dynamic
information processing on the load conditions and the power generated, the utility
providers can optimize the energy dispatched in the grid. The customers can
optimize their energy consumption, resulting in the improvement of resource
utilization and the reduction of cost [19]. This is a truly win-win situation.
15-15
Modern Optimization Methods for Science, Engineering and Technology
15-16
Modern Optimization Methods for Science, Engineering and Technology
15-17
Modern Optimization Methods for Science, Engineering and Technology
Figure 15.5. The implementation of cloud robotics in an industrial environment. Image source: [23].
widely to automate factory operation remotely. It became the successor to the first
(mechanization of production using water and steam power), the second (mass
production with electric power) and the third (use of electronics to automate
production) industrial revolutions. The term ‘industrial Internet’ was first coined
by the engineers from General Electric in 2012, to describe new efforts where
industrial equipment such as wind turbines, jet engines and MRI machines connect
over networks to exchange data. Thus, the processing of data for industries
including energy, transportation and healthcare became quite efficient [24].
15-18
Modern Optimization Methods for Science, Engineering and Technology
15-19
Modern Optimization Methods for Science, Engineering and Technology
15.5 The human 4.0 or the Internet of skills (IoS) and the tactile
Internet (zero delay Internet)
One can predict the fast emergence of an entirely new, more vibrant Internet, that
capitalizes on the latest developments in 5G and ultra-low delay networking, as well
as the innovations in AI and robotics. This novel Internet will enable the delivery of
skills in digital form. The delivery of physical experiences remotely (and globally) is
made possible by the Internet of skills (IoS), which will revolutionize operations and
servicing capabilities for industries. In general, it will be a quantum jump in the way
we communicate, teach, learn and interact with our surroundings. It will be a brave
new world where our best engineers can service cars instantaneously around the
world over the tactile internet; or anybody can be taught how to paint by the best
available artists globally. At an estimated revenue of US$20 trillion per annum
worldwide, which is approximately 20% of today’s global gross domestic product
(GDP), it will be a technology enabler for skill set delivery—thus a very timely
technology for service driven economies across the globe [25]. The transformation to
the Internet of skills is illustrated in figure 15.6.
The IoS will democratize labor, in the same way as the Internet has democratized
the dissemination of knowledge. The core technologies used in the Internet of skills
are (a) ultra-fast data networks (zero delay Internet or tactile Internet), (b) haptic
encoders (both kinesthetic and tactile) and (c) edge AI (to beat the light limit).
15-20
Modern Optimization Methods for Science, Engineering and Technology
Figure 15.6. The evolution of the Internet of skills (human 4.0). Image source: [25].
15-21
Modern Optimization Methods for Science, Engineering and Technology
to a problem-free way of life. The digital voice assistants offered by Amazon and
Google are some of the incredible gadgets which will make life easier. The
development of amazing IoT solutions would be more in vogue, in view of
controlled purchasing power and appreciation for advancing innovations [26].
The gap between the IoT and AI will continue to diminish, leading to the amazing
growth of functionality in gadgets, and a consequent operational excellence. With
explicit client intelligence, a similar IoT gadget would be equipped to offer explicit
client experiences. IoT applications would become more intelligent, to cleverly
watch the Earth, correct flaws and autocorrect operational glitches. A growth in
client explicit innovations will provide the necessary base for a personal AI era [26].
The dependence on the cloud for inexpensive, localized computing power should
not be ignored. The advent of cloud robotics will result in numerous applications.
Also, more recent developments in deep learning and ML will continue to support
several AI applications, including those in image processing.
In the era of rapidly evolving technologies, big data analytics and IoT are the two
leading radical technologies which can explicitly modify the style of business
operations. Both technologies are still in their nascent stages and hold massive
potential and opportunities, and pose several unresolved challenges. The two
technologies can be coupled together for more efficient implementation and can
help all applications by making smarter decisions [27].
ML is a modern science which enables computers to work autonomously. No
explicit programming is needed in ML. The modern-day technology deploys
algorithms that can train and improve on the data that is being fed to them. Over
the years, ML has matured and made possible the concept of self-driving autono-
mous cars. ML is also a technology enabler for more effective web searches, spam
free emails, practical speech recognition software, personalized marketing and so on.
Today, ML is increasingly being deployed in credit card purchase fraud detection,
personalized advertising through pattern recognition, personalized shopping/enter-
tainment recommendations, to determine cab arrival times, pick-up locations, and
finding routes on maps [28]. The five recent technological trends are (a) the creation
of more jobs in data science, (b) new approaches to data security, (c) robotic process
automation (industry 4.0), (d) improved IT operations and (e) transparency in
decision making [28].
It is interesting to note that the essence of ML and AI lies in ANNs. DNNs
enhance the learning capability of ANNs by adding more hidden layers to the
network. CNNs are a class of deep, feed-forward (not recurrent) artificial neural
networks that are applied to analyze videos. CNNs are usually composed of a set of
layers that can be grouped by their functionalities. CNNs consist of feature learning
layers and classification layers apart from the input and output layers.
With the advent of the ANN, ML has taken a giant leap in recent times. ANNs
are biologically inspired computational models, and are capable of exceeding the
performance of previous forms of AI to a larger extent. The CNN is one of the most
impressive forms of ANN architecture. CNNs are primarily used to solve difficult
image-driven pattern recognition tasks with their precise yet simple architecture. A
simplified method of getting started with ANNs is provided by CNNs [29]. Note that
15-22
Modern Optimization Methods for Science, Engineering and Technology
first generation ANNs were shallow, in the sense that apart from the input and
output layers they contained at most one hidden layer. In contrast, DNNs contain
several hidden layers.
CNNs differ from other forms of ANNs in that instead of focusing on the entirety
of the problem domain, knowledge about the specific type of input is exploited. This in
turn allows for a much simpler network architecture to be set up [29].
References
[1] Banks J 2018 The human touch—practical and ethical implications of putting AI and
robotics to work for patients IEEE Pulse 9 15–8
[2] Fujita M 2019 AI x robotics: technology challenges and opportunities in sensors, actuators,
and integrated circuits Proc. of the 2019 IEEE Inter. Solid-State Circ. Conf. (ISSCC 2019)
pp 276–7
[3] Lars Kunze et al 2018 Artificial intelligence for long-term robot autonomy: a survey IEEE
Robotics and Automation Letters 3 4023–30
[4] Wogu I A P et al 2017 Artificial intelligence, alienation and ontological problems of other
minds: a critical investigation into the future of man and machines Proc. of the 2017 Int.
Conf. on Computing, Networking and Informatics (ICCNI)
[5] Davey T 2017 Artificial intelligence and the future of work: an interview with Moshe Vardi
Future of Life https://futureoflife.org/2017/06/14/artificial-intelligence-and-the-future-of-
work-an-interview-with-moshe-vardi/
[6] Hawking S, Tegmark M, Russell S and Wilczek F 2014 Transcending complacency on super-
intelligent machines Huffington Post http://huffingtonpost.com/stephen-hawking/artificial-
intelligence_b_5174265.html
[7] Lorencik D and Sincak P 2013 Cloud robotics: current trends and possible use as a service
Proc. of the IEEE 11th Int. Symp. on Applied Machine Intelligence and Informatics (SAMI
2013) pp 85–8
[8] Guizzo E 2011 Robots with their heads in the clouds IEEE Spectrum http://spectrum.ieee.
org/robotics/humanoids/robots-with-their-heads-in-the-clouds
[9] Hu G, Tay W P and Wen Y 2012 Cloud robotics: architecture, challenges and applications
IEEE Network 26 21–8
[10] RoboEarth Project http://roboearth.org/
[11] The Robotic Operating System (ROS) http://ros.org/wiki/
[12] Arumugam R et al 2010 DAvinCi: a cloud computing framework for service robots Proc. of
the 2010 IEEE Int. Conf. on Robotics and Automation (ICRA) (3–7 May) pp 3084–9
[13] Apache Hadoop http://hadoop.apache.org/
[14] O’Leary D E 2017 Emerging white-collar robotics: the case of Watson analytics IEEE Intell.
Syst. 32 63–7
[15] IBM 2014 Introduction to IBM Watson: Analytics, Data Loading and Data Quality, IBM
Document Version 2.0, December 16, 2014 https://docplayer.net/10919185-Introduction-to-
ibm-watson-analytics-data-loading-and-data-quality.html
[16] Yukitake T 2017 Innovative solutions toward future society with AI, robotics, and IoT Proc.
of 2017 Symp. on VLSI Circuits pp C16–9
[17] Ahmed S H, Kim G and Kim D 2013 Cyber physical system: architecture, applications and
research challenges Proc. of the IFIP Wireless Days Conf. (WD’13) pp 1–5
15-23
Modern Optimization Methods for Science, Engineering and Technology
[18] Silva A R and Vuran M C 2010 (CPS)2: integration of center pivot systems with wireless
underground sensor networks for autonomous precision agriculture Proc. of the 1st ACM/
IEEE Int. Conf. on Cyber-Physical Systems pp 79–88
[19] Jie Lin et al 2017 A survey on internet of things: architecture, enabling technologies, security
and privacy, and applications IEEE Internet of Things J. 4 1125–42
[20] NIST 2016 NIST & The Smart Grid Accessed: 12 April 2019 https://nist.gov/engineering-
laboratory/smart-grid/about-smart-grid/nist-and-smart-grid
[21] Nayyar A, Batth R S and Nagpal A 2018 Internet of robotic things: driving intelligent
robotics of future—concept, architecture, applications and technologies Proc. of the 2018 4th
Int. Conf. on Computing Sciences pp 151–60
[22] Adebayo A O, Chaubey M S and Numbu L P 2019 Industry 4.0: the fourth industrial
revolution and how it relates to the application of Internet of things (IoT) J.
Multidiscip. Eng. Sci. Stud. 5 2477–82
[23] Jiafu Wan et al 2016 Cloud robotics: current status and open issues IEEE Access 4 2797–807
[24] Ben Kehoe et al 2015 A survey of research on cloud robotics and automation IEEE Trans.
Autom. Sci. Eng. 12 398–409
[25] Mischa Dohler et al 2017 Internet of skills, where robotics meets AI, 5G and the tactile Internet
Proc. of the 2017 European Conf. on Networks and Communications (EuCNC) pp 1–5
[26] Dialani P 2019 AIOPS: the integration of AI and IoT https://analyticsinsight.net/aiops-the-
integration-of-ai-and-iot/
[27] Sarkar S 2017 How to build IoT solutions with big data analytics https://analyticsinsight.net/
how-to-build-iot-solutions-with-big-data-analytics/
[28] Some K 2018 Top 5 machine learning trends of 2018 https://analyticsinsight.net/top-5-
machine-learning-trends-of-2018/
[29] O’Shea K and Nash R 2015 An introduction to convolutional Neural Netw. 2 1–10
15-24
IOP Publishing
Chapter 16
Efficacy of genetic algorithms for
computationally intractable problems
Ajay Kulkarni and Sachin Puntambekar
A genetic algorithm is a metaheuristic method that has been proved highly efficient
for searching for optimal solutions for problems belonging to the category of NP-hard
and which are often algorithmically solvable but computationally intractable.
A genetic algorithm is a probabilistic heuristic search technique motivated by the
principle of natural genetic systems. It aims to locate the global optimal solution for
a problem from the given solution space. Due to its population based approach, the
requirement for a fitness function rather than its derivatives and the probabilistic
nature of the operators, genetic algorithms possess the capability of exploring the
search space efficiently and efficaciously and are therefore applied to search for
optimal or near-optimal solutions of various optimization problems.
A genetic algorithm can be viewed as an abstract version of the evolutionary
process which operates on a population of artificial chromosomes. Each chromo-
some represents an encoded version of a candidate solution and is associated with a
fitness value which reflects the eminence of that chromosome as the solution to the
problem. Binary coding is a commonly used technique for encoding the solutions—
in this strategy the solution appears as a bit string which facilitates the subsequent
operations. These candidate solutions are further subjected to operations such as
crossover and mutations to search an offspring population that is expected to be
better than the parent population.
Both crossover and mutation are nondeterministic in nature and are applied with
certain probabilities. The crossover operator is a mechanism for exchanging
information between chromosomes; this operator allows two parent chromosomes
to exchange their genetic characteristics in order to generate two offspring. Single-
point crossover, two-point crossover and uniform crossover are commonly used
exchange mechanisms for binary coded genetic algorithms. The mutation operator is
16.1 Introduction
A genetic algorithm is a heuristic optimization search technique inspired by the
principles of natural genetic systems and aims to seek a global optimum solution for
a problem from the set of candidate solutions in an iterative manner. Due to its
population based approach, the requirement for a fitness function rather than its
derivatives and the probabilistic nature of operators, genetic algorithms possess the
potential to explore the search space efficiently and efficaciously for optimal or near-
optimal solution of the problem under consideration. Genetic algorithms and their
variants have been successfully applied in engineering fields to search for the
solution for optimization problems with significant complexities. The capability of
a genetic algorithm to solve the optimization problems of the nature of NP-hard and
NP-complete problems has been reflected in several research findings cited in the
literature. Genetic algorithms can be viewed as a technique to search for an optimal
solution of a problem by performing repeated iterations of a solution. To quantify
the process of iteration and to implement the concept of natural selection, genetic
algorithms rely on an objective function or fitness function. The fitness function
provides a measure to determine the candidate solution’s relative fitness and is used
by the genetic algorithm to evolve better solutions in subsequent iterations. Another
aspect of genetic algorithms is the concept of population, instead of operating on a
unique solution the genetic algorithm operates simultaneously on a population of
candidate solutions, known as individuals. This implicit parallelism involved in the
genetic algorithm makes it a global search rather than confining it to a local area of
the search space. However, the size of the population is an issue of concern, as small
16-2
Modern Optimization Methods for Science, Engineering and Technology
population size may lead to premature convergence while large size may result in
excessive computational time.
Also, to imitate the natural selection procedure, the genetic algorithm operates on
the encoded versions of the parameters to be optimized and not on the parameters
themselves. Parameter encoding transforms the actual optimization problem into
combinatorial optimization as the genetic algorithm is essentially a combinatorial
search technique.
A genetic algorithm maintains a population of solutions, known as chromosomes
or individuals, over the search space which is iteratively modified so as to drive this
population towards an optimal or near-optimal solution. Starting with some ran-
domly or heuristically selected initial population, the genetic algorithm generates a
new population at each iteration using the following steps: (i) calculation of a fitness
function value for each chromosome constituting the old (existing) population—this
value reflects the potency of each solution and can be considered as a meaningful
measure to analyze the claim of the algorithm to approach optima in successive
iterations; (ii) using the fitness value as selection measure the individuals are selected
for subsequent procedures of recombination and selection; (iii) selected individuals
(parents) are thereafter subjected to a genetic operation called crossover to generate
probably better new solutions (offspring); (iv) the newly generated solutions are
further subjected to mutation, this operator aims to preserve the genetic diversity in the
vicinity of the candidate solution; and (v) a new population is generated to replace
the existing one. Iterations are carried out until some terminating criterion is met [1].
Recombination and selection operations in genetic algorithms are nondetermin-
istic and are governed by some probabilistic rules instead of some deterministic
procedure; this nondeterministic nature facilitates the objective of preserving the
global explorative properties of the search. An important attribute of genetic
algorithms is that they require only a mathematical function acting as an objective
function for the considered problem and have no dependence on other character-
istics of the fitness function, such as the existence of its derivatives or differ-
entiability. This attribute allows the applicability of this algorithm even for problems
with non-smooth functions. Procedural details of the genetic algorithm are explained
by the flowchart shown in figure 16.1.
16-3
Modern Optimization Methods for Science, Engineering and Technology
Start
Initialize Population;
Evaluate Fitness
Perform Selection
Termination
condition satisfied
No
Yes
Optimal Solution
End
16-4
Modern Optimization Methods for Science, Engineering and Technology
β
xiu − xil
xi = xil + ∑ γj 2 j , (16.1)
2β − 1 j = 0
where xil and xiu represent the lower and upper bounds of xi ,
respectively, β represents the length of the binary representation of xi ,
β
whereas ∑ j = 0 γj 2 j is the decoded value of the binary substring. The
accuracy obtained with substrings of length β provides accuracy of the
order of 1/2β of the search space, so arbitrary precision can be achieved
by using strings of appropriate length. Binary representation is a
commonly used encoding scheme in classical genetic algorithms and
is normally supported by two arguments. First, binary alphabets
maximize the level of implicit parallelism. Second, the encoding scheme
associated with a high cardinal alphabet requires a larger population
size for the effective exploration of the search space which often reduces
the computational efficiency of the algorithms. However, the binary
representation has not been found to be appropriate when dealing with
a continuous search space with large dimensions and when a numerical
precision of considerable degree is required, as it handles continuous
problems as discrete ones. Application of binary coding for the problems
of a continuous domain requires decoding and repair algorithms to
16-5
Modern Optimization Methods for Science, Engineering and Technology
transform the solution searched for by the genetic algorithm into a viable
solution of the original problem. Designing of decoding and repair
algorithms often complicates the implementation of genetic algorithms.
Real coding:
• The real coding scheme has been found to be appropriate for opti-
mization problems of continuous nature; it uses a floating point
representation for representing various parameters to be optimized.
As a result, each chromosomal string appears as a vector of floating
point numbers. The precision in this scheme is often decided by the
nature of the problem to be solved. A common approach in this scheme
for constructing chromosomes is to represent each variable to be
optimized by a gene and to keep the length of the chromosomal string
the same as that of the solution vector. Also, the value of a gene
representing a particular variable is restricted to the interval defined for
that variable and the genetic operators forced to preserve this require-
ment. Float point representation of parameters is capable of represent-
ing the large domains, whereas in binary implementations, an increase
in domain size results in a loss of precision with a fixed length of the
chromosomes. Real coding also offers the feature of local tuning of the
solution by exploiting the graduality of the functions with continuous
variables, which is often difficult in the case of binary coding due to the
problem of the Hamming cliff. In real coding, as the genotype and
phenotype are identical, there is no requirement of coding and decoding
and so the speed of the algorithm increases.
16-6
Modern Optimization Methods for Science, Engineering and Technology
to seed some high quality solutions into the initial population. However, this
inclusion induces the possibility of premature convergence.
4. Evaluation/fitness: As genetic algorithms try to impersonate the survival of
the fittest principle of nature to formulate a search process, it is necessary to
evaluate the fitness of a potential solution relative to others. This evaluation
is carried out using the fitness function. The fitness function is a mathemat-
ical evaluation function that computes and reflects the superiority of the
chromosome as a solution to the problem under consideration. The fitness
function allocates reproductive traits to an individual and acts as a measure
to be maximized in subsequent iterations. It implies that individuals with a
higher fitness function value usually have a better chance of participating in
subsequent stages of the algorithm. The algorithm is structured such that it
aims to increase the average population fitness in an iterative manner.
5. Selection: In a genetic algorithm, the fitness function value is used as an
assessor of the quality of the solution represented by a chromosome and the
average fitness is considered as a qualitative measure of the population
comprising a stipulated number of chromosomes. The selection mechanism
of a genetic algorithm utilizes the fitness as a guide to select individuals to
form a mating pool so as to participate in the process of evolution. Due to
fitness based criteria, the chromosomes with better fitness have a relatively
higher probability of being selected for subsequent operations than others.
As the selection mechanism is normally carried out with replacement, fit
chromosomes have a chance of being selected more than once, thus fitter
chromosomes participate more frequently in the mating process and posess a
relatively higher probability of surviving in succeeding iterations. During the
selection process key factors that are required to be balanced are selection
pressure and genetic diversity. Selection pressure describes the tendency to
select only the best individuals of the current population to participate in
subsequent steps; selection pressure controls the rate of convergence of the
genetic algorithm towards the optimum. Genetic diversity refers to the
maintenance of diversity in the solution population and is required to ensure
the effective exploration of solution space, which is often necessary during
the earlier stages of the optimization process. Very high selection pressure
results in a loss of genetic diversity due to which the genetic algorithm is
likely to undergo premature convergence with some local optimum. With too
low a selection pressure, the genetic algorithm may not converge to an
optimum solution in adequate computational time. In order to ensure the
convergence of a genetic algorithm to a global optimum solution in
reasonable time, an appropriate balance between the selective pressure and
genetic diversity is required to be maintained by the selection mechanism.
There exist several selection mechanisms in the genetic algorithm literature,
such as roulette wheel selection, stochastic reminder sampling, stochastic
universal sampling, linear rank selection, exponential rank selection, tourna-
ment selection and truncation selection [1]. Roulette wheel (fitness pro-
portional) selection is the conventional selection method in which each
16-7
Modern Optimization Methods for Science, Engineering and Technology
candidate is assigned a slot on the roulette wheel with size proportional to its
fitness, thus the candidates with higher fitness have a larger slot size than the
less fit individuals. The roulette wheel is spun an appropriate number of
times, each time selecting a candidate pointed at by the wheel pointer. As per
the scheme suggested by Goldberg [1], the roulette wheel selection mecha-
nism to select n individuals from a population size of n can be implemented
through the following steps:
i. Calculate the fitness value, fi , for all the candidate solutions constitut-
ing the population.
ii. Calculate the probability (slot size), pi of selection for each solution
i
pi = fi /f , where f = ∑f j .
j=1
i
iii. Calculate the cumulative probability, qi = ∑pj .
j=1
iv. Generate a random number, r ∈ (0, 1].
v. Select the individual s1 if r < q1, otherwise select si if qi−1 < r ⩽ qi.
vi. Repeat steps iv and v n times to select n candidates in the mating pool.
16-8
Modern Optimization Methods for Science, Engineering and Technology
16-9
Modern Optimization Methods for Science, Engineering and Technology
In two-point crossover, two crossover points are selected along the length of
the chromosomes and the characteristics of the two parents are exchanged
between these two points to generate two offspring.
16-10
Modern Optimization Methods for Science, Engineering and Technology
according to the ones and zeros in the mask—if at a particular locus the mask
bit is 1, then the allele is copied from parent one, and if the mask bit is 0 then
the allele is copied from parent two. The second offspring is generated by
reversing the process.
16-11
Modern Optimization Methods for Science, Engineering and Technology
and is free from the issues of diversity and selection pressure; the drawback
associated with this technique is that it contains the risk of discarding the
good solutions from the existing population. An improved version of the
complete replacement scheme is replacement with elitism; this method
emphasizes preserving one or two of the best individuals from the existing
population and replacing the others. Elitism speeds up the performance of
the genetic algorithm but exerts a high selection pressure due to the
deterministic selection of relatively fit candidates. Another scheme for the
replacement is steady state or incremental replacement, which introduces
the concept of population overlapping, in this scheme only a fraction G
(known as the generation gap) of the existing population is replaced.
Selection of individuals from the existing population for replacement is an
issue of concern in steady state replacement. One technique is the replace-
ment of the worst individuals; however, it exerts a strong selection pressure
and often requires a large population size or high mutation rate to maintain
diversity. Another replacement scheme, which maintains the considerable
degree of diversity, is steady state with no duplicates. In this case, offspring
are not included in the population if they are mere duplicates of the existing
individuals. Other evolution strategy based replacement schemes are the
(μ, λ ) and (μ + λ ) schemes. In the (μ, λ ) scheme, μ( >λ ) individuals are
generated from the λ parents and the best λ of these newly generated
individuals are selected as the successor population. In the (μ + λ ) scheme,
μ offspring are generated and combined with λ parents to generate a set of
(μ + λ ) individuals; from this set the best λ individuals are selected to
constitute the successor population. This scheme also exerts a high selection
pressure and requires a high mutation probability to maintain divergence.
8. Termination: The processes of selection, recombination and replacement
are iterated until some terminating criterion is satisfied. Commonly used
termination criteria are either population convergence criteria where almost
all the solutions become identical or nearly identical. Another is the
improvement in best fitness score with iterations which emphasizes the
accuracy of the solution, and the third is a fixed number of iterations with
a reasonable compromise between population convergence, accuracy of the
solution and computation time.
16-12
Modern Optimization Methods for Science, Engineering and Technology
obtained by fixing the allele of specific chromosome loci, thus it can be viewed as a
typical pattern which could be observed in some chromosomes. A schema thus
represents a similarity pattern which describes a subset of individuals with similar
features in some positions. A schema is a string of {1, 0, *} where a typical pattern is
described by using {1, 0} and * is used at the chromosome loci kept unspecified by
the pattern. For example, a schema λ = 101 * *011 describes a subset of chromo-
somes with elements {10100011, 10101011, 10110011, 10111011}, i.e. the elements
of this subset belong to λ . For a schema, commonly defined terms are the order and
length of schema. The order of λ , o(λ ) is the number of defined bits, whereas length λ ,
lλ is the difference of the allele positions of the first and last defined bits. For
λ = 101 * *011, o(λ ) = 6 and lλ = 7.
The schema theorem provides an explanation of how the schemata featured in
relatively fit chromosomes possess greater prospects of propagating through
successive populations with the evolution of the genetic algorithm. The theorem
can be formally stated as follows.
Schema theorem. Let λ represent a schema for binary coded chromosomes of
length L and let m(λ , t ) be the number of chromosomes in the current population (t )
which belongs to λ . Then the expected number of chromosomes of λ in the next
generation (t + 1) is given by the formula
f (λ ) ⎛ l ⎞
m(λ , t + 1) ⩾ m(λ , t ) ⎜1 − pc λ ⎟(1 − pm )o(λ ) , (16.5)
f ⎝ L − 1⎠
where f (λ ) is the average fitness of chromosomes of λ and f is the average fitness of
f (λ )
the population. Here, the term m(λ , t ) f represents the expected number of
chromosomes of λ after the selection process (fitness proportionate selection). It
implies that schema with relative fitness above one will continue to increase, whereas
with relative fitness less than one they will decrease; this is due to the selective
l
pressure in fitness proportionate selection. Term (1 − pc L −λ 1 ) represents the prob-
ability of survival of chromosomes of λ after the crossover process, this term is high
l
when lλ is low and when lλ approaches L − 1, (1 − pc L −λ 1 ) approaches (1 − pc ), which
indicates that the disruption of the chromosomes is almost certainly due to cross-
over. Term (1 − pm )o(λ) represents the probability of survival of chromosomes of λ
after the mutation process, and is high when o(λ ) is low.
It indicates that schemata with short length, low order and above average fitness
receive increasing trials in successive iterations. These schemata are known as
building blocks and can be viewed as the partial solutions which provide a higher
fitness value to the chromosomes which contain these building blocks. It means that
a chromosome composed of such building blocks will be a near-optimal solution.
The consequence of this theoretical framework is the building block hypothesis,
which states that the genetic algorithm can approach the near-optimal solution
through appropriate selection and combination of building blocks. The notion of
building blocks is particularly useful when a heuristic insight of the search problem is
available. During the process of encoding, the desired factors could be encoded in
16-13
Modern Optimization Methods for Science, Engineering and Technology
neighboring loci, thereby having low order schema. Crossover sites could also be
adjusted heuristically such that chromosome disruption occurs only at locations
which are worth it for that problem. The procedure thus results in increasing the
probability that the defined process will productively juxtapose building blocks and
rapidly approach an optimal solution [5].
16-14
Modern Optimization Methods for Science, Engineering and Technology
genetic algorithms; however, the nature of the search performed by the crossover
operator is mainly determined by the diversity of the population. For random and
diverse populations, normally at the commencement of the search, crossover
performs a widespread search and explores the search space, whereas for similar
populations, normally after several iterations when high fitness solutions develop, it
performs the search in the neighborhood of existing solutions. The probability of the
crossover operator also affects the exploration and exploitation—high crossover
probability results in the quick introduction of new solutions in the search space, but
there is a chance of missing good solutions and of failing to exploit existing
solutions. With low crossover probability, most of the existing solutions are
preserved and a considerable search area may be left unexplored. Thus the crossover
and mutation probabilities can be used as controlling parameters to achieve balance
between exploration and exploitation. Population diversity is also considered as a
measure to achieve balance between exploration and exploitation. High genetic
diversity indicates that the genetic algorithm was in the phase of exploration, whereas
low genetic diversity indicates that the algorithm was in the phase of exploitation.
16-15
Modern Optimization Methods for Science, Engineering and Technology
16-16
Modern Optimization Methods for Science, Engineering and Technology
function returns a value of 0, implying that the individuals are in different niches. A
commonly used sharing function is
⎧ d
⎪1 − ij ; dij < ρ
Sh(dij ) = ⎨ ρ , (16.6)
⎪
⎩ 0; otherwise
where dij is the distance between the ith and jth string in phenotype space and ρ is the
niche radius. For every string, a sharing function is calculated for other strings and
these sharing function values are added to obtain a niche count mi = ∑j dij .
Thereafter the shared fitness value of an individual is computed as fi* = fi /mi and
is used in the selection mechanism instead of fi . Now, if there exist lesser strings from
any optimal solution then these strings will have a lower niche count and higher
shared fitness than the strings corresponding to other optima. As the selection
operator will now emphasize these strings, the number of strings form this optima
will increase in the subsequent iterations. Thus this method allows the simultaneous
existence of multiple optima in the population. One commonly used replacement in
this case is the overlapping population scheme where new offspring are first added to
the current population and shared fitness values are computed for all the candidates
and individuals (equal to the number of offspring) with the worst shared fitness
values then eliminated from the population and the shared fitness values are
recalculated for the next iteration [1, 4].
16-17
Modern Optimization Methods for Science, Engineering and Technology
References
[1] Goldberg D S 1989 Genetic Algorithms in Search, Optimization and Machine Learning
(Boston, MA: Addison-Wesley Longman)
[2] Reeves C R and Rowe J E 2002 Genetic Algorithms: Principles and Perspectives: A Guide to
GA Theory (Norwell, MA: Kluwer Academic)
[3] Jong K D 1988 Learning with genetic algorithms: an overview Mach. Learn. 3 121–38
[4] Deb K 1999 Introduction to genetic algorithms Sadhana 24 293–315
[5] McCall J 2005 Genetic algorithm for modeling and optimization J. Comput. Appl. Math. 184
205–22
[6] Tang K S, Man K F, Kwong S and He Q 1996 Genetic algorithms and their applications
IEEE Process Mag. 22–37
[7] Renders J M and Flasse S P 1996 Hybrid methods using genetic algorithms for global
optimization IEEE Trans. Syst. Man. Cyber. B 26 243–58
[8] Blanco A, Delgado M and Pegalajar M C 2001 A real coded genetic algorithm for training
recurrent neural networks Neural Netw. 14 93–105
[9] Alonge F, Dlppolito F and Raimondi F M 2003 System identification via optimized wavelet
based neural networks Proc. IEEE Contol Theory Appl. 150 147–54
[10] Sahoo D and Dulikravich G S 2006 Evolutionary wavelet neural network for large
scale estimation in optimization Proc Multidisciplinary Analysis and Optimization Conf.
(Portsmouth, VA) pp 1–11
[11] Awad M 2009 Optimization RBFNN parameters using genetic algorithms: applied on
function approximation Int. J. Comput. Sci. Secur. 4 295–307
[12] Shou-sheng L and Yong D 2010 An evolutionary wavelet network and its training method
Int. Conf. on Computer Application and System Modeling pp 379–83
[13] Aly A A 2011 PID parameters optimization using genetic algorithm technique for electro-
hydraulic servo control system Intell. Control Autom. 2 69–76
16-18
Modern Optimization Methods for Science, Engineering and Technology
[14] Vishwakarma D D 2012 Genetic algorithm based weight optimization of artificial neural
network Int. J. Adv. Res. Electri. Electron. Instrum. Eng. 1 206–11
[15] Awad M 2014 Using genetic algorithms to optimize wavelet neural networks parameters
for function approximation Int. J. Comput. Sci. Issues 11 256–67
[16] Kulkarni A and Kumar A 2015 Structurally optimized wavelet network based adaptive
control for a class of uncertain underactuated systems with actuator saturation Inter J. Hyb.
Intell. Sys. 12 171–84
[17] Kopel A and Yu X H 2008 Optimize neural network controller design using genetic
algorithm Proc. 7th World Congress on Intelligent Control and Automation (Chongqing,
China) pp 2012–6
[18] Chiroma H, Noor A S M, Abdulkareem S, Abubakar A I, Hermawan A, Qin H, Hamza
M F and Herawan T 2017 Neural networks optimization through genetic algorithm
searches: a review Appl. Math. Inf. Sci.11 1543–64
16-19
IOP Publishing
Chapter 17
A novel approach for QoS optimization in
4G cellular networks
Vandana Khare and G R Sinha
17-2
Modern Optimization Methods for Science, Engineering and Technology
Due to the above-mentioned issues, FDMA and TDMA can only support first
generation (1G), second generation (2G) and third generation (3G) networks.
OFDMA networks are the only solution for the transmission of RT multimedia
traffic in fourth generation (4G) cellular and mobile networks. In OFDMA networks,
a high data rate of transmission is possible for RT multimedia traffic as the data rate
depends on the chip rate and spreading factor. In the case of down-link transmission
the spreading factor is 4–512, which indicates that the network is able to support 512
users for one tower or base station.
However, every tower is accommodating only 200–225 users while the remaining
codes are left free for handoff. Maintaining the QoS and its parameters, i.e.
throughput, delay and power consumption, is thus a very important task.
A basic comparison of the FDMA, TDMA, WCDMA and OFDMA networks,
with respect to different parameters, is given in table 17.1, which depicts the data
rate, bandwidth as well as operating frequencies of the OFDMA network. It can be
observed that only power control is an important factor in maintaining the required
QoS in OFDMA networks.
17-3
Modern Optimization Methods for Science, Engineering and Technology
Table 17.1. Basic comparison of FDMA, TDMA, WCDMA and OFDMA networks.
Parameters 1G 2G 3G 4G
Inventions 1980 1993 2006 2012
Access system FDMA TDMA WCDMA OFDMA
Speed (data rates) 2.4 Kbps– 84 Kbps 144 Kbps to 2 Mbps 100 Mbps
23.4 Kbps
Bandwidth 20 kHz 25 MHz 25 MHz 100 MHz
Operating 800 MHz GSM: 900 MHz 1920–1980 MHz for 850 MHz for UL,
frequencies UL and 1800 MHz
2110–2170 MHz for DL
for DL
Limitations Limited capacity Slow speed High power Complicated
consumption hardware
Applications Voice calls SMS Video conferencing High speed
mobile TVGPS operations
17-4
Modern Optimization Methods for Science, Engineering and Technology
conditions and meet the requirement of strong and secure data transmission
and reception by mobile users.
• Desired quality multimedia communication: The compatibility problem with
1G and 2G as well as 3G networks is entirely overcome by 4G using
OFDMA networks. This can be credited to its high rate of data transmission
that is up to 10 Mbps for 3G and 100 Mbps for 4G mobile users.
• Cheaper communication cost: The Telephony Regulatory Authority of India
(TRAI) tirelessly analyzes cost related issues and maintains lower rates for
4G mobile communication users. Additionally, it provides multimedia
communication with global roaming to all mobile subscribers.
What is QoS?
In the case of cellular networks, high and efficient QoS implies that the network
service provider is providing satisfactory performance in terms of quality of voice, a
lower call blocking and call dropping rate, and high data transmission rate, in
particular for multimedia traffic transmission.
Why QoS?
Third generation networks, mainly WCDMA networks, require multimedia
traffic transmission. Every user in cellular networks transmits the same frequency,
because of which the coverage area as well as the capacity of the network decreases.
Hence, maximum network capacity is achieved only if all the users maintain the
minimum signal-to-inference ratio (SIR), i.e. higher throughput, and this is possible
only when QoS in the network is maintained. In maintaining the required
conditions, some important parameters are responsible.
17-5
Modern Optimization Methods for Science, Engineering and Technology
The earlier work was mostly based on the signal to interference ratio, because at that
time only voice transmission was required. When GSM emerged, i.e. when both
voice and message entered the 2G network, power allocation came into the picture.
Currently, owing to multimedia transmission, throughput based resource allocation
has become an indispensable part of every application in 4G networks, which
requires a considerably high speed and consequently, a higher data rate.
17-6
Modern Optimization Methods for Science, Engineering and Technology
All these parameters are used in CAC based on requirements. Hence, interference
and power measurement is required for comparison of signal strength, whereas
throughput measurement is required for comparison of data rate or speed.
Fading can be defined as the reduction in signal strength. This is easily resolved
by increasing the transmitted power and reducing the BER, which can be defined as
the ratio of the number of bit errors to the total number of bits transmitted. It should
be kept low in all conditions.
If we increase the transmitted power, interference is also increased, suggesting
that fast and perfect power control is required. Power control can be categorized into
two types:
• Open loop power control.
• Closed loop power control.
17-7
Modern Optimization Methods for Science, Engineering and Technology
Open loop power control is based on the mobile unit and no feedback is required
from the base station. This type of power control is fast but not accurate.
Close loop power control depends on the feedback. The base station continuously
takes feedback from the mobile unit and adjusts the signal strength. This technique is
slow but accurate to a great extent. Close loop power control is again classified into
two types:
• Inner loop power control.
• Outer loop power control.
In the case of inner loop power control, the base station checks the signal level of
a particular user by taking feedback from the mobile unit and adjusts it according to
the QoS requirement.
In the case of outer loop power control, the received power is controlled by
changing the target value, which is pre-decided, based on the QoS requirement.
The need for power control
Power control is very important in OFDMA networks to manage the following:
• Interference management.
• Connectivity management.
• Energy management.
In the case of interference management for down-link operation, the base station
continuously takes feedback and adjusts the signal strength by controlling interfer-
ence in the system. As a result, the near–far problem is eliminated.
In the case of connectivity management, when the mobile user leaves a particular
area and enters another area, the base station in that area should be able to provide
sufficient signal strength for that particular user. Hence, perfect connectivity
management is possible and call dropping is reduced. Also, the handoff problem
is perfectly resolved.
In the case of energy management, the mobile battery reduces power consump-
tion and hence improves battery life. This is possible only through fast and accurate
power control.
17.2.4.4 Scheduling
Scheduling plays a very important role in OFDMA networks. It monitors the traffic
and minimizes delay at the receiving end. It is basically designed for RT or non-RT
traffic transmission as well as reception.
In the case of the OFDMA network, two types of scheduling are mainly used:
• Fixed scheduling.
• Adaptive scheduling.
Fixed scheduling suggests that once the packets are scheduled, they will be
transmitted with constant time and so the delay will be greater.
Adaptive scheduling is based on adaptability, suggesting that network conditions
will change according to the situation. In the case of OFDMA networks, it is based
17-8
Modern Optimization Methods for Science, Engineering and Technology
All three types of resources can be used but the use of channel codes and power is
very popular. The use of rate as the main resource will be more beneficial for
multimedia traffic transmission because speed management is very important for RT
traffic in 4G networks, such as video telephony, live telecast and online games. Such
types of applications demand minimum delay (in milliseconds). So, if adaptive
scheduling is based on rate, then delay will surely be reduced and network
throughput will also be maintained according to given QoS requirements.
RT traffic is based on a variable bit rate (VBR) and is mostly applicable for time
sensitive applications such as video telephony, audio telephony, mobile TV, online
games and videos. NRT VBR traffic is useful for the booking of air tickets and bank
transactions. All types of applications come under multimedia RT traffic trans-
mission. NRT traffic is based on a constant bit rate (CBR) and is typically used for
NRT applications such as SMS, Internet, voice, etc.
RT traffic mainly suffers from delays because of the fast development of mobile
communications. Improvement in QoS is therefore an important task in such
networks. NRT traffic generally suffers from packet loss constraints and QoS
improvement is not an issue in these types of networks.
17-9
Modern Optimization Methods for Science, Engineering and Technology
17-10
Modern Optimization Methods for Science, Engineering and Technology
Notation Meaning
Resource allocation with call admission control and the adaptive rate scheduling
scheme (RACAC-ARS), are employed. The adaptive rate scheduling scheme is
associated with a heuristic approach based on feedback control unit (FCU) logic.
Under this approach, during the entry of the session into the network, a rate is
assigned and adjusted based on the feedback obtained from the users already
admitted into the network.
A heuristic-based FCU is present in the network, which records all the
information related to the rate of every user. On the basis of the rate of the
operating users, the rate of the incoming users for respective sessions is adjusted.
Important notations used in the analysis of adaptive rate scheduling are shown in
table 17.3.
The process of adaptive rate scheduling is as followed:
1. During the entry of the session, the rate Rallocate for the incoming user is
allocated. After that, the time slot duration T and queue size Q of session i are
assigned and the arrival rate Rest-arrival is estimated. When the user arrives at the
base station (BS), the rate is adjusted based on the feedback (FB) obtained from
the already admitted session. At the same time, FCU determines the delay for
threshold level Dthreshold. It also estimates the average scheduling delay Davg(i)
from the first user and checks the required conditions. If Davg is less than or
equal to Dthreshold, FCU pre-empts all the users.
2. If Davg is more than Dthreshold, error in arrival rate estimation is indicated.
Later, the rate Rallocate is assigned and the arrival rate for the user is
estimated. Finally, the estimated rate is sent as feedback to the BS, based
on which the BS adjusts the rate for the incoming user.
17-11
Modern Optimization Methods for Science, Engineering and Technology
Start
Estimate Davg
Yes
If Davg> Dthreshold
No
End
Step 3. Estimate the arrival rate Rest-arrival when the user arrives.
Step 4. BS adjusts the rate based on FB obtained from the already admitted user.
Step 5. FCU determines the delay for threshold level Dthreshold.
Step 6. Estimate the average scheduling delay Davg(i) from the first user and check
the following conditions:
If Davg ⩽ Dthreshold, FCU pre-empts all the users, otherwise if
Davg > Dthreshold indicates error in arrival rate estimation then:
• Assign rate Rallocate and estimate the arrival rate for the user.
• Send the estimated rate as FB to the BS.
• Adjust the BS rate for the incoming user.
• END
17-12
Modern Optimization Methods for Science, Engineering and Technology
17.5 Conclusions
Currently, the 4G network is very popular and it is thought to be the next generation
of mobile computing. Its use and advantages distinguish it from all other peer
technologies. In order to keep its services available all the time, there is a need to
improve its QoS with various parameters.
In this chapter, the developed scheme for RACAC improves network reliability
by attempting to reduce the number of sessions that are blocked due to inadequate
resources. The main concept of resource allocation in the scheme is used to increase
the data rate during transmission according to user requirements, and simulta-
neously improve throughput for real-time multimedia traffic using variable bit rate
traffic for transmission. The resource allocation with CAC scheme for next
generation networks maximizes sessions for users while increasing the data trans-
mission rate. It also ensures that a new session does not violate the QoS of ongoing
sessions with the help of an ARS scheme. In an ARS scheme, a heuristic-based
approach finds the solution close to the true one. The overall process depends on
FCU logic. Thus, our proposed scheme utilizes resources efficiently and maximizes
the spectral efficiency of the network. According to analysis of the results, it is
observed that our proposed scheme improves throughput and reduces delay for 4G
cellular networks.
References
[1] Jagiasi V and Giri N 2017 Resource allocation strategies for cellular networks Inter. J. Eng.
Res. Techn. 5 130–6
[2] Jain M and Mittal R 2016 Adaptive call admission control and resource allocation in multi-
server wireless cellular networks J. Ind. Eng. 12 71–80
[3] Memis M O and Ercetin O 2015 Resource allocation for statistical QoS guarantees in MIMO
cellular networks EURASIP J. Wirel. Commun. Netw. 2015 217
[4] Nantha Kumar G and Arokiasamy A 2014 An efficient combined call admission control and
scheduling algorithm to improve quality of service for multimedia services over next
generation wireless networks Inter. J. Comput. Appl 107 416–25
[5] Liu J, Chen W, Zhang Y J and Cao Z 2013 A utility maximization framework for fair
and efficient multicasting in multicarrier wireless cellular network IEEE Trans. Netw. 21
110–20
17-13
Modern Optimization Methods for Science, Engineering and Technology
[6] Kadhem K and Khan Y 2012 Evaluation and comparison of resource allocation strategies for
new streaming services in wireless cellular networks IEEE Trans. Commun. 12 123–35
[7] Jraifi Abdelouahed A 2012 Theoretical prediction of users in WCDMA interference J. Theor.
Appl. Inf. Techn 35 163–8
[8] Peifang Z and Jordan S 2008 Cross layer dynamic resource allocation with targeted
throughput for WCDMA data IEEE Trans. Wireless Commun. 7 4896–906
17-14