Optimization_methodology_based_on_neural
Optimization_methodology_based_on_neural
a r t i c l e i n f o a b s t r a c t
Article history: The determination of the optimal neural network topology is an important aspect when using neural
Received 13 November 2011 models. Due to the lack of consistent rules, this is a difficult problem, which is solved in this paper
Received in revised form 22 May 2012 using an evolutionary algorithm namely Differential Evolution. An improved, simple, and flexible self-
Accepted 1 August 2012
adaptive variant of Differential Evolution algorithm is proposed and tested. The algorithm included two
Available online 15 August 2012
initialization strategies (normal distribution and normal distribution combined with the opposition based
principle) and a modified mutation principle. Because the methodology contains new elements, a specific
Keywords:
name has been assigned, SADE-NN-1. In order to determine the most influential inputs of the models, a
Differential evolution
Neural network topology
sensitivity analysis was applied. The case study considered in this work refer to the oxygen mass transfer
Optimization coefficient in stirred bioreactors in the presence of n-dodecane as oxygen vector. The oxygen transfer
Sensitivity analysis in the fermentation broths has a significant influence on the growth of cultivated microorganism, the
Fermentation process accurate modeling of this process being an important problem that has to be solved in order to optimize
the aerobic fermentation process.
The neural networks predicted the mass transfer coefficients with high accuracy, which indicates that
the proposed methodology had a good performance. The same methodology, with a few modifications,
and with the best neural network models, was used for determining the optimal conditions for which
the mass transfer coefficient is maximized.
A short review of the differential evolution methodology is realized in the first part of this article,
presenting the main characteristics and variants, with advantages and disadvantages, and fitting in the
modifications proposed within the existing directions of research.
© 2012 Elsevier B.V. All rights reserved.
1568-4946/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.asoc.2012.08.004
E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238 223
each hidden layer needed to be previously set in order to determine 2.1.1. Initialization
the maximum length of each individual in the DE methodology. For The initialization represents the first step of the algorithm.
the process optimization this is not necessary because the neural It is an important aspect because it can influence the outcome,
model was already optimized using the data describing the process. improper initialization leading to high errors. For example, if all the
In addition, no dataset is used, the maximum length of the individ- individuals in the population are initialized with the replicas of a
uals was very easy to compute, and the fitness function was set as single vector, uniform crossover and differential mutation will gen-
being the output of the neural model. erate more replicas [2]. As a result, the population does not evolve
A short review is included in this paper to present the DE algo- and the fitness of the best individuals remains constant.
rithm as a promising alternative tool for process optimization or The initialization of the population with constraint character-
for neural network model optimization. The review emphasizes istics cannot be performed until the entire boundaries are known.
some characteristics of DE, advantages and disadvantages, its evo- For the most real-world problems, the existence of natural phys-
lution, variants and applications. The improved variant developed ical limit or logical constraints imposes different values for each
and applied in our work is presented comparatively with the exist- parameter and their initialization is a straightforward process.
ing variants, being better framed in the actual state of the field. For parameters with no obvious limits, Price et al. [2] consider
The motivation of adding this review to the current work is related that the bounds must be set in order to encompass the optimum.
to the efficiency and effectiveness of this algorithm for predic- Because not always the optimum values are known, for the uncon-
tion and classification problems approached from the chemical strained parameters, after the initialization, the boundaries must
engineering field, for both model (neural network) and process be ignored so that DE can explore the search space beyond the
optimization. The case studies of this article are examples which bounding box.
sustain the above consideration, along with other works belonging
to our group [20,22,23]. 2.1.2. Mutation
The paper is organized as follows: Section 2 presents the general In the context of EAs, the mutation is seen as a change with a
principles of DE and a series of theoretical aspect regarding self- random element, but in the DE case, a new individual is created by
adaptation. In Section 3, the mechanism of combining DE algorithm adding a scaled differential term to a base vector (individual):
with neural networks is detailed. Section 4 contains a description
of the proposed methodology, along with its motivation. In Sec- ωi = ˛ + F · ˇ (2)
tion 5, the sensitivity analysis procedure and their importance in
the modeling methodology is tackled. General information and the where ˛ is the base vector, ˇ is the differential term, and F is the scal-
database describing the process are presented in Section 6. The ing factor. The differential term is determined by the difference of
results obtained from the simulations are detailed in Section 7. The two distinct, randomly chosen vectors: ˇ = xk − xp . The base vector
last section concludes the paper. is also randomly chosen and, in order to achieve good convergence
speed and probability, Price et al. [2] indicate that all vectors used
in the mutation step must be distinct. This enables the formation
2. Differential evolution algorithm
of a geometric triangle in the search space where the three vectors
exists as vertices [25].
Since proposed by Storn and Price in 1995 [1], the DE algorithm
The role of the F parameter is to control the rate at which the
has undergone a series of transformations, literature presenting
population evolves. The predefined interval in which it can take
a multitude of variants, from simple modifications to hybrid ver-
values is (0,1) and, although no upper limit has been establish
sions. Although the list of modifications is large, the base principles
for this parameter, values greater than 1 are seldom effective and
are the same. The algorithm starts with a pool of potential solu-
require more computation time [2]. When F > 1 the differential term
tions X = {x1 ,x2 ,. . .,xNp }, where Np is the population number. As in
is scaled up and when F < 1 it is scaled down. When the scaling down
every evolutionary algorithm, the steps of mutation, crossover,
is performed, some points can be prevented from falling outside the
and recombination are performed until a stop criterion is reached.
optimum boundaries. When scaling up, the number of evaluation
Finally, the solution is chosen to be the individual with the best
functions has a tendency to get bigger [26].
fitness function.
In order to avoid premature convergence, F must be high enough
to counteract the selection pressure [2] and Zaharie [27] demon-
2.1. Steps and general principles of DE strated that when values fall below a limit Flow , the population could
converge even in the situation of absent selection pressure.
The general problem of an optimization algorithm is to find xi
so that f(xi ) is optimized, xi = {xi,1 ,xi,2 ,. . .,xi,D }, where xi,k is the kth
2.1.3. Crossover
characteristic of xi and D is the dimensionality of the function [24].
In this step, new individuals are created based on the current
In the DE algorithm, f(xi ) is considered to be the objective function
population and on the mutation vectors obtained in the previous
and xi is one of the individuals from population.
step. The population created is called trial population. While in case
The optimization performed with DE algorithm consists in locat-
of EAs the role of crossover is to combine features from different
ing the minima of the objective function by determining x* for
parents, in the DE algorithm the crossover allows the construction
which:
of offspring by mixing characteristics [28]. The level of construction
∀xi ∈ S : f (xi ) ≥ f (x∗ ) = f ∗ (1) performed by crossover cannot be achieved by mutation because it
perfectly creates, in a randomly manner, the diversity, but it cannot
where f is the objective function, f* is called global minimum and x* execute well the construction function [3].
is the optimized vector with parameters, called minimum location Generally, two variants of crossover are used in DE algorithm:
set. Knowing that max{f(S)} = −min{−f(S)}, the restriction to mini- binomial and exponential [2]. The binomial crossover is described
mization is without loss of generality because the fitness function by Eq. (3). The exponential crossover uses an initial start point (sp)
can be modified in order to accommodate the minimum [3]. and all the characteristics up to the sp point are copied into the
The sections below describe the steps of the algorithm (ini- new vector from the mutation vector. A random value between 0
tialization, mutation, crossover, selection), emphasizing particular and 1 is generated and, until it is bigger than Cr (crossover rate), the
characteristics of DE, in context of evolutionary algorithms (EA). characteristics from the current individual are copied to the trial
E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238 225
vector. After that, the remaining characteristics are taken from the that objective is met and, consequently, to call of the optimiza-
mutation vector. tion;
(b) population statistics—different information about the popula-
ωi,j if(rand(0, 1) < Cr) tion can be used to create a stop criterion. For example, an
ui,j = (3)
xi,j otherwise optimization can be halted when the difference between the
best and the worst objective function falls below a specified
The difference between the two types of crossover consists in limit;
the position of characteristics from distinct individuals. If in the case (c) limited time—there are cases when the computation time is a
of binomial type the components inherited from the mutant vector very important aspect and the optimization must stop, regard-
are arbitrarily selected, in the exponential type they are grouped less it found or not the optimal solution;
into one or two compact sequences [28]. The use of exponential (d) human monitoring—in cases of time consuming tasks, the
crossover over the binomial type increases the efficiency only for human can monitor the progress and, in response to the
a small part of existing problems [29], Qing [30] considering that perceived opportunities, the optimization can be altered;
crossover is not so important. (e) application specific—there are application that have their own
The control parameter Cr provides the means to exploit decom- termination criteria, and it is suitable to use them over some
posability and to provide extra diversity [2]. The interval in which general principles.
Cr can take values is (0, 1.0]. The optimal Cr depends on both the
problem and crossover type used. In the opinion of Davendra and In our work, the stop condition is represented by a combined
Onwublou [31], when using binomial scheme, the best results are criterion. The algorithm stops when the current generation or the
obtained with intermediate values. They also provide the inter- fitness function reaches the pre-established corresponding maxi-
val [0.8, 1.0] as containing the optimal values for Cr parameter. mum or minimum values.
Although these guidelines can be useful in some cases, choosing
suitable values is difficult for DE. The classical method for tuning
2.3. DE variants
the parameter values is the trial and error but, because it is prone
to errors and requires high computation time, in the most cases, it
DE is a powerful, effective and robust algorithm. This state-
is not suitable.
ment is supported by the multitude of works [2,15,36,37] in
which comparisons between DE and other algorithms indicated
2.1.4. Selection that, for various types of problems (such as unconstrained, multi-
In this step, the individuals from the current and trial popu- constraint nonlinear, multi-objective), DE is better not only in terms
lations compete to each other in order to survive to the next of solution performance but also in terms of speed. Due to these
generation. This type of selection is called one-to-one survivor properties, the DE algorithm is the first approach tested by vari-
selection [32], the best-so-far solution being always retained ous researchers when the problem is known to be difficult to solve.
because the population’s current best individual is replaced only For example, in [15], a classic, simple, and unimproved DE based
when a better one is found [2]. The comparison between the indi- methodology was compared with a GA method, when they were
viduals from the two populations determines in DE algorithm a applied for simultaneous topological and structural optimization
more tightly integration of recombination and selection than in of neural networks. The general performances of the two algo-
other EAs [2]. rithms were similar, with better results obtained by DE for a series
For constraint problems, a different approach regarding the of modeled parameters.
selection operator is proposed in [33,34]. The necessity for using By combining different types of mutation, crossover, recombi-
penalty functions is eliminated, the choice of individual resulting nation, and stop criteria, or by introducing new methods in the
from three rules [3]: (a) when both solutions are feasible, the indi- inner working of the DE algorithm, new variants (also called strate-
vidual with the lower objective function wins; (b) when there is one gies) are created. The main objective of this action is to improve
infeasible and one feasible individual, the feasible individual is cho- the algorithm and to make it more powerful and more robust [38]
sen; (c) when both individuals are infeasible, the one less infeasible because there are situations in which DE does not perform as good
is preferred. as the expectations [39]. The main problems that the researchers
encounter consists in:
2.2. Stop criteria
(a) stagnation—the situation in which a population-based algo-
The steps mutation, crossover, and selection are repeated until rithm does not converge even to a suboptimal solution, while
a stop criterion is reached. Depending on the type of problem, vari- the population diversity is still high [40]. For DE, the stagna-
ous stop criteria can be applied. The most used criterion and the one tion is the state where the population does not improve over
proposed in the classical DE algorithm is represented by the num- a period of generations, the algorithm not being able to deter-
ber of current generations reaching a predefined maximum number mine a new search space in order to find the optimal solution
of generations (G). Due to the randomness factor involved in evo- [31]. Various factors such as control parameters and problem
lutionary algorithms, the disadvantage of this criterion consists in dimensionality influence the stagnation [3,31,40].
the trial and error methods applied to find a suitable G [35]. (b) premature convergence—the case in which the population diver-
When the type of problem is constrained, the algorithm sity is lost. In all EAs, the premature convergence arises when
stops when all constraints are satisfied whereas, in the case of the characteristics of some high rated individuals dominate
multi-objective optimization, because of the nature of conflicting the population, determining it to converge to a local optimum
objectives, it is not always clear when the search must stop [2]. where the operators cannot produce any more descendents
Price et al. [2] enumerate a series of methods that can be used as better than the parents [41]. There is a close relation between
stop criteria: premature convergence, loss of diversity and population vari-
ance [42]. Preserving the population diversity helps to avoid
(a) met objectives—because there are problems for which the objec- premature convergence and stimulates the ability of the algo-
tive function’s minimum is known, it is easy to determine when rithm to follow the optimum [43].
226 E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238
(c) deterioration of performance—the performance deterioration of use of the objective function value in the mutation operation. The
the algorithm is determined by the increase of objective func- directed version supposes a modification that embeds an additional
tion’s dimensionality [39]; operation, directed mutation, in order to increase the convergence
(d) high computational time—is a result of high number of gener- velocity [25].
ations (G) combined with a high number of individuals (Np) Along with these variants, in the open literature, numerous
because at least G*Np function evaluations are necessary. For strategies can be found, their notations varying from researcher
most of the real-world problems, the evaluation of a candidate to researcher. Within the multitude of variants, the notation
solution is not difficult, but is time-consuming [25]. The long ‘Mode/DiffTerm/Cross’ seems to be insufficient, but a more appro-
computational time appears due to the stochastic nature of the priate notation has not been defined yet [32].
DE algorithm [44]. A solution to solve this problem is to limit
the algorithm to operate within an acceptable time interval and 2.4. Parameter tuning and self-adaptation in DE algorithm
thus obtaining an improved solution, although it may not be the
global optima [25]. The control parameters in DE are Cr, F, and Np and they have
(e) sensitivity/insensitivity to control parameters—the strategies an important role because they influence the effectiveness, effi-
determined so far by the researchers are more or less sensi- ciency and robustness of the algorithm [38]. Each parameter has
tive to the control parameters. Empirical studies indicate that its own influence, not only on the algorithm itself, but also on the
the more sensitive a strategy is, the better solution it can be other control parameters. Because the determination of the opti-
achieved [3]. mal parameters is problem dependent, there is not only a single
acceptable pair of values that can be used in solving all problems.
In order to overcome these problems, new strategies were cre- However, there are some general accepted limits for each parame-
ated. The main directions to improve DE are [38]: ter and some guide rules that can be applied in order to determine
their optimal values.
(a) replacing the hand-tuning of control parameters with adaptive or Cr and F affect the convergence speed and robustness of the
self-adaptive mechanisms—hand-tuning of the control param- search space [50]. Cr controls the number of characteristics inher-
eters can be time consuming due to the different influences of ited from the mutant vector and thus can be interpreted as a
the data used to solve a specific problem and due to the adopted mutation probability, while F influences the size of perturbation
strategy. The introduction of adaptive or self-adaptive mecha- and has a significant role in ensuring the population diversity.
nisms resolves this aspect by inserting the parameters into the When strong convergence is used (Cr = 0.1), the contour matching
algorithm itself; property of the DE algorithm is lost, and the search is performed
(b) introducing more mutation strategies during the optimization along the main parameter axes which is beneficial for separable
process—new different mutation strategies can be created in objective function [32]. Cr = 0.9 is recommended for near uni-modal
order to overcome distinct problem such as the rotationally problems or when fast convergence is desired [51]. Zaharie demon-
invariant problem; strated that a lower limit of F exists (FLow ), determined using Eq. (4),
(c) hybridizing DE by combining it with other optimization which depends on Cr and Np:
techniques—hybridization combines different features of differ-
ent methods in a complementary way, leading to more robust 2 2 Cr
2FLow − + =0 (4)
and effective optimization tools [45]. DE can be used with var- Np Np
ious methods such as opposition based learning (OBL) [46], If F < FLow , the population can converge even in the situation
energetic selection principle (ESP) [3], or particle swarm opti- of absent selection pressure [2,27]. When Np is too small, the
mization (PSO) [47]. stagnation appears, and when is too big, the number of function
evaluations rises, retarding the convergence. A correlation exists
Storn [48] presents 10 variants of the algorithm. In order to indi- between Np and F, being intuitively clear that a large Np requires
cate each variant, a coding ‘Mode/DiffTerm/Cross’ is used, where a small F because there is no need for large amplitude when the
‘Mode’ represents the mode in which the base vector of the muta- population size is big [3].
tion step is chosen, ‘DiffTerm’ represents the number of differential The various rules imposed on the control parameters do not
terms used for mutation and ‘Cross’ is the type of crossover. manage to determine optimal values, different methods being
The first two terms (‘Mode’and ‘DiffTerm’) refer to the character- proposed for solving this problem. These methods can be classi-
istics of the mutation step and the last one, ‘Cross’, is related to the fied into: (a) deterministic control—the parameters are found using
crossover step. The base vector for the mutation step can be chosen a deterministic law, without any feedback information [3]; (b)
randomly (Rand), as the best individual in the population (Best) or adaptive control—the direction and/or magnitude of the parameter
as a vector that lies on a line between the target and the best indi- change is determined using feedback information [24]; (c) self-
vidual (Rand-to-Best). Regarding the number of differential terms adaptive control—the parameters are encoded into the algorithm
used in the mutation phase, information coded with ‘DiffTerm’, one itself [3].
(Eq. (2)) or two terms are used, which are denoted with ‘1’and ‘2’. In self-adaptation, the concept of co-evolution, that is an effec-
As explained earlier, two types of crossover can be applied in the tive approach to decompose complex structures and to achieve
DE: binomial (Bin) and exponential (Exp). better performance, can be used to adapt the control parameters
For example, the first variant of the DE algorithm, which [52]. By reconfiguring itself, the evolutionary strategy is adapted
uses the random method of vector selection, along with one to any general class of problems [24], and, in this manner, the
differential term and binomial crossover, is called ‘Rand/1/Bin’. generality of the algorithm is extended. Among the studies on self-
The other versions are: ‘Rand/2/Bin’, ‘Rand/1/Exp’, ‘Rand/2/Bin’, adaptability of DE algorithms is the paper of Brest et al. [24]. For
‘Rand/2/Exp’, ‘Best/1/Bin’, ‘Best/2/Bin’, ‘Best/1/Exp’, ‘Best/2/Exp’, each individual in the new generation, F and Cr parameters were
‘Rand-to-Best/1/Bin’, ‘Rand-to-Best/1/Exp’. computed as:
A few years later, Fan and Lampien propose two more strategies:
trigonometric and directed. The trigonometric version (TDE) [49] Fl + rand1 · Fu , if rand2 < 1
introduces new form of mutation into the original ‘Rand/1/Bin’. The Fi,G+1 = (5)
Fi,G , otherwise
main difference between the older versions and TDE consists in the
E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238 227
rand3 , if rand4 < 2 3. Differential evolution algorithm and neural network
Cri,G+1 = (6)
Cri,G otherwise In this work, the DE methodology was used to determine the
topology and internal parameters of neural networks. The advan-
where Fl , Fu are the lower and upper limit of the F parameter, 1 tages of using evolutionary algorithms (EAs) over other approaches
and 2 are the probabilities to adjust F and Cr, and randi , i = 1. . .4, consist in the ability to escape local optima, ability to adapt it in a
are uniform values in the interval [0,1]. changing environment, robustness [55]. Evolution in ANNs can be
Zhang et al. [53] proposed a novel self-adaptive differential evo- performed at three different levels [56]:
lution algorithm (DMSDE) in which the population was divided
into multi-groups individuals. The difference between the objective (a) connection weights—the determination of connection weights
functions of individuals from the current group influence the scal- is also known as training phase and this step is usually formu-
ing factor F and the crossover rate Cr, and the strategy is constructed lated as the minimization of an error function such as the mean
based on Eqs. (7) and (8). squared error (MSE);
(b) architectures—evolutionary approaches concerning architec-
fgt middle − fgt best
t
Fgi = Fl + (Fu − Fl ) · (7) ture determination enables the automatic ANN design without
fgt worst − fgt best human intervention;
(c) learning rules—the adaptation of learning rules is performed
where Fgit is the scaling factor of the ith vector of gth group from through evolution.
the current generation t, Fl and Fu are the lower and upper limits
of F parameter, fgt best , fgt middle , fgt worst are the best, middle and worst Another aspect regarding the evolution of ANNs consists in the
fitness functions of the three randomly selected vectors from the g period in which these levels of evolution are performed. In litera-
group, in the generation t. ture, three cases are encountered: evolution of weights, evolution
of architectures, and evolution of both weights and architecture.
The methodology proposed in this work belongs to the last group,
⎧ t
fgit < fgt
⎨ Crgi ,
⎪
the determination of the weights and architecture being performed
t
Crgi = fgit − fgt min (8) simultaneously.
⎩ Crl + (Cru − Crl ) ·
⎪ , fgit ≥ fgt
fgt max − fgt min When using evolutionary algorithms, several features can be
encoded and co-evolved at the same time, the definition of per-
where Crgit is the crossover of the individual i from the group g in formance becoming more flexible than the definition of an energy
or error function [57]. Because EAs do not depend on gradient infor-
the generation t, Cru is the upper limit and Crl is the lower limit
mation like gradient-descent based algorithms, it is not necessary
of the Cr parameter; fgt max , fgt min are the maximum and minimum
that the fitness function to be differentiable or even continuous
values of the fitness functions of all the individuals in the g group
[56].
at t generation, fgit is the fitness of the i individual from the g group,
In the neuro-evolution field, the coding (representation) used
and fgt is the average value of the fitness of all individuals in the can be classified into three classes: direct, developmental, and
group g. implicit [57], and a specific terminology is used to represent the
Recently, Pan et al. [51] created a new DE algorithm (SspDE) with ANN structure and the individuals used in evolutionary algorithms.
self-adaptive trial vector generation strategy and control param- Therefore, evolutionary algorithms work with a population of geno-
eters. Three lists were used: strategy list (SL), mutation scaling types and the neural network is represented by a phenotype. In the
factor list (FL), and crossover list (CRL). Trial individuals were cre- direct encoding scheme, there is a one-to-one relationship between
ated during each generation by applying the standard mutation the phenotype and genotype. Distinctively, the indirect encoding
and crossover steps, which use the parameters in the target asso- has a more compact representation which tries to copy the principle
ciated lists. If the trial was better than the target, the parameters of gene reusability in the biological development [58]. The devel-
were then inserted in the winning strategy list (wSL), winning F list opmental encoding is based on the use of a genome that directs a
(wFL), and winning Cr list (wCRL). SL, FL, and CRL were refilled after process towards the construction of the network, allowing in this
a predefined number of iterations, with a great probability from the manner a more compact representation [59].
wining lists or with randomly generated values. In this manner, the The DE algorithm, in combination with neural networks, is
self-adaptation of the parameters followed the different phases of largely used to determine the optimal parameters of the net-
the evolution. work (training) and scarcely for determining both topology and
In general, the mathematical formulas used for self-adapting the internal parameters. Plagianakos et al. [59] used DE algorithm
parameters of the DE algorithm are not so complex as (7) and (8). for training neural networks. The methodology was tested on the
In our work, the parameters of the algorithm evolved in the same Encoder/Decoder Feedforward Neural Network training problem,
time as the individuals, no new formula being applied to deter- which is very close to a real world pattern classification task. Vari-
mine their values and no additional lists were used. In this manner, ous activation functions were used and, in each case, the percentage
the complexity of the algorithm is kept approximately the same of success was 100%.
as the other non self-evolutive strategies. The idea behind the pro- Subudhi and Jena [5,13] proposed a nonlinear system identifica-
posed strategy was to use the F and Cr parameters embedded in tion scheme using neural networks trained with DE and Levenberg
the target vector, the parameters being evolved in the same man- Marquardt (LM) algorithm. DE was used to find approximate val-
ner as all the characteristics of the individuals. Abass [54] used the ues near the global minimum and LM, which is a faster convergence
same mechanism of adaptation in the Self-adaptive Pareto Differ- algorithm, continued the search (local search). The results obtained
ential Evolution (SPDE) algorithm. The novelty of the methodology indicated that the methodology was better than other existing
proposed in this paper consists in combining DE with the sim- approaches for nonlinear system identification.
ple self-adaptive principle, with the opposition based initialization Zarth and Ludemir [60] developed a multimodal methodology in
and with the modified mutation, and, after that, applying it for which parallel subpopulation of DE are used to train and determine
two types of optimization: neural network architecture and mass the optimal topology. Compared with other existing approaches for
transfer coefficient when different broths are considered. nonlinear system identification methods found in the literature, the
228 E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238
neural networks determined for three classification problems were different transfer functions. Consequently, eight types of activation
less complex and had higher generalization capabilities. functions were used: Linear, Hard Limit, Bipolar Sigmoid, Loga-
Cruz-Ramirez et al. [61] developed a Memetic Pareto Differ- rithmic Sigmoid, Tangent Sigmoid, Sinus, Radial Basis with a fixed
ential Evolution Neural Network (MPDENN) methodology for the center point at 0, and Triangular Basis functions. These represent
automatically design of neural network models with sigmoid only a fraction of activation functions that can be used and were
basis units, applied to multi-classification tasks. The structure and chosen in order to enlarge the mathematical complexity of the neu-
weights were determined by a Pareto DE approach augmented with ral network model. In this manner, the complex interaction within
a local search algorithm based on an improved Resilient back prop- the process can be modeled with great accuracy.
agation algorithm. As stated earlier, initialization is an important step of DE (and in
In this work, the neural networks determined with the DE algo- every evolutionary algorithm) because the algorithm performance
rithm are the fully connected multilayer feed-forward perceptrons dependents not only on the evolutionary process, but also on the
(MLPs). The MLP network type was chosen due to its capability of initial population. All parameters are initialized using random val-
universal approximator and its simple and flexible structure. The ues within the accepted range. For weights, no limits are imposed,
main goal of neural network modeling was to predict the oxygen their initial values being randomly generated in the interval (0, 1).
mass transfer coefficient of stirred bioreactors depending of the In the current work, two types of initialization have been used:
process conditions: biomass concentration, superficial air speed, normal distribution and normal distribution combined with OBL.
volumetric fraction oxygen vector, specific power. In a series of studies, OBL was introduced into the DE algorithm in
The DE algorithm was chosen for ANN optimization because two phases – initialization and creation of new individuals – with
among other EAs it distinguishes as a powerful and robust encouraging results [46,62,63]. The mutation and crossover steps
algorithm which has a few control parameters. In the neuro- form the phase of creation of new individuals.
evolutionary field, the GA was primarily used for training and The OBL was proposed in 2005 by Tizhoosh [64] and is based on
topology determination, but this trend is now changing, DE begin- the opposite number theory, where an opposite number is defined
ning to replace GA since it is more powerful. Another reason for as:
choosing DE over other EAs is represented by the fact that its
w̃ = a + b − w (9)
application in chemical engineering field, especially in combination
with neural networks, are still scarce, the good results obtained in w̃ is the opposite number and w ∈ [a, b].
this study indicating that the specific properties of the proposed Initially, the population was generated as usually, with the
methodology (generality, flexibility, robustness) makes it suitable normal distribution. After that, the opposite individuals were cal-
for different problems. culated by applying Eq. (9) for each characteristic. From the reunion
of the two populations, Np individuals with the highest fitness
functions were chosen. With higher fitness individuals in the ini-
4. Improved self-adaptive DE combined with neural tialization step, the probability of obtaining better performance
networks (SADE-NN-1) indexes at the end of the evolution rises.
Along with the self-adaptation and OBL initialization, a modified
The methodology proposed and applied in this article combines mutation strategy was used in the proposed SADE-NN-1. It con-
different aspects of DE with ANNs, in order to create a new and sists in ordering the individuals participating in the mutation phase
effective algorithm for two types of optimization: ANN topology based on their fitness function. For example, in case of ‘Rand/1’
and chemical processes. The neural network acts as a model while versions, the three random chosen vectors x1 , x2 , x3 are ordered
the DE algorithm is the solving method for optimizing both the from the biggest to the lowest fitness function, higher fitness val-
model and the process. ues indicating better individuals. In this manner, the base vector is
The improved simple self-adaptive differential evolution (SADE- the fittest between the three and the mutation becomes an opera-
NN-1) is a flexible algorithm that can work having as a base various tion in which the best individual from a group is changed in order
variants of DE. The self-adaptive mechanism is a simple one, the to create a better individual. This approach is a greedy one and it is
main idea consisting in the fact that it is not necessary for new sets based on the idea that better individuals can create better children.
of equations to determine F and Cr corresponding to each individ- Comparisons between the simple self-adaptive version with clas-
ual. In our variant, F and Cr are evolved with the population itself, sical mutation and the one with modified mutation are performed
the new values being determined by the same equations as the ones in order to determine which one is the best in the case of optimal
used to create new individuals. Fine-tuning of the DE parameters is neural network determination for the current process.
a difficult problem, their values influencing the performance of the When the modified mutation scheme is used in combination
algorithm in manner not yet fully understood by the scientists. The with ‘Best/1/Cross’ version, the idea behind ‘Best/1/Cross’ remains
most used technique is trial and error, but, because it is prone to the same, the base vector being represented by the best individual
errors and it requires long computational time due to the necessity in the generation. Only the vectors participating in the differential
of performing various runs with different combination of values, term are arranged based on their fitness function.
we considered that a self-adaptive variant is a better choice. Two aspects determine the flexibility of the proposed algorithm,
The basic principle of the methodology is related to the obser- SADE-NN-1. First, the type of DE variant considered as starting point
vation that, like for the majority problems solved with DE, the can be modified by the end-user from the interface with just a sim-
simultaneous optimization of neural network topology and internal ple click. Second, the types of initialization and mutation can be
parameters is performed with greater success when self-adaptive easily changed.
variants were chosen [23]. In our SADE-NN-1 algorithm, the same At the beginning of the algorithm, the base variants and the
formulas for evolving the individuals and control parameters were mutation scheme used along with the maximum number of neu-
used. For instance, in the case of Rand/1/Bin variant, combined with rons in the hidden layers of the neural network are chosen by the
normal mutation, Eqs. (2) and (3) have been applied for computing user. The network parameters influence the length of characteris-
the values of the control parameters of each individual. tics that an individual has, the necessity of setting this parameter
The activation functions have a significant contribution to the before the algorithm starts being obvious.
performance of a neural model. Complex dependencies between During the run of the algorithm, the information between the
inputs and outputs can be modeled with high accuracy using DE and the developed ANN flows in two ways: from the DE to the
E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238 229
network and vice versa. At each generation, new individuals rep- current generation must be smaller than the maximum number of
resenting encoded new networks were created and each network generations allowed (G = 500) and the fitness to have a value smaller
(determined after decoding the individual) calculated the fitness than 10−6 . The limit of the fitness function was chosen based on
function for the DE. Then the algorithm uses the fitness in order to the consideration that the methodology performs its maximization
determine the best individuals for the next generation. with a good ratio between running time and performance.
Because of the specific nature of the DE algorithm, which uses The type of the optimization and, indirectly, the manner in
real-value vectors, the direct encoding was used. The vectors from which the objective function is computed, has a big influence on
the genotype have a specific structure and contain all the param- the complexity of the algorithm, since for neural network optimiza-
eters of the neural networks: number of hidden layers, number tion the objective function is dependent of computing performance
of neurons in each hidden layer, weights, biases, and activation indexes based on a relatively high number of data (length of the
function of each neuron. training data set). When considering the built in functions provided
For the current methodology, at least two reasons can explain by the .NET Framework, which is the framework used to implement
the computation time: a large network needs a long vector and the current methodology, the complexity is O(n5 log(n)) and when
DE population contains Np vectors. For example, an ANN with 4 not, the complexity is O(n4 ).
inputs, 3 hidden layers, 20 neurons in first hidden layer, 15 in On what concerns flexibility and scalability, different aspects
the second hidden layer, 10 in the third hidden layer and 5 out- can be considered. From the functional scalability point of view, due
puts, coded as 4:20:15:10:5, has 4*20 + 20*15 + 15*10 + 10*5 = 580 to the modular approach, the application allows an easy introduc-
weights, 4 + 20 + 15 + 10 + 5 = 64 neurons and the vector containing tion of new modules and functionalities. If the aspect considered
all these data has 1 + 3 + 580 + 64*2 = 712 characteristics. Knowing is the load scalability, the methodology internally manages the
that the majority of processes can be modeled with great success resource consumption and computational time by (i) automatically
with networks having maximum two hidden layers, in our work adjusting the lengths of the individuals of the optimization proce-
the number of maximum allowed hidden layers was limited to two. dure based on the characteristics of the problem being solved and
Taking into account that the length of each individual and the posi- on the application settings, and (ii) stopping optimization based on
tion of each neural network property is fixed during the evolution the performance of the solutions obtained at each step.
and is determined automatically based on network topology and
application settings, the vector has the following structure:
5. Sensitivity analysis
- the first position in the vector is occupied by the number of hidden
layers (Nl);
When creating models for specific problems, a key question is:
- the second and third positions are kept for the number of neurons
which are the input variables and which of them are the most
in the first (Nh1 ) and second intermediate layers (Nh2 );
important? The answer to such a question is given by a procedure
- the next positions are reserved for the weights. The number
based on sensitivity analysis (SA). SA can be used to perform a vari-
of reserved positions is equal to: number of inputs × maximum
ety of actions such as: ranking the inputs based on their influence
number of neurons in the first hidden layer + maximum number
on the output, assessing changes in the output due to the variations
of neurons in the first hidden layer × maximum number of neu-
in the inputs, improving quality of the computations or limiting the
rons in the second hidden layer + maximum number of neurons in
use of a program [66]. For models represented by neural networks,
the second hidden layer × number of outputs. Depending on the
SA can be used to examine whether the characteristic of each input
actual values of Nh1 and Nh2 , unoccupied position can exists. These
has been learned well or to explore the sensitivity of the output to
are filled with noise data copied from the parents. In a sense, this
the variation of each input [67].
noise data can be viewed as inactivated genes from the biological
When performing SA, different methods can be utilized: adding
genome that became activated only in specific conditions.
noise to each input and observing the modification in the outputs,
- after that, the biases and activation functions for each neuron
analyzing the derivatives of the fan-out weights of the inputs units
follow. A specific number of positions is reserved for the neu-
or applying the missing value problem [68].
ron’s parameters and it is equal to the maximum number of
In this work, the sensitivity of each output for a selected input
neurons allowed in the network multiplied by 2, since there are
is defined as a differential coefficient and is applied for ranking
two parameters for each neuron.
the inputs based on their influence on the model’s output by using
quantified information. The method based on partial derivatives is
An increased performance of the neural networks can be
encountered in the literature as ‘PaD’ and, by using this approach,
achieved resolving two aspects: optimizing the architecture and
two results can be obtained: profile of output for small input vari-
normalizing the raw data [65]. In this article, the data normaliza-
ations and a classification of relative contribution of each variant
tion was applied in order to ensure the best possible scenario in
[69].
which the networks can evolve and the optimization of the ANN
architecture was realized with DE algorithm.
A simplified logic schema of the proposed methodology is ∂outputj
sj,i = (10)
detailed in Fig. 1. ∂inputi
As it can be observed from Fig. 1, the SADE-NN-1 methodol-
ogy contains a module which performs the process optimization
where sj,i is the sensitivity of the output j related to the input i.
using an already determined neural network. This module was
Knowing that for a network with two hidden layers the outputj is
introduced according to the principle of code reusability and allow-
simply defined (without biases) as in Eq. (11), its sensitivity related
ing the DE algorithm to run in the same manner for each type of
to the each input pattern (m) becomes:
optimization (process or neural model).
The stop criterion employed in the proposed methodology con-
sists of a combination of two conditions, referring to the number of
Nh2 Nh1 I
generations and the fitness function, the algorithm stopping when outputj = g wj,h2 · g wh1,h2 · g wh1,i · input
at least one of them reached a predefined value. In order for the h2=0 h1=0 i=0
mutation, crossover, and recombination steps to be repeated, the (11)
230 E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238
where kp represents the input of the p neuron and wp,h is the weight M
between pth and the hth neurons and g(p) is the activation function
si,i = m )2
(sj,i (13)
corresponding to each p neuron.
m , a series of information about the influence of the input m=1
Using si,i
on the output can be obtained by plotting a set of graphs. One exam- In order to obtain a reliable sensitivity analysis, the neural
ple of interpretation is that if the partial derivative is negative, for networks with the highest performances must be used; otherwise,
the specific input value given by the m input pattern, the output confusing results may be obtained [70]. Another aspect linked to
E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238 231
Due to their specific properties, the increase of the biomass Type of layer Neuron parameters
concentration of P. shermanii leads to the decrease of kL a. This
Bias Activation function
intensification of mass transfer can be described by means of the
Hidden layer (H1.1): 0.099 (H1.1): Lin
amplification factor, defined by the ratio between kL a in the pres-
(H1.2): −0.241 (H1.2): Tansig
ence of oxygen-vector, (kL a)V , and in the absence of oxygen-vector, (H1.3): −0.625 (H1.3): Bipolar
(kL a)0 , for similar experimental conditions. The bacterial cells are (H1.4): 0.849 (H1.4): Lin
absorbed on the surface of the oxygen-vector determining an ini- (H1.5): 0.341 (H1.5): Sin
tial reduction of (kL a)V /(kL a)0 . Experimental data indicates that Output layer (O.1): −0.229 (O.1): Tansig
kL a increases when the n-dodecane concentration is lower than
0.1. The intensification of specific power input induces the fine
dispersion of n-dodecane and exhibits favorable effect on oxy- The best neural networks obtained in each case were MLP
gen transfer rate. However, the increase of energy dissipated by (4:8:1) and MLP (4:5:1), chosen because they have the highest fit-
mechanical agitation leads to a contrary effect on kL a, initially ness value among all neural networks determined. They correspond
increasing to a maximum value, followed by its reduction over to Best/1/Bin as DE base variant, normal initialization, modified
a certain level of mixing intensity. As it can be observed, differ- mutation and Best/1/Bin, opposition based initialization and nor-
ent strategies and actions have a distinct effect on the kL a factor, mal mutation, respectively.
a specific neural network model being necessary to simulate its As it can be observed, the binomial versions tend to behave
behavior as a function of the biomass concentration, superficial better than the exponential ones and the opposition based initial-
air velocity, specific power, and oxygen-vector volumetric fraction. ization gave better results than the classic versions. In all cases
These four inputs are related to the output variable, namely kL a (except normal initialization with ‘Rand/1/Exp’ and opposition
factor. based initialization with ‘Best/1/Bin’ and ‘Rand-to-Best/1/Bin’), the
Using various DE strategies as a base, a series of simulations were modified mutation outperforms the classic mutation. The best
performed in order to determine the best neural network model of neural networks determined with both initialization types were
the bacteria fermentation. The results obtained with the SADE-NN- obtained when the base DE version was ‘Best/1/Bin’. The best neu-
1 methodology are listed in Table 1. The first column indicates the ral network, MLP (4:5:1), was obtained with the classic mutation
DE base variant. The two types of mutation used were referred as form and opposition based initialization.
‘Normal’ and ‘Modified’. Normal initialization and opposition based All the neurons characteristics of the best network MLP (4:5:1),
initialization are used combined with different base variants and which was chosen based on the highest fitness function, are listed
mutations. For each combination of initialization type – DE base vari- in Table 2. In order to identify each neuron, a notation indicating
ant – mutation type of the SADE-NN-1 algorithm, ten simulations the layer and the number of neuron is used. For example, H1.2
were performed and the average values of the performance indexes indicates the second neuron in the first hidden layer and O.1 is
were computed. the first output neuron. Each neuron can have one from the eight
Table 1
Results of simulations obtained in the case of bacteria fermentation with the SADE-NN-1 methodology.
Mutation type MSE training MSE testing Fitness Topology MSE training MSE testing Fitness
Normal Initialization Rand/1/Exp Normal 0.029 0.027 34.035 4:10:1 0.033 0.031 29.323
Modified 0.029 0.028 33.548 4:9:5:1 0.033 0.032 29.425
Opposition Based Rand/1/Exp Normal 0.02 0.022 49.474 4:7:1 0.023 0.025 41.569
Initialization Modified 0.019 0.016 51.669 4:10:1 0.024 0.021 41.430
Fig. 2. Comparison between experimental data, phenomenological model and predictions of the model MLP (4:5:1) for the training data in case of bacteria fermentation.
activation functions, referred with the following notations: Lin for respectively. As it can be observed from the performance indexes
Linear function, Hardlim for Hard Limit function, Bipolar for Bipolar and from the two figures, the neural network renders better the
Sigmoid function, Logsig for Logistic Sigmoid function, Tansig for training data than the testing data. This fact is normal because the
Tangent Sigmoid function, Sin for Sinus function, Radbas for Radial training set was used in the phase when network learned the pro-
Basis function and Tribas for Triangular Basis function. cess characteristics, while the testing set was not previously ‘seen’
For the training data, the correlation between ANN prediction by the model. However, the small errors obtained in the testing
and experimental data had a value of 0.9878 and the root mean phase indicate that the obtained network can accurately model the
squared error (RMSE) was 0.0219, while for the testing data the process considered as case study.
correlation was 0.9563 and RMSE was 0.0353. Using Eqs. (14) and Figs. 2 and 3 (where the comparison is made point by point)
(15), determined with a multi-regression method in Matlab, for the and the error indexes previously given show that the neural model
training and the testing data sets, the mass transfer coefficient was is closer to the experimental data than the phenomenological
computed (phenomenological model). In this case, the correlation model obtained through regression. This proves that the proposed
for the training set was −0.2329 and RMSE was 0.2172, while for the methodology is better than other approaches used to predict the
training data the correlation was −0.2214 and RMSE was 0.2125. mass transfer coefficient.
The comparison between the prediction of the neural network After determining the best model of the process when bacteria
model MLP (4:5:1), the regression method and the experimental yeast is used, a sensitivity analysis is performed in order to deter-
data are shown in Figs. 2 and 3 for the training and testing data, mine the influence of each parameter on the output of the neural
Fig. 3. Comparison between experimental data, phenomenological model and predictions of the model MLP (4:5:1) for the testing data in case of bacteria fermentation.
234 E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238
Table 5
Results of simulations obtained in the case of yeast fermentation with the SADE-NN-1 methodology.
DE base variant Mutation type MSE training MSE testing Fitness Topology MSE training MSE testing Fitness
Normal Initialization Rand/1/Exp Normal 0.017 0.020 55.666 4:4:5:1 0.019 0.017 50.025
Modified 0.016 0.016 60.914 4:10:1 0.018 0.017 53.729
Opposition Based Initialization Rand/1/Exp Normal 0.013 0.008 75.61 4:10:3:1 0.0147 0.013 66.563
Modified 0.009 0.015 109.04 4:10:1 0.014 0.013 68.102
used, the fitness is higher than the fitness corresponding to the correlation = 0.0505 for the training data, and RMSE = 0.2719, cor-
exponential crossover. By comparing the normal initialization with relation = 0.4284 for the testing data.
the opposition based initialization, it can be observed that the The small errors between the experimental data and the pre-
former determined models with higher fitness values. Compared dictions obtained with the neural network show that the neural
with the best fitness obtained in the case of bacteria (determined network model could follow the process with an acceptable
when opposition based initialization, ‘Best/1/Bin’ variant, and clas- precision. In addition, the neural network is better than the phen-
sic mutation are used), the best fitness for the yeasts (using normal omenological model, this confirming the fact that the proposed DE
initialization, the ‘Best/1/Bin’ variant, and modified mutation) is algorithm is better and more robust than the existing methodolo-
higher. This means that the neural network modeling of the yeasts, gies used for predicting the mass transfer coefficient.
with neurons characteristics described in Table 6, gives more accu- A sensitivity analysis procedure is applied to the best neural net-
rate results in training and testing phases compared with the ones work, MLP (4:6:1), in order to determine the influence of each input
obtained with the neural network that modeled the bacteria broth. on the mass transfer coefficient. The sensitivity values are 5.514 for
The notations used in Table 6 are the same as the ones from the first input, 5.134 for the second, 2.644 for the third and 4.021 for
Table 2. the fourth. Consequently, the descending order of input influence
In Figs. 4 and 5, the differences between the experimental data, on the output is 1, 2, 4, 3. Owing to the above discussion concern-
phenomenological model results and predictions of the best neu- ing the role of each considered parameter on the oxygen transfer
ral network for the training and testing data are shown. Using the process inside the broth, their order of importance is similar to the
model MLP (4:6:1), for the training set, RMSE = 0.00903 and the case of P. shermanii fermentation.
correlation is 0.9946 and for the testing set, RMSE = 0.01411 and After that, with the best neural network model obtained, the
the correlation is 0.9842. For the phenomenological model, deter- process is optimized in order to determine the conditions for which
mined using the multi-regression methodology, RMSE = 0.2751 and the mass transfer coefficient is maximized when yeast broths are
used. The same modifications and procedure are applied as in the
previous optimization. A series of ten conditions that lead to the
Table 6
maximization of the considered parameter are listed in Table 7.
Neurons characteristics of the best neural network MLP (4:6:1).
The differences between the fitness values, as in the previous
Type of layer Neuron parameters case, are very small and around the global maxima. These results
Bias Activation function indicate that the proposed methodology is able to determine the
various conditions that can lead to process optimization.
Hidden layer (H1.1): −0.151 (H1.1): Sin
(H1.2): 0.041 (H1.2): Lin Due to the different behavior of yeasts in systems contain-
(H1.3): −0.106 (H1.3): LogSig ing hydrocarbons, the results from Table 7 underline the negative
(H1.4): −0.134 (H1.4): Lin influence of biomass concentration on oxygen transfer rate in
(H1.5): −0.010 (H1.5): Lin this fermentation broth. Therefore, the maximum value of kL a is
(H1.6): 0.922 (H1.6): TanSig
obtained at low yeast amount and high superficial air velocity, spe-
Output layer (O.1): 0.242 (O.1): Sin cific power input and oxygen-vector concentration.
236 E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238
Fig. 4. Comparison between experimental data, phenomenological model and predictions obtained with MLP (4:6:1) for the training data, in case of the yeast fermentation
process.
Fig. 5. Comparison between experimental data, phenomenological model and predictions obtained with MLP (4:6:1) for the testing data, in case of the yeast fermentation
process.
We can not conclude which elements of the methodology is [15] S. Curteanu, F. Leon, R. Furtuna, E.-N. Dragoi, N. Curteanu, Comparison between
better due to different performance obtained in different situa- different methods for developing neural network topology applied to a complex
polymerization process, in: The 2010 International Joint Conference on Neural
tions, but according to process characteristics we can say that these Networks IJCNN, IEEE, 2010, pp. 1–8.
improvements added to the algorithm resulted in general higher [16] D.M. Himmelblau, Accounts of experiences in the application of artificial neu-
efficiency of the methodology. ral networks in chemical engineering, Industrial & Engineering Chemistry
Research 47 (2008) 5782–5796.
The sensitivity analysis applied with the best neural models rank [17] D. Cascaval, A.I. Galaction, E. Folescu, M. Turnea, Comparative study on the
the inputs according to their influence of the output. The results effects of n-dodecane addition on oxygen transfer in stirred bioreactors for sim-
are consistent with the experimental practice, showing the follow- ulated bacterial and yeasts broths, Biochemical Engineering Journal 31 (2006)
51–56.
ing descending order: biomass concentration, superficial air speed,
[18] J.S. Alford, Bioprocess control: advances and challenges, Computers & Chemical
specific power, volumetric fraction oxygen vector. Engineering 30 (2006) 1464–1475.
The DE optimization methodology was also applied for the [19] Y. Liu, W.L. Chen, Z.L. Gao, H.Q. Wang, P. Li, Adaptive control of nonlinear time-
varying processes using selective recursive kernel learning method, Industrial
determination of the optimal working conditions that lead to
& Engineering Chemistry Research 50 (2011) 2773–2780.
a maximum value of the oxygen mass transfer coefficient. [20] S. Curteanu, H. Cartwright, Neural networks applied in chemistry. I. Determina-
Neural networks developed with SADE-NN-1 were the models tion of the optimal topology of multilayer perceptron neural networks, Journal
included into the optimization procedures. Significant information of Chemometrics 25 (2011) 527–549.
[21] R. Furtuna, S. Curteanu, F. Leon, Multi-objective optimization of a stacked neu-
is obtained, useful for experimental practice. ral network using an evolutionary hyper-heuristic, Applied Soft Computing 12
The developed methodology proved to be flexible and efficient (2012) 133–144.
for the modeling and optimization of the oxygen mass transfer [22] E.-N. Dragoi, S. Curteanu, C. Mihailescu, Modeling methodology based on Dif-
ferential Evolution algorithm applied to a series of hydrogels, in: Proceedings
in stirred bioreactors. Because the characteristics of the problem ECIT 2010—6th European Conference on Intelligent Systems and Technologies,
being solved influences the performance of the methodology, fur- 2010.
ther studies and analysis must be performed in order to determine [23] E.-N. Dragoi, S. Curteanu, F. Leon, A.I. Galaction, D. Cascaval, Modeling of oxygen
mass transfer in the presence of oxygen-vectors using neural networks devel-
the situations in which each modification gives better results. oped by differential evolution algorithm, Engineering Applications of Artificial
Intelligence 24 (2011) 1214–1226.
[24] J. Brest, B. Boskovic, S. Greiner, V. Zumer, M. Maucec, Performance comparison
Acknowledgments of self-adaptive and adaptive differential evolution algorithms, Soft Computing
11 (2007) 617–629.
[25] H.Y. Fan, J. Lampinen, A directed mutation operation for the differential evo-
This work was done by financial support provided by EURODOC lution algorithm, Journal of Industrial Engineering-Theory Applications and
“Doctoral Scholarships for research performance at European level” Practice 1 (2003) 6–15.
[26] M.M. Ali, Differential evolution with preferential crossover, European Journal
project, financed by the European Social Found and Romanian Gov-
of Operational Research 181 (2007) 1137–1147.
ernment and by “Partnership in priority areas – PN-II” program, [27] D. Zaharie, Critical values for the control parameters of differential evolution
supported by ANCS, CNDI - UEFISCDI, project PN-II-PT-PCCA-2011- algorithms, in: P. Ošmera, R. Matoušek (Eds.), Proceedings of MENDEL 2002,
3.2-0732, no. 23/2012. 8th International Conference on soft Computing, 2002, pp. 62–67.
[28] D. Zaharie, Influence of crossover on the behavior of Differential Evolution
Algorithms, Applied Soft Computing 9 (2009) 1126–1138.
[29] J. Tvrdik, Adaptive differential evolution and exponential crossover, in: Inter-
References national Multiconference on Computer Science and Information Technology,
IEEE, 2008, pp. 927–931.
[1] R. Storn, K. Price, Differential Evolution—A Simple and Efficient Adaptive [30] A. Qing, Differential, Evolution. Fundamentals and Applications in Electrical
Scheme for Global Optimization Over Continuous Spaces, Technical Report Engineering, John Wiley & Sons, Singapore, 2009.
TR-95-012, Berkley, 1995. [31] D. Davendra, G. Onwubolu, Forward backward transformation, in: G.
[2] K.V. Price, R.M. Storn, J.A. Lampien, Differential Evolution. A Practical Approach Onwubolu, D. Davendra D. (Eds.), Differential Evolution: A Handbook for Global
to Global Optimization, Springer, Berlin, 2005. Permutation-Based Combinatorial Optimization, Springer, Berlin, 2009, pp.
[3] V. Feoktistov, Differential Evolution: In Search of Solutions, Springer, Berlin, 35–80.
2006. [32] R. Storn, Differential evolution research—trends and open questions, in: U.
[4] S. Olafsson, Chapter 21 Metaheuristics, in: S.G. Henderson, B.L. Nelson (Eds.), Chakraborty (Ed.), Advances in Differential Evolution, Springer, Berlin, 2008,
Handbooks in Operations Research and Management Science Simulation, vol. pp. 1–31.
13, North-Holland, Amsterdam, 2006, pp. 633–654. [33] J. Lampinen, Solving problems subject to multiple nonlinear constraints by
[5] B. Subudhi, D. Jena, An improved differential evolution trained neural network the differential evolution, in: R. Matousek, P. Osmera (Eds.), Proceedings of
scheme for nonlinear system identification, International Journal of Automa- MENDEL’01–7th International Conference on Soft Computing, 2001, pp. 50–57.
tion and Computing 6 (2009) 137–144. [34] E. Mezura-Montes, C.A. Coello Coello, E.I. Tun-Morales, Simple feasibility rules
[6] R. Angira, B.V. Babu, Performance of modified differential evolution for optimal and differential evolution for constrained optimization, in: R. Monroy, G.
design of complex and non-linear chemical processes, Journal of Experimental Arroyo-Figueroa, L.E. Sucar, H. Sossa (Eds.), Proceedings of the Third Mexican
& Theoretical Artificial Intelligence 18 (2006) 501–512. International Conference on Artificial Intelligence, Springer, New York, 2004,
[7] U. Yüzgeç, Performance comparison of differential evolution techniques on pp. 707–716.
optimization of feeding profile for an industrial scale baker’s yeast fermentation [35] K. Zielinski, R. Laur, Stopping criteria for differential evolution in constrained
process, ISA Transactions 49 (2010) 167–176. single-objective optimization, in: U. Chakraborty (Ed.), Advances in Differential
[8] M.D. Kapadi, R.D. Gudi, Optimal control of fed-batch fermentation involving Evolution, Springer, Berlin, 2008, pp. 111–138.
multiple feeds using Differential Evolution, Process Biochemistry 39 (2004) [36] J. Brest, S. Greiner, B. Boskovic, M. Mernik, V. Zumer, Self-adapting control
1709–1721. parameters in differential evolution: a comparative study on numerical bench-
[9] B.V. Babu, P.G. Chakole, J.H. Syed Mubeen, Multiobjective differential evolution mark problems, IEEE Transactions on Evolutionary Computation 10 (2006)
(MODE) for optimization of adiabatic styrene reactor, Chemical Engineering 646–657.
Science 60 (2005) 4822–4837. [37] M.E. Abdual-Salam, H.M. Abdul-Kader, W.F. Abdel-Wahed, Comparative study
[10] A.M. Gujarathi, B.V. Babu, Improved multiobjective differential evolution between Differential Evolution and Particle Swarm Optimization algorithms in
(MODE) approach for purified terephthalic acid (PTA) oxidation process, Mate- training of feed-forward neural network for stock price prediction, in: The 7th
rials and Manufacturing Processes 24 (2009) 303–319. International Conference on Informatics and Systems (INFOS), IEEE, 2010, pp.
[11] B. Subudhi, D. Jena, Differential evolution levenberg marquardt trained neural 1–8.
network scheme for nonlinear system identification, Neural Processing Letters [38] J. Brest, Constrained real-parameter optimization with e-self-adaptive dif-
27 (2008) 285–296. ferential evolution, in: E. Mezura-Montes (Ed.), Constraint-Handling in
[12] B. Subudhi, D. Jena, A combined differential evolution and neural network Evolutionary Optimization, Springer, Berlin, 2009, pp. 73–93.
approach to nonlinear system identification, in: TENCON 2008-2008b IEEE [39] R. Thangaraj, M. Pant, A. Abraham, A simple adaptive Differential Evolution
Region 10 Conference, IEEE, 2008, pp. 1–6. algorithm, in: World Congress on Nature & Biologically Inspired Computing,
[13] B. Subudhi, D. Jena, A differential evolution based neural network approach to 2009. NaBIC 2009, IEEE, 2009, pp. 457–462.
nonlinear system identification, Applied Soft Computing 11 (2011) 861–871. [40] F. Neri, V. Tirronen, Recent advances in differential evolution: a survey
[14] C.W. Chen, D.Z. Chen, G.Z. Cao, An improved differential evolution algorithm and experimental analysis, Artificial Intelligence Review 33 (2010) 61–
in training and encoding prior knowledge into feedforward networks with 106.
application in chemistry, Chemometrics and Intelligent Laboratory 64 (2002) [41] E.S. Nicoara, Mechanisms to avoid premature convergence of genetic algo-
27–43. rithms, Buletinul Universitatii Petrol-Gaze din Ploiesti 61 (2009) 87–96.
238 E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238
[42] D. Zaharie, A comparative analysis of crossover variant in differential evolu- [59] V. Plagianakos, D. Tasoulis, M. Vrahatis, A review of major application areas of
tion, in: M. in:, M. Gazha, T. Paprzycki, Pelech-Pilichowski (Eds.), Proceedings differential evolution, in: U. Chakraborty (Ed.), Advances in Differential Evolu-
of International Multiconference on Computer Science and Information Tech- tion, Springer, Berlin, 2008, pp. 197–238.
nology IMCSIT, 2007, pp. 171–181. [60] A.M.F. Zarth, T.B. Ludermir, Optimization of neural networks weights and archi-
[43] D. Zaharie, Statistical properties of differential evolution and related random tecture: a multimodal methodology, in: Ninth International Conference on
search algorithms, in: P. Brito (Ed.), Proceedings of International Conference on Intelligent Systems Design and Applications, 2009, pp. 209–214.
Computational Statistics, Physica-Verlag, Heidelberg, 2008, pp. 473–485. [61] M. Cruz-Ramirez, J. Sanchez-Monedero, F. Fernandez-Navarro, J. Fernandez,
[44] C.W. Chiang, W.P. Lee, J.S. Heh, A 2-Opt based differential evolution for global C. Hervas-Martinez, Memetic pareto differential evolutionary artificial neu-
optimization, Applied Soft Computing 10 (2010) 1200–1207. ral networks to determine growth multi-classes in predictive microbiology,
[45] B. Qian, L. Wang, R. Hu, W.L. Wang, D.X. Huang, X. Wang, A hybrid differential Evolution of Intelligence 3 (2010) 187–199.
evolution method for permutation flow-shop scheduling, International Journal [62] S. Rahnamayan, H. Tizhoosh, M. Salama, Opposition-based differential evolu-
of Advanced Manufacturing Technology 38 (2008) 757–777. tion (ODE) with variable jumping rate, in: IEEE Symposium on Foundation of
[46] S. Rahnamayan, H. Tizhoosh, M. Salama, Opposition-based differential evolu- Computational Intelligence, IEEE, 2007, pp. 81–88.
tion algorithms, in: IEEE Congress on Evolutionary Computation CEC 2006, IEEE, [63] S. Rahnamayan, H. Tizhoosh, M. Salama, Opposition-based differential evo-
2006, pp. 2010–2017. lution, in: U. Chakraborty (Ed.), Advances in Differential Evolution, Springer,
[47] V. Ramesh, T. Jayabarathi, S. Asthana, S. Mital, S. Basu, Combined hybrid differ- Berlin, 2008, pp. 155–171.
ential particle swarm optimization approach for economic dispatch problems, [64] H.R. Tizhoosh, Opposition-based learning a new scheme for machine intel-
Electric Power Components and Systems 38 (2010) 545–557. ligence, in: International Conference on Computational Intelligence for
[48] R. Storn, On the usage of differential evolution for function optimization, in: Modeling, Control and International Conference on Intelligent Agents, Web
M.H. Smoth, M.A. Lee, J. Keller, J. Yen (Eds.), 1996 Biennial Conference of the technologies and Internet Commerce, 2005, pp. 695–701.
North American Fuzzy Information Processing Society–NAFIPS, IEEE, 1996, pp. [65] Q. Zhang, S. Sun, Weighted data normalization based on eigenvalues for arti-
519–523. ficial neural network classification, in: C. Leung, M. Lee, J. Chan (Eds.), Neural
[49] H.Y. Fan, J. Lampinen, A trigonometric mutation operation to differential evo- Information Processing, Springer, Berlin, 2009, pp. 349–356.
lution, Journal of Global Optimization 27 (2003) 105–129. [66] L.M. Liebrock, Empirical sensitivity analysis for computational proce-
[50] J. Ilonen, J.K. Kamarainen, J. Lampinen, Differential evolution training algorithm dures, in: P.J. Williams, M.A. Friedman (Eds.), Proceedings of the Richard
for feed-forward neural networks, Neural Processing Letters 17 (2003) 93–105. Tapia Celebration of Diversity in Computing Conference 2005, ACM, 2005,
[51] Q.K. Pan, P.N. Suganthan, L. Wang, L. Gao, R. Mallipeddi, A differential evolution pp. 32–35.
algorithm with self-adapting strategy and control parameters, Computers & [67] R. Tsaih, Sensitivity analysis, neural networks, and the finance, in: Interna-
Operations Research 38 (2011) 394–408. tional Joint Conference on Neural Networks IJCNN 99, vol. 6, IEEE, 1999, pp.
[52] C. Hu, X. Yan, An immune self-adaptive differential evolution algorithm with 3830–3835.
application to estimate kinetic parameters for homogeneous mercury oxida- [68] A. Hunter, L. Kennedy, J. Henry, I. Ferguson, Application of neural networks
tion, Chinese Journal of Chemical Engineering 17 (2009) 232–240. and sensitivity analysis to improved prediction of trauma survival, Computer
[53] X. Zhang, W. Chen, C. Dai, W. Cai, Dynamic multi-group self-adaptive differen- Methods and Programs in Biomedicine 62 (2000) 11–19.
tial evolution algorithm for reactive power optimization, International Journal [69] M. Gevrey, I. Dimopoulos, S. Lek, Review and comparison of methods to study
of Electrical Power 32 (2010) 351–357. the contribution of variables in artificial neural network models, Ecological
[54] H.A. Abbass, The self-adaptive Pareto differential evolution algorithm, in: Modelling 160 (2003) 249–264.
Proceedings of the 2002 Congress on Evolutionary Computation (CEC’02), IEEE, [70] P.M. Szecowka, M.A. Mazurowski, A. Szczurek, B.W. Licznerski, On reliability
2002, pp. 831–836. of neural network sensitivity analysis applied for sensor array optimization,
[55] H.A. Abbass, A memetic pareto evolutionary approach to artificial neural Sensor Actuator B: Chemical 157 (2011) 298–303.
networks, in: M. Stumptner, D. Corbett, M.J. Brooks (Eds.), AI 2001: Advances [71] M.A. Mazurowski, P.M. Szecowka, Limitations of sensitivity analysis for neural
in Artificial Intelligence, 14th Australian Joint Conference on Artificial Intelli- networks in cases with dependent inputs, in: IEEE International Conference on
gence, Springer, London, 2001, pp. 1–6. Computational Cybernetics 2006, IEEE, 2006, pp. 1–5.
[56] Y. Xin, Evolving artificial neural networks, in: Proceedings of the IEEE 1999, vol. [72] A.I. Galaction, D. Cascaval, M. Turnea, E. Folescu, Enhancement of oxygen
87, IEEE, pp. 1423–1447. mass transfer in stirred bioreactors using oxygen-vectors. 2. Propionibac-
[57] D. Floreano, P. Durr, C. Mattiussi, Neuroevolution: from architectures to learn- terium shermanii broths, Bioprocess and Biosystems Engineering 27 (2005)
ing, Evolution of Intelligence 1 (2008) 47–62. 263–271.
[58] J.B. Mouret, S.P. Doncieux, MENNAG: a modular, regular and hierarchical encod- [73] A.I. Galaction, D. Cascaval, C. Oniscu, M. Turnea, Predictions of oxygen mass
ing for neural-networks based on attribute grammars, Evolution of Intelligence transfer coefficients in stired bioractors for bacteria yeasts and fungus broths,
1 (2008) 187–207. Biochemical Engineering Journal 20 (2004) 85–94.