0% found this document useful (0 votes)

2 views

Optimization_methodology_based_on_neural

This paper presents a novel optimization methodology combining neural networks with a self-adaptive differential evolution algorithm (SADE-NN-1) to enhance the modeling of oxygen mass transfer coefficients in aerobic fermentation processes. The proposed method demonstrates high accuracy in predicting mass transfer coefficients and effectively determines optimal conditions for maximizing these coefficients. A sensitivity analysis is also performed to identify the most influential inputs affecting the model output, showcasing the methodology's applicability in optimizing fermentation processes.

Uploaded by

Sairaj Patil

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Optimization_methodology_based_on_neural

Uploaded by

Sairaj Patil

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Applied Soft Computing 13 (2013) 222–238

Contents lists available at SciVerse ScienceDirect

Applied Soft Computing

journal homepage: www.elsevier.com/locate/asoc

Optimization methodology based on neural networks and self-adaptive

differential evolution algorithm applied to an aerobic fermentation process
Elena-Niculina Dragoi a , Silvia Curteanu a,∗ , Anca-Irina Galaction b , Dan Cascaval a
a
“Gheorghe Asachi” Technical University of Iasi, Department of Chemical Engineering, Street Prof.dr.doc. Dimitrie Mangeron, No. 73, 700050 Iasi, Romania
b
“Grigore T. Popa” University of Medicine and Pharmacy Iasi, Department of Biotechnology, Street Mihail Kogalniceanu, No. 9-13, 700454 Iasi, Romania

a r t i c l e i n f o a b s t r a c t

Article history: The determination of the optimal neural network topology is an important aspect when using neural
Received 13 November 2011 models. Due to the lack of consistent rules, this is a difficult problem, which is solved in this paper
Received in revised form 22 May 2012 using an evolutionary algorithm namely Differential Evolution. An improved, simple, and flexible self-
Accepted 1 August 2012
adaptive variant of Differential Evolution algorithm is proposed and tested. The algorithm included two
Available online 15 August 2012
initialization strategies (normal distribution and normal distribution combined with the opposition based
principle) and a modified mutation principle. Because the methodology contains new elements, a specific
Keywords:
name has been assigned, SADE-NN-1. In order to determine the most influential inputs of the models, a
Differential evolution
Neural network topology
sensitivity analysis was applied. The case study considered in this work refer to the oxygen mass transfer
Optimization coefficient in stirred bioreactors in the presence of n-dodecane as oxygen vector. The oxygen transfer
Sensitivity analysis in the fermentation broths has a significant influence on the growth of cultivated microorganism, the
Fermentation process accurate modeling of this process being an important problem that has to be solved in order to optimize
the aerobic fermentation process.
The neural networks predicted the mass transfer coefficients with high accuracy, which indicates that
the proposed methodology had a good performance. The same methodology, with a few modifications,
and with the best neural network models, was used for determining the optimal conditions for which
the mass transfer coefficient is maximized.
A short review of the differential evolution methodology is realized in the first part of this article,
presenting the main characteristics and variants, with advantages and disadvantages, and fitting in the
modifications proposed within the existing directions of research.
© 2012 Elsevier B.V. All rights reserved.

1. Introduction benchmark functions, its efficiency laying in a simple and compact

structure that uses a stochastic direct search approach [5]. Vari-
Differential Evolution (DE) is a population-based meta- ous versions of DE algorithm were used to solve different problems
heuristic, using the Darwinian principle [1,2]. Meta-heuristics are approaching chemical processes. Several examples are: determi-
universal methods, heavily used for solving problems in areas such nation of the optimal operation conditions of alkylation unit and
as: bioinformatics, computational chemistry and molecular biol- dynamic optimization of a batch reactor [6], optimization of the
ogy, chemical engineering and biosystems [3]. They are preferred feeding profile for an industrial scale baker’s yeast fermentation
over other methods because they are able to find good heuristics process [7], determination of optimal control policies in semi-
to complex optimization problems with multiple local optima [4]. /fed-batch reactors [8], multi-objective optimization applied to an
Other advantages of meta-heuristics are: (a) they do not require adiabatic styrene reactor [9] or to the oxidation process of puri-
special conditions for the properties of the objective function; fied terephthalic acid [10]. In combination with neural networks
(b) can be applied to continuous, combinatorial or problems with and other techniques, DE was applied to problems such as: nonlin-
mixed variables; (c) can be extended to resolve multimodal and ear system identification [5,11–13], modeling the true boiling point
multi-objective optimization problems [3]. curve of crude oil or the curve of the pressure effect on entropy [14],
In the majority of cases, the DE algorithm is appropriate at find- modeling the free radical polymerization of styrene performed by
ing optimal or near-optimal solutions to different case studies and the suspension technique [15]. Artificial neural networks (ANNs)
are widely applied in chemical engineering for solving different
types of problems from many processes, a vast amount of litera-
∗ Corresponding author. Tel.: +40 232 278683/2358; fax: +40 232 271311. ture addressing case studies, reviews and other types of research.
E-mail address: silvia [email protected] (S. Curteanu). In one of its works, Himmelbau [16] accounts some of its experience

1568-4946/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.asoc.2012.08.004
E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238 223

of oxygen-vectors such as hydrocarbons or perfluorocarbons into

Nomenclature the fermentation broths could increase the concentration of solved
oxygen, the efficiency of this method being correlated with the
˛ base term used in the mutation step broth’s characteristics [17]. Due to the differences between the
ˇ differential term used in the mutation step broths behavior in systems containing hydrocarbon droplets, the
ωi mutation solution obtained in the DE algorithm by oxygen transfer in the presence of n-dodecane has to be sepa-
applying the mutation step to the i vector. rately analyzed for different cultivated microorganisms types such
bk,U , bk,L upper and lower limits of the kth characteristic of as bacteria or yeasts. The proposed neural network and DE based
the DE individual methodology proved to be an adequate alternative for modeling
Cr crossover probability and optimization of the mass transfer coefficient because accurate
d input dimension of the target function phenomenological models are difficult to develop and use due to
D dimensionality of the search space the processes complex internal interactions. Related to the optimal
F scale factor control of fermentation processes (domain from which the chosen
Fl , Fu lower and upper limits of the scale factor case studies belong to), different approaches can be encountered
FLow lower limit of the F parameter of the DE algorithm [18,19]. In spite of this fact, to our knowledge, for the selected
kj input of the jth neuron in the neural network process, there is not a methodology based on ANN and/or DE. Con-
kL a oxygen mass transfer coefficient sequently, the novelty of the proposed methodology consists not
G number of generations for the DE algorithm only in the new hybridization realized, but also in its application
Rand(0,1) random value between 0 and 1 on the actual case studies.
N number of training data Due to the big influence on the general outcome, optimal param-
Nh1 , Nh2 number of neurons in the first and second layers eter determination of an ANN is an important request for obtaining
Np number of individuals in the DE population acceptable results when neural networks are used to model a spe-
Pa power consumption for mixing of aerated broths cific problem. There are numerous methods for determining both
si,j sensitivity of the i input related to the j output the topology and internal parameters of neural networks [20],
m
si,i sensitivity of the i input related to the j output for among which the evolutionary algorithms distinguish as good and
the m input pattern reliable alternatives. In a previous work [21], a methodology com-
sp initial starting point of exponential crossover bining Non-dominated Sorting Genetic Algorithm II (NSGA-II) with
ui ith individual from the trial population obtained stacked neural networks was developed for modeling the synthe-
after applying the crossover step sis of polyacrylamide-based multi-component hydrogels. Similar
w̃ opposite value of a number w to the current case study, the dependencies between the process
wi,j weight between i and j neurons in the neural net- parameters were complex and difficult to model. Although both the
work algorithms were developed to model chemical processes and relied
X list of vectors representing the current population on the neuro-evolutive principles, the similarities stop here.
of potential solutions The optimization strategy proposed in this work combines a
xi ith potential solution from the current populations flexible and improved, modified DE algorithm, based on simple
of vectors self-adaptive variant (SADE), with a neural network model, result-
ing in a new methodology called SADE-NN-1. Its efficiency was
Abbreviations tested for determining the optimal or near-optimal neural network
ANN artificial neural network model of mass transfer coefficient in stirred bioreactors when dif-
Bipolar bipolar sigmoid activation function ferent broths are used and for the optimization of the fermentation
DE differential evolution process.
Hardlim hard limit activation function The improvements of the SADE-NN-1 consists in: (a) each
Lin linear activation function neuron can have one of the eight types of activation functions (hard-
Logsig logistic sigmoid activation function limit function, linear, bipolar sigmoid, log-sigmoid, tan-sigmoid,
MLP multilayer perceptron neural network sinus, radial basis and triangular basis); (b) a population initial-
MSE mean squared error ization, in which the opposition based principle is used to create
OBL opposition based learning better individuals was included in the algorithm; (c) a new muta-
P. shermanii propionibacterium shermanii tion mechanism was proposed within SADE-NN-1.
PSO particle swarm optimization In order to evaluate its performance, the proposed method was
Radbas radial basis activation function compared with an already existing phenomenological model deter-
RMSE root mean squared error mined in another study by means of a multiregression approach
S. cerevisiae saccharomyces cerevisiae [17]. This comparison has also the role of determining if the ANN
SA sensitivity analysis developed with SADE-NN-1 methodology is more appropriate for
SADE-NN-1 methodology based on self-adaptive DE com- process optimization than other models. After that, a sensitivity
bined with ANNs, OBL and a new mutation strategy analysis was performed in order to determine the influence of each
Sin sinus activation function input on the model output. The inputs are represented by a col-
Tansig tangent sigmoid activation function lection of four parameters of the process (biomass concentration,
Tribas triangular basis activation function superficial air speed, volumetric fraction oxygen vector, specific
power) and the output is the mass transfer coefficient.
Using the best neural network determined as model of the pro-
related to the use of ANNs in chemical engineering, the indications cess, the optimal conditions for which the mass transfer coefficient
he provides being useful for a lot of researchers. is maxim was determined. The same optimization methodology (as
In the present work, a methodology based on neural networks, for neural network determination) was applied, but with few mod-
determined with an improved self-adaptive DE algorithm is pro- ifications due to the nature of the optimization problem. In case
posed. This approach is used as an alternative method for the mass of neural network optimization, a series of parameters such as the
transfer coefficient in a system with oxygen-vectors. The addition number of hidden layers and the maximum number of neurons in
224 E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238

each hidden layer needed to be previously set in order to determine 2.1.1. Initialization
the maximum length of each individual in the DE methodology. For The initialization represents the first step of the algorithm.
the process optimization this is not necessary because the neural It is an important aspect because it can influence the outcome,
model was already optimized using the data describing the process. improper initialization leading to high errors. For example, if all the
In addition, no dataset is used, the maximum length of the individ- individuals in the population are initialized with the replicas of a
uals was very easy to compute, and the fitness function was set as single vector, uniform crossover and differential mutation will gen-
being the output of the neural model. erate more replicas [2]. As a result, the population does not evolve
A short review is included in this paper to present the DE algo- and the fitness of the best individuals remains constant.
rithm as a promising alternative tool for process optimization or The initialization of the population with constraint character-
for neural network model optimization. The review emphasizes istics cannot be performed until the entire boundaries are known.
some characteristics of DE, advantages and disadvantages, its evo- For the most real-world problems, the existence of natural phys-
lution, variants and applications. The improved variant developed ical limit or logical constraints imposes different values for each
and applied in our work is presented comparatively with the exist- parameter and their initialization is a straightforward process.
ing variants, being better framed in the actual state of the field. For parameters with no obvious limits, Price et al. [2] consider
The motivation of adding this review to the current work is related that the bounds must be set in order to encompass the optimum.
to the efficiency and effectiveness of this algorithm for predic- Because not always the optimum values are known, for the uncon-
tion and classification problems approached from the chemical strained parameters, after the initialization, the boundaries must
engineering field, for both model (neural network) and process be ignored so that DE can explore the search space beyond the
optimization. The case studies of this article are examples which bounding box.
sustain the above consideration, along with other works belonging
to our group [20,22,23]. 2.1.2. Mutation
The paper is organized as follows: Section 2 presents the general In the context of EAs, the mutation is seen as a change with a
principles of DE and a series of theoretical aspect regarding self- random element, but in the DE case, a new individual is created by
adaptation. In Section 3, the mechanism of combining DE algorithm adding a scaled differential term to a base vector (individual):
with neural networks is detailed. Section 4 contains a description
of the proposed methodology, along with its motivation. In Sec- ωi = ˛ + F · ˇ (2)
tion 5, the sensitivity analysis procedure and their importance in
the modeling methodology is tackled. General information and the where ˛ is the base vector, ˇ is the differential term, and F is the scal-
database describing the process are presented in Section 6. The ing factor. The differential term is determined by the difference of
results obtained from the simulations are detailed in Section 7. The two distinct, randomly chosen vectors: ˇ = xk − xp . The base vector
last section concludes the paper. is also randomly chosen and, in order to achieve good convergence
speed and probability, Price et al. [2] indicate that all vectors used
in the mutation step must be distinct. This enables the formation
2. Differential evolution algorithm
of a geometric triangle in the search space where the three vectors
exists as vertices [25].
Since proposed by Storn and Price in 1995 [1], the DE algorithm
The role of the F parameter is to control the rate at which the
has undergone a series of transformations, literature presenting
population evolves. The predefined interval in which it can take
a multitude of variants, from simple modifications to hybrid ver-
values is (0,1) and, although no upper limit has been establish
sions. Although the list of modifications is large, the base principles
for this parameter, values greater than 1 are seldom effective and
are the same. The algorithm starts with a pool of potential solu-
require more computation time [2]. When F > 1 the differential term
tions X = {x1 ,x2 ,. . .,xNp }, where Np is the population number. As in
is scaled up and when F < 1 it is scaled down. When the scaling down
every evolutionary algorithm, the steps of mutation, crossover,
is performed, some points can be prevented from falling outside the
and recombination are performed until a stop criterion is reached.
optimum boundaries. When scaling up, the number of evaluation
Finally, the solution is chosen to be the individual with the best
functions has a tendency to get bigger [26].
fitness function.
In order to avoid premature convergence, F must be high enough
to counteract the selection pressure [2] and Zaharie [27] demon-
2.1. Steps and general principles of DE strated that when values fall below a limit Flow , the population could
converge even in the situation of absent selection pressure.
The general problem of an optimization algorithm is to find xi
so that f(xi ) is optimized, xi = {xi,1 ,xi,2 ,. . .,xi,D }, where xi,k is the kth
2.1.3. Crossover
characteristic of xi and D is the dimensionality of the function [24].
In this step, new individuals are created based on the current
In the DE algorithm, f(xi ) is considered to be the objective function
population and on the mutation vectors obtained in the previous
and xi is one of the individuals from population.
step. The population created is called trial population. While in case
The optimization performed with DE algorithm consists in locat-
of EAs the role of crossover is to combine features from different
ing the minima of the objective function by determining x* for
parents, in the DE algorithm the crossover allows the construction
which:
of offspring by mixing characteristics [28]. The level of construction
∀xi ∈ S : f (xi ) ≥ f (x∗ ) = f ∗ (1) performed by crossover cannot be achieved by mutation because it
perfectly creates, in a randomly manner, the diversity, but it cannot
where f is the objective function, f* is called global minimum and x* execute well the construction function [3].
is the optimized vector with parameters, called minimum location Generally, two variants of crossover are used in DE algorithm:
set. Knowing that max{f(S)} = −min{−f(S)}, the restriction to mini- binomial and exponential [2]. The binomial crossover is described
mization is without loss of generality because the fitness function by Eq. (3). The exponential crossover uses an initial start point (sp)
can be modified in order to accommodate the minimum [3]. and all the characteristics up to the sp point are copied into the
The sections below describe the steps of the algorithm (ini- new vector from the mutation vector. A random value between 0
tialization, mutation, crossover, selection), emphasizing particular and 1 is generated and, until it is bigger than Cr (crossover rate), the
characteristics of DE, in context of evolutionary algorithms (EA). characteristics from the current individual are copied to the trial
E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238 225

vector. After that, the remaining characteristics are taken from the that objective is met and, consequently, to call of the optimiza-
mutation vector. tion;
(b) population statistics—different information about the popula-
ωi,j if(rand(0, 1) < Cr) tion can be used to create a stop criterion. For example, an
ui,j = (3)
xi,j otherwise optimization can be halted when the difference between the
best and the worst objective function falls below a specified
The difference between the two types of crossover consists in limit;
the position of characteristics from distinct individuals. If in the case (c) limited time—there are cases when the computation time is a
of binomial type the components inherited from the mutant vector very important aspect and the optimization must stop, regard-
are arbitrarily selected, in the exponential type they are grouped less it found or not the optimal solution;
into one or two compact sequences [28]. The use of exponential (d) human monitoring—in cases of time consuming tasks, the
crossover over the binomial type increases the efficiency only for human can monitor the progress and, in response to the
a small part of existing problems [29], Qing [30] considering that perceived opportunities, the optimization can be altered;
crossover is not so important. (e) application specific—there are application that have their own
The control parameter Cr provides the means to exploit decom- termination criteria, and it is suitable to use them over some
posability and to provide extra diversity [2]. The interval in which general principles.
Cr can take values is (0, 1.0]. The optimal Cr depends on both the
problem and crossover type used. In the opinion of Davendra and In our work, the stop condition is represented by a combined
Onwublou [31], when using binomial scheme, the best results are criterion. The algorithm stops when the current generation or the
obtained with intermediate values. They also provide the inter- fitness function reaches the pre-established corresponding maxi-
val [0.8, 1.0] as containing the optimal values for Cr parameter. mum or minimum values.
Although these guidelines can be useful in some cases, choosing
suitable values is difficult for DE. The classical method for tuning
2.3. DE variants
the parameter values is the trial and error but, because it is prone
to errors and requires high computation time, in the most cases, it
DE is a powerful, effective and robust algorithm. This state-
is not suitable.
ment is supported by the multitude of works [2,15,36,37] in
which comparisons between DE and other algorithms indicated
2.1.4. Selection that, for various types of problems (such as unconstrained, multi-
In this step, the individuals from the current and trial popu- constraint nonlinear, multi-objective), DE is better not only in terms
lations compete to each other in order to survive to the next of solution performance but also in terms of speed. Due to these
generation. This type of selection is called one-to-one survivor properties, the DE algorithm is the first approach tested by vari-
selection [32], the best-so-far solution being always retained ous researchers when the problem is known to be difficult to solve.
because the population’s current best individual is replaced only For example, in [15], a classic, simple, and unimproved DE based
when a better one is found [2]. The comparison between the indi- methodology was compared with a GA method, when they were
viduals from the two populations determines in DE algorithm a applied for simultaneous topological and structural optimization
more tightly integration of recombination and selection than in of neural networks. The general performances of the two algo-
other EAs [2]. rithms were similar, with better results obtained by DE for a series
For constraint problems, a different approach regarding the of modeled parameters.
selection operator is proposed in [33,34]. The necessity for using By combining different types of mutation, crossover, recombi-
penalty functions is eliminated, the choice of individual resulting nation, and stop criteria, or by introducing new methods in the
from three rules [3]: (a) when both solutions are feasible, the indi- inner working of the DE algorithm, new variants (also called strate-
vidual with the lower objective function wins; (b) when there is one gies) are created. The main objective of this action is to improve
infeasible and one feasible individual, the feasible individual is cho- the algorithm and to make it more powerful and more robust [38]
sen; (c) when both individuals are infeasible, the one less infeasible because there are situations in which DE does not perform as good
is preferred. as the expectations [39]. The main problems that the researchers
encounter consists in:
2.2. Stop criteria
(a) stagnation—the situation in which a population-based algo-
The steps mutation, crossover, and selection are repeated until rithm does not converge even to a suboptimal solution, while
a stop criterion is reached. Depending on the type of problem, vari- the population diversity is still high [40]. For DE, the stagna-
ous stop criteria can be applied. The most used criterion and the one tion is the state where the population does not improve over
proposed in the classical DE algorithm is represented by the num- a period of generations, the algorithm not being able to deter-
ber of current generations reaching a predefined maximum number mine a new search space in order to find the optimal solution
of generations (G). Due to the randomness factor involved in evo- [31]. Various factors such as control parameters and problem
lutionary algorithms, the disadvantage of this criterion consists in dimensionality influence the stagnation [3,31,40].
the trial and error methods applied to find a suitable G [35]. (b) premature convergence—the case in which the population diver-
When the type of problem is constrained, the algorithm sity is lost. In all EAs, the premature convergence arises when
stops when all constraints are satisfied whereas, in the case of the characteristics of some high rated individuals dominate
multi-objective optimization, because of the nature of conflicting the population, determining it to converge to a local optimum
objectives, it is not always clear when the search must stop [2]. where the operators cannot produce any more descendents
Price et al. [2] enumerate a series of methods that can be used as better than the parents [41]. There is a close relation between
stop criteria: premature convergence, loss of diversity and population vari-
ance [42]. Preserving the population diversity helps to avoid
(a) met objectives—because there are problems for which the objec- premature convergence and stimulates the ability of the algo-
tive function’s minimum is known, it is easy to determine when rithm to follow the optimum [43].
226 E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238

(c) deterioration of performance—the performance deterioration of use of the objective function value in the mutation operation. The
the algorithm is determined by the increase of objective func- directed version supposes a modification that embeds an additional
tion’s dimensionality [39]; operation, directed mutation, in order to increase the convergence
(d) high computational time—is a result of high number of gener- velocity [25].
ations (G) combined with a high number of individuals (Np) Along with these variants, in the open literature, numerous
because at least G*Np function evaluations are necessary. For strategies can be found, their notations varying from researcher
most of the real-world problems, the evaluation of a candidate to researcher. Within the multitude of variants, the notation
solution is not difficult, but is time-consuming [25]. The long ‘Mode/DiffTerm/Cross’ seems to be insufficient, but a more appro-
computational time appears due to the stochastic nature of the priate notation has not been defined yet [32].
DE algorithm [44]. A solution to solve this problem is to limit
the algorithm to operate within an acceptable time interval and 2.4. Parameter tuning and self-adaptation in DE algorithm
thus obtaining an improved solution, although it may not be the
global optima [25]. The control parameters in DE are Cr, F, and Np and they have
(e) sensitivity/insensitivity to control parameters—the strategies an important role because they influence the effectiveness, effi-
determined so far by the researchers are more or less sensi- ciency and robustness of the algorithm [38]. Each parameter has
tive to the control parameters. Empirical studies indicate that its own influence, not only on the algorithm itself, but also on the
the more sensitive a strategy is, the better solution it can be other control parameters. Because the determination of the opti-
achieved [3]. mal parameters is problem dependent, there is not only a single
acceptable pair of values that can be used in solving all problems.
In order to overcome these problems, new strategies were cre- However, there are some general accepted limits for each parame-
ated. The main directions to improve DE are [38]: ter and some guide rules that can be applied in order to determine
their optimal values.
(a) replacing the hand-tuning of control parameters with adaptive or Cr and F affect the convergence speed and robustness of the
self-adaptive mechanisms—hand-tuning of the control param- search space [50]. Cr controls the number of characteristics inher-
eters can be time consuming due to the different influences of ited from the mutant vector and thus can be interpreted as a
the data used to solve a specific problem and due to the adopted mutation probability, while F influences the size of perturbation
strategy. The introduction of adaptive or self-adaptive mecha- and has a significant role in ensuring the population diversity.
nisms resolves this aspect by inserting the parameters into the When strong convergence is used (Cr = 0.1), the contour matching
algorithm itself; property of the DE algorithm is lost, and the search is performed
(b) introducing more mutation strategies during the optimization along the main parameter axes which is beneficial for separable
process—new different mutation strategies can be created in objective function [32]. Cr = 0.9 is recommended for near uni-modal
order to overcome distinct problem such as the rotationally problems or when fast convergence is desired [51]. Zaharie demon-
invariant problem; strated that a lower limit of F exists (FLow ), determined using Eq. (4),
(c) hybridizing DE by combining it with other optimization which depends on Cr and Np:
techniques—hybridization combines different features of differ-
ent methods in a complementary way, leading to more robust 2 2 Cr
2FLow − + =0 (4)
and effective optimization tools [45]. DE can be used with var- Np Np
ious methods such as opposition based learning (OBL) [46], If F < FLow , the population can converge even in the situation
energetic selection principle (ESP) [3], or particle swarm opti- of absent selection pressure [2,27]. When Np is too small, the
mization (PSO) [47]. stagnation appears, and when is too big, the number of function
evaluations rises, retarding the convergence. A correlation exists
Storn [48] presents 10 variants of the algorithm. In order to indi- between Np and F, being intuitively clear that a large Np requires
cate each variant, a coding ‘Mode/DiffTerm/Cross’ is used, where a small F because there is no need for large amplitude when the
‘Mode’ represents the mode in which the base vector of the muta- population size is big [3].
tion step is chosen, ‘DiffTerm’ represents the number of differential The various rules imposed on the control parameters do not
terms used for mutation and ‘Cross’ is the type of crossover. manage to determine optimal values, different methods being
The first two terms (‘Mode’and ‘DiffTerm’) refer to the character- proposed for solving this problem. These methods can be classi-
istics of the mutation step and the last one, ‘Cross’, is related to the fied into: (a) deterministic control—the parameters are found using
crossover step. The base vector for the mutation step can be chosen a deterministic law, without any feedback information [3]; (b)
randomly (Rand), as the best individual in the population (Best) or adaptive control—the direction and/or magnitude of the parameter
as a vector that lies on a line between the target and the best indi- change is determined using feedback information [24]; (c) self-
vidual (Rand-to-Best). Regarding the number of differential terms adaptive control—the parameters are encoded into the algorithm
used in the mutation phase, information coded with ‘DiffTerm’, one itself [3].
(Eq. (2)) or two terms are used, which are denoted with ‘1’and ‘2’. In self-adaptation, the concept of co-evolution, that is an effec-
As explained earlier, two types of crossover can be applied in the tive approach to decompose complex structures and to achieve
DE: binomial (Bin) and exponential (Exp). better performance, can be used to adapt the control parameters
For example, the first variant of the DE algorithm, which [52]. By reconfiguring itself, the evolutionary strategy is adapted
uses the random method of vector selection, along with one to any general class of problems [24], and, in this manner, the
differential term and binomial crossover, is called ‘Rand/1/Bin’. generality of the algorithm is extended. Among the studies on self-
The other versions are: ‘Rand/2/Bin’, ‘Rand/1/Exp’, ‘Rand/2/Bin’, adaptability of DE algorithms is the paper of Brest et al. [24]. For
‘Rand/2/Exp’, ‘Best/1/Bin’, ‘Best/2/Bin’, ‘Best/1/Exp’, ‘Best/2/Exp’, each individual in the new generation, F and Cr parameters were
‘Rand-to-Best/1/Bin’, ‘Rand-to-Best/1/Exp’. computed as:
A few years later, Fan and Lampien propose two more strategies:
trigonometric and directed. The trigonometric version (TDE) [49] Fl + rand1 · Fu , if rand2 < 1
introduces new form of mutation into the original ‘Rand/1/Bin’. The Fi,G+1 = (5)
Fi,G , otherwise
main difference between the older versions and TDE consists in the
E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238 227

rand3 , if rand4 < 2 3. Differential evolution algorithm and neural network
Cri,G+1 = (6)
Cri,G otherwise In this work, the DE methodology was used to determine the
topology and internal parameters of neural networks. The advan-
where Fl , Fu are the lower and upper limit of the F parameter, 1 tages of using evolutionary algorithms (EAs) over other approaches
and 2 are the probabilities to adjust F and Cr, and randi , i = 1. . .4, consist in the ability to escape local optima, ability to adapt it in a
are uniform values in the interval [0,1]. changing environment, robustness [55]. Evolution in ANNs can be
Zhang et al. [53] proposed a novel self-adaptive differential evo- performed at three different levels [56]:
lution algorithm (DMSDE) in which the population was divided
into multi-groups individuals. The difference between the objective (a) connection weights—the determination of connection weights
functions of individuals from the current group influence the scal- is also known as training phase and this step is usually formu-
ing factor F and the crossover rate Cr, and the strategy is constructed lated as the minimization of an error function such as the mean
based on Eqs. (7) and (8). squared error (MSE);
(b) architectures—evolutionary approaches concerning architec-
fgt middle − fgt best
t
Fgi = Fl + (Fu − Fl ) · (7) ture determination enables the automatic ANN design without
fgt worst − fgt best human intervention;
(c) learning rules—the adaptation of learning rules is performed
where Fgit is the scaling factor of the ith vector of gth group from through evolution.
the current generation t, Fl and Fu are the lower and upper limits
of F parameter, fgt best , fgt middle , fgt worst are the best, middle and worst Another aspect regarding the evolution of ANNs consists in the
fitness functions of the three randomly selected vectors from the g period in which these levels of evolution are performed. In litera-
group, in the generation t. ture, three cases are encountered: evolution of weights, evolution
of architectures, and evolution of both weights and architecture.
The methodology proposed in this work belongs to the last group,
⎧ t
fgit < fgt
⎨ Crgi ,
⎪
the determination of the weights and architecture being performed
t
Crgi = fgit − fgt min (8) simultaneously.
⎩ Crl + (Cru − Crl ) ·
⎪ , fgit ≥ fgt
fgt max − fgt min When using evolutionary algorithms, several features can be
encoded and co-evolved at the same time, the definition of per-
where Crgit is the crossover of the individual i from the group g in formance becoming more flexible than the definition of an energy
or error function [57]. Because EAs do not depend on gradient infor-
the generation t, Cru is the upper limit and Crl is the lower limit
mation like gradient-descent based algorithms, it is not necessary
of the Cr parameter; fgt max , fgt min are the maximum and minimum
that the fitness function to be differentiable or even continuous
values of the fitness functions of all the individuals in the g group
[56].
at t generation, fgit is the fitness of the i individual from the g group,
In the neuro-evolution field, the coding (representation) used
and fgt is the average value of the fitness of all individuals in the can be classified into three classes: direct, developmental, and
group g. implicit [57], and a specific terminology is used to represent the
Recently, Pan et al. [51] created a new DE algorithm (SspDE) with ANN structure and the individuals used in evolutionary algorithms.
self-adaptive trial vector generation strategy and control param- Therefore, evolutionary algorithms work with a population of geno-
eters. Three lists were used: strategy list (SL), mutation scaling types and the neural network is represented by a phenotype. In the
factor list (FL), and crossover list (CRL). Trial individuals were cre- direct encoding scheme, there is a one-to-one relationship between
ated during each generation by applying the standard mutation the phenotype and genotype. Distinctively, the indirect encoding
and crossover steps, which use the parameters in the target asso- has a more compact representation which tries to copy the principle
ciated lists. If the trial was better than the target, the parameters of gene reusability in the biological development [58]. The devel-
were then inserted in the winning strategy list (wSL), winning F list opmental encoding is based on the use of a genome that directs a
(wFL), and winning Cr list (wCRL). SL, FL, and CRL were refilled after process towards the construction of the network, allowing in this
a predefined number of iterations, with a great probability from the manner a more compact representation [59].
wining lists or with randomly generated values. In this manner, the The DE algorithm, in combination with neural networks, is
self-adaptation of the parameters followed the different phases of largely used to determine the optimal parameters of the net-
the evolution. work (training) and scarcely for determining both topology and
In general, the mathematical formulas used for self-adapting the internal parameters. Plagianakos et al. [59] used DE algorithm
parameters of the DE algorithm are not so complex as (7) and (8). for training neural networks. The methodology was tested on the
In our work, the parameters of the algorithm evolved in the same Encoder/Decoder Feedforward Neural Network training problem,
time as the individuals, no new formula being applied to deter- which is very close to a real world pattern classification task. Vari-
mine their values and no additional lists were used. In this manner, ous activation functions were used and, in each case, the percentage
the complexity of the algorithm is kept approximately the same of success was 100%.
as the other non self-evolutive strategies. The idea behind the pro- Subudhi and Jena [5,13] proposed a nonlinear system identifica-
posed strategy was to use the F and Cr parameters embedded in tion scheme using neural networks trained with DE and Levenberg
the target vector, the parameters being evolved in the same man- Marquardt (LM) algorithm. DE was used to find approximate val-
ner as all the characteristics of the individuals. Abass [54] used the ues near the global minimum and LM, which is a faster convergence
same mechanism of adaptation in the Self-adaptive Pareto Differ- algorithm, continued the search (local search). The results obtained
ential Evolution (SPDE) algorithm. The novelty of the methodology indicated that the methodology was better than other existing
proposed in this paper consists in combining DE with the sim- approaches for nonlinear system identification.
ple self-adaptive principle, with the opposition based initialization Zarth and Ludemir [60] developed a multimodal methodology in
and with the modified mutation, and, after that, applying it for which parallel subpopulation of DE are used to train and determine
two types of optimization: neural network architecture and mass the optimal topology. Compared with other existing approaches for
transfer coefficient when different broths are considered. nonlinear system identification methods found in the literature, the
228 E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238

neural networks determined for three classification problems were different transfer functions. Consequently, eight types of activation
less complex and had higher generalization capabilities. functions were used: Linear, Hard Limit, Bipolar Sigmoid, Loga-
Cruz-Ramirez et al. [61] developed a Memetic Pareto Differ- rithmic Sigmoid, Tangent Sigmoid, Sinus, Radial Basis with a fixed
ential Evolution Neural Network (MPDENN) methodology for the center point at 0, and Triangular Basis functions. These represent
automatically design of neural network models with sigmoid only a fraction of activation functions that can be used and were
basis units, applied to multi-classification tasks. The structure and chosen in order to enlarge the mathematical complexity of the neu-
weights were determined by a Pareto DE approach augmented with ral network model. In this manner, the complex interaction within
a local search algorithm based on an improved Resilient back prop- the process can be modeled with great accuracy.
agation algorithm. As stated earlier, initialization is an important step of DE (and in
In this work, the neural networks determined with the DE algo- every evolutionary algorithm) because the algorithm performance
rithm are the fully connected multilayer feed-forward perceptrons dependents not only on the evolutionary process, but also on the
(MLPs). The MLP network type was chosen due to its capability of initial population. All parameters are initialized using random val-
universal approximator and its simple and flexible structure. The ues within the accepted range. For weights, no limits are imposed,
main goal of neural network modeling was to predict the oxygen their initial values being randomly generated in the interval (0, 1).
mass transfer coefficient of stirred bioreactors depending of the In the current work, two types of initialization have been used:
process conditions: biomass concentration, superficial air speed, normal distribution and normal distribution combined with OBL.
volumetric fraction oxygen vector, specific power. In a series of studies, OBL was introduced into the DE algorithm in
The DE algorithm was chosen for ANN optimization because two phases – initialization and creation of new individuals – with
among other EAs it distinguishes as a powerful and robust encouraging results [46,62,63]. The mutation and crossover steps
algorithm which has a few control parameters. In the neuro- form the phase of creation of new individuals.
evolutionary field, the GA was primarily used for training and The OBL was proposed in 2005 by Tizhoosh [64] and is based on
topology determination, but this trend is now changing, DE begin- the opposite number theory, where an opposite number is defined
ning to replace GA since it is more powerful. Another reason for as:
choosing DE over other EAs is represented by the fact that its
w̃ = a + b − w (9)
application in chemical engineering field, especially in combination
with neural networks, are still scarce, the good results obtained in w̃ is the opposite number and w ∈ [a, b].
this study indicating that the specific properties of the proposed Initially, the population was generated as usually, with the
methodology (generality, flexibility, robustness) makes it suitable normal distribution. After that, the opposite individuals were cal-
for different problems. culated by applying Eq. (9) for each characteristic. From the reunion
of the two populations, Np individuals with the highest fitness
functions were chosen. With higher fitness individuals in the ini-
4. Improved self-adaptive DE combined with neural tialization step, the probability of obtaining better performance
networks (SADE-NN-1) indexes at the end of the evolution rises.
Along with the self-adaptation and OBL initialization, a modified
The methodology proposed and applied in this article combines mutation strategy was used in the proposed SADE-NN-1. It con-
different aspects of DE with ANNs, in order to create a new and sists in ordering the individuals participating in the mutation phase
effective algorithm for two types of optimization: ANN topology based on their fitness function. For example, in case of ‘Rand/1’
and chemical processes. The neural network acts as a model while versions, the three random chosen vectors x1 , x2 , x3 are ordered
the DE algorithm is the solving method for optimizing both the from the biggest to the lowest fitness function, higher fitness val-
model and the process. ues indicating better individuals. In this manner, the base vector is
The improved simple self-adaptive differential evolution (SADE- the fittest between the three and the mutation becomes an opera-
NN-1) is a flexible algorithm that can work having as a base various tion in which the best individual from a group is changed in order
variants of DE. The self-adaptive mechanism is a simple one, the to create a better individual. This approach is a greedy one and it is
main idea consisting in the fact that it is not necessary for new sets based on the idea that better individuals can create better children.
of equations to determine F and Cr corresponding to each individ- Comparisons between the simple self-adaptive version with clas-
ual. In our variant, F and Cr are evolved with the population itself, sical mutation and the one with modified mutation are performed
the new values being determined by the same equations as the ones in order to determine which one is the best in the case of optimal
used to create new individuals. Fine-tuning of the DE parameters is neural network determination for the current process.
a difficult problem, their values influencing the performance of the When the modified mutation scheme is used in combination
algorithm in manner not yet fully understood by the scientists. The with ‘Best/1/Cross’ version, the idea behind ‘Best/1/Cross’ remains
most used technique is trial and error, but, because it is prone to the same, the base vector being represented by the best individual
errors and it requires long computational time due to the necessity in the generation. Only the vectors participating in the differential
of performing various runs with different combination of values, term are arranged based on their fitness function.
we considered that a self-adaptive variant is a better choice. Two aspects determine the flexibility of the proposed algorithm,
The basic principle of the methodology is related to the obser- SADE-NN-1. First, the type of DE variant considered as starting point
vation that, like for the majority problems solved with DE, the can be modified by the end-user from the interface with just a sim-
simultaneous optimization of neural network topology and internal ple click. Second, the types of initialization and mutation can be
parameters is performed with greater success when self-adaptive easily changed.
variants were chosen [23]. In our SADE-NN-1 algorithm, the same At the beginning of the algorithm, the base variants and the
formulas for evolving the individuals and control parameters were mutation scheme used along with the maximum number of neu-
used. For instance, in the case of Rand/1/Bin variant, combined with rons in the hidden layers of the neural network are chosen by the
normal mutation, Eqs. (2) and (3) have been applied for computing user. The network parameters influence the length of characteris-
the values of the control parameters of each individual. tics that an individual has, the necessity of setting this parameter
The activation functions have a significant contribution to the before the algorithm starts being obvious.
performance of a neural model. Complex dependencies between During the run of the algorithm, the information between the
inputs and outputs can be modeled with high accuracy using DE and the developed ANN flows in two ways: from the DE to the
E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238 229

network and vice versa. At each generation, new individuals rep- current generation must be smaller than the maximum number of
resenting encoded new networks were created and each network generations allowed (G = 500) and the fitness to have a value smaller
(determined after decoding the individual) calculated the fitness than 10−6 . The limit of the fitness function was chosen based on
function for the DE. Then the algorithm uses the fitness in order to the consideration that the methodology performs its maximization
determine the best individuals for the next generation. with a good ratio between running time and performance.
Because of the specific nature of the DE algorithm, which uses The type of the optimization and, indirectly, the manner in
real-value vectors, the direct encoding was used. The vectors from which the objective function is computed, has a big influence on
the genotype have a specific structure and contain all the param- the complexity of the algorithm, since for neural network optimiza-
eters of the neural networks: number of hidden layers, number tion the objective function is dependent of computing performance
of neurons in each hidden layer, weights, biases, and activation indexes based on a relatively high number of data (length of the
function of each neuron. training data set). When considering the built in functions provided
For the current methodology, at least two reasons can explain by the .NET Framework, which is the framework used to implement
the computation time: a large network needs a long vector and the current methodology, the complexity is O(n5 log(n)) and when
DE population contains Np vectors. For example, an ANN with 4 not, the complexity is O(n4 ).
inputs, 3 hidden layers, 20 neurons in first hidden layer, 15 in On what concerns flexibility and scalability, different aspects
the second hidden layer, 10 in the third hidden layer and 5 out- can be considered. From the functional scalability point of view, due
puts, coded as 4:20:15:10:5, has 4*20 + 20*15 + 15*10 + 10*5 = 580 to the modular approach, the application allows an easy introduc-
weights, 4 + 20 + 15 + 10 + 5 = 64 neurons and the vector containing tion of new modules and functionalities. If the aspect considered
all these data has 1 + 3 + 580 + 64*2 = 712 characteristics. Knowing is the load scalability, the methodology internally manages the
that the majority of processes can be modeled with great success resource consumption and computational time by (i) automatically
with networks having maximum two hidden layers, in our work adjusting the lengths of the individuals of the optimization proce-
the number of maximum allowed hidden layers was limited to two. dure based on the characteristics of the problem being solved and
Taking into account that the length of each individual and the posi- on the application settings, and (ii) stopping optimization based on
tion of each neural network property is fixed during the evolution the performance of the solutions obtained at each step.
and is determined automatically based on network topology and
application settings, the vector has the following structure:

5. Sensitivity analysis
- the first position in the vector is occupied by the number of hidden
layers (Nl);
When creating models for specific problems, a key question is:
- the second and third positions are kept for the number of neurons
which are the input variables and which of them are the most
in the first (Nh1 ) and second intermediate layers (Nh2 );
important? The answer to such a question is given by a procedure
- the next positions are reserved for the weights. The number
based on sensitivity analysis (SA). SA can be used to perform a vari-
of reserved positions is equal to: number of inputs × maximum
ety of actions such as: ranking the inputs based on their influence
number of neurons in the first hidden layer + maximum number
on the output, assessing changes in the output due to the variations
of neurons in the first hidden layer × maximum number of neu-
in the inputs, improving quality of the computations or limiting the
rons in the second hidden layer + maximum number of neurons in
use of a program [66]. For models represented by neural networks,
the second hidden layer × number of outputs. Depending on the
SA can be used to examine whether the characteristic of each input
actual values of Nh1 and Nh2 , unoccupied position can exists. These
has been learned well or to explore the sensitivity of the output to
are filled with noise data copied from the parents. In a sense, this
the variation of each input [67].
noise data can be viewed as inactivated genes from the biological
When performing SA, different methods can be utilized: adding
genome that became activated only in specific conditions.
noise to each input and observing the modification in the outputs,
- after that, the biases and activation functions for each neuron
analyzing the derivatives of the fan-out weights of the inputs units
follow. A specific number of positions is reserved for the neu-
or applying the missing value problem [68].
ron’s parameters and it is equal to the maximum number of
In this work, the sensitivity of each output for a selected input
neurons allowed in the network multiplied by 2, since there are
is defined as a differential coefficient and is applied for ranking
two parameters for each neuron.
the inputs based on their influence on the model’s output by using
quantified information. The method based on partial derivatives is
An increased performance of the neural networks can be
encountered in the literature as ‘PaD’ and, by using this approach,
achieved resolving two aspects: optimizing the architecture and
two results can be obtained: profile of output for small input vari-
normalizing the raw data [65]. In this article, the data normaliza-
ations and a classification of relative contribution of each variant
tion was applied in order to ensure the best possible scenario in
[69].
which the networks can evolve and the optimization of the ANN
architecture was realized with DE algorithm.
A simplified logic schema of the proposed methodology is ∂outputj
sj,i = (10)
detailed in Fig. 1. ∂inputi
As it can be observed from Fig. 1, the SADE-NN-1 methodol-
ogy contains a module which performs the process optimization
where sj,i is the sensitivity of the output j related to the input i.
using an already determined neural network. This module was
Knowing that for a network with two hidden layers the outputj is
introduced according to the principle of code reusability and allow-
simply defined (without biases) as in Eq. (11), its sensitivity related
ing the DE algorithm to run in the same manner for each type of
to the each input pattern (m) becomes:
optimization (process or neural model).
The stop criterion employed in the proposed methodology con-
sists of a combination of two conditions, referring to the number of
Nh2 Nh1 I

generations and the fitness function, the algorithm stopping when outputj = g wj,h2 · g wh1,h2 · g wh1,i · input
at least one of them reached a predefined value. In order for the h2=0 h1=0 i=0
mutation, crossover, and recombination steps to be repeated, the (11)
230 E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238

Fig. 1. Simplified schema of the SADE-NN-1 methodology.

will tend to decrease, and, if it is positive, the output will tend to

increase [69].
Nh2 Nh1
m
The sensitivity of each output related to each input is calculated
si,i = g ′ (kj ) · g ′ (kh2 ) · wj,h2 · wh1,h2 · g ′ (kh1 ) · wh1,i (12) for all the training patterns, the global sensitivity being determined
h2=0 h1=0 with the Euclidian formula:

where kp represents the input of the p neuron and wp,h is the weight M
between pth and the hth neurons and g(p) is the activation function

si,i = m )2
(sj,i (13)
corresponding to each p neuron.
m , a series of information about the influence of the input m=1
Using si,i
on the output can be obtained by plotting a set of graphs. One exam- In order to obtain a reliable sensitivity analysis, the neural
ple of interpretation is that if the partial derivative is negative, for networks with the highest performances must be used; otherwise,
the specific input value given by the m input pattern, the output confusing results may be obtained [70]. Another aspect linked to
E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238 231

the reliability of the methodology is the interdependency between 2.74 0.75

−2 (Pa /V ) vs
the inputs. A series of papers indicate that the SA methods can be kLa = 6.72 × 10 (15)
Cx0.03
ineffective, specifically when inputs are interdependent [71]. Tak-
ing into consideration these aspects, the SA was applied for the best
where kL a is the oxygen mass transfer, vs is the superficial air
networks obtained in each case study of our article.
velocity, (Pa /V) is the specific power input, Cx is the biomass con-
centration and is the volumetric fraction of oxygen-vector. The
6. Case study average deviations are 9.6% for Eq. (14), and 10.8% for Eq. (15),
respectively. Although these values of average deviation are accept-
The aim of our works is to quantify the influence of n-dodecane able, the maximum deviations for both systems reached almost
addition, used as oxygen-vector, on oxygen transfer in stirred 35–38%, the highest values corresponding to the extreme levels of
bioreactors, for different fermentation broths (Propionibacterium specific power input and aeration rate [17]. Because the oxygen
shermanii and Saccharomyces cerevisiae cultures). The experiments supply is one of the limiting factors of the aerobic fermentation
for obtaining the database were carried out in 5 l (4 l working vol- processes, the selection of the optimum conditions has to be made
ume, ellipsoidal bottom) laboratory bioreactor (Biostat A, B. Braun on the basis of a more accurate model, which allows to dimin-
Biotech International), with computer-controlled and recorded ishing the differences between the real and simulated behaviors
parameters. The bioreactor mixing system consists of two turbine of the process. This is the main reason for considering a potential
stirrers and three baffles. The bioreactor and impeller characteris- better model for the two fermentation systems. In order to deter-
tics have been given in previous papers [17,72]. mine whether the proposed methodology is better than the existing
The upper stirrer was placed on the shaft at a distance of 64 mm models, the phenomenological model described above was used for
from the lower one. The rotation speed was maintained between comparison.
0 and 700 rpm, this mixing regime avoiding the ‘cave’ formation at
the broths surface and therefore the supplementary aeration.
7. Results and discussion
The sparging system consists of a single ring sparger with 64 mm
diameter, placed at 15 mm from the vessel bottom, having 14 holes
The previous experiments carried out for biosynthesis sys-
with 1 mm diameter. The air volumetric flow rate was varied from
tems containing yeast or bacterial cells indicated that the addition
0 to 450 l h−1 .
of n-dodecane as oxygen-vector induced the significant increase
In the experiments, the following broths have been used:
of oxygen transfer rate from air to microorganisms, without
needing a supplementary intensification of mixing. The influ-
(a) P. shermanii suspensions with biomass concentration, Cx ,
ence of oxygen-vector has to be analyzed in correlation with
between 30.5 and 120.5 g l−1 dry weight, and apparent viscosity
the broth characteristics, especially the presence and affinity of
of 1.8–4.7 cP.
biomass for oxygen-vector droplets. Thus, according to these
(b) S. cerevisiae suspensions with Cx = 43–150 g l−1 dry weight, and
studies, the yeast cells affinity for hydrocarbon droplets is more
apparent viscosity of 2.2–5.7 cP.
pronounced than that of bacterial cells, leading to the differ-
ent behavior of the systems from the viewpoint of oxygen
n-Dodecane was used as oxygen-vector. Its volumetric fraction transfer.
into the broth varied between 0.05 and 0.20. For kL a values deter- In this context, for obtaining accurate and useful results for
mination, the static method has been used [17]. For this purpose, the investigations concerning n-dodecane effect on aerobic fer-
the solved oxygen concentrations in broth were measured using mentations of bacteria (P. shermanii) and yeasts (S. cerevisiae),
an oxygen electrode of InPro 6000 Series type (Mettler Toledo). these systems have been separately analyzed and, implicitly,
The values of kl a were calculated from the slope of the straight modeled, taking into consideration the efficiency of oxygen trans-
line plotted by means of the dependence ln(Cl∗ /(Cl∗ − Cl )) vs. time fer, described by means of oxygen mass transfer coefficient,
(Cl – oxygen concentration, % saturation; Cl∗ – maximum oxygen kL a.
concentration for a given fermentation process, % saturation). The two case studies formulated above were approached sepa-
The database used in this work is composed of 192 data for rately, using the data described in Section 6. In order to determine
bacteria fermentation and 192 data for yeasts. Due to the charac- the fitness function characterizing the performance of the neural
teristics of the process and the systematic approach related to data networks, each group of data was split into 75% for training and
collection, the gathered data source has: (i) functional properties 25% for testing. The fitness function for the topological and internal
related to the existence of dependencies between inputs and out- optimization of neural network as a process model is represented
puts and (ii) formal properties related to the existence of a sufficient by 1/(MSEtraining + exp(−10)), where MSEtraining is the mean squared
number of training and testing data which are distributed all over error obtained in the training phase.
the search space. The neural networks that model the two processes Because DE algorithm has a randomness characteristic and small
have four inputs (biomass concentration, superficial air velocity, changes in the internal parameters of a network can lead to radi-
specific power, oxygen-vector volumetric fraction) and one output cal changes in the output (and in the fitness function too), various
(oxygen mass transfer coefficient). In this manner, the oxygen mass runs of the algorithm were performed in order to determine the
transfer coefficient is modeled as a function of the characteristics optimal neural network than can be used in modeling the bacteria
gathered from the process simulations. or yeasts.
In a previous paper [17], correlation between all this process The self-adapting variant proposed here includes the parame-
characteristics and their influence on the oxygen mass trans- ters F and Cr into each individual. Along with these parameters,
fer coefficient was determined using a multiregression method. in the DE algorithm, two more parameters are used: population
Consequently, the following equations (which represent classical size, Np, and maximum number of generations, G. In this approach,
mathematical models of the processes) were identified for bacteria their values are set at the beginning of the algorithm and remain
and yeasts broths, respectively: constant during the run. Taking into consideration the influence
of all parameters on the computational time and performance of
v0.97
s the methodology (as discussed earlier), the values chosen for these
kLa = 0.714 0.40 1.22
(14)
(Pa /V ) Cx parameters were Np = 500 and G = 500.
232 E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238

7.1. Bacteria fermentation Table 2

Neurons characteristics of the best neural network obtained, MLP (4:5:1).

Due to their specific properties, the increase of the biomass Type of layer Neuron parameters
concentration of P. shermanii leads to the decrease of kL a. This
Bias Activation function
intensification of mass transfer can be described by means of the
Hidden layer (H1.1): 0.099 (H1.1): Lin
amplification factor, defined by the ratio between kL a in the pres-
(H1.2): −0.241 (H1.2): Tansig
ence of oxygen-vector, (kL a)V , and in the absence of oxygen-vector, (H1.3): −0.625 (H1.3): Bipolar
(kL a)0 , for similar experimental conditions. The bacterial cells are (H1.4): 0.849 (H1.4): Lin
absorbed on the surface of the oxygen-vector determining an ini- (H1.5): 0.341 (H1.5): Sin
tial reduction of (kL a)V /(kL a)0 . Experimental data indicates that Output layer (O.1): −0.229 (O.1): Tansig
kL a increases when the n-dodecane concentration is lower than
0.1. The intensification of specific power input induces the fine
dispersion of n-dodecane and exhibits favorable effect on oxy- The best neural networks obtained in each case were MLP
gen transfer rate. However, the increase of energy dissipated by (4:8:1) and MLP (4:5:1), chosen because they have the highest fit-
mechanical agitation leads to a contrary effect on kL a, initially ness value among all neural networks determined. They correspond
increasing to a maximum value, followed by its reduction over to Best/1/Bin as DE base variant, normal initialization, modified
a certain level of mixing intensity. As it can be observed, differ- mutation and Best/1/Bin, opposition based initialization and nor-
ent strategies and actions have a distinct effect on the kL a factor, mal mutation, respectively.
a specific neural network model being necessary to simulate its As it can be observed, the binomial versions tend to behave
behavior as a function of the biomass concentration, superficial better than the exponential ones and the opposition based initial-
air velocity, specific power, and oxygen-vector volumetric fraction. ization gave better results than the classic versions. In all cases
These four inputs are related to the output variable, namely kL a (except normal initialization with ‘Rand/1/Exp’ and opposition
factor. based initialization with ‘Best/1/Bin’ and ‘Rand-to-Best/1/Bin’), the
Using various DE strategies as a base, a series of simulations were modified mutation outperforms the classic mutation. The best
performed in order to determine the best neural network model of neural networks determined with both initialization types were
the bacteria fermentation. The results obtained with the SADE-NN- obtained when the base DE version was ‘Best/1/Bin’. The best neu-
1 methodology are listed in Table 1. The first column indicates the ral network, MLP (4:5:1), was obtained with the classic mutation
DE base variant. The two types of mutation used were referred as form and opposition based initialization.
‘Normal’ and ‘Modified’. Normal initialization and opposition based All the neurons characteristics of the best network MLP (4:5:1),
initialization are used combined with different base variants and which was chosen based on the highest fitness function, are listed
mutations. For each combination of initialization type – DE base vari- in Table 2. In order to identify each neuron, a notation indicating
ant – mutation type of the SADE-NN-1 algorithm, ten simulations the layer and the number of neuron is used. For example, H1.2
were performed and the average values of the performance indexes indicates the second neuron in the first hidden layer and O.1 is
were computed. the first output neuron. Each neuron can have one from the eight

Table 1
Results of simulations obtained in the case of bacteria fermentation with the SADE-NN-1 methodology.

DE base variant The best neural network obtained Average

Mutation type MSE training MSE testing Fitness Topology MSE training MSE testing Fitness

Normal Initialization Rand/1/Exp Normal 0.029 0.027 34.035 4:10:1 0.033 0.031 29.323
Modified 0.029 0.028 33.548 4:9:5:1 0.033 0.032 29.425

Rand/1/Bin Normal 0.016 0.019 61.035 4:5:1 0.020 0.018 48.867

Modified 0.015 0.015 65.581 4:6:4:1 0.022 0.022 45.965

Best/1/Exp Normal 0.019 0.020 51.814 4:10:1 0.026 0.026 38.558

Modified 0.018 0.018 54.388 4:5:1 0.027 0.026 38.157

Best/1/Bin Normal 0.004 0.005 221.346 4:8:1 0.007 0.010 145.980

Modified 0.004 0.006 234.42 4:8:1 0.007 0.010 137.098

Rand-To-Best/1/Exp Normal 0.027 0.023 35.741 4:5:5:1 0.031 0.030 31.570

Modified 0.026 0.020 37.348 4:8:1 0.029 0.031 32.932

Rand-To-Best/1/Bin Normal 0.015 0.015 64.643 4:6:1 0.016 0.019 67.849

Modified 0.014 0.020 68.888 4:9:1 0.015 0.019 62.299

Opposition Based Rand/1/Exp Normal 0.02 0.022 49.474 4:7:1 0.023 0.025 41.569
Initialization Modified 0.019 0.016 51.669 4:10:1 0.024 0.021 41.430

Rand/1/Bin Normal 0.019 0.033 51.318 4:10:1 0.022 0.024 42.661

Modified 0.016 0.017 62.206 4:3:1 0.021 0.020 46.437

Best/1/Exp Normal 0.012 0.010 77.573 4:10:1 0.016 0.018 61.393

Modified 0.007 0.011 125.47 4:7:1 0.015 0.015 68.039

Best/1/Bin Normal 0.003 0.009 278.423 4:5:1 0.008 0.012 134.667

Modified 0.004 0.005 248.909 4:9:1 0.005 0.007 187.307

Rand-to-Best/1/Exp Normal 0.019 0.017 51.938 4:9:1 0.021 0.021 45.307

Modified 0.017 0.023 58.388 4:8:1 0.021 0.025 45.752

Rand-to-Best/1/Bin Normal 0.011 0.015 84.977 4:5:1 0.014 0.016 66.896

Modified 0.015 0.017 66.092 4:5:1 0.016 0.018 60.452
E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238 233

Fig. 2. Comparison between experimental data, phenomenological model and predictions of the model MLP (4:5:1) for the training data in case of bacteria fermentation.

activation functions, referred with the following notations: Lin for respectively. As it can be observed from the performance indexes
Linear function, Hardlim for Hard Limit function, Bipolar for Bipolar and from the two figures, the neural network renders better the
Sigmoid function, Logsig for Logistic Sigmoid function, Tansig for training data than the testing data. This fact is normal because the
Tangent Sigmoid function, Sin for Sinus function, Radbas for Radial training set was used in the phase when network learned the pro-
Basis function and Tribas for Triangular Basis function. cess characteristics, while the testing set was not previously ‘seen’
For the training data, the correlation between ANN prediction by the model. However, the small errors obtained in the testing
and experimental data had a value of 0.9878 and the root mean phase indicate that the obtained network can accurately model the
squared error (RMSE) was 0.0219, while for the testing data the process considered as case study.
correlation was 0.9563 and RMSE was 0.0353. Using Eqs. (14) and Figs. 2 and 3 (where the comparison is made point by point)
(15), determined with a multi-regression method in Matlab, for the and the error indexes previously given show that the neural model
training and the testing data sets, the mass transfer coefficient was is closer to the experimental data than the phenomenological
computed (phenomenological model). In this case, the correlation model obtained through regression. This proves that the proposed
for the training set was −0.2329 and RMSE was 0.2172, while for the methodology is better than other approaches used to predict the
training data the correlation was −0.2214 and RMSE was 0.2125. mass transfer coefficient.
The comparison between the prediction of the neural network After determining the best model of the process when bacteria
model MLP (4:5:1), the regression method and the experimental yeast is used, a sensitivity analysis is performed in order to deter-
data are shown in Figs. 2 and 3 for the training and testing data, mine the influence of each parameter on the output of the neural

Fig. 3. Comparison between experimental data, phenomenological model and predictions of the model MLP (4:5:1) for the testing data in case of bacteria fermentation.
234 E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238

Table 3 methodology is organized so that a maximization of the fitness

Sensitivity coefficients values of each input of the best neural network model
function is performed and a maximization of the mass transfer
obtained in the case of bacteria fermentation.
coefficient is desired, in this case, the fitness function is equal to
Input Process parameter corresponding to the input Sensitivity coefficient the arithmetic sum of the outputs. Knowing that the neural net-
1 Biomass concentration (g l−1 ) 10.95911 work model has just one output, the fitness function can be easily
2 Superficial air speed (m s−1 ) 6.71907 determined, being equal to the output of the neural network model.
3 Volumetric fraction oxygen vector 2.38750 For the end user, in the application interface, by using different
4 Specific power (W m−3 ) 3.56381
menus and by automatically changing the specific parameters and
functions, the neural network parametric and structural optimiza-
tion is separated from the process optimization. For this reason, the
network model. This approach was applied only on the best net- proposed methodology can be used by any kind of user, indiffer-
work, MLP (4:5:1), because it can lead to conflicting results when ently if it has or not specific knowledge regarding the methodology.
used on models with low performance values. The sensitivity val- Due to its stochastic nature, different results are obtained for
ues associated with each input and the input meaning are listed in each run of the algorithm. A list of 10 cases, for which the maxi-
Table 3. mization of the mass transfer coefficient when bacteria broths are
Higher values of sensitivity indicate that the specific input has used, is presented in Table 4. The meaning of each input specified
a high influence on the network output. Consequently, the input in Table 4 is described in Table 3.
influence in descending order is the following: 1, 2, 4, 3. For a gen- As it can be observed from Table 4, the differences between the
eral model, the values of sensitivity coefficients obtained with the fitness function are smaller than 0.001 and it is concluded that all
approach described in Section 5 are organized in a matrix, where sets of parameters are around the global maximum.
the rows represent the inputs and the columns represent the out- From the bioprocess point of view, the results given in Table 4
puts. Because the model used in this case has only one output, just suggest that the maximum value obtained for oxygen mass trans-
a column of data was obtained. fer can be reached even for concentrated bacteria broths (30 g l−1 )
The results obtained from the sensitivity analysis of the neu- at low mixing intensity, if the superficial air velocity and the vol-
ral network model are similar with the ones observed when the umetric fraction of oxygen-vector are high. If the cost with power
laboratory experiments were carried out. In the studied aerobic consumption is taken into consideration, this combination repre-
fermentation systems, from the phenomenological point of view, sents the optimum one for the analyzed fermentation process.
the most significant influence on oxygen transfer is attributed to
biomass concentration, due to the following phenomena [73]: (a) 7.2. Yeast fermentation
modification of rheological characteristics of broths during fermen-
tation process, especially the increase of apparent viscosity due to The behavior of yeasts cells, compared with bacteria cells, is dif-
biomass accumulation; (b) obstruction of mass transfer, owing both ferent because of their affinity for hydrocarbon media, the increase
to the reduction of oxygen solubility, and to the blocking effect cre- of S. cerevisae mass concentration leading to a decrease of kL a. Con-
ated by cells adsorption to the air bubbles surface. In the same time, trary to the behavior of P. shemanii, the decrease of the volumetric
the adsorbed solid particles can promote the surface renewal with fraction of n-dodecane determines the decrease of kL a. When inten-
favorable effect on oxygen mass transfer. sification of mixing is realized, the oxygen diffusion rate increases
The second important factor, namely aeration rate, influences due to the increase of free interfacial area. Because the behavior of
oxygen mass transfer through air hold-up, interfacial area value S. cerevisae is different from the one of the bacteria cells, the neural
and media circulation inside the bioreactor. The effect of oxygen- network determined for modeling bacteria cells cannot be used to
vectors on oxygen transfer rate is important, especially in the model the yeast, and a new set of simulation must be performed in
fermentation broths with higher viscosity or biomass concentra- order to determine the appropriate network topology and internal
tion due to the bacteria or yeasts accumulation. The last taken into parameters.
consideration parameter, the specific power consumption, is the As in the case of bacteria cells, simulations using various DE
parameter which indicates the turbulence degree and media circu- strategies as a base were performed for the yeast broth. The results
lation velocity, but it has to be correlated with the above mentioned obtained with the SADE-NN-1 methodology are listed in Table 5.
factors. The best neural networks had the topology 4:6:1 for normal ini-
After the sensitivity analysis was performed, the best neural net- tialization and 4:9:1 for opposition based initialization (Table 5).
work model was used in the optimization procedure applied to the For the majority of variants, the fitness of networks determined
process. The DE based methodology is searching for the optimal with the modified mutation is higher than the one of networks
process parameters which lead to maximization of the mass trans- determined with the classic (normal) mutation. In addition, in
fer coefficient. In this case, the neural network remains unchanged both of the mutation versions, when the binomial crossover is
and its role is to act as a process model by providing the mathe-
matical equations through the forward propagation function used
for determining the value of the output corresponding to the input Table 4
values. Although the same DE methodology is used, the optimiza- Optimal conditions for the maximization of the mass transfer coefficient in case of
bacteria fermentation.
tion procedure is slightly modified in order to accommodate the
new optimization problem. Consequently, the limitations concern- Run Input 1 Input 2 × 10−3 Input 3 Input 4 Output (Fitness)
ing the number of hidden layers and neurons in each hidden layer 1 29.51927 5.24875 212.95808 0.15526 0.67747
are not used because the neural network is determined from a pre- 2 29.55468 5.24866 213.80439 0.15554 0.67749
vious step, the length of each individual is changed and the fitness 3 29.50641 5.24474 213.79509 0.15512 0.67740
function is computed using another principle. Because the process 4 29.52328 5.25209 213.14236 0.15501 0.67743
5 29.50584 5.24404 213.42678 0.15486 0.67730
optimization involves only the parameters considered as inputs for 6 29.69535 5.24711 214.18417 0.15531 0.67729
the neural network, the length of each individual from the popu- 7 29.52156 5.24002 213.75447 0.15543 0.67739
lation is equal to number of inputs + 2. The last two characteristics 8 29.51374 5.24703 212.85931 0.15549 0.67753
correspond to F and Cr parameters of the DE algorithm because a 9 29.56118 5.24523 213.81442 0.15551 0.67741
10 29.55148 5.24900 214.03304 0.15532 0.67743
self-adaptive variant is applied. Taking in consideration that the
E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238 235

Table 5
Results of simulations obtained in the case of yeast fermentation with the SADE-NN-1 methodology.

The best neural network obtained Average

DE base variant Mutation type MSE training MSE testing Fitness Topology MSE training MSE testing Fitness

Normal Initialization Rand/1/Exp Normal 0.017 0.020 55.666 4:4:5:1 0.019 0.017 50.025
Modified 0.016 0.016 60.914 4:10:1 0.018 0.017 53.729

Rand/1/Bin Normal 0.006 0.008 149.39 4:5:1 0.008 0.009 114.284

Modified 0.005 0.005 181.43 4:8:1 0.008 0.009 120.824

Best/1/Exp Normal 0.013 0.016 73.086 4:9:1 0.014 0.014 65.643

Modified 0.003 0.005 289.15 4:6:1 0.012 0.011 98.580

Best/1/Bin Normal 0.002 0.005 476.42 4:6:1 0.002 0.004 359.966

Modified 0.001 0.004 597.71 4:6:1 0.002 0.004 390.598

Rand-To-Best/1/Exp Normal 0.012 0.010 79.138 4:5:1 0.015 0.012 62.201

Modified 0.015 0.011 64.742 4:10:1 0.016 0.012 59.525

Rand-To-Best/1/Bin Normal 0.005 0.005 187.74 4:6:1 0.007 0.007 135.490

Modified 0.004 0.005 226.27 4:7:1 0.006 0.007 149.851

Opposition Based Initialization Rand/1/Exp Normal 0.013 0.008 75.61 4:10:3:1 0.0147 0.013 66.563
Modified 0.009 0.015 109.04 4:10:1 0.014 0.013 68.102

Rand/1/Bin Normal 0.007 0.008 108.861 4:8:1 0.008 0.008 108.861

Modified 0.007 0.005 137.93 4:5:1 0.007 0.008 124.192

Best/1/Exp Normal 0.004 0.004 210.85 4:4:1 0.009 0.010 116.032

Modified 0.006 0.005 164.16 4:5:1 0.011 0.010 88.886

Best/1/Bin Normal 0.001 0.003 588.72 4:9:1 0.001 0.003 464.426

Modified 0.001 0.004 563.82 4:5:1 0.002 0.004 400.146

Rand-to-Best/1/Exp Normal 0.012 0.012 80.631 4:5:1 0.013 0.012 71.154

Modified 0.009 0.008 103.19 4:9:1 0.012 0.014 79.643

Rand-to-Best/1/Bin Normal 0.004 0.006 242.82 4:7:1 0.005 0.006 166.895

Modified 0.003 0.005 265.01 4:7:1 0.005 0.006 179.209

used, the fitness is higher than the fitness corresponding to the correlation = 0.0505 for the training data, and RMSE = 0.2719, cor-
exponential crossover. By comparing the normal initialization with relation = 0.4284 for the testing data.
the opposition based initialization, it can be observed that the The small errors between the experimental data and the pre-
former determined models with higher fitness values. Compared dictions obtained with the neural network show that the neural
with the best fitness obtained in the case of bacteria (determined network model could follow the process with an acceptable
when opposition based initialization, ‘Best/1/Bin’ variant, and clas- precision. In addition, the neural network is better than the phen-
sic mutation are used), the best fitness for the yeasts (using normal omenological model, this confirming the fact that the proposed DE
initialization, the ‘Best/1/Bin’ variant, and modified mutation) is algorithm is better and more robust than the existing methodolo-
higher. This means that the neural network modeling of the yeasts, gies used for predicting the mass transfer coefficient.
with neurons characteristics described in Table 6, gives more accu- A sensitivity analysis procedure is applied to the best neural net-
rate results in training and testing phases compared with the ones work, MLP (4:6:1), in order to determine the influence of each input
obtained with the neural network that modeled the bacteria broth. on the mass transfer coefficient. The sensitivity values are 5.514 for
The notations used in Table 6 are the same as the ones from the first input, 5.134 for the second, 2.644 for the third and 4.021 for
Table 2. the fourth. Consequently, the descending order of input influence
In Figs. 4 and 5, the differences between the experimental data, on the output is 1, 2, 4, 3. Owing to the above discussion concern-
phenomenological model results and predictions of the best neu- ing the role of each considered parameter on the oxygen transfer
ral network for the training and testing data are shown. Using the process inside the broth, their order of importance is similar to the
model MLP (4:6:1), for the training set, RMSE = 0.00903 and the case of P. shermanii fermentation.
correlation is 0.9946 and for the testing set, RMSE = 0.01411 and After that, with the best neural network model obtained, the
the correlation is 0.9842. For the phenomenological model, deter- process is optimized in order to determine the conditions for which
mined using the multi-regression methodology, RMSE = 0.2751 and the mass transfer coefficient is maximized when yeast broths are
used. The same modifications and procedure are applied as in the
previous optimization. A series of ten conditions that lead to the
Table 6
maximization of the considered parameter are listed in Table 7.
Neurons characteristics of the best neural network MLP (4:6:1).
The differences between the fitness values, as in the previous
Type of layer Neuron parameters case, are very small and around the global maxima. These results
Bias Activation function indicate that the proposed methodology is able to determine the
various conditions that can lead to process optimization.
Hidden layer (H1.1): −0.151 (H1.1): Sin
(H1.2): 0.041 (H1.2): Lin Due to the different behavior of yeasts in systems contain-
(H1.3): −0.106 (H1.3): LogSig ing hydrocarbons, the results from Table 7 underline the negative
(H1.4): −0.134 (H1.4): Lin influence of biomass concentration on oxygen transfer rate in
(H1.5): −0.010 (H1.5): Lin this fermentation broth. Therefore, the maximum value of kL a is
(H1.6): 0.922 (H1.6): TanSig
obtained at low yeast amount and high superficial air velocity, spe-
Output layer (O.1): 0.242 (O.1): Sin cific power input and oxygen-vector concentration.
236 E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238

Fig. 4. Comparison between experimental data, phenomenological model and predictions obtained with MLP (4:6:1) for the training data, in case of the yeast fermentation
process.

Fig. 5. Comparison between experimental data, phenomenological model and predictions obtained with MLP (4:6:1) for the testing data, in case of the yeast fermentation
process.

Table 7 on oxygen mass transfer in stirred bioreactors, for different fer-

Optimal conditions which lead to the maximization of the mass transfer coefficient
mentation broths (Propionibacterium shermanii and Saccharomyces
in case of yeast fermentation.
cerevisiae cultures). The oxygen mass transfer coefficient was
Run Input 1 Input 2 × 10−3 Input 3 Input 4 Output (Fitness) related to the working conditions: biomass concentration, super-
1 52.48671 5.23278 457.61143 0.15363 0.42771 ficial air speed, volumetric fraction oxygen vector, specific power.
2 54.81902 5.23523 452.90035 0.15503 0.42829 The modeling tool was based on neural networks, for which a DE
3 55.15521 5.20603 456.30984 0.15495 0.42823 methodology has been used to determine the topology and internal
4 56.31236 5.23270 452.79534 0.15497 0.42823
parameters. Simple multilayer perceptrons, with one hidden layer
5 58.72966 5.22904 457.35711 0.15469 0.42844
6 57.81024 5.24613 458.06709 0.15528 0.42912 and small number of intermediate neurons, accurately model the
7 53.55575 5.24458 458.01373 0.15508 0.42891 processes considered as case studies.
8 61.73874 5.21964 452.18218 0.15533 0.42808 The methodology proposed and applied in this article, namely
9 51.42616 5.24750 458.11657 0.15538 0.42903
SADE-NN-1, combines different aspects of DE, in order to create a
10 54.45847 5.21003 456.45343 0.15535 0.42854
new and effective algorithm for determining the near-optimal ANN
topology. Different base variants of DE were considered, combined
with a simple self-adaptive mechanism and with a modified muta-
8. Conclusions tion principle. Along with these improvements, modifications at
two levels (initialization of the DE algorithms and types of activa-
The aim of this work is to create a methodology for quantify- tion functions for the neural networks) were performed in order to
ing the influence of n-dodecane addition, used as oxygen-vector, increase the methodology’s performance.
E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238 237

We can not conclude which elements of the methodology is [15] S. Curteanu, F. Leon, R. Furtuna, E.-N. Dragoi, N. Curteanu, Comparison between
better due to different performance obtained in different situa- different methods for developing neural network topology applied to a complex
polymerization process, in: The 2010 International Joint Conference on Neural
tions, but according to process characteristics we can say that these Networks IJCNN, IEEE, 2010, pp. 1–8.
improvements added to the algorithm resulted in general higher [16] D.M. Himmelblau, Accounts of experiences in the application of artificial neu-
efficiency of the methodology. ral networks in chemical engineering, Industrial & Engineering Chemistry
Research 47 (2008) 5782–5796.
The sensitivity analysis applied with the best neural models rank [17] D. Cascaval, A.I. Galaction, E. Folescu, M. Turnea, Comparative study on the
the inputs according to their influence of the output. The results effects of n-dodecane addition on oxygen transfer in stirred bioreactors for sim-
are consistent with the experimental practice, showing the follow- ulated bacterial and yeasts broths, Biochemical Engineering Journal 31 (2006)
51–56.
ing descending order: biomass concentration, superficial air speed,
[18] J.S. Alford, Bioprocess control: advances and challenges, Computers & Chemical
specific power, volumetric fraction oxygen vector. Engineering 30 (2006) 1464–1475.
The DE optimization methodology was also applied for the [19] Y. Liu, W.L. Chen, Z.L. Gao, H.Q. Wang, P. Li, Adaptive control of nonlinear time-
varying processes using selective recursive kernel learning method, Industrial
determination of the optimal working conditions that lead to
& Engineering Chemistry Research 50 (2011) 2773–2780.
a maximum value of the oxygen mass transfer coefficient. [20] S. Curteanu, H. Cartwright, Neural networks applied in chemistry. I. Determina-
Neural networks developed with SADE-NN-1 were the models tion of the optimal topology of multilayer perceptron neural networks, Journal
included into the optimization procedures. Significant information of Chemometrics 25 (2011) 527–549.
[21] R. Furtuna, S. Curteanu, F. Leon, Multi-objective optimization of a stacked neu-
is obtained, useful for experimental practice. ral network using an evolutionary hyper-heuristic, Applied Soft Computing 12
The developed methodology proved to be flexible and efficient (2012) 133–144.
for the modeling and optimization of the oxygen mass transfer [22] E.-N. Dragoi, S. Curteanu, C. Mihailescu, Modeling methodology based on Dif-
ferential Evolution algorithm applied to a series of hydrogels, in: Proceedings
in stirred bioreactors. Because the characteristics of the problem ECIT 2010—6th European Conference on Intelligent Systems and Technologies,
being solved influences the performance of the methodology, fur- 2010.
ther studies and analysis must be performed in order to determine [23] E.-N. Dragoi, S. Curteanu, F. Leon, A.I. Galaction, D. Cascaval, Modeling of oxygen
mass transfer in the presence of oxygen-vectors using neural networks devel-
the situations in which each modification gives better results. oped by differential evolution algorithm, Engineering Applications of Artificial
Intelligence 24 (2011) 1214–1226.
[24] J. Brest, B. Boskovic, S. Greiner, V. Zumer, M. Maucec, Performance comparison
Acknowledgments of self-adaptive and adaptive differential evolution algorithms, Soft Computing
11 (2007) 617–629.
[25] H.Y. Fan, J. Lampinen, A directed mutation operation for the differential evo-
This work was done by financial support provided by EURODOC lution algorithm, Journal of Industrial Engineering-Theory Applications and
“Doctoral Scholarships for research performance at European level” Practice 1 (2003) 6–15.
[26] M.M. Ali, Differential evolution with preferential crossover, European Journal
project, financed by the European Social Found and Romanian Gov-
of Operational Research 181 (2007) 1137–1147.
ernment and by “Partnership in priority areas – PN-II” program, [27] D. Zaharie, Critical values for the control parameters of differential evolution
supported by ANCS, CNDI - UEFISCDI, project PN-II-PT-PCCA-2011- algorithms, in: P. Ošmera, R. Matoušek (Eds.), Proceedings of MENDEL 2002,
3.2-0732, no. 23/2012. 8th International Conference on soft Computing, 2002, pp. 62–67.
[28] D. Zaharie, Influence of crossover on the behavior of Differential Evolution
Algorithms, Applied Soft Computing 9 (2009) 1126–1138.
[29] J. Tvrdik, Adaptive differential evolution and exponential crossover, in: Inter-
References national Multiconference on Computer Science and Information Technology,
IEEE, 2008, pp. 927–931.
[1] R. Storn, K. Price, Differential Evolution—A Simple and Efficient Adaptive [30] A. Qing, Differential, Evolution. Fundamentals and Applications in Electrical
Scheme for Global Optimization Over Continuous Spaces, Technical Report Engineering, John Wiley & Sons, Singapore, 2009.
TR-95-012, Berkley, 1995. [31] D. Davendra, G. Onwubolu, Forward backward transformation, in: G.
[2] K.V. Price, R.M. Storn, J.A. Lampien, Differential Evolution. A Practical Approach Onwubolu, D. Davendra D. (Eds.), Differential Evolution: A Handbook for Global
to Global Optimization, Springer, Berlin, 2005. Permutation-Based Combinatorial Optimization, Springer, Berlin, 2009, pp.
[3] V. Feoktistov, Differential Evolution: In Search of Solutions, Springer, Berlin, 35–80.
2006. [32] R. Storn, Differential evolution research—trends and open questions, in: U.
[4] S. Olafsson, Chapter 21 Metaheuristics, in: S.G. Henderson, B.L. Nelson (Eds.), Chakraborty (Ed.), Advances in Differential Evolution, Springer, Berlin, 2008,
Handbooks in Operations Research and Management Science Simulation, vol. pp. 1–31.
13, North-Holland, Amsterdam, 2006, pp. 633–654. [33] J. Lampinen, Solving problems subject to multiple nonlinear constraints by
[5] B. Subudhi, D. Jena, An improved differential evolution trained neural network the differential evolution, in: R. Matousek, P. Osmera (Eds.), Proceedings of
scheme for nonlinear system identification, International Journal of Automa- MENDEL’01–7th International Conference on Soft Computing, 2001, pp. 50–57.
tion and Computing 6 (2009) 137–144. [34] E. Mezura-Montes, C.A. Coello Coello, E.I. Tun-Morales, Simple feasibility rules
[6] R. Angira, B.V. Babu, Performance of modified differential evolution for optimal and differential evolution for constrained optimization, in: R. Monroy, G.
design of complex and non-linear chemical processes, Journal of Experimental Arroyo-Figueroa, L.E. Sucar, H. Sossa (Eds.), Proceedings of the Third Mexican
& Theoretical Artificial Intelligence 18 (2006) 501–512. International Conference on Artificial Intelligence, Springer, New York, 2004,
[7] U. Yüzgeç, Performance comparison of differential evolution techniques on pp. 707–716.
optimization of feeding profile for an industrial scale baker’s yeast fermentation [35] K. Zielinski, R. Laur, Stopping criteria for differential evolution in constrained
process, ISA Transactions 49 (2010) 167–176. single-objective optimization, in: U. Chakraborty (Ed.), Advances in Differential
[8] M.D. Kapadi, R.D. Gudi, Optimal control of fed-batch fermentation involving Evolution, Springer, Berlin, 2008, pp. 111–138.
multiple feeds using Differential Evolution, Process Biochemistry 39 (2004) [36] J. Brest, S. Greiner, B. Boskovic, M. Mernik, V. Zumer, Self-adapting control
1709–1721. parameters in differential evolution: a comparative study on numerical bench-
[9] B.V. Babu, P.G. Chakole, J.H. Syed Mubeen, Multiobjective differential evolution mark problems, IEEE Transactions on Evolutionary Computation 10 (2006)
(MODE) for optimization of adiabatic styrene reactor, Chemical Engineering 646–657.
Science 60 (2005) 4822–4837. [37] M.E. Abdual-Salam, H.M. Abdul-Kader, W.F. Abdel-Wahed, Comparative study
[10] A.M. Gujarathi, B.V. Babu, Improved multiobjective differential evolution between Differential Evolution and Particle Swarm Optimization algorithms in
(MODE) approach for purified terephthalic acid (PTA) oxidation process, Mate- training of feed-forward neural network for stock price prediction, in: The 7th
rials and Manufacturing Processes 24 (2009) 303–319. International Conference on Informatics and Systems (INFOS), IEEE, 2010, pp.
[11] B. Subudhi, D. Jena, Differential evolution levenberg marquardt trained neural 1–8.
network scheme for nonlinear system identification, Neural Processing Letters [38] J. Brest, Constrained real-parameter optimization with e-self-adaptive dif-
27 (2008) 285–296. ferential evolution, in: E. Mezura-Montes (Ed.), Constraint-Handling in
[12] B. Subudhi, D. Jena, A combined differential evolution and neural network Evolutionary Optimization, Springer, Berlin, 2009, pp. 73–93.
approach to nonlinear system identification, in: TENCON 2008-2008b IEEE [39] R. Thangaraj, M. Pant, A. Abraham, A simple adaptive Differential Evolution
Region 10 Conference, IEEE, 2008, pp. 1–6. algorithm, in: World Congress on Nature & Biologically Inspired Computing,
[13] B. Subudhi, D. Jena, A differential evolution based neural network approach to 2009. NaBIC 2009, IEEE, 2009, pp. 457–462.
nonlinear system identification, Applied Soft Computing 11 (2011) 861–871. [40] F. Neri, V. Tirronen, Recent advances in differential evolution: a survey
[14] C.W. Chen, D.Z. Chen, G.Z. Cao, An improved differential evolution algorithm and experimental analysis, Artificial Intelligence Review 33 (2010) 61–
in training and encoding prior knowledge into feedforward networks with 106.
application in chemistry, Chemometrics and Intelligent Laboratory 64 (2002) [41] E.S. Nicoara, Mechanisms to avoid premature convergence of genetic algo-
27–43. rithms, Buletinul Universitatii Petrol-Gaze din Ploiesti 61 (2009) 87–96.
238 E.-N. Dragoi et al. / Applied Soft Computing 13 (2013) 222–238

[42] D. Zaharie, A comparative analysis of crossover variant in differential evolu- [59] V. Plagianakos, D. Tasoulis, M. Vrahatis, A review of major application areas of
tion, in: M. in:, M. Gazha, T. Paprzycki, Pelech-Pilichowski (Eds.), Proceedings differential evolution, in: U. Chakraborty (Ed.), Advances in Differential Evolu-
of International Multiconference on Computer Science and Information Tech- tion, Springer, Berlin, 2008, pp. 197–238.
nology IMCSIT, 2007, pp. 171–181. [60] A.M.F. Zarth, T.B. Ludermir, Optimization of neural networks weights and archi-
[43] D. Zaharie, Statistical properties of differential evolution and related random tecture: a multimodal methodology, in: Ninth International Conference on
search algorithms, in: P. Brito (Ed.), Proceedings of International Conference on Intelligent Systems Design and Applications, 2009, pp. 209–214.
Computational Statistics, Physica-Verlag, Heidelberg, 2008, pp. 473–485. [61] M. Cruz-Ramirez, J. Sanchez-Monedero, F. Fernandez-Navarro, J. Fernandez,
[44] C.W. Chiang, W.P. Lee, J.S. Heh, A 2-Opt based differential evolution for global C. Hervas-Martinez, Memetic pareto differential evolutionary artificial neu-
optimization, Applied Soft Computing 10 (2010) 1200–1207. ral networks to determine growth multi-classes in predictive microbiology,
[45] B. Qian, L. Wang, R. Hu, W.L. Wang, D.X. Huang, X. Wang, A hybrid differential Evolution of Intelligence 3 (2010) 187–199.
evolution method for permutation flow-shop scheduling, International Journal [62] S. Rahnamayan, H. Tizhoosh, M. Salama, Opposition-based differential evolu-
of Advanced Manufacturing Technology 38 (2008) 757–777. tion (ODE) with variable jumping rate, in: IEEE Symposium on Foundation of
[46] S. Rahnamayan, H. Tizhoosh, M. Salama, Opposition-based differential evolu- Computational Intelligence, IEEE, 2007, pp. 81–88.
tion algorithms, in: IEEE Congress on Evolutionary Computation CEC 2006, IEEE, [63] S. Rahnamayan, H. Tizhoosh, M. Salama, Opposition-based differential evo-
2006, pp. 2010–2017. lution, in: U. Chakraborty (Ed.), Advances in Differential Evolution, Springer,
[47] V. Ramesh, T. Jayabarathi, S. Asthana, S. Mital, S. Basu, Combined hybrid differ- Berlin, 2008, pp. 155–171.
ential particle swarm optimization approach for economic dispatch problems, [64] H.R. Tizhoosh, Opposition-based learning a new scheme for machine intel-
Electric Power Components and Systems 38 (2010) 545–557. ligence, in: International Conference on Computational Intelligence for
[48] R. Storn, On the usage of differential evolution for function optimization, in: Modeling, Control and International Conference on Intelligent Agents, Web
M.H. Smoth, M.A. Lee, J. Keller, J. Yen (Eds.), 1996 Biennial Conference of the technologies and Internet Commerce, 2005, pp. 695–701.
North American Fuzzy Information Processing Society–NAFIPS, IEEE, 1996, pp. [65] Q. Zhang, S. Sun, Weighted data normalization based on eigenvalues for arti-
519–523. ficial neural network classification, in: C. Leung, M. Lee, J. Chan (Eds.), Neural
[49] H.Y. Fan, J. Lampinen, A trigonometric mutation operation to differential evo- Information Processing, Springer, Berlin, 2009, pp. 349–356.
lution, Journal of Global Optimization 27 (2003) 105–129. [66] L.M. Liebrock, Empirical sensitivity analysis for computational proce-
[50] J. Ilonen, J.K. Kamarainen, J. Lampinen, Differential evolution training algorithm dures, in: P.J. Williams, M.A. Friedman (Eds.), Proceedings of the Richard
for feed-forward neural networks, Neural Processing Letters 17 (2003) 93–105. Tapia Celebration of Diversity in Computing Conference 2005, ACM, 2005,
[51] Q.K. Pan, P.N. Suganthan, L. Wang, L. Gao, R. Mallipeddi, A differential evolution pp. 32–35.
algorithm with self-adapting strategy and control parameters, Computers & [67] R. Tsaih, Sensitivity analysis, neural networks, and the finance, in: Interna-
Operations Research 38 (2011) 394–408. tional Joint Conference on Neural Networks IJCNN 99, vol. 6, IEEE, 1999, pp.
[52] C. Hu, X. Yan, An immune self-adaptive differential evolution algorithm with 3830–3835.
application to estimate kinetic parameters for homogeneous mercury oxida- [68] A. Hunter, L. Kennedy, J. Henry, I. Ferguson, Application of neural networks
tion, Chinese Journal of Chemical Engineering 17 (2009) 232–240. and sensitivity analysis to improved prediction of trauma survival, Computer
[53] X. Zhang, W. Chen, C. Dai, W. Cai, Dynamic multi-group self-adaptive differen- Methods and Programs in Biomedicine 62 (2000) 11–19.
tial evolution algorithm for reactive power optimization, International Journal [69] M. Gevrey, I. Dimopoulos, S. Lek, Review and comparison of methods to study
of Electrical Power 32 (2010) 351–357. the contribution of variables in artificial neural network models, Ecological
[54] H.A. Abbass, The self-adaptive Pareto differential evolution algorithm, in: Modelling 160 (2003) 249–264.
Proceedings of the 2002 Congress on Evolutionary Computation (CEC’02), IEEE, [70] P.M. Szecowka, M.A. Mazurowski, A. Szczurek, B.W. Licznerski, On reliability
2002, pp. 831–836. of neural network sensitivity analysis applied for sensor array optimization,
[55] H.A. Abbass, A memetic pareto evolutionary approach to artificial neural Sensor Actuator B: Chemical 157 (2011) 298–303.
networks, in: M. Stumptner, D. Corbett, M.J. Brooks (Eds.), AI 2001: Advances [71] M.A. Mazurowski, P.M. Szecowka, Limitations of sensitivity analysis for neural
in Artificial Intelligence, 14th Australian Joint Conference on Artificial Intelli- networks in cases with dependent inputs, in: IEEE International Conference on
gence, Springer, London, 2001, pp. 1–6. Computational Cybernetics 2006, IEEE, 2006, pp. 1–5.
[56] Y. Xin, Evolving artificial neural networks, in: Proceedings of the IEEE 1999, vol. [72] A.I. Galaction, D. Cascaval, M. Turnea, E. Folescu, Enhancement of oxygen
87, IEEE, pp. 1423–1447. mass transfer in stirred bioreactors using oxygen-vectors. 2. Propionibac-
[57] D. Floreano, P. Durr, C. Mattiussi, Neuroevolution: from architectures to learn- terium shermanii broths, Bioprocess and Biosystems Engineering 27 (2005)
ing, Evolution of Intelligence 1 (2008) 47–62. 263–271.
[58] J.B. Mouret, S.P. Doncieux, MENNAG: a modular, regular and hierarchical encod- [73] A.I. Galaction, D. Cascaval, C. Oniscu, M. Turnea, Predictions of oxygen mass
ing for neural-networks based on attribute grammars, Evolution of Intelligence transfer coefficients in stired bioractors for bacteria yeasts and fungus broths,
1 (2008) 187–207. Biochemical Engineering Journal 20 (2004) 85–94.