0% found this document useful (0 votes)
3 views

2015TrainingArtificialNeuralNetworkUsingModificationofDifferentialEvolutionAlgorithm

The document discusses a modified differential evolution algorithm (ISADE-ANN) for training artificial neural networks (ANNs), aiming to enhance the optimization of weight and bias settings. It highlights the limitations of traditional training methods like Back Propagation and presents ISADE-ANN as a more effective alternative with improved precision and performance. The paper is structured to cover the ANN training process, the differential evolution algorithm, and the proposed improvements in parameter control for better results in neural network training.

Uploaded by

Sairaj Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

2015TrainingArtificialNeuralNetworkUsingModificationofDifferentialEvolutionAlgorithm

The document discusses a modified differential evolution algorithm (ISADE-ANN) for training artificial neural networks (ANNs), aiming to enhance the optimization of weight and bias settings. It highlights the limitations of traditional training methods like Back Propagation and presents ISADE-ANN as a more effective alternative with improved precision and performance. The paper is structured to cover the ANN training process, the differential evolution algorithm, and the proposed improvements in parameter control for better results in neural network training.

Uploaded by

Sairaj Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/336550208

Training Artificial Neural Network Using Modification of Differential


Evolution Algorithm

Article · January 2015

CITATION READS

1 97

1 author:

Ngoc-Tam Bui
Shibaura Institute of Technology
73 PUBLICATIONS 350 CITATIONS

SEE PROFILE

All content following this page was uploaded by Ngoc-Tam Bui on 15 October 2019.

The user has requested enhancement of the downloaded file.


International Journal of Machine Learning and Computing, Vol. 5, No. 1, February 2015

Training Artificial Neural Network Using Modification of


Differential Evolution Algorithm
Ngoc Tam Bui and Hiroshi Hasegawa

 overcome the barriers of BP algorithm.


Abstract—Training an artificial neural network (ANN) is an In the last version of ISADE [5] we worked is to improve
optimization task where the result is to find optimal weight and self-adaptive differential evolution, to do this the three DE's
bias set of the network. There are many traditional method to mutation scheme operators are selected as candidates due to
training ANN, such as Back Propagation (BP) Algorithm,
their good performance on problems with different
Levenberg-Marquadt (LM), Quasi-Newton (QN), Genetic
Algorithm(GA) etc. Traditional training algorithms might get characteristics. These three mutation scheme operators are
stuck in local minima and the global search techniques might chosen to be applied to individuals in the current population
catch global minima very slow. Recently differential evolution with the same probability. The scaling factor F is calculated
(DE) algorithm has been used in many practical cases and has by ranking the population and applying formula of sigmoid
demonstrated good convergence properties. In DE algorithm function depend on the rank number of population size and
there are some parameters, which are kept fix throughout the
the crossover control CR is also adaptively changed instead
entire evolution process. However we have to tune value of
these control parameters and it is not easy to do. Therefore of taking fixed values to deal with different classes of
this research we apply the improvement of self-adaptive problems.
strategy for controlling parameters in differential evolution This paper is organized in the following manner. The
algorithm (ISADE-ANN) for training neural network. Section II describes training an artificial neural network.
Experiment results show that the new algorithm ISADE-ANN Section III gives a briefly introduce to the DE and related
has higher precision and better performance than traditional
work of DE. Section IV describes ISADE. Section V
training algorithms.
proposes apply ISADE for training some artificial neural
Index Terms—Neural network training, differential network. Finally, a few conclusions are given in Section VI.
evolution, global search, local search, multi-peak problems.

II. TRAINING ARTIFICIAL NEURAL NETWORK


I. INTRODUCTION The neural network is a large-scale self-organization and
Artificial Neural Networks (ANNs) is widely applied in self-adaptation nonlinear dynamic system. Artificial neural
many fields of science, in pattern classification, function network technology is an effective way to solve complex
approximation, optimization, pattern matching and nonlinear mapping problem. In numerous neural network
associative memories [1], [2]. Currently, there have been models, feed-forward multilayer neural network model is
many algorithms used to train the ANNs, such as back one of the most widely used models in current, there are
propagation (BP) algorithm, Levenberg-Marquadt(LM), many researches show that three-layer feed-forward neural
Quasi-Newton(QN), genetic algorithm (GA), simulating network can with arbitrary accuracy approximate any
annealing (SA) algorithm, particle swarm optimization (PSO) continuous function and its each order derivatives.
algorithm, hybrid PSO-BP algorithm [3], hybrid ABC-BP An ANN consists of a set of processing elements Fig. 1
algorithm [4] and so on. Back propagation (BP) learning can also known as neurons or nodes, which are interconnected
realize the training of feed-forward multilayer neural with each other [6]. In feed forward neural network models,
network. The algorithm mainly revises neural network shown in Fig. 2, each node receives a signal from the nodes
weights according to the gradient descent methods to reduce in the previous layer and each of those signals is multiplied
error. This kind of method calculates simply. But there are by a separate weight value. The weighted inputs are
still many drawbacks if neural network are used alone, for summed, and passed through a limiting function which
example, low training speed, easy to trap into local scales the output to a fixed range of values. The output of
minimum point, and poor global searching ability, and so on. the limiter is then broadcast to all of the nodes in the next
Though many improvements have already been carried on in layer. The input values to the inputs of the first layer, allow
this aspect, such as introducing momentum parameter, but it the signals to propagate through the network, and read the
can't solve problem by the root. The ISADE [5] can output values where output of the the node can be described
by (1).
Manuscript received September 20, 2014; revised November 21, 2014.
This work was supported in part by the U.S. Department of Commerce (∑ ) (1)
under Grant BS123456 (sponsor and financial support acknowledgment th
goes here). where yj is the output of node j, xi is the i input to the node
Ngoc Tam Bui is with the Graduate School of Engineering and Science, j, wij is the connection weight between the node i and node j,
Shibaura Institute of Technology, Japan (e-mail: bj is the threshold (or bias) of the node j, and fj is the node
[email protected]).
Hiroshi Hasegawa is with the College of Systems Engineering and transfer function. Usually, the node transfer function is a
Science, Shibaura Institute of Technology, Japan (e-mail: h- nonlinear function such as a sigmoid function, a Gaussian
[email protected]).

DOI: 10.7763/IJMLC.2015.V5.473 1
International Journal of Machine Learning and Computing, Vol. 5, No. 1, February 2015

function, etc. In this paper, the logarithmic sigmoid (2) DE/rand to best/1:
( ) (2)   
ViG, j  X rG1, j  F * X rG2 , j  X rG3, j  F * X rG2 , j  X rG3, j  (9)

where r1, r2, r3, r4 and r5 are distinct integers that randomly
selected from the range [1, NP] and are also different from i.
The parameter F is called the scaling factor that amplifier the
difference vectors. X best is the best individual in the current
population.
Fig. 1. Processing unit of an ANN (neuron). C. Crossover Operation
The optimization goal is to minimize the objective After mutation process, DE performs a binomial
function by optimizing the network weights. The mean crossover operator on X iG and ViG to generate a trial vector
square error (MSE), given by (3), is chosen as network error U iG  uiG,1 ,, uiG, D  for each particle i as shown in (10).
function.
 VG
 if rand j  0,1  CR or j  jrand
( ⃗⃗ ( )) ∑ ∑ ( ) (3) U iG   iG, j (10)

 X i, j otherwise
where ( ⃗⃗ ( )) is the error at the tth iteration; ⃗⃗ ( ) is the
where i=1,…,NP, j=1,…,D, jrand is a randomly chosen integer
weight vector at the tth iteration; dk and ok represent
in [1, D], jrand is a randomly chosen integer in [1, D], randj(0,1)
respectively the desired and actual values of kth output node;
is a uniformly distributed random number between 0 and 1
L is the number of patterns.
generated for each j and CR[0,1] is called the crossover
control parameter. Due to the use of jrand, the trial vector U iG
differ from target vector X iG .

D. Selection Operation
The selection operator is performed to select the better
one between the target vector X iG and the trial vector U iG to
enter the next generation.
Fig. 2. Multilayer feed-forward neural network (MLP). 
U
X iG 1   iG
G
if    
f U iG  f X iG
(11)

X i otherwise

III. TRAINING ARTIFICIAL NEURAL NETWORK where i=1, …, NP. is target vector in the next population.
Differential evolution (DE), proposed by Storn and Price E. Algorithm 1: DE Algorithm
[7], is a very popular EA. Like other EAs, DE is a
Requirements: Max_Cycles, number of particles NP,
population-based stochastic search technique. It uses
crossover constant CR and scaling factor F. The selection
mutation, crossover and selection operators at each
operator is performed to select the better one between the
generation to move its population toward the global
optimum minimum. target vector and the trial vector to enter the next
generation.
A. Initialization in DE
The initial population was generated uniformly at random Begin
in the range lower boundary (LB) and upper boundary (UB). Step 1: Initialize the population
Step 2: Evaluate the population
X iG, j0  lb j  rand j 0,1ub j  lb j  (4) Step 3: Cycle = 1
Step 4: while for each individual X iG do
where randj(0,1) a random number in [0,1].
Step 5: Mutation:
B. Mutation Operation DE creates a mutation vector Vi G using equations (5) to (9),
In this process, DE creates a mutant vector depending on the mutation scheme
( ) for each individual at each generation Step 6: Crossover:
(called a target vector) in the current population. DE creates a trial vector U iG using equation (10)
There are several variants of DE, according to [7], [8] we Step 7: Greedy selection:
have some mutation schemes as follow: To decide whether it should become a member of generation

DE/rand/1: ViG, j  X rG1, j  F * X rG , j  X rG3 , j
2
 (5)
G  1 (next generation), the trial vector U iG is compared to
the target vector X iG (11)
DE/best/1: ViG, j  X best
G G

, j  F * X r , j  X r2 , j
G
1
 (6) Step 8: Memorize the best solution found thus far\\
Step 9: Cycle  Cycle  1

DE/rand/2: ViG, j  X rG1, j  F * X rG2 , j  X rG3, j  (7) Step 10: end while
Step 11: return best solution
DE/best/2: ViG, j  X rG1, j  F * X rG2 , j  X rG3, j  (8) End

2
International Journal of Machine Learning and Computing, Vol. 5, No. 1, February 2015

F. Related Work of DE [0, 1]. τ1 and τ2 represent probabilities to adjust factors F and
This section reviews some papers that compared the CR, respectively, Author set τ1=τ1. Because Fl=0.1 and Fu=0.9,
different extension of DE with the original DE. After that, the new takes a value form [0.1, 0.9] in a random manner. The
we concentrate on papers that deal with parameter control in new CR takes a value from [0, 1]. and are obtained
DE. before the mutation process.
There have been many research works on controlling Through reviewing related work, we understood that it is
search parameters of DE. DE Control parameters include the difficult to select DE learning strategies in the mutation
NP, F and CR. operator and DE control parameters. To overcome this
R. Storn and K. Price [7] argued that these three control drawback we proposed the Improvement of Self-Adapting
parameters are not difficult to set for obtaining good control parameters in Differential Evolution (ISADE) - a
performance. They suggested that NP should be between 5D new version of DE in this research. The detail of ISADE is
and 10D, F should be 0.5 as a good initial choice and the presented in the next section.
value of F smaller than 0.4 or larger than 1.0 will lead to
performance degradation and CR can be set to 0.1 or 0.9.
Omar S. Soliman and Lam T. Bui at [9], the author IV. IMPROVEMENT OF SELF-ADAPTING CONTROL
introduced a self-adaptive approach to DE parameters using PARAMETERS IN DIFFERENTIAL EVOLUTION
a variable step length generated by a Gaussian distribution; To achieve good performance on a specific problem by
also, the mutation amplification and crossover parameter using the original DE algorithm, we need to try all available
were introduced. These parameters are evolved during the (usually 5 mentions above) learning strategies in the
optimization process. mutation operator and fine-tune the corresponding critical
A. K. Qin and P. N. Suganthan [10] proposed the new control parameters , and . From the experiment we
choice of learning strategy SaDE and the two control know that the performance of the original DE algorithm is
parameters F and CR do not require predetermining. During highly dependent on the strategies and parameter settings.
evolution, parameter is applied. The author considered Although we may find the most suitable strategy and the
allowing F to take different random values in the range (0, 2] corresponding control parameters for a specific problem, it
with normal distributions of mean 0.5 and standard may require a huge amount of computation time. Also,
deviation 0.3 for different individuals in the current during different evolution stages, different strategies and
population. For CR author assumed CR normally distributed corresponding parameter settings with different global and
in a range of normal distribution of CR, CRm and standard local search capability might be preferred. Therefore, to
deviation 0.1. The CR values associated with trial vectors overcome this drawback, we attempt to develop a new
successfully entering the next generation are recorded. After version of DE algorithm that can automatically adapt the
a specified number of generations CR has been changed for learning strategies and the parameters settings during
several times under the same normal distribution with center evolution. The main ideas of the ISADE algorithm are
CRm and standard deviation 0.1, and author recalculated the summarized below.
CRm according to all the recorded CR values corresponding
A. Adaptive Selection Learning Strategies in the Mutation
to successful trial vectors during this period.
Operator
J. Liu and J. Lampinen [11] present an algorithm based on
the Fuzzy Logic Control (FLC) in which the step-length was ISADE probabilistically selects one out of several
controlled using a single FLC. Its two inputs were: linearly available learning strategies in the mutation operator for
depressed parameter vector change and function value each individual in the current population. Hence, we should
change over the whole population members between the have several candidate learning strategies available to be
current generation and the last generation. chosen and also we need to develop a procedure to
J. Teo [12] proposed an attempt to dynamic self-adaptive determine the probability of applying each learning strategy.
populations in differential evolution, in addition to self- In this research, we select three learning strategies in the
adapting crossover and mutation rates, they showed that DE mutation operator as candidates: “DE/best/1/bin”,
with self-adaptive populations produced highly competitive “DE/best/2/bin” and “DE/rand to best/1/bin” that are
results compared to a conventional DE algorithm with static respectively expressed as:
populations. DE/best/1: ViG, j  X best
G

, j  F * X r1 , j  X r2 , j
G G
 (14)
J. Brest [13] presented another variant of DE algorithms
jDE, which uses different self-adaptive mechanisms applied DE/best/2: ViG, j  X rG1, j  F * X G
r2 , j  X rG3, j  (15)
on the control parameters: The step length F and crossover
rate CR are produce factors F and CR in a new parent vector. DE/rand to best/1:
G

G G
 G

, j  X r1, j  F * X r2 , j  X r 3, j  F * X r2 , j  X r 3, j
ViG G
 (16)
 F  rand1  Fu if rand 2  1
Fi G 1   l G (12)
The reason for our choice is that these three strategies
 Fi otherwise
have been commonly used in many DE literatures and
 rand if rand 4   2 reported to perform well on problems with distinct
CRiG 1   G 3 (13) characteristics [7], [8]. Among them, “DE/rand to best/1/bin”
CRi otherwise strategy usually demonstrates good diversity while the
“DE/best/1/bin” and “DE/best/2/bin” strategy show good
where rand1, rand2, rand3, rand4 are uniform random values n

3
International Journal of Machine Learning and Computing, Vol. 5, No. 1, February 2015

convergence property, which we also observe in our trial solution and reduce the calculation cost.
experiments. For better performance of ISADE it is need that the scale
Since here we have three candidate strategies, the factor F should be high in the beginning to have much
probability of applying strategy to each particle in the exploration and after certain generation F is need to be small
current population is pi which are same value p1=p2=p3=1/3. for proper exploitation. To implement this, we have new
With this learning strategy in the mutation operator, the approach to calculate the scale factor F as follow:
procedure can gradually evolve the most suitable learning
strategy at different learning stages for the problem under ( )( ) (18)
consideration.
where Fmax, Fmin, iter, itermax and niter denote the lower
B. Adaptive Scaling Factor F boundary condition of the F, and the upper boundary
In the multi-point search of the DE, particles move from condition of the F, maximum generation, current generation
their current points to new search points in the design space and nonlinear modulation index, respectively. From our
of design variables. For example, as shown in Fig. 3, the experiment we assign Fmin=0.15, and Fmax=1.55.
particle A requires a slight change to the values of the To control the , we have varied the nonlinear
design variables to obtain the global optimum solution. On modulation index niter with generation as follows:
the other hand, particle B cannot reach a global optimum
( )( ) (19)
solution without a significant change, and in addition,
particle C has landed in a local optimum solution. Such a where nmax and nmin are typically chosen in the range (0,15].
situation, in which the good individual and the low After a number of experiments on the values of nmax and nmin,
individual are intermingled, can generally occur at any time we have found that the best choice for them is 0.2 and 6.0.
in this search process. Therefore, we have to recognize each The gait of , chart depends on the iteration number
individual's situation and propose a suitable design variables and the nonlinear modulation index niter is shown in Fig. 5.
generation process for each individual's situation in the
design space.

Global minimum
point
Fitness value

Individual C Individual A
(C gets local
minimum)

Individual B

Large step Small step Fig. 4. Suggested to calculate $F$ value.


size size

Design variable x Scale factor depend on generation


1.8
varying n_i
Fig. 3. Example of individual situations. 1.6
n_min=0.2
1.4 n_max=6.0
In the DE algorithm, the distance for a search point can be 1.2
scale factor

changed by controlling the F factor for determining the 1.0


0.8
neighborhood range. To do this, S.Tooyama and
0.6
H.Hasegawa [14] proposed APGA/VNC approach in which 0.4
author used sigmoid function to control neighborhood 0.2
parameter. In this paper, we will sort all the particles by 0.0
1128
1177
50
99
148
197
246
295

883
932
981
1

344
393
442
491
540
589
638
687
736
785
834

1030
1079

1226
1275
1324
1373
1422
1471

estimating their fitness. A ranked particle is labeled with this


rank number and assigned F that corresponds with this
Fig. 5. Suggested to calculate $F$ value.
number. The formula for F by sigmoid function as follows.
We introduced a novel approach of scale factor Fi of the
⁄ (17) each particle with their fitness values in (17). So in one
( )
generation the value of (i=1NP) are not the same for
where , i denote the gain of the sigmoid function, particle all particles in the population rather it is made to vary for all
of ith in NP, respectively. particles in each generation. Consider of (18) as an
The gait of F chart depends on the sign and gain of  Fig. average value that we assign to each generation and the final
4. When particle at good fitness same as particle A in Fig. 3 value of scale factor for each particle in each generation is
will have small step size of F factor and otherwise. From calculated as follow:
this view, the ISADE method automatically adapts F factor
to obtain design variable generation accuracy for each (20)
individual's situation and particle's fitness. As a result, we
believe that it will steadily provide a global optimum where iter=1itermax and i=1NP.

4
International Journal of Machine Learning and Computing, Vol. 5, No. 1, February 2015

C. Adaptive Crossover Control Parameter CR Step 7: Mutation: Adaptive selection learning strategies the
Ref. [15] suggested to have a success if a child substitutes mutation operator.
its parent in the next generation. The minimum, maximum Step 8: Crossover: DE creates a trial vector using (10)
and medium value on such set of success is used for this Step 9: Selection: To decide whether it should become a
purpose. member of generation G + 1 (next generation), the trial
 Be able to detect a separable problem, choosing a vector is compared to the target vector (11)
binomial crossover operator with low values for CR. Step 10: Memorize the best solution found thus far
 Be able to detect non-separable problems, choosing a Step 11: Cycle = Cycle + 1
binomial crossover operator with high values for CR. Step 12: End while
In this way, the algorithm will be able to detect if high Step 13: Return best solution
values of CR are useful and furthermore, if a rotationally End
invariant crossover is required. A minimum base for CR
around its median value is incorporated to avoid stagnation
around a single value, Fig. 6 shows this principle, and so we V. EXPERIMENTS
propose the ideas behind this adaptive mechanism for the We apply our ISADE to training some neural network,
crossover: same in [4], which includes XOR, 3-Bit Parity and Decoder-
The control parameter CR is adapted as follows: Encoder problems. These experiments involved 30 trials for
each problem. The initial seed number was varied randomly
{ (21) during each trial.
The three layer feed-forward neural networks are used for
where: rand1 and rand2 are uniform random values in [0, 1], each problem, i.e. one hidden layer and input and output
τ represents probabilities to adjust CR, same as [5] we assign layers. In the network structures, bias nodes are also applied
τ=0.10. and sigmoid function is placed as the activating function of
After that we adjust $CR$ as follows: the hidden nodes.
A. The Exclusive-OR
{ (22)
The first test problem is the exclusive OR (XOR) Boolean
function which is a difficult classification problem mapping
where: CRmin, CRmed and CRmax denote the low value, two binary inputs to a single binary output shown in Table I.
median value and high value of crossover parameter In the simulations, we used a 2-2-1 feed-forward neural
respectively. From our experiment in many trials, we assign network with six connection weights, no biases (having six
CRmin=0.05, CRmed=0.50 and CRmax=0.95. parameters, XOR6) and a 2-2-1 feed-forward neural
network with six connection weights and three biases
CR=0 CR=1 (having 9 parameters, XOR9) and a 2-3-1 feed-forward
neural network having nine connection weights and four
CRmax
biases totally thirteen parameters (XOR13). For XOR6,
CRmin CRmedium
Highly suggested XOR9 and XOR13 problems, the parameter ranges [-100,
100], [-10, 10] and [-10, 10] are used, respectively. The
Independent Problems Not recommended maximum iteration was 200.
TABLE I: BINARY XOR PROBLEM
Highly suggested
Input 1 Input 2 Output
0 0 0
Not recommended Dependent Problems 0 1 1
1 0 1
Fig. 6. Suggested to calculate CR values. 1 1 0

The purpose of our approach is that user does not need to B. The 3-Bit Parity Problem
tune the good values for F and CR, which are problem The second test problem is the three bit parity problem.
dependent. The rules for improve self-adapting control The problem is taking the modulus 2 of summation of three
parameters are quite simple; therefore the new version of the inputs. In other words, if the number of binary inputs is odd,
DE algorithm does not increase the time complexity in the output is 1, otherwise it is 0 shown in Table II. We use a
comparison to the original DE algorithm. 3-3-1 feed-forward neural network structure for the 3-Bit
Parity problem. The parameter range was [-10, 10] for this
D. Algorithm 2: ISADE Algorithm problem. The maximum iteration was 400.
Requirements: Max Cycles, Number of particles NP. TABLE II: 3-BIT PARITY PROBLEM
Begin Input 1 Input 2 Input 3 Output
Step 1: Initialize the population 0 0 0 0
Step 2: Evaluate and rank population. 0 0 1 1
Step 3: Cycle = 1 0 1 0 1
0 1 1 0
Step 4: While for each individual do 1 0 0 1
Step 5: Adaptive scaling factor F by (17), (18) and (20) 1 0 1 0
Step 6: Adaptive crossover control parameter CR by (21) 1 1 0 0
and (22). 1 1 1 1

5
International Journal of Machine Learning and Computing, Vol. 5, No. 1, February 2015

C. The 4-Bit Encoder-Decoder D. Result of Experiment


The third problem is 4-bit encoder/decoder problem. The The results, given in Table IV, shows that the new
network is presented with 4 distinct input patterns, each algorithm ISADE chas faster convergence speed, and it can
having only one bit turned on. The output is a duplication of obtain the lesser mean square error, it is superior to the
the inputs shown in Table III. A 4-2-4 feed-forward neural references. The convergence of the optimal solution could
network structure is used for this problem. For this problem, be improved significantly in ISADE than that in references.
the parameter range is [-50, 50]. The maximum iteration was
400.
TABLE III: 4-BIT ENCODER-DECODER PROBLEM
Input 1 Input 2 Input 3 Input 4 Output 1 Output 2 Output 3 Output 4
0 0 0 1 0 0 0 1
0 0 1 0 0 0 1 0
0 1 0 0 0 1 0 0
1 0 0 0 1 0 0 0

TABLE IV: MEAN AND STANDARD DEVIATION OF MSE FOR ALGORITHM


Problem Mean and std ABC ABC-LM LM ISADE
mean 0.007051 0.000752 0.110700 1.1954E-21
XOR6
std 0.00223 0.000980 0.063700 5.3828E-21
mean 0.006956 2.1246E-09 0.049100 2.9189E-17
XOR9
std 0.002402 1.9579E-10 0.064600 1.4827E-16
mean 0.006079 2.6111E-09 0.007800 6.5278E-10
XOR13
std 0.003182 1.2586E-09 0.022300 3.5487E-09
mean 0.006679 6.3156E-07 0.020900 5.3143E-15
3-Bit Par
std 0.002820 3.3189E-06 0.043000 1.7173E-14
mean 0.008191 1.3007E-06 0.024300 9.8123E-17
Enc.Dec std 0.001864 8.8443E-07 0.042400 4.5439E-16

[8] R. Storn and K. Price, “Differential evolutionary – A simple and


VI. CONCLUSIONS efficient heuristic for global optimization over continuous spaces,”
Locating global minimizes is a very challenging task for Journal Global Optimization, vol. 11, pp. 341-359, 1997.
[9] O. S. Soliman and T. L. Bui, “A self-adaptive strategy for controlling
any minimization method. In this research, a new parameters in differential evolution,” in Proc. IEEE World Congress
improvement of self-adaptive differential evolution is on Computational Intelligence, 2008, pp. 2837-2842.
proposed. The main idea is that the three mutation scheme [10] A. K. Qin and P. N. Suganthan, “Self-adaptive differential evolution
operators are chosen to be applied to individuals in the algorithm for numerical optimization,” in Proc. IEEE Congress on
Evolutionary Computation, vol. 2, 2005, pp. 1785-1791.
current population with the same probability, the scalar [11] J. Liu and J. Laminen, “A fuzzy adaptive differential evolution
factor F is adaptively calculated by sigmoid function after algorithm,” Soft Computing - A Fusion of Foundations,
ranking population in their fitness value and the control Methodologies and Applications, vol. 9, no. 6, pp. 448-462, 2005.
[12] J. Teo, “Exploring dynamic self-adaptive populations in differential
parameter CR is adjusted to balance the abilities of DE in evolution,” Soft Comput., vol. 10, no. 8, pp. 673-686, 2006.
exploitation and DE in exploration. [13] S. Greiner, B. Bokovic, M. Mernik, J. Brest, and V. Zumer,
The new algorithm ISADE is used to train feed-forward “Performance comparison of self-adaptive and adaptive differential
artificial neural networks on the XOR, 3- Bit Parity and 4- evolution algorithms,” Soft Comput., vol. 11, no. 7, pp. 617-629, 2007.
[14] H. Hasegawa and S. Tooyama, “Adaptive plan system with genetic
Bit Encoder-Decoder benchmark problems. The results of algorithm using the variable neighborhood range control,” in Proc.
the experiments show that ISADE has better performance IEEE Congress on Evolutionary Computation, 2007, pp. 846-853.
than the performance of the some reference algorithms. The [15] R. Meza, G. Sanchis, J. Blasco, and X. Herrero, “Hybrid DE
algorithm with adaptive crossover operator for solving real-world
future work we will plan to apply this new algorithm ISADE numerical optimization problems,” in Proc. IEEE Congress on
for training neural networks on high dimensional Evolutionary Computation, 2011, pp. 1551-1556.
classification and approximate benchmark problems.
Bui Ngoc Tam received the B.E. and master degrees
REFERENCES in 2008 and 2012 respectively in Shibaura Institute
[1] J. Dayhoff, Neural Network Architectures: An Introduction, New of Technology, Japan. Currently, he is a third year
York: Van Nostrand Reinhold, 1990. PhD student at Graduate School of Engineering,
[2] K. Mehrotra, C. K. Mohan, and S. Ranka, Elements of Artificial Shibaura Institute of Technology, Japan. His research
Neural Networks, Cambridge, MA: MIT Press, 1997. interests include optimization system design,
[3] J. Zhang, J. Zhang T. Lok, and M. Lyu, “A hybrid particle swarm biomemitics, swarm intelligent, evolutionary
optimization back propagation algorithm for feed forward neural algorithm.
network training,” Applied Mathematics and Computation, Elsevier,
2007.
[4] C. Ozturk and D. Karaboga, “Hybrid artificial bee colony algorithm
for neural network training,” presented at the IEEE Congress, Hiroshi Hasegawa received his B.E. and M.E.
Evolutionary Computation (CEC), 2011. degrees in 1992 and 1994 respecitvely from
[5] T. Bui, H. Pham, and H. Hasegawa, “Improve self-adaptive control Shibaura Institute of Technology, Japan. He
parameters in differential evolution for solving constrained received Dr. Eng. in mechanical engineering from
engineering optimization problems,” Journal of Computational Tokyo Institute of Technology, Japan In 1998. He
Science and Technology, vol. 7, no. 1, pp. 59-74, July 2013. has been working at Shibaura Institute of
[6] X. Yao, “Evolving artificial neural networks,” Proceedings of the Technology, and currently is a professor. He is a
IEEE, vol. 87, no. 9, pp. 1423-1447, 1999. member of JSME, ASME, JSCES and JSST. His
[7] R. Storn and K. Price, “Differential evolution – A simple and efficient research interests include computer aided
adaptive scheme for global optimization over continuous spaces,” exploration and creativity of design.
Technical Report tr-95-012, 1995.

View publication stats

You might also like