Tuning of The Structure and Parameters of Neural N
Tuning of The Structure and Parameters of Neural N
net/publication/3931641
CITATIONS READS
283 1,073
4 authors, including:
F.H.F. Leung
The Hong Kong Polytechnic University
218 PUBLICATIONS 5,157 CITATIONS
SEE PROFILE
All content following this page was uploaded by S. H. Ling on 06 June 2014.
Abstract—This paper presents the tuning of the structure and [7], system modeling, and control [16]. Owing to its particular
parameters of a neural network using an improved genetic algo- structure, a neural network is very good in learning [2] using
rithm (GA). It will also be shown that the improved GA performs some learning algorithms such as GA [1] and backpropagation
better than the standard GA based on some benchmark test func-
tions. A neural network with switches introduced to its links is pro-
[2]. In general, the learning steps of a neural network are as fol-
posed. By doing this, the proposed neural network can learn both lows. First, a network structure is defined with a fixed number
the input–output relationships of an application and the network of inputs, hidden nodes and outputs. Second, an algorithm is
structure using the improved GA. The number of hidden nodes is chosen to realize the learning process. However, a fixed struc-
chosen manually by increasing it from a small number until the ture may not provide the optimal performance within a given
learning performance in terms of fitness value is good enough. Ap- training period. A small network may not provide good per-
plication examples on sunspot forecasting and associative memory
are given to show the merits of the improved GA and the proposed formance owing to its limited information processing power. A
neural network. large network, on the other hand, may have some of its con-
nections redundant [18], [19]. Moreover, the implementation
Index Terms—Genetic algorithm (GA), neural networks, param-
eter learning, structure learning. cost for a large network is high. To obtain the network struc-
ture automatically, constructive and destructive algorithms can
be used [18]. The constructive algorithm starts with a small net-
I. INTRODUCTION work. Hidden layers, nodes, and connections are added to ex-
pand the network dynamically [19]–[24]. The destructive algo-
G ENETIC algorithm (GA) is a directed random search
technique [1] that is widely applied in optimization
problems [1], [2], [5]. This is especially useful for complex
rithm starts with a large network. Hidden layers, nodes, and con-
nections are then deleted to contract the network dynamically
optimization problems where the number of parameters is large [25], [26]. The design of a network structure can be formulated
and the analytical solutions are difficult to obtain. GA can help into a search problem. GAs [27], [28] were employed to ob-
to find out the optimal solution globally over a domain [1], [2], tain the solution. Pattern-classification approaches [29] can also
[5]. It has been applied in different areas such as fuzzy control be found to design the network structure. Some other methods
[9]–[11], [15], path planning [12], greenhouse climate control were proposed to learn both the network structure and connec-
[13], modeling and classification [14] etc. tion weights. An ANNA ELEONORA algorithm was proposed
A lot of research efforts have been spent to improve the per- [36]. New genetic operator and encoding procedures that allows
formance of GA. Different selection schemes and genetic op- an opportune length of the coding string were introduced. Each
erators have been proposed. Selection schemes such as rank- gene consists of two parts: the connectivity bits and the connec-
based selection, elitist strategies, steady-state election and tour- tion weight bits. The former indicates the absence or presence
nament selection have been reported [32]. There are two kinds of of a link, and the latter indicates the value of the weight of a link.
genetic operators, namely crossover and mutation. Apart from A GNARL algorithm was also proposed in [37]. The number of
random mutation and crossover, other crossover and mutation hidden nodes and connection links for each network is first ran-
mechanisms have been proposed. For crossover mechanisms, domly chosen within some defined ranges. Three steps are then
two-point crossover, multipoint crossover, arithmetic crossover, used to generate an offspring: copying the parents, determining
and heuristic crossover have been reported [1], [31]–[33]. For the mutations to be performed, and mutating the copy. The mu-
mutation mechanisms, boundary mutation, uniform mutation, tation of a copy is separated into two classes: the parametric mu-
and nonuniform mutation can be found [1], [31]–[33]. tations that alter the connection weights, and the structural mu-
Neural network was proved to be a universal approximator tations that alter the number of hidden nodes and the presence of
[16]. A three-layer feedforward neural network can approxi- network links. An evolutionary system named EPNet can also
mate any nonlinear continuous function to an arbitrary accuracy. be found for evolving neural networks [19]. Rank-based selec-
Neural networks are widely applied in areas such as prediction tion and five mutations were employed to modify the network
structure and connection weights.
In this paper, a three-layer neural network with switches in-
Manuscript received September 5, 2001; revised February 6, 2002 and July 2,
2002. This work was supported by the Centre for Multimedia Signal Processing,
troduced in some links is proposed to facilitate the tuning of
Department of Electronic and Information Engineering, The Hong Kong Poly- the network structure in a simple manner. A given fully con-
technic University, under Project A432. nected feedforward neural network may become a partially con-
The authors are with the Centre for Multimedia Signal Processing, Depart- nected network after learning. This implies that the cost of im-
ment of Electronic and Information Engineering, The Hong Kong Polytechnic
University, Kowloon, Hong Kong. plementing the proposed neural network, in terms of hardware
Digital Object Identifier 10.1109/TNN.2002.804317 and processing time, can be reduced. The network structure and
1045-9227/03$17.00 © 2003 IEEE
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on July 7, 2009 at 02:05 from IEEE Xplore. Restrictions apply.
80 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003
(1)
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on July 7, 2009 at 02:05 from IEEE Xplore. Restrictions apply.
LEUNG et al.: TUNING OF THE STRUCTURE AND PARAMETERS OF A NEURAL NETWORK 81
The cumulative probability for the chromosome is defined will produce one offspring. First, four chromosomes will be
as generated according to the following mechanisms:
(7)
(6)
(8)
The selection process starts by randomly generating a nonzero
floating-point number, . Then, the chromosome
is chosen if ( ). It can be observed from
(9)
this selection process that a chromosome having a larger
will have a higher chance to be selected. Consequently, the best
chromosomes will get more offspring, the average will stay and
the worst will die off. In the selection process, only two chro- (10)
mosomes will be selected to undergo the genetic operations.
(11)
(12)
D. Genetic Operations
where denotes the weight to be determined by users,
The genetic operations are to generate some new chromo- denotes the vector with each element obtained by
somes (offspring) from their parents after the selection process. taking the maximum among the corresponding element of
They include the crossover and the mutation operations. and . For instance, .
1) Crossover: The crossover operation is mainly for ex- Similarly, gives a vector by taking the minimum
changing information from the two parents, chromosomes value. For instance, .
and , obtained in the selection process. The two parents Among to , the one with the largest fitness value is used
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on July 7, 2009 at 02:05 from IEEE Xplore. Restrictions apply.
82 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003
as the offspring of the crossover operation. The offspring is de- to change the value of a randomly selected gene to its upper
fined as or lower bound. Uniform mutation is to change the value of a
randomly selected gene to a value between its upper and lower
bounds. Nonuniform mutation is capable of fine-tuning the pa-
(13) rameters by increasing or decreasing the value of a randomly
selected gene by a weighted random number. The weight is usu-
denotes the index which gives a maximum value of , ally a monotonic decreasing function of the number of iteration.
. In our approach, we have three offspring generated in the muta-
If the crossover operation can provide a good offspring, tion process. From (14), the first mutation is in fact the uniform
a higher fitness value can be reached in less iteration. In mutation. The second mutation allows some randomly selected
general, two-point crossover, multipoint crossover, arithmetic genes to change simultaneously. The third mutation changes all
crossover or heuristic crossover can be employed to realize the genes simultaneously. The second and the third mutations allow
crossover operation [1], [31]–[33]. The offspring generated multiple genes to be changed. Hence, the searching domain is
by these methods, however, may not be better than that from larger than that formed by changing a single gene. The genes
our approach. As seen from (7)–(10), the potential offspring will have a larger space for improving when the fitness values
spreads over the domain. While (7) and (10) result in searching are small. On the contrary, when the fitness values are nearly
around the center region of the domain [a value of near to the same, changing the value of a single gene (the first muta-
one in (10) can move to be near ], (8) and (9) tion) will give a higher probability of improving the fitness value
move the potential offspring to be near the domain boundary [a as the searching domain is smaller and some genes may have
large value of in (8) and (9) can move and to be near reached their optimal values.
and respectively]. After the operation of selection, crossover, and mutation, a
2) Mutation: The offspring (13) will then undergo the mu- new population is generated. This new population will repeat
tation operation. The mutation operation is to change the genes the same process. Such an iterative process can be terminated
of the chromosomes. Consequently, the features of the chromo- when the result reaches a defined condition, e.g., the change of
somes inherited from their parents can be changed. Three new the fitness values between the current and the previous iteration
offspring will be generated by the mutation operation is less than 0.001, or a defined number of iteration has been
reached.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on July 7, 2009 at 02:05 from IEEE Xplore. Restrictions apply.
LEUNG et al.: TUNING OF THE STRUCTURE AND PARAMETERS OF A NEURAL NETWORK 83
where and the minimum point is at . times of simulations based on the proposed and standard GAs
is a function to generate uniformly a floating-point are shown in Fig. 3 and tabulated in Table I. Generally, it can
number between zero and one inclusively be seen that the performance of the proposed GA is better than
that of the standard GA.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on July 7, 2009 at 02:05 from IEEE Xplore. Restrictions apply.
84 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003
(a) (b)
(c) (d)
(e) (f)
Fig. 3. Simulation results of the improved and standard GAs. The averaged fitness value of test functions obtained by the improved (solid line) and standard
(dotted line) GAs. (a) f (x). (b) f (x). (c) f (x). (d) f (x). (e) f (x). (f) f (x).
, , is the th output of the proposed the weights of the links govern the input–output relationship of
neural network. By introducing the switches, the weights the neural network while the switches of the links govern the
and , and the switch states can be tuned. It can be seen that structure of the neural network.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on July 7, 2009 at 02:05 from IEEE Xplore. Restrictions apply.
LEUNG et al.: TUNING OF THE STRUCTURE AND PARAMETERS OF A NEURAL NETWORK 85
TABLE I
SIMULATION RESULTS OF THE PROPOSED GA AND THE STANDARD GA
BASED ON THE BENCHMARK TEST FUNCTIONS
V. APPLICATION EXAMPLES
Two application examples will be given in this section to il-
lustrate the merits of the proposed neural networks tuned by the
improved GA.
where and
are the given inputs and the desired
outputs of an unknown nonlinear function respectively, (29)
denotes the number of input–output data pairs. The fitness func-
tion is defined as The number of hidden nodes is changed from three to seven
to test the learning performance. The fitness function is defined
fitness (27) as follows:
err
fitness (30)
err
err (28)
err (31)
The objective is to maximize the fitness value of (27) The improved GA is employed to tune the parameters and
using the improved GA by setting the chromosome to be structure of the neural network of (29). The objective is to
for all , , . It can be seen maximize the fitness function of (30). The best fitness value is
from (27) and (28) that a larger fitness value implies a smaller one and the worst one is zero. The population size used for the
error value. improved GA is ten, and for all values of
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on July 7, 2009 at 02:05 from IEEE Xplore. Restrictions apply.
86 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003
. The lower and the upper bounds of the link weights are de-
fined as
and, , ;
[16]. The chromosomes used
for the improved GA are .
The initial values of all the link weights between the input and
hidden layers are one, and those between the hidden and output
layers are 1. The initial value of in (23) is 0.5.
For comparison purpose, a fully connected three-layer
feedforward neural network (three-input–one-output) [2] is
also trained by the standard GA with arithmetic crossover and
nonuniform mutation [1], [2], [5], and backpropagation with
momentum and adaptive learning rate [30]. Also, the proposed
neural network trained by the standard GA will be considered.
For the standard GA, the population size is ten, the probability
of crossover is 0.8 and the probability of mutation is 0.1.
The shape parameters of the standard GA with arithmetic
crossover and nonuniform mutation, which is selected by trial
and error through experiment for good performance, is set to Fig. 6. Simulation results of a 96-year prediction using the proposed neural
be one. For the backpropagation with momentum and adaptive network (n = 6) with the proposed GA (dashed line), and the actual sunspot
numbers (solid line) for the years 1885–1980.
learning rate, the learning rate is 0.2, the ratio to increase the
learning rate is 1.05, the ratio to decrease the learning rate is
connected link is 18 after learning (the number of links of a
0.7, the maximum validation failures is five, the maximum
fully connected network is 31 which includes the bias links).
performance increase is 1.04, the momentum constant is 0.9.
It is about 42% reduction of links. The training error and the
The initial values of the link weights are the same as those of
forecasting error are 11.5730 and 14.0933, respectively.
the proposed neural network. For all approaches, the learning
processes are carried out by a personal computer with a P4 1.4 B. Associative Memory
GHz CPU. The number of iterations for all approaches is 1000.
The tuned neural networks are used to forecast the sunspot Another application example on tuning an associative
number during the years 1885–1980. The number of hidden memory will be given in this section. In this example, the
nodes is changed from four to eight. The simulation results associative memory, which maps its input vector into itself, has
for the comparisons are tabulated in Tables II and III. From ten inputs and ten outputs. Thus, the desired output vector is its
Table II, it is observed that the proposed neural network trained input vector. Referring to (24), the proposed neural network is
with the improved GA provides better results in terms of given by
accuracy (fitness values) and number of links. The training
error [governed by (31)] and the forecasting error [governed
by ] are tabulated in Table III.
Referring to Table III, the best result is obtained when the
number of hidden node is six. Fig. 6 shows the simulation
results of the forecasting using the proposed neural network
trained with the improved GA (dashed lines) and the actual
sunspot numbers (solid lines) for . The number of (32)
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on July 7, 2009 at 02:05 from IEEE Xplore. Restrictions apply.
LEUNG et al.: TUNING OF THE STRUCTURE AND PARAMETERS OF A NEURAL NETWORK 87
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on July 7, 2009 at 02:05 from IEEE Xplore. Restrictions apply.
88 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003
[19] X. Yao and Y. Liu, “A new evolutionary system for evolving artificial Frank H. F. Leung (M’92) was born in Hong Kong in 1964. He received the
neural networks,” IEEE Trans. Neural Networks, vol. 8, pp. 694–713, B.Eng. and Ph.D. degrees in electronic engineering from the Hong Kong Poly-
May 1997. technic University in 1988 and 1992, respectively.
[20] F. J. Lin, C. H. Lin, and P. H. Shen, “Self-constructing fuzzy neural net- He joined the Hong Kong Polytechnic University in 1992 and is now an As-
work speed controller for permanent-magnet synchronous motor drive,” sociate Professor in the Department of Electronic and Information Engineering.
IEEE Trans. Fuzzy Syst., vol. 9, pp. 751–759, Oct. 2001. He has published more than 100 research papers on computational intelligence,
[21] Y. Hirose, K. Yamashita, and S. Hijiya, “Back-propagation algorithm control and power electronics. At present, he is actively involved in the research
which varies the number of hidden units,” Neural Networks, vol. 4, no. on Intelligent Multimedia Home and electronic Book.
1, pp. 61–66, 1991. Dr. Leung is a Reviewer for many international journals and had helped the
[22] A. Roy, L. S. Kim, and S. Mukhopadhyay, “A polynomial time algorithm organization of many international conferences. He is a Chartered Engineer and
for the construction and training of a class of multiplayer perceptions,” a Member of the Institution of Electrical Engineers (IEE).
Neural Networks, vol. 6, no. 4, pp. 535–545, 1993.
[23] N. K. Treadold and T. D. Gedeon, “Exploring constructive cascade net-
works,” IEEE Trans. Neural Networks, vol. 10, pp. 1335–1350, Nov.
1999.
[24] C. C. Teng and B. W. Wah, “Automated learning for reducing the config-
uration of a feedforward neural network,” IEEE Trans. Neural Networks,
vol. 7, pp. 1072–1085, Sept. 1996.
[25] Y. Q. Chen, D. W. Thomás, and M. S. Nixon, “Generating-shrinking H. K. Lam received the B.Eng. (Hons.) and Ph.D. degrees form the Department
algorithm for learning arbitrary classification,” Neural Networks, vol. 7, of Electronic and Information Engineering, The Hong Kong Polytechnic Uni-
no. 9, pp. 1477–1489, 1994. versity, Hong Kong, in 1995 and 2000, respectively.
[26] M. C. Moze and P. Smolensky, “Using relevance to reduce network size He is currently a Research Fellow in the Department of Electronic and In-
automatically,” Connect. Sci., vol. 1, no. 1, pp. 3–16, 1989. formation Engineering at The Hong Kong Polytechnic University. His current
[27] H. K. Lam, S. H. Ling, F. H. F. Leung, and P. K. S. Tam, “Tuning of the research interests include intelligent control and systems, computational intel-
structure and parameters of neural network using an improved genetic ligence, and robust control.
algorithm,” in Proc. 27th Annu. Conf. IEEE Ind. Electron. Soc., Denver,
CO, Nov. 2001, pp. 25–30.
[28] G. P. Miller, P. M. Todd, and S. U. Hegde, “Designing neural networks
using genetic algorithms,” in Proc. 3rd Int. Conf. Genetic Algorithms
Applications, 1989, pp. 379–384.
[29] N. Weymaere and J. Martens, “On the initialization and optimization
of multiplayer perceptrons,” IEEE Trans. Neural Networks, vol. 5, pp.
738–751, Sept. 1994.
[30] S. S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd S. H. Ling received the B.Eng. (Hons.) degree from the Department of Electrical
ed. Upper Saddle River, NJ: Prentice-Hall, 1999. Engineering, The Hong Kong Polytechnic University, Hong Kong, in 1999. He
[31] X. Wang and M. Elbuluk, “Neural network control of induction ma- is currently a Research Student in the Department of Electronic and Informa-
chines using genetic algorithm training,” in Conf. Record 31st IAS An- tion Engineering, The Hong Kong Polytechnic University. His research interests
nual Meeting, vol. 3, 1996, pp. 1733–1740. include evolutionary computation, fuzzy logic, neural networks, and intelligent
[32] L. Davis, Handbook of Genetic Algorithms. New York: Van Nostrand homes.
Reinhold, 1991.
[33] M. Srinivas and L. M. Patnaik, “Genetic algorithms: A survey,” IEEE
Computer, vol. 27, pp. 17–27, June 1994.
[34] J. D. Schaffer, D. Whitley, and L. J. Eshelman, “Combinations of ge-
netic algorithms and neural networks: A survey of the state of the art,”
in Proc. Int. Workshop Combinations Genetic Algorithms Neural Net-
works, 1992, pp. 1–37.
[35] S. Bornholdt and D. Graudenz, “General asymmetric neural networks Peter K. S. Tam received the B.E., M.E., and Ph.D. degrees from the University
and structure design by genetic algorithms: A learning rule for tem- of Newcastle, Newcastle, Australia, in 1971, 1973, and 1976, respectively, all
poral patterns,” in Proc. Int. Conf. Syst., Man, Cybern., vol. 2, 1993, in electrical engineering.
pp. 595–600. From 1967 to 1980, he held a number of industrial and academic positions
[36] V. Maniezzo, “Genetic evolution of the topology and weight distribution in Australia. In 1980, he joined The Hong Kong Polytechnic University, Hong
of neural networks,” IEEE Trans. Neural Networks, vol. 5, pp. 39–53, Kong, as a Senior Lecturer. He is currently an Associate Professor in the De-
Jan. 1994. partment of Electronic and Information Engineering. He has participated in the
[37] P. J. Angeline, G. M. Saunders, and J. B. Pollack, “An evolutionary al- organization of a number of symposiums and conferences. His research inter-
gorithm that constructs recurrent neural networks,” IEEE Trans. Neural ests include signal processing, automatic control, fuzzy systems, and neural net-
Networks, vol. 5, pp. 54–65, Jan. 1994. works.
Authorized
View publication stats licensed use limited to: Hong Kong Polytechnic University. Downloaded on July 7, 2009 at 02:05 from IEEE Xplore. Restrictions apply.