Computational Intelligence
Computational Intelligence
Computational Intelligence
Collaboration, Fusion and Emergence
123
Dr. Christine L. Mumford
School of Computer Science
Cardiff University
5 The Parade, Roath
Cardiff, CF24 3AA
UK
E-mail: [email protected]
DOI 10.1007/978-3-642-01799-5
Intelligent Systems Reference Library ISSN 1868-4394
Library of Congress Control Number: Applied for
c 2009 Springer-Verlag Berlin Heidelberg
This work is subject to copyright. All rights are reserved, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilm or in any other
way, and storage in data banks. Duplication of this publication or parts thereof is
permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from
Springer. Violations are liable to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this
publication does not imply, even in the absence of a specific statement, that such
names are exempt from the relevant protective laws and regulations and therefore
free for general use.
Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India.
987654321
springer.com
Dedicated to the chapter authors.
Editors
neural networks and multi-agent systems are also collaborative in their na-
ture, and all such systems require effective communication. Emergence refers
to the phenomenon that complex behaviour can emerge from collaboration
between simple processing elements - indeed, many would say that this is the
key to success. The twenty two chapters have been grouped into nine parts
(see Table 1):
I. Introduction
II. Fusing evolutionary algorithms and fuzzy logic
III. Adaptive solution schemes
IV. Multi-agent systems
V. Computer vision
VI. Communication for CI systems
VII. Artificial immune systems
VIII. Parallel evolutionary algorithms
IX. CI for clustering and classification
Acknowledgments
We are grateful to the authors for their wonderful contribution, and to the
reviewers for their excellent comments which helped to improve the quality of
chapters. Thanks are also due to Springer-Verlag for their excellent support
during the preparation of the manuscript.
Theme Chapter
Introduction 1: Synergy in Computational Intelligence
2: Computational Intelligence: The Legacy of
Alan Turing and John von Neumann
Evolutionary Algorithms and Fuzzy 3: Multiobjective Evolutionary Algorithms
Logic for the Electric Power Dispatch Problem
4: Fuzzy Evolutionary Algorithms and Ge-
netic Fuzzy Systems: A Positive Collabora-
tion Between Evolutionary Algorithms and
Fuzzy Systems
5: Multiobjective Genetic Fuzzy Systems
Adaptive Solution Schemes 6: Exploring Hyper-Heuristic Methodologies
with Genetic Programming
7: Adaptive Constraint Satisfaction: The
Quickest First Principle
Multi-Agent Systems 8: Collaborative Computational Intelligence
in Economics
9: IMMUNE: A Collaborating Environment
for Complex System Design
10: Bayesian Learning for Cooperation in
Multi-Agent Systems
11: Collaborative Agents for Complex Prob-
lem Solving
Computer vision 12: Predicting Trait Impressions of Faces Us-
ing Classifier Ensembles
13: The Analysis of Crowd Dynamics: From
Observations to Modelling
Communication for CI 14: Computational Intelligence for the Col-
laborative Identification of Distributed Sys-
tems
15: Collaboration at the Basis of Sharing Fo-
cused Information: The Opportunistic Net-
works
Artificial Immune Systems 16: Exploiting Collaborations in the Immune
System: The Future of Artificial Immune
Systems
Parallel EAs 17: Evolutionary Computation: Centralized,
Parallel or Collaborative
Clustering and Classification 18: Fuzzy Clustering of Likelihood Curves
for Finding Interesting Patterns in Expression
Profiles
19: A Hybrid Rule Induction/Likelihood
Ratio-Based Approach for Predicting
Protein-Protein Interactions
20: Improvements in Flock-based Collabora-
tive Clustering Algorithms
21: Combining Statistics and Case-Based
Reasoning for Medical Research
22: Collaborative and Experience-Consistent
Schemes of System Modelling in Computa-
tional Intelligence
Contents
Part I: Introduction
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
Part I
Introduction
Synergy in Computational Intelligence
Christine L. Mumford
Abstract. This chapter introduces the book. It begins with a historical perspective
on Computational Intelligence (CI), and discusses its relationship with the longer es-
tablished term “Artificial Intelligence” (AI). The chapter then gives a brief overview
of the main CI techniques, and concludes with short summaries of all the chapters
in the book.
1 Introduction
In the early days of information technology computers were large, expensive and the
property of the few government organizations, academic institutions and big busi-
nesses who could afford them. Centralized operating systems were developed and
two classes of computer systems evolved: one for scientific computing and engi-
neering, specializing in “number crunching” and the other for business computing
focussing on data processing activities such as stock control and computerized cus-
tomer accounts. Today computing devices are small and cheap, and pervade our
every day lives. It is therefore not surprising that the style of software required for
the twenty-first century is very different from that needed to run operations on the
large mainframe computers of the past. It is in this climate that the field of “Artificial
Intelligence” (AI) has given way to the newer study of “Computational Intelligence”
(CI)1 . AI grew out of attempts to emulate the human brain on mainframe computers,
while CI is more pragmatic and relies on distributed computation, communication
and emergence. CI is well suited to today’s modern ubiquitous computing devices.
This book is about practical computational intelligence. It covers many tech-
niques and applications, and focuses on novel ways of combining different CI
Christine L. Mumford
Cardiff University, School of Computer Science, 5 The Parade, Cardiff, CF24 3AA,
United Kingdom
e-mail: [email protected]
1 Terms with very similar meanings have also emerged in the recent literature, such as “soft
computing” and “natural computing”.
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 3–21.
springerlink.com c Springer-Verlag Berlin Heidelberg 2009
4 C.L. Mumford
The document goes on to discuss the “various aspects of the artificial intelligence
problem” in more detail, including computers and computer programming, natural
language processing, neural networks, the theory of computation, the need for au-
tomatic self-improvement, and aspects of abstraction and creativity. Most of these
Synergy in Computational Intelligence 5
topics remain active research issues to this day. However, the assumption that hu-
man intelligence can be simulated by machine was perhaps a little overoptimistic.
Indeed, it is one of the “big questions” remaining in computer science.
The two decades following the 1956 conference saw many high profile AI re-
search projects, for example, the development of the LISP and PROLOG program-
ming languages, the SHRDLU “microworlds” project, and the first expert systems
(see standard texts on AI, such as [20, 21], for more information). Although few
could argue that these projects had produced some highly successful results, and
useful applications, there was, nevertheless, a general feeling of disappointment at
the time, that the AI community had in some sense “failed to deliver”. This percep-
tion was effectively articulated in a report to the British Science Research Council
by the British academic James Lighthill in 1973 [14]:
In no part of the field have discoveries made so far produced the major impact that was
then promised.
In essence, the so-called “Lighthill Report” stated that AI researchers had failed to
address the issue of the combinatorial explosion, i.e., AI techniques may work on
small problem domains, but the techniques do not scale up well to solve more real-
istic problems. Following this very pessimistic view, the Science Research Council
slashed funding for AI projects in the UK. Although a rather more optimistic view
prevailed in much of the rest of the world, and major new investments continued
throughout the 1980s (e.g., CYC in the USA [13], and the Fifth Generation Com-
puter Systems project in Japan [6]). AI was becoming an increasingly fragmented
study, consisting of many disciplines, such as reasoning, knowledge engineering,
planning, learning, communication, perception, and so on. Despite the many suc-
cesses that had been achieved using expert systems, logic programming, neural net-
works etc., it was blatantly obvious that the dream of properly emulating human
intelligence had never come close to being realized. It was time to perhaps “move
on” and capitalize on the substantial achievements provided by some of the “off-
shoots” of AI, and leave behind the very negative image that had become so closely
associated with the term “AI” itself, not so much because AI had failed per se, but
rather because of the over-inflated expectations that had become intrinsically tied
up with the notion of it.
Bezdek’s view of CI was as a system that exhibited some form of “intelligence”,
yet dealt with numerical (low level) data, as opposed to “knowledge”, and in this
sense differed from traditional Artificial Intelligence. Nevertheless, the view of
Bezdek was very much focussed towards his personal research interests of pattern
recognition and neural networks. In the following years the term “CI” became firmly
established when it was adopted by the IEEE (the Institute of Electrical and Elec-
tronic Engineers), and in 2004 the Computational Intelligence Society (CIS) was
established (as a name change from the Neural Network Society). The slogan of the
IEEE CIS is “mimicking nature for problem solving”, and its scope is stated as:
The Field of Interest of the Society shall be the theory, design, application, and de-
velopment of biologically and linguistically motivated computational paradigms em-
phasizing neural networks, connectionist systems, genetic algorithms, evolutionary
6 C.L. Mumford
programming, fuzzy systems, and hybrid intelligent systems in which these paradigms
are contained.
Some interesting further discussions on the birth of AI and CI, and on some of
the important philosophical issues on the essence of intelligence can be found in
Chapter 2 of this book.
In this section we will look briefly at the following key CI paradigms: Evolutionary
Algorithms, Neural Networks, Fuzzy Systems and Multi-Agent Systems. This will
be followed by a short summary covering some other important techniques included
by various authors in this collection.
artificial life [12], evolvable hardware [8], ant systems [4] and particle swarms [10]
(Chapter 20), to name but a few. Artificial Immune Systems (Chapter 16) have also
become a popular topic for research in recent years, drawing analogies with some
of the ingenious problem-solving mechanisms observed in natural immune systems
and applying them to a broad range of real-world problems. In addition, there are
many examples of hybrid (or memetic) approaches where problem specific heuris-
tics, or other techniques such as neural networks, fuzzy systems, or simulated an-
nealing, have been incorporated into a GA framework. Thus, due to the growth
in popularity of search and optimization techniques inspired by natural evolution
during the last few decades, it is now common practice to refer to the field as evo-
lutionary computing and to the various techniques as evolutionary algorithms. In
addition, evolutionary techniques for simultaneously optimizing several objectives
have recently become popular. These approaches, collectively known as multi-
objective evolutionary algorithms [3] are very effective at balancing the frequently
conflicting objectives to produce excellent trade-off solutions, from which a human
decision maker can make an informed choice. Chapters 3 and 5 deal with multi-
objective optimization problems.
Parallel evolutionary algorithms are discussed in Chapter 17. The analogy with
natural population structures and their geographical distributions make parallel im-
plementations highly desirable, to speed up processing and to facilitate complex
emergent behaviour from simple components within the distributed populations.
Given the range of EAs mentioned above, it is not perhaps surprising that there is
no rigorous definition of the term “evolutionary algorithm” that everyone working in
the field would agree on. There are, however, certain elements that the more generic
types of EA tend to have in common:
1. a population of chromosomes encoding candidate solutions to the problem in
hand,
2. a mechanism for reproduction,
3. selection according to a fitness, and
4. genetic operators.
Figure 1 gives an outline of a generic EA. The process is initialized with a starting
population of candidate solutions. The initial population is frequently generated by
some random process, but may be produced by constructive heuristic algorithms, or
by other methods. Once generated, the candidate solutions are evaluated to establish
the quality of each solution, and based on this quantity, a fitness value will be com-
puted, in such a way that better quality solutions will be assigned higher values for
their fitness. Individuals will next be selected from the population to form the par-
ents of the next generation, and these will be duplicated and placed in a mating pool.
The selection process is frequently biased, so that fitter individuals are more likely
to be chosen than their less fit counterparts. Genetic operators are then applied to the
individuals in the mating pool. The idea is to introduce new variation, without which
no improvement is possible. Recombination (also known as crossover) is achieved
by combining elements of two parents to form new offspring. Mutation, on the other
hand, involves very small random changes made to solutions. The final stage in the
8 C.L. Mumford
cycle requires the population is updated with new individuals. Depending on the
style of the EA, this may involve replacing the parent population in its entirety, or
partial replacement is favoured by some researchers - perhaps replacing the poorest
10 % of the population by the best offspring, for example. A good general text on
evolutionary algorithms is Eiben and Smith [5].
stage, and these are frequently referred to as “self organizing networks”. Kohonen
nets are the best known example of this type. In reinforcement learning data is not
usually available. Instead the aim is to discover a policy for selecting actions that
minimize some measure of long-term cost. A schematic neural network is illus-
trated in Figure 2. For more details on ANN see Mehrotra, Mohan, and Ranka [16].
Chapters 12, 13 and 22 all utilize neural networks, in one form or another.
• a set of inputs
• a fuzzification system, for transforming the raw inputs into grades of member-
ships of fuzzy sets
• a set of fuzzy rules
• an inference system - to activate the rules and produce their outputs
• a defuzzification system - to produce one or more final crisp outputs
We will now look at a simplistic fuzzy system: a fuzzy controller for room tem-
perature.
The fuzzy set membership diagram in Figure 3 characterizes three functions,
identifiable as subranges of temperature: cold, warm and hot. Suppose we wish to
keep a room at a comfortable temperature (warm) by building a control system to
adjust a room heater. We can see in Figure 3 how each function maps the same tem-
perature value to a truth value in the 0 to 1 range, so that any point on that scale has
three “truth values”, one for each of the three functions. It is these truth values that
are used to determine how the room temperature should be controlled. The vertical
line in the diagram represents a particular temperature, t. At this temperature it is
easy to observe the degree of membership to “hot” (red) is zero, this temperature
may be interpreted as “not hot”. Membership of “warm” is about 0.7, and this may
be described as “fairly warm”. Similarly, examining membership of the “cold” func-
tion gives a value of about 0.15, which may describe it as “slightly cold”. Adjectives
such as “fairly” and “slightly”, used to modify functions are referred to as “hedges”,
and can be a useful way to specify subregions of the functions to which they are
applied.
To operate our fuzzy temperature control system, we require a number of fuzzy
IF-THEN rules, in the form of “IF variable IS property THEN action”. For example,
an extremely simple temperature regulator that uses a heater might look like this:
Clearly, the simple temperature controller described above is for illustration only,
and practical fuzzy systems will typically be made up from many more rules - per-
haps hundreds or even thousands. In these more sophisticated systems, it is likely
that the fuzzy rule set will be less “flat”, and form more of a hierarchy, so that the
outputs of some rules provide inputs to others. Systems with large rule sets will
probably require more sophisticated inference systems to ensure the efficient pro-
cessing of the rules, in a reasonable order.
To complete this section, it is worth mentioning a variation of fuzzy sets called
rough sets. Rough Set Theory was introduced in the early 1980s by Zdzislaw Pawlak
[18]. The basic idea is to take concepts and decision values, and create rules for
upper and lower boundary approximations of the set. With these rules, a new object
can easily be classified into one of the regions. Rough sets are especially helpful
in dealing with vagueness and uncertainty in decision situations, and for estimating
missing data. Uses include data mining, stock market prediction and financial data
analysis, machine learning and pattern recognition.
For further reading on fuzzy systems [17] is a good introductory text. Also Chap-
ter 4 in the present book, provides a good background to many of the important
concepts, and chapters 3, 5, 18, and 22 also cover aspects of fuzzy systems.
The present chapter, by Christine Mumford, introduces the book and begins Part I.
It begins with a brief history of Artificial Intelligence and discusses the origins of
the term “Computational Intelligence”. Then follows an introduction to the main
Computational Intelligence paradigms used by the various authors in the book; and
finally, the chapter concludes with short summaries of all the individual chapters.
The main objective of the electric power dispatch problem is to schedule the avail-
able generating units to meet the load demand at minimum cost, while satisfying all
constraints. However, thermal plants are a major source of atmospheric pollution.
Recently the pollution minimization problem has attracted a lot of attention as the
public demand clean air. Mohammad Abido explores the use of evolutionary multi-
objective optimization to minimize cost and pollution, simultaneously. Furthermore,
he uses fuzzy set theory to select the “best” compromise solution from the trade-off
solution set.
14 C.L. Mumford
Two alternative ways of integrating fuzzy logic and evolutionary algorithms are dis-
cussed in detail by F. Herrera, M. Lozano in this chapter. The first one, called a ge-
netic fuzzy system (GFS) consists of a fuzzy rule based system (FRBS) augmented
by a learning process based on evolutionary algorithms. In the second approach,
fuzzy tools and fuzzy logic-based techniques are used for modeling different evolu-
tionary algorithm components and also for adapting evolutionary algorithm control
parameters, with the goal of improving performance. The evolutionary algorithms
resulting from the second type of integration are called fuzzy evolutionary algo-
rithms. This chapter includes some excellent introductory material on fuzzy logic,
as well as a summary of state-of-the-art with respect to genetic fuzzy systems and
fuzzy evolutionary algorithms. The potential benefits derived from the synergy be-
tween evolutionary algorithms and fuzzy logic are made clear.
Hisao Ishibuchi and Yusuke Nojima describe the two conflicting goals in the design
of fuzzy rule-based systems: one is accuracy maximization, and the other is com-
plexity minimization. Generally, complex rules and large rule sets promote accuracy,
and smaller rule sets with simple rules reduce complexity. The authors discuss the
trade-off relation between these two goals, i.e., that improving the accuracy of a rule
set will simultaneously increase its complexity. This chapter explains how various
studies in multiobjective genetic fuzzy systems have experimented with the provi-
sion of non-dominated trade-off solutions, each solution being a complete candidate
rule set for the decision maker’s consideration. These rule sets will range from the
simplest and least accurate to the most complex and most accurate.
and John Woodward look at the use of Genetic Programming to automatically gen-
erate heuristics for a given problem domain.
James Borrett and Edward Tsang demonstrate the potential of adaptive constraint
satisfaction in this chapter, using a technique known as algorithmic chaining. It is
recognised that some constraint satisfaction instances are much easier to solve than
others, and thus it makes sense to apply a simple and fast algorithm, whenever such
an approach is adequate for solving the instance in question. However, when faced
with exceptionally hard problem instances, a more complex (and slower) approach
may be required. Algorithmic chaining presents a sequence of algorithms, which
are applied to a problem instance in turn, if and when required. Thus, if the first
algorithm is unsuccessful, the second in the sequence will be tried, and then the
third, if required, and so on. The chapter describes the “Reduced Exceptional Be-
haviour Algorithm” (REBA), which is a technique based on algorithmic chaining.
The REBA algorithm makes use of a mechanism for predicting when thrashing type
behaviour is likely to occur, and results presented within the chapter clearly demon-
strate the effectiveness of the approach in reducing susceptibility to exceptionally
hard problem instances.
design agents from different disciplines are required. The particular characteristics
of such decision support systems must include immunity to catastrophic failures and
sudden collapse that are usually observed in complex systems. This chapter, written
by Mahmoud Efatmaneshnik and Carl Reidsema, lays the conceptual framework
for IMMUNE as a robust collaborating design environment. Agents in IMMUNE
are adaptive and can change their negotiation strategy and in this way can con-
tribute to the overall capability of the design system to maintain its problem solving
complexity.
Mair Allen-Williams and Nicholas R Jennings consider the problem of agent coor-
dination in uncertain and partially observable systems. They present an approach to
this problem using a Bayesian learning mechanism, and demonstrate its effective-
ness on a cooperative scenario from the disaster response domain.
In a multi-agent system (MAS), agents that possess different expertise and re-
sources collaborate together to handle problems which are too complex for indi-
vidual agents. Generally, agent collaborations in a MAS can be classified into two
groups, namely agent cooperation and agent competition. In this chapter Minjie
Zhang, Quan Bai, Fenghui Ren and John Fulcher introduce two main approaches
for complex problem solving via agent cooperation and/or competition, these be-
ing (i) a partner selection strategy among competitive agents, and (ii) dynamic team
forming strategies among cooperative agents.
Recent studies in social psychology indicate that people are predisposed to form
impressions of a person’s social status, abilities, dispositions, and character traits
based on nothing more than that person’s facial appearance. In this chapter Sheryl
Brahnam and Loris Nanni present their work on building machine models of hu-
man perception, aimed at recognizing traits (such as dominance, intelligence, matu-
rity, sociality, trustworthiness, and warmth) simply by observing human faces. They
demonstrate that ensembles of classifiers work better than single classifiers, and also
that ensembles composed of 100 Levenberg-Marquardt neural networks (LMNNs)
Synergy in Computational Intelligence 17
seem to be as capable as most individual human beings are in their ability to predict
the social impressions certain faces make on the average human observer.
In this chapter Giorgio Biagetti, Paolo Crippa, Francesco Gianfelici and Claudio
Turchetti suggest a new algorithm for the identification of distributed systems by
large scale collaborative sensor networks. They describe how recent advances in
hardware technologies have made it possible to realize low-power low-cost wireless
devices and sensing units that are able to detect information from the distributed
environment. Even though individual sensors can only perform simple local com-
putation and communicate over a short range at low data rate, when deployed in
large numbers they can form an intelligent collaborative network interacting with
the surrounding environment in a large spatial domain. Sensor networks (SNs) char-
acterized by low computational complexity, great learning capability, and efficient
collaborative technology are highly desirable to discriminate, regulate and decide
actions on real phenomena in many applications such as environmental monitoring,
surveillance, factory instrumentation, defence and so on.
protected communities, the authors point out that modern people may escape the
information avalanche by forming virtual communities without relinquishing most
of the benefits of the latest information and computer technology. A communication
middleware to obtain this result is represented by opportunistic networks.
This chapter, written by Emma Hart, Chris McEwan and Despina Davoudani, sug-
gests some novel ways in which the natural immune system metaphor could be
exploited to build new types of computational systems capable of meeting some of
the challenges of the 21st Century, including self-configuration, self-maintenance,
self-optimization and self-protection in an ever-changing environment. The authors
focus particularly on aspects of the natural immune system which appear to have
been largely overlooked by the artificial immune systems (AIS) research community
in the past, and place significant emphasis on the design of systems rather than algo-
rithms. The article puts forward some possible reasons why the potential promised
by AIS has not yet been delivered, and suggests how this might be addressed in
the future. The arguments are particularly relevant in light of recent advances in
technology which present a new and challenging range of problems to be solved.
A number of examples of systems in which steps are currently being taken to im-
plement some of the mechanisms are then described. The chapter concludes with
a discussion of an emerging field, that of immuno-engineering which promises a
methodology which will facilitate maximum exploitation of immune mechanisms
in the future.
In this second chapter by Heinz Mühlenbein, the author focusses on the nature
and importance of spatial interactions in evolutionary computation, and he also
Synergy in Computational Intelligence 19
The four chapters in this section cover various aspects of pattern recognition, clus-
tering and data mining.
In this chapter Claudia Hundertmark, Lothar Jänsch and Frank Klawonn present a
prototype-based fuzzy clustering approach that allows the automatic detection of
regulatory regions within individual proteins. Cellular processes are mediated by
proteins acting e.g. as enzymes (catalysts) in different metabolic pathways. Modi-
fications are regularly made to specific regions of proteins within a living cell after
that protein has been manufactured. The purpose of these post-translational mod-
ifications is to provide regulatory effects that will control the binding and activity
properties of the modified proteins. In other words, the same protein will behave
differently depending on the specific modifications made to it after its creation. Fol-
lowing the digestion of proteins into fragments (peptides), which is a necessary first
stage of the work, the approach described in this chapter utilises likelihood curves
to summarise the regulatory information of the peptides, based on a noise model
obtained by an analytical process. Since the algorithm for the detection of peptide
clusters is based on fuzzy clustering, their collaborative approach combines proba-
bilistic concepts as well as principles from soft computing. However, fuzzy cluster-
ing is usually based on data points and its application to likelihood curves provided
a considerable challenge for the authors. An interesting feature of this work is its
potential transferability to noisy data from other applications, provided the noise
can be specified by a noise model.
Mudassar Iqbal, Alex A. Freitas and Colin G. Johnson propose a new hybrid data
mining method for predicting protein-protein interactions in this chapter. The pur-
pose is to predict unknown protein interactions using relevant genomic informa-
tion currently available. The new technique combines Likelihood-Ratios with rule
20 C.L. Mumford
induction algorithms and uses rule induction to discover the rules to partition the
data. The discovered rules are subsequently interpreted as “bins” and used to com-
pute likelihood ratios. In this way a rule induction algorithm learns classification
rules, and these learned rules are used to improve the effectiveness of a likelihood
ratio-based classifier, which is used to predict unknown protein interactions.
Esin Saka and Olfa Nasraoui begin their chapter with a brief survey of swarm in-
telligence clustering algorithms, and point out that since the early 90s, swarm in-
telligence (SI) has been a source of inspiration for clustering problems, and has
been used in many applications ranging from image clustering to social clustering,
and from document clustering to Web session clustering. The chapter then focuses
mainly on a recent development: simultaneous data visualization and clustering us-
ing flocks of agents. The chapter presents some improvements to previous algo-
rithms of this type and proposes a hybrid approach. Experiments on both artificial
and real data confirm the validity of the approach and the advantages of the variants
proposed in this chapter.
References
1. Bezdek, J.C.: On the relationship between neural networks, pattern recognition and in-
telligence. International Journal of Approximate Reasoning 6, 85–107 (1992)
2. Bezdek, J.C.: What is computational intelligence? In: Zurada, J.M., Marks II, R.J.,
Robinson, C.J. (eds.) Computational Intelligence Imitating Life, pp. 1–12. IEEE Press,
Los Alamitos (1994)
3. Deb, K.: Multi-objective optimization using evolutionary algorithms. John Wiley and
Sons, Chichester (2001)
4. Dorigo, M., Maniezzo, V., Colorni, A.: The ant system: Optimization by a colony of
cooperating agents. IEEE Trans. System Man Cybernetics Part B 26, 29–41 (1996)
5. Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Heidelberg
(2003)
6. Feigenbaum, E.A., McCorduck, P.: The Fifth Generation: Artificial Intelligence and
Japan’s Computer Challenge to the World. Addison-Wesley, Reading (1983)
7. Fogel, L., Owens, A., Walsh, M.: Artificial intelligence through simulated evolution.
John Wiley, Chichester (1966)
8. Greenwood, G.W., Tyrrell, A.M.: Introduction to Evolvable Hardware: A Practical Guide
for Designing Self-Adaptive Systems. Wiley-IEEE Press, Chichester (2006)
9. Holland, J.H.: Adaptation in natural and artificial systems. The University of Michigan
Press, Ann Arbor (1975)
10. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE Interna-
tional Conference on Neural Networks, Piscataway, NJ, pp. 1942–1948 (1995)
11. Koza, J.: Genetic programming. MIT Press, Cambridge (1992)
12. Langton, C. (ed.): Artificial life: An overview. MIT Press, Cambridge (1995)
13. Lenat, D.B.: Cyc: A Large-Scale Investment in Knowledge Infrastructure. Communica-
tions of the ACM 38(11) (November 1995)
14. Lighthill, J.: Artificial Intelligence: A General Survey. In: Artificial Intelligence: a paper
symposium, Science Research Council, UK (1973)
15. McCarthy, J., Minsky, M.L., Rochester, N., Shannon, C.E.: A proposal for the Dart-
mouth summer research project on artificial intelligence, Stanford University, August 31
(1955), http://www-formal.stanford.edu/jmc/history/dartmouth/
dartmouth.html
16. Mehrotra, K., Mohan, C.K., Ranka, S.: Elements of Artificial Neural Networks. MIT
Press, Cambridge (1996)
17. Nguyen, T.H., Walker, E.A.: A First Course in Fuzzy Logic, 3rd edn. Chapman and Hall,
Boca Raton (2006)
18. Pawlak, Z.: Rough sets. Int. J. Computer and Information Sci. 11, 341–356 (1982)
19. Rechenberg, I.: Cybernetic solution path of an experimental problem. Technical Report
Translation number 1122, Ministry of Aviation, Royal aircraft Establishment, Farnbor-
ough, Hants, UK (1965)
20. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice
Hall, NJ (2003)
21. Winston, P.H.: Artificial Intelligence, 3rd edn. Addison Wesley, MS (1992)
22. Wooldridge, M.: An Introduction to MultiAgent Systems. John Wiley & Sons Ltd, NY
(2002)
23. Zadeh, L.A.: Fuzzy Sets. Information and Control 8, 338–353 (1965)
Computational Intelligence: The Legacy of Alan
Turing and John von Neumann
Heinz Mühlenbein
1 Introduction
Human intelligence can be divided into individual, collaborative, and collective in-
telligence. Individual intelligence is always multi-modal, using many sources of in-
formation. It developed from the interaction of the humans with their environment.
Based on individual intelligence, collaborative intelligence developed. This means
that humans work together with all the available allies to solve problems. On the
next level appears collective intelligence. It describes the phenomenon that fami-
lies, groups, organizations and even entire societies seem to act as a whole living
organism.
Heinz Mühlenbein
Fraunhofer Institut Autonomous intelligent Systems Schloss Birlinghoven 53757
Sankt Augustin, Germany
e-mail: [email protected]
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 23–43.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
24 H. Mühlenbein
Natural organism are, as a rule, much more complicated and subtle, and there-
fore much less well understood in detail, than are artificial automata. Nevertheless,
some regularities, which we observe in the organization of the former may be quite
instructive in our thinking and planning of the latter; and conversely, a good deal of
our experiences and difficulties with our artificial automata can be to some extend
projected on our interpretations of natural organisms.
In this chapter I will first review the work of Alan Turing, described in his famous
seminal paper “Computing Machinery and Intelligence” [23] and in the not so well
known paper “Intelligent Machinery” [24]. Turing’s thoughts about learning, evo-
lution, and structure of the brain are described.
Then I will discuss the most important paper of John von Neumann concern-
ing our subject “The General and Logical Theory of Automata” [25]. John von
Neumann’s research centers on artificial automata, computability, complexity, and
self-reproduction
Computational Intelligence: The Legacy of Alan Turing and John von Neumann 25
All three papers were written before the first electronic computers became avail-
able. Turing even wrote programs for paper machines. As a third example I will
describe the proposal of John Holland [10]. The simplification of this proposal lead
later to the famous genetic algorithm [11]. The historical part ends with a discussion
of the early research of Newell, Shaw and Simon.
I will first discuss this early research in detail, without reference to today’s knowl-
edge. Then I will try to evaluate the proposals in answering the following questions
• What are the major ideas for creating machine intelligence?
• Did the original proposals lack important components we see as necessary today?
• What are the major research problems of the proposals and do solutions exist
today?
Then two recent large projects are shortly summarized. The goal of the project
Cyc is to specify common sense knowledge in a well-designed language The Cog
project tried to build a humanoid robot that acts like a human. In addition the archi-
tecture of our hand-eye robot JANUS is described. It has a modular structure similar
to the human brain.
This chapter is a tour de force in computational intelligence. It requires that the
reader is willing to contemplate fundamental problems arising in building intel-
ligent machines. Solutions are not given. I hope that the reader finds interesting
research problems worthy of being investigated. This paper extends my research
started in [15].
I propose to consider the question “Can machines think?” This should begin with
definitions of the meaning of the terms “machine” and “think”....But this is absurd.
Instead of attempting such a definition I shall replace the question by another, which
is closely related to it and is expressed in relatively unambiguous words. The new
form of the question can be described in terms of a game which we call the imitation
game.
Turing’s definition of the imitation game is more complicated than that normally
used today. Therefore I will describe it shortly. It is played with three actors, a man
(A), a woman (B) and an interrogator (C). The object of the game for the interrogator
is to determine which of the other two is the man and which is the woman. It is
A’s objective in the game to cause C to make the wrong identification. Turing then
continues: “We now ask the question ‘What will happen when a machine takes the
part of A in the game?’” Will the interrogator decide wrongly as often when the
game is played like this as he does when the game is played between a man and a
woman? These questions will replace our original “Can machines think”.
26 H. Mühlenbein
Why did Turing not define just a game between a human and a machine trying to
imitate a human, as the Turing test is described today? Is there an additional trick
in introducing gender into the game? There has been quite a lot of discussions as to
whether this game characterizes human intelligence at all. Its purely behavioristic
definition leaves out any attempt to identify important components which together
produce human intelligence. I will not enter this discussion here, but just state the
opinion of Turing about the outcome of the imitation game.
I believe that in about fifty years’ time it will be possible to programme computers
with a storage capacity of about 109 bits to make them play the imitation game so
well that an average interrogator will not have more than 70% chance of making
the right identification after five minutes of questioning.
The very detailed prediction is funny: Why a 70% chance, why a duration of five
minutes? In the next section I will discuss what arguments Turing used to support
this prediction.
Turing did not see any problems in creating machine intelligence purely by pro-
gramming, he just found it too time consuming. So he investigated if there exist
more expeditious methods. He observed:
“In the process of trying to imitate an adult human mind we are bound to think
a good deal about the process which has brought it to the state that it is in. We may
notice three components.
1. The initial state of the brain, say at birth.
2. The education to which it has been subjected.
3. Other experience, not to be described as education, to which it has been been
subjected.
Instead of trying to produce a programme to simulate an adult mind, why not
rather try to produce one which simulates the child’s...Presumably the child brain is
something like a notebook. Rather little mechanism, and lots of blank sheets. Our
hope is that there is so little mechanism in the child brain that something like it can
easily be programmed. The amount of work in the education we can assume, as a
first approximation, to be much the same as for the human child.”
In order to speed up learning Turing demanded that the child machine should un-
derstand some language. In the final pages of the paper Turing discusses the problem
of the complexity the child machine should have. He proposes to try two alterna-
tives: either to make it as simple as possible to allow learning or to include a com-
plete system of logical inference. He ends his paper with the remarks: “Again I do
not know the answer, but I think both approaches should be tried. We can see only
a short distance ahead, but we can see plenty there that needs to be done.”
The states from the units from which the input comes are taken from the previous
moment, multiplied together and the result is subtracted from 1.
Thus a neuron is nothing else than a NAND gate. The state of the network is
defined by the states of the units. Note that the network might have lots of loops,
it continually goes through a number of states until a period begins. The period
cannot exceed 2N cycles. In order to allow learning the machine is connected with
some input device which can alter its behavior. This might be a dramatic change of
the structure, or changing the state of the network.
Maybe Turing had the intuitive feeling that the basic transition of the type A
machine is not enough, therefore he introduced the more complex B-type machine.
I will not describe this machine here, because neither for the A or the B machine did
Turing define precisely how learning can be done.
A learning mechanism is introduced with the third machine, called a P-type ma-
chine. The machine is an automaton with a number of N configurations. There exists
a table where, for each configuration, the action the machine has to take is specified.
The action may be either
1. To do some externally visible act A1 , . . . Ak
2. To set a memory unit Mi
The reader should have noticed that the next configuration is not yet specified.
Turing surprisingly defines: If the current configuration is s, then the next configura-
tion is the remainder of 2s or 2s + 1 on division by N. These two configurations are
Computational Intelligence: The Legacy of Alan Turing and John von Neumann 29
called the alternatives 0 and 1. The reason for this definition is the learning mech-
anism Turing defines. At the start the description of the machine is largely incom-
plete. The entries for each configuration might be in five states, either U (uncertain),
or T0 (try alternative 0), T1 (try alternative 1), D0 (definite 0) or D1 (definite 1).
Learning changes the entries as follows: If the entry is U, the alternative is chosen
at random, and the entry is changed to either T0 or T1 according to whether 0 or
1 was chosen. For the other four states, the corresponding alternatives are chosen.
When a pleasure stimulus occurs, state T is changed to state D, when a pain stimu-
lus occurs, T is changed to U. Note that state D cannot be changed. The proposed
learning method sounds very simple, but Turing surprisingly remarked:
Today the universal machine is called the Turing Machine. Turing even gave
some details of this particular P-type machine. Each instruction consisted of 128
digits, forming four sets of 32 digits, each of which describes one place in the main
memory.
If the untrained infant’s mind is to become an intelligent one, it must acquire both
discipline and initiative.
Discipline means strictly obeying the punishment and reward. But what is initia-
tive? The definition of initiative is typical of Turing’s behavioristic attitude. “Disci-
pline is certainly not enough in itself to produce intelligence. That which is required
in addition we call initiative. This statement will have to serve as a definition. Our
task is to discover the nature of this residue as it occurs in man, and to try and copy
it in machines.”
With only a paper computer available Turing was not able to investigate the sub-
ject initiative further. Nevertheless he made the bold statement [24]: “A great posi-
tive reason for believing in the possibility of making thinking machinery is the fact
that it is possible to make machinery to imitate any small part of a man. One way
of setting about our task of building a thinking machine would be to take a man
as a whole and to try to replace all parts of him by machinery...Thus although this
method is probably the ‘sure’ way of producing a thinking machine it seems to be
altogether too slow and impracticable. Instead we propose to try and see what can be
done with a ‘brain’ which is more or less without a body providing, at most organs
of sight, speech, and hearing. We are then faced with the problem of finding suitable
branches of thought for the machine to exercise its powers in.”
30 H. Mühlenbein
In 1938 Alan Turing was assistant to John von Neumann. But later they worked
completely independently from each other, not knowing the thoughts the other had
concerning the possible applications of the newly designed electronic computers. A
condensed summary of the research of John von Neumann concerning machine in-
telligence is contained in his paper “The General and Logical Theory of Automata”
[25]. This paper was presented in 1948 at the Hixon symposium on: Cerebral mech-
anism of behavior. Von Neumann was the only computer scientist at this sympo-
sium. The reason was that von Neumann closely observed the theoretical research
aimed to understand the brain in order to use the results for artificial automata.
Von Neumann notices three major limitations of the present size of artificial
automata
• The size of componentry
• The limited reliability
• The lack of a logical theory of automata
There have been tremendous achievements in the first two areas. Therefore I will
concentrate on the theory problem. Here von Neumann predicted:
The logic of automata will differ from the present system of formal logic in two
relevant respects.
1. The actual length of “chains of reasoning”, that is, of the chains of operations,
will have to be considered.
2. The operations of logic will all have to be treated by procedures which allow
exceptions with low but non-zero probabilities.
...This new system of formal logic will move closer to another discipline which has
been little linked in the past with logic. This is thermodynamics, primarily in the
form it was received from Boltzmann, and is that part of theoretical physics which
comes nearest in some of its aspects to manipulating and measuring information.
Computational Intelligence: The Legacy of Alan Turing and John von Neumann 31
Von Neumann tried later to formalize probabilistic logic. His results appeared
in [26]. But this research was more or less a dead end, because von Neumann did
not abstract from the hardware components. They are unreliable and have a certain
probability of failure. In addition, von Neumann included time in his model, making
a mathematical analysis of a given system difficult. Probabilistic reasoning is now
heavily used in artificial intelligence [17]. The chains of operations are investigated
in a branch of theoretical computer science called computational complexity [8].
Now it is perfectly possible that the simplest and only practical way to say what
constitutes a visual analogy consists in giving a description of the connections of
the visual brain....It is not at all certain that in this domain a real object might not
constitute the simplest description of itself.
Von Neumann ends the section with the sentence: “The foregoing analysis shows
that one of the relevant things we can do at this moment is to point out the directions
in which the real problem does not lie.” In order to understand and investigate the
fundamental problem, von Neumann identified an important subproblem. In nature
it is obvious that more complex beings have been developed from less complex ones.
Is this also possible using automata? How much complexity is needed for automata
to create more complex ones?
Von Neumann starts the discussion of complexity with the observation that if an
automaton has the ability to construct another one, there must be a decrease in com-
plication. In contrast, natural organisms reproduce themselves, that is, they produce
new organisms with no decrease in complexity. So von Neumann tries to construct
a general artificial automata which could reproduce itself. The famous construction
consists of the following automata:
1. A general constructive machine, A, which can read a description Φ (X) of another
machine, X, and build a copy of X from this description:
A + Φ (X) ; X
B + Φ (X) ; Φ (X)
3. A control machine, C, which when combined with A and B, will first activate B,
then A, link X to Φ (X) and cut them loose from A+B+C
A + B + C + Φ (X) ; X + Φ (X)
A + B + C + Φ (A + B + C) ; A + B + C + Φ (A + B + C)
A + B + C + Φ (A + B + C + D) ; A + B + C + D
+Φ (A + B + C + D)
A + B + C + Φ (A + B + C + D ) ; A + B + C + D
+Φ (A + B + C + D )
This fact, that complication, as well as organization, below a critical level is degen-
erative, and beyond that level can become self-supporting and even increasing, will
clearly play an important role in any future theory of the subject.
Von Neumann was well aware of the other two important evolutionary processes
besides replication - namely variation and selection. He decided that knowledge
about these two processes was not yet sufficient to incorporate them in his the-
ory of automaton. “Conflicts between independent organisms lead to consequences
which, according to the theory of natural selection, are believed to furnish an im-
portant mechanism of evolution. Our models lead to such conflict situations. The
conditions under which this motive for evolution can be effective here may be quite
complicated ones, but they deserve study.”
Cellular automata have lead to great theoretical research. They can easily be ex-
tended to have the power of Turing machines. Nevertheless, the central problem
of this approach remains unsolved: How can the automata evolve complex prob-
lem solving programs starting with fairly simple initial programs? This happened
in biological evolution. Starting with small self-reproducing units complex problem
solving capabilities have evolved, culminating in the human brain.
In the paper “Outline for a Logical Theory of Adaptive Systems” [10] John Holland
tried to continue the scientific endeavor initiated by von Neumann. He wrote:
The theory should enable to formulate key hypotheses and problems particularly
from molecular control and neurophysiology. The work in theoretical genetics
should find a natural place in the theory. At the same time, rigorous methods of
automata theory, particularly those parts concerned with growing automata should
be used.
systems, he does not claim to solve grand challenge applications with the proposed
methods. This can be tried after the theories have been formulated and verified.
“Unrestricted adaptability (assuming nothing is known of the environment) re-
quires that the adaptive system be able initially to generate any of the programs of
some universal computer . . . With each generation procedure we associate the pop-
ulation of programs it generates;. . . In the same vein we can treat the environment as
a population of problems.”
Now let us have a closer look at Holland’s model. First, there is a finite set of gen-
erators (programs) (g1 , . . . , gk ). The generation procedure is defined in terms of this
set and a graph called a generation tree. Each permissible combination of generators
is represented by a vertex in the generation tree. Holland now distinguishes between
auxiliary vertices and main vertices. Each auxiliary vertex will be labeled with two
numbers, called the connection and disconnection probabilities. This technique en-
ables to create new connections or to delete existing connections. Each main vertex
is labeled with a variable referred to as density. The interested reader is urged to
read the original paper [10].
Holland claims that from the generation tree and the transition equations of any
particular generation procedure, one can calculate the expected values of the den-
sities of the main vertices as a function of time. Holland writes: “From the general
form of the transition equations one can determine such things as conditions under
which the resulting generation procedures are stationary processes.” Thus Holland
already tried to formulate a stochastic theory of program generation! This is an idea
still waiting to be explored.
Holland’s next extension of the system is similar in spirit to von Neumann’s self-
reproducing automata. Holland introduces supervisory programs which can con-
struct templates which alter the probabilities of connections. Templates play the
role of catalysts or enzymes. Thus program construction is also influenced by some
kind of “chemical reactions.”
The above process is not yet adaptive. Adaptation needs an environment posing
problems. Therefore Holland proposes that the environment is treated as a popula-
tion of problems. These problems are presented by means of a finite set of initial
statements and an algorithm for checking whether a purported solution of the prob-
lem is in fact a solution. Holland then observes the problems of partial solutions and
subgoals. “When we consider the interaction of an adaptive system with its environ-
ment we come very soon to questions of partial solutions, subgoals etc. The simplest
cases occur when there is an a priori estimate of the nature of the partial solution
and a measure of the closeness of its approach to the final solution.”
Holland then observes that a rich environment is crucial for the adaptation.
“Mathematical characterization of classes of rich environments relative to a given
class of adaptive systems constitutes one of the major questions in the study of adap-
tive systems. . . . An adaptive system could enhance its rate of adaptation by some-
how enriching the environment. Such enrichment occurs if the adaptive system can
generate subproblems or subgoals whose solution will contribute to the solution of
the given problems of the environment.”
Computational Intelligence: The Legacy of Alan Turing and John von Neumann 35
(p or q) → p
p → (p or q)
(p or q) → (q or p)
p and q are binary variables. Given any variable p we can form (not p) Given any
two variables we can form the expression (p or q) or p → q. From these axioms
theorems can be derived.
When the LT found a simpler proof of proposition 2.85 of Principia Mathematica,
Simon wrote to Russell: “We have accumulated some interesting experience about
36 H. Mühlenbein
I have reviewed only four of the early proposals which simulate natural systems
to create machine intelligence. One observation strikes immediately: all the re-
searchers investigated the problem of machine intelligence on a very broad scale.
The main emphasis of Turing was the design of efficient learning schemes. For Tur-
ing it was obvious that only by efficient learning of something like a child machine
an intelligent machine could be developed. The attitude of Turing was purely that
of a computer scientist. He firmly believed that machine intelligence equal to or
surpassing human intelligence could eventually be created.
Computational Intelligence: The Legacy of Alan Turing and John von Neumann 37
Von Neumann’s approach was more interdisciplinary, using also results from the
analysis of the brain. He had a similar goal, but he was much more cautious con-
cerning the possibility of creating an automaton with intelligence. He identified im-
portant problems which blocked the road to machine intelligence.
Both von Neumann and Turing investigated formal neural networks as a basic
component of an artificial brain. This component was not necessary for the design,
it was used only to show that the artificial automata could have a similar organization
as the human brain. Both researchers ruled out that a universal theory of intelligence
could be found, which would make it possible to program a computer according
to this theory. So Turing proposed to use learning as the basic mechanism, von
Neumann self-reproducing automata.
Von Neumann was sceptical about the creation of machine intelligence. He was
convinced that learning leads to the curse of infinite enumeration. While every sin-
gle behavior can be unambiguously described, there is obviously an infinite number
of different behaviors. Turing also saw the limitations of teacher based learning by
reward and punishment, therefore he required that the machine needs initiative in
addition. Turing had no idea how learning techniques for initiative could be im-
plemented. He correctly observed that it was necessary for creating machine intelli-
gence by learning. Higher-level learning methods are still an open research problem.
The designs of Turing and von Neumann contain all components considered nec-
essary today for creating machine intelligence. Turing ended his investigation with
the problem of learning by initiative. Von Neumann invented as a first step self-
reproducing cellular automata.
There is no major flaw in their designs. Von Neumann’s question - can visual
analogy be described in finite time and limited space, is still unsolved.
In order to make the above problem clear, let me formulate a conjecture: The
computational universe can be divided into three sectors: computable problems;
non-computable problems (that can be given a finite, exact description but have no
effective procedure to deliver a definite result); and, finally, problems whose indi-
vidual behaviors are, in principle, computable, but that, in practice, we are unable to
formulate in an unambiguous language understandable for a Turing machine. Many
non-computable problems are successfully approached by heuristics, but it seems
very likely that the problem of visual analogy belongs to the third class.
Holland proposed a general scheme for breeding intelligent programs using the
mechanisms of evolution. This was the most ambitious proposal using program gen-
eration by evolutionary principles to create intelligent machines. This proposal tried
to circumvent Turing’s problem to code all the necessary knowledge.
Let us try to contrast the approach of Turing with those of von Neumann and
Holland. Turing proposed to programme the knowledge the humans have. In order
to speed up the implementation he suggested to programme an automaton with only
child like intelligence. The automaton child is then taught to become more intelligent.
Von Neumann was skeptical if all the components necessary for human like in-
telligence could be programmed in finite time and finite space. Therefore von Neu-
mann started with the idea to automatically evolve automata. This idea was extended
by Holland proposing an environment of problems to evolve the automata. On first
38 H. Mühlenbein
sight this seems to solve the programming problem. Instead of copying human like
intelligence, an environment of problems was used. But Holland overlooked the
complexity of programming the problems. This would seem to be no easier than
programming the knowledge humans have about the environment.
Holland’s proposal to use stochastic systems, their steady-state equilibria and
homeostasis is in my opinion still a very promising approach for a constructive evo-
lution theory of automata. Holland himself never implemented his general model. It
is still a theoretical design.
Later von Neumann’s proposal has been extended insofar as both, the problem
solving programs and the problems evolve together [14]. This obviously happened
in natural evolution. In a new research discipline called artificial life several attempts
have been made to evolve automata and the environment together, but the evolution
always stopped very early.
Newell, Shaw and Simon concentrated on the higher level problem solving capa-
bilities of humans. Evolutionary principles or lower level structures like the human
brain are not considered to be relevant. Instead a theory of problem solving by hu-
mans is used. Their research lead to cognitive science and to artificial intelligence
research based on theories of intelligence. Despite their great optimism, no convinc-
ing artificial intelligence system has been created so far using this approach.
on predicate calculus and has a syntax similar to that of the Lisp programming
language.
Much of the current work on the Cyc project continues to be knowledge en-
gineering, representing facts about the world by hand, and implementing efficient
inference mechanisms on that knowledge. Increasingly, however, work at Cycorp
involves giving the Cyc system the ability to communicate with end users in natural
language, and to assist with the knowledge formation process via machine learning.
Currently (2007) the knowledge base consists of
• 3.2 million assertions (facts and rules)
• 280,000 concepts
• 12,000 concept-interrelating predicates
Cyc runs now for 32 years, it is the longest running project in the history of AI.
But despite its huge effort its success is still uncertain. Up to now Cyc has not been
successfully used for any broad AI application. The system is far away from being
used for a Turing test.
We remind the reader, that the coding of knowledge was considered by Turing as
too inefficient. Von Neumann even doubted if the necessary knowledge for visual
analogy could be specified in finite time. Today Cyc seems to be more a confirmation
of von Neumann’s doubt than a refutation.
The Cog project was started in 1993 with extreme publicity. The goal was to under-
stand human cognitive abilities well enough to build a humanoid robot that develops
and acts similar to a person [3, 4]. One of the key ideas of the project was to build
a robot with capabilities similar to a human infant. We have encountered this idea
already in Turing’s proposal.
“By exploiting a gradual increase in complexity both internal and external, while
reusing structures and information gained from previously learned behaviors, we
hope to be able to learn increasingly sophisticated behavior [4].” Cog was designed
bottom-up [3]. This lead to reasonable success in the beginning. The big problems
appeared later.
Brooks et al. wrote prophetically: To date (1999), the major missing piece of our
endeavor is demonstrating coherent global behavior from existing subsystems and
sub-behaviors. If all of these systems were active at once, competition for actua-
tors and unintended couplings through the world would result in incoherence and
interference among the subsystems [4].
During the course of the project a lot of interesting research has been done. But
the problem of coherent or even intelligent behavior could not be solved. Therefore
the project was stopped in 2002 without even entering the learning or development
phase.
40 H. Mühlenbein
Thus for a letter classification problem we might have 26 agents, each of which
is specialized in recognizing a particular letter in all distortions. Each agent uses a
number of filters. The learning method used by Selfridge was gradient descent for
adapting the weights for each filter used.
We have taken this general idea and extended it to a modular system of neural net-
works. The central new idea is self-assessment by reflection. Each module observes
its own behavior and produces information relating to the quality of its classifica-
tion. The architecture was very successful in a number of classification tasks, but in
the course of developing it more and more refinements had to be implemented. The
interested reader is referred to [21, 1, 2].
9 Conclusion
Today computational intelligence is divided into many fields e.g. evolutionary com-
putation, neural networks, fuzzy logic. These are further separated in a myriad of
42 H. Mühlenbein
specialized techniques. In this paper I have recalled the fundamental research issues
of machine intelligence by discussing the research of Alan Turing and John von
Neumann. They represent two positions popular till today. For Turing the creation
of machines with human-like intelligence was just a question of programming time.
He estimated that sixty programmers had to work for fifty years. John von Neumann
was more cautious. Using the example of visual analogy he doubted that human-like
intelligent machines could be programmed in finite time and space. This lead him
to the question if intelligent programs could automatically evolve by simulating
evolution. While von Neumann solved the problem of self-reproducing automata,
automata solving complex problems could not be yet obtained. I have identified the
major problem of this approach: the programming of the environment seems to be
as difficult as programming the human problem solving capabilities.
In my opinion it is not yet clear if Turing will be ultimately right that automata
with human like intelligence could be programmed. Up to now computational intel-
ligence was successful in specialized applications only, automata passing the Turing
test or understanding languages are not yet in sight.
References
1. Beyer, U., Smieja, F.J.: Data exploration with reflective adaptive models. Computat.
Statistics and Data Analysis 22, 193–211 (1996)
2. Beyer, U., Smieja, F.J.: Learning from examples, agent teams and the concept of reflec-
tion. International Journal of Pattern Recognition and Artificial Intelligence 10, 251–272
(1996)
3. Brooks, R.: From earwigs to humans. Robotics and Autonomous Systems 20, 291–304
(1997)
4. Brooks, R., Brezeal, C., Marjanovic, M., Scasselati, B., Williamson, M.: The Cog
project: Building a humanoid robot. In: Nehaniv, C.L. (ed.) CMAA 1998. LNCS (LNAI),
vol. 1562, pp. 52–87. Springer, Heidelberg (1999)
5. Burns, A.W.: Essays on Cellular Automata. University of Illinois Press, Urbana (1970)
6. Panton, K., Matuszek, C., Lenat, D., Schneider, D., Witbrock, M., Siegel, N., Shepard,
B.: Common sense reasoning – from cyc to intelligent assistant. In: Cai, Y., Abascal,
J. (eds.) Ambient Intelligence in Everyday Life. LNCS, vol. 3864, pp. 1–31. Springer,
Heidelberg (2006)
7. Eccles, J.C.: The Human Mystery. Springer, New York (1979)
8. Garey, M.R., Johnson, D.S.: Computers and intractability: a guide to the theory of NP-
completeness. Freeman, San Francisco (1979)
9. Holland, J.H.: Iterative circuit computers. In: Essays on Cellular Automata [5], pp. 277–
296
10. Holland, J.H.: Outline for a logical theory of adaptive systems. In: Essays on Cellular
Automata, [5], pp. 296–319.
11. Holland, J.H.: Adaptation in Natural and Artificial Systems. Univ. of Michigan Press,
Ann Arbor (1975/1992)
12. Lenat, D.B.: Cyc: A large-scale investment in knowledge infrastructure. Comm.
ACM 38, 33–38 (1995)
13. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent un nervous activity.
Bull. of Mathematical Biophysics 5, 115–137 (1943)
Computational Intelligence: The Legacy of Alan Turing and John von Neumann 43
14. McMullin, B.: John von Neumann and the evolutionary growth of complexity: Looking
backward, looking forward. . .. Artificial Life 6, 347–361 (2001)
15. Mühlenbein, H.: Towards a theory of organisms and evolving automata. In: Menon, A.
(ed.) Frontiers of Evolutionary Computation, pp. 1–36. Kluwer Academic Publishers,
Boston (2004)
16. Newell, A., Shaw, J.C., Simon, H.: Empirical explorations with the logic theory machine.
In: Proc. Western Joint Computer Conference, vol. 11, pp. 218–239 (1957)
17. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Infer-
ence. Morgan Kaufman, San Mateo (1988)
18. Selfridge, O.G.: Pandemonium: a paradigm for learning. In: Mechanisation of Thought
Processes, pp. 511–529. Her Majesty’s Stationery Office, London (1959)
19. Simon, H., Newell, A.: Heuristic problem solving: The next advance in operations re-
search. Operations Research 6, 1–10 (1958)
20. Simon, H.A.: Models of my Life. MIT Press, Boston (1991)
21. Smieja, F.J.: The pandemonium system of reflective agents. IEEE Transact. on Neural
Networks 7, 193–211 (1996)
22. Springer, S., Deutsch, G.: Left brain, right brain. W.H. Freeman, New York (1985)
23. Turing, A.M.: Computing machinery and intelligence. Mind 59, 433–460 (1950)
24. Turing, A.M.: Intelligent machinery. In: Meltzer, B., Michie, D. (eds.) Machine Intelli-
gence 6, pp. 3–23. Oxford University Press, Oxford (1969)
25. von Neumann, J.: The general and logical theory of automata. In: The world of mathe-
matics, pp. 2070–2101. Simon and Schuster, New York (1954)
26. von Neumann, J.: Probabilistic logics and the synthesis of reliable organs from unreliable
components. In: Annals of Mathematics Studies, vol. 34, pp. 43–99. Princeton University
Press, Princeton (1956)
27. von Neumann, J.: Theory of Self-Reproducing Automata. University of Illinois Press,
Urbana (1966)
Part II
Fusing Evolutionary Algorithms
and Fuzzy Logic
Multiobjective Evolutionary Algorithms for
Electric Power Dispatch Problem
Mohammad A. Abido
1 Introduction
Generally, the basic objective of the traditional economic dispatch (ED) of electric
power generation is to schedule the committed generating unit outputs so as to meet
the load demand at minimum operating cost while satisfying all generator and sys-
tem equality and inequality constraints. This makes the ED problem a large-scale
highly constrained nonlinear optimization problem.
Mohammad A. Abido
Electrical Engineering Department, King Fahd University of Petroleum & Minerals Dhahran
31261, Saudi Arabia
e-mail: [email protected]
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 47–82.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
48 M.A. Abido
However, thermal power plants are major causes of atmospheric pollution be-
cause of the high concentration of pollutants they cause such as sulphur oxides SOx
and nitrogen oxides NOx . Nowadays, the pollution minimization problem has at-
tracted a lot of attention due to the public demand for clean air. In addition, the
increasing public awareness of the environmental protection and the passage of the
U.S. Clean Air Act Amendments of 1990 have forced the power utilities to modify
their design or operational strategies to reduce pollution and atmospheric emissions
of the thermal power plants [17, 24, 43].
Several strategies to reduce the atmospheric emissions have been proposed and
discussed in the literature [43]. These include
• Installation of pollutant cleaning equipment such as gas scrubbers and electro-
static precipitators;
• Switching to low emission fuels;
• Replacement of the aged fuel-burners and generator units with cleaner and more
efficient ones;
• Emission dispatching.
The first three options require installation of new equipment and/or modification
of the existing ones that involve considerable capital outlay and, hence, they can
be considered as long-term options. The emission dispatching option is an attrac-
tive short-term alternative in which the emission, in addition to the fuel cost objec-
tive, is to be minimized. In recent years, this option has received much attention
[8, 10, 16, 18, 23] since it requires only a small modification of the basic economic
dispatch to include emissions. Thus, the power dispatch problem can be handled as
a multiobjective optimization problem with non-commensurable and contradictory
objectives, since the optimum solution of the economic power dispatch problem is
not environmentally the best solution.
Generally speaking, there are three approaches to solve the environmen-
tal/economic dispatch (EED) problem. The first approach treats the emission as a
constraint with a permissible limit. The second approach treats the emission as an-
other objective in addition to the usual cost objective, and the problem is converted
to a single objective problem either by linear combination of both objectives or by
considering one objective at a time for optimization. The third and the most re-
cent approach handles both fuel cost and emission simultaneously as competing
objectives.
In [8, 23] the problem has been reduced to a single objective problem by treating
the emission as a constraint with a permissible limit. This formulation, however, has
severe difficulty in getting the trade-off relations between cost and emission.
Alternatively, minimizing the emission has been handled as another objective in
addition to the usual cost objective. A linear programming based optimization pro-
cedures in which the objectives are considered one at a time was presented in [18].
Unfortunately, this approach does not give any information regarding the trade-offs
involved. In another research direction, the multiobjective EED problem was con-
verted to a single objective problem by linear combination of the different objectives
as a weighted sum [9, 10, 16]. The important aspect of this weighted sum method
Multiobjective Evolutionary Algorithms for Electric Power Dispatch Problem 49
is that a set of non-inferior solutions can be obtained by varying the weights. Un-
fortunately, this requires multiple runs. Furthermore, this method cannot be used
to find Pareto-optimal solutions in problems having a non-convex Pareto-optimal
front. To avoid this difficulty, the ε -constraint method for multiobjective optimiza-
tion was presented in [7, 45]. This method is based on optimization of the most
preferred objective and considering the other objectives as constraints bounded by
some allowable levels ε . The obvious weaknesses of this approach are that it is
time-consuming and tends to find weakly non-dominated solutions.
The recent direction is to handle both objectives simultaneously as competing ob-
jectives. A fuzzy multiobjective optimization technique for the EED problem was
proposed [41]. However, the solutions produced are sub-optimal and the algorithm
does not provide a systematic framework for directing the search towards the Pareto-
optimal front. A fuzzy satisfaction-maximizing decision approach was successfully
applied to solve the bi-objective EED problem [27, 42]. However, extension of the
approach to include more objectives such as security and reliability is a very in-
volved question. A multiobjective stochastic search technique for the multiobjective
EED problem was proposed in [14]. However, the technique is computationally in-
volved and time-consuming. In addition, the genetic drift and search bias are severe
problems that result in premature convergence. Therefore, additional efforts should
be made to preserve the diversity of the non-dominated solutions.
In dealing with multiobjective optimization problems, classical search and opti-
mization methods are not efficient for the following reasons.
• Most of them cannot find multiple solutions in a single run, thereby requiring
them to be applied as many times as the number of desired Pareto-optimal solu-
tions.
• Multiple applications of these methods do not guarantee finding widely different
Pareto-optimal solutions.
• Most of them cannot efficiently handle problems with discrete variables and
problems having multiple optimal solutions.
• Some algorithms are sensitive to the shape of the trade-off curve and cannot be
used in problems having a non-convex Pareto-optimal front.
On the contrary, the studies on evolutionary algorithms have shown that these meth-
ods can be efficiently used to solve multiobjective optimization problems and elim-
inate most of the above difficulties of classical methods [11, 12, 13, 15, 20, 22, 26,
29, 31, 33, 34, 37, 39, 40, 44, 47, 49]. Since they use a population of solutions in
their search, multiple Pareto trade-off solutions can be found in a single run.
Recently, different multiobjective evolutionary algorithms (MOEA) have been
implemented and applied to the EED problem with impressive success [1, 2, 3, 4, 5].
In this chapter, implementations of different MOEA techniques to solve the
real-world multiobjective EED problem have been carried out to assess their poten-
tial and effectiveness. Specifically speaking, Non-dominated Sorting Genetic Algo-
rithm (NSGA) [40], Niched Pareto Genetic Algorithm (NPGA) [26], and Strength
Pareto Evolutionary Algorithm (SPEA) [49] have been developed and implemented.
It is worth mentioning that this work presents an exploratory study, aiming to
50 M.A. Abido
demonstrate the potential of MOEA for solving the problem under consideration.
The EED problem is formulated as a nonlinear constrained multiobjective optimiza-
tion problem where fuel cost and environmental impact are treated as competing
objectives. The potential of MOEA to handle this problem is investigated and dis-
cussed. A hierarchical clustering technique is implemented to provide the system
operator with a representative and manageable Pareto trade-off set. In addition, a
fuzzy-based mechanism is employed to extract the best compromise solution. Dif-
ferent cases with different complexities have been considered in this study. The
MOEA techniques have been applied to the standard IEEE 30-bus 6-generator test
system. These techniques were compared to each other and to classical multiob-
jective optimization techniques as well. The effectiveness of MOEA to solve the
EED problem is demonstrated. The quality and diversity of the non-dominated so-
lutions obtained by different MOEA techniques have been measured and assessed
quantitatively.
where N is the number of generators, ai , bi , and ci are the cost coefficients of the
ith generator, and PGi is the real power output of theith generator. PG is the vector of
real power outputs of generators and is defined as
PGmin
i
≤ PGi ≤ PGmax
i
, i = 1, ..., N. (4)
Power balance constraint: the total electric power generation must cover the total
electric power demand PD and the real power loss in transmission lines Ploss . Hence,
N
∑ PGi − PD − Ploss = 0. (5)
i=1
Calculation of Ploss implies solving the load flow problem which has equality con-
straints on real and reactive power at each bus as follows
NB
PGi − PDi − Vi ∑ V j [Gi j cos(δi − δ j ) + Bi j sin(δi − δ j )] = 0, (6)
j=1
NB
QGi − QDi − Vi ∑ V j [Gi j sin(δi − δ j ) − Bi j cos(δi − δ j )] = 0, (7)
j=1
where i =1,2,. . . ,NB; NB is the number of buses; QGi is the reactive power gener-
ated atith bus; PDi and QDi are the ith bus load real and reactive power respectively;
Gi j and Bi j are the transfer conductance and susceptance between bus i and bus j
respectively; Vi and V j are the voltage magnitudes at bus i and bus j respectively;
δ i and δ j are the voltage angles at bus i and bus j respectively. The equality con-
straints in Equations (6) and (7) are nonlinear equations that can be solved using the
Newton-Raphson method to generate a solution of the load flow problem. During
the course of solution, the real power output of one generator, called the slack gen-
erator, is left to cover the real power loss and satisfy the equality constraint in (5).
The load flow solution gives all bus voltage magnitudes and angles. Then, the real
power loss in transmission lines can be calculated as
NL 2
Ploss = ∑ gk Vi + V j2 − 2ViV j cos(δi − δ j ) , (8)
k=1
where NL is the number of transmission lines; gk is the conductance of the kth line
that connects bus i to bus j.
Security constraints: for secure operation, the apparent power flow through the
transmission line Sl is restricted by its upper limit as follows:
Slk ≤ Slmax
k
, k = 1, ..., NL. (9)
52 M.A. Abido
It is worth mentioning that the kth transmission line flow connecting bus i to bus j
can be calculated as
Slk = (Vi ∠δi ) Ii∗j , (10)
where Ii j is the current flow from bus i to bus j and it can be calculated as
y
Ii j = (Vi ∠δi ) × (Vi ∠δi − V j ∠δ j ) × yi j + (Vi ∠δi ) × j , (11)
2
where yi j is the line admittance while y is the shunt susceptance of the line.
Aggregating the objectives and constraints, the problem can be mathematically for-
mulated as a multiobjective optimization problem as follows.
Subject to:
g(PG ) = 0, (13)
h(PG ) ≤ 0, (14)
where g is the equality constraint representing the power balance while h are the in-
equality constraints representing the generation capacity and power system security.
3 Multiobjective Optimization
where fi is the ith objective function, x is a candidate solution, and Nob j is the number
of objectives.
For a multiobjective optimization problem, any two solutions x1 and x2 can have
one of two relationships - one dominates the other or none dominates the other. In a
minimization problem, without loss of generality, a solution x1 dominates x2 iff the
following two conditions are satisfied:
If any of the above conditions is violated, the solution x1 does not dominate
the solution x2 . If x1 dominates the solution x2 , x1 is called the non-dominated
solution within the set {x1 , x2 }. The solutions that are non-dominated within the
entire search space are denoted as Pareto-optimal and constitute the Pareto-optimal
set. The objective function values associated with the non-dominated solutions in
Pareto-optimal set comprise the Pareto-optimal front.
The basic idea of the Pareto-based fitness assignment is to find a set of solutions
in the population that are non-dominated by the rest of the population. These so-
lutions are then assigned the highest rank and eliminated from further contention.
Generally, all approaches of this class explicitly use Pareto dominance in order to
determine the reproduction probability of each individual. Some Pareto-based ap-
proaches are NSGA, NPGA, and SPEA.
Fitness sharing is the most frequently used niching technique. The basic idea be-
hind this technique is: the more individuals are located in the neighborhood of a
certain individual, the more its fitness value is degraded. The neighborhood is de-
fined in terms of a distance measure di j and specified by the niche radius σshare .
Restricted mating is the most frequently used non-niching technique. In this tech-
nique, two individuals are allowed to mate only if they are within a certain distance.
This mechanism may avoid the formation of lethal individuals and therefore im-
prove the online performance. However, it does not appear to be widely used in the
field of multiobjective evolutionary algorithms [22].
Srinivas and Deb [40] developed NSGA in which a ranking selection method is
used to emphasize current non-dominated solutions and a niching method is used
to maintain diversity in the population. Before the selection is performed, the
Multiobjective Evolutionary Algorithms for Electric Power Dispatch Problem 55
Fitness assignment: the basic idea of this approach is to find a set of solutions in
the population that are non-dominated by the rest of the population. Consider a set
of N population members, each having Nob j objective function values, the following
procedure is used to find the nondominated set of solutions:-
These solutions represent the first front and are eliminated from further contention.
This process continues until the population is properly ranked.
Fitness sharing: the basic idea behind sharing is: the more individuals are located
in the neighborhood of a certain individual, the more its fitness value is degraded.
The neighborhood is defined in terms of a distance measure d and specified by the
niche radius σ share . Given a set of nk solutions in the kth front each having a dummy
fitness value fk , the sharing procedure is performed in the following way [26] for
each solution i = 1,. . . ,nk :-
56 M.A. Abido
where P is the number of variables in the problem. xuk and xlk are the
upper and lower bounds of variable xk .
Step 2: This distance di j is compared with a prespecified parameter σ share and
the following sharing function value is computed:
2
di j
Sh(di j ) = 1 − σshare
, if di j ≤ σshare (20)
0, otherwise
Step 4: Degrade the dummy fitness fk of ith solution in the kth nondomination
front to calculate the shared fitness, fi∗ , as follows:
fk
fi∗ = (22)
mi
This procedure is continued for all i = 1,. . . ,nk and a corresponding fi∗ is found.
Thereafter, the smallest value fkmin of all fi∗ in the kth nondominated front is found
for further processing. The dummy fitness of the next non-dominated front is as-
signed to be fk+1 = fkmin − εk , where ε k is a small positive number.
Step 4: Set j = j+1. If j ≤ N, go to Step 2, else calculate niche count for the
candidate i as follows:
N
mi = ∑ Sh(di j ) (25)
j=1
Step 4 (Selection): Combine the population and the external set individuals.
Select two individuals at random and compare their fitness. Select the
better one and copy it to the mating pool. Repeat the selection process
N times to fill the mating pool
Step 5 (Crossover and Mutation): Perform the crossover and mutation oper-
ations according to their probabilities to generate the new population.
Step 7 (Termination): Check for stopping criteria. If any one is satisfied then
stop else copy new population to old population and go to Step 2. In
this study, the search will be stopped if the generation counter exceeds
its maximum number.
It is worth mentioning that new and revised versions of MOEA have been presented
such as NSGA-II [15, 29], SPEA2 [47], and multiobjective particle swarm optimiza-
tion MOPSO [13]. Recently, different studies in analysis, test cases, and applications
of MOEA have also been discussed [20, 31, 39].
5 MOEA Implementation
Step 3: Calculate the distance of all possible pairs of clusters. The distance dc
of two clusters c1 and c2 ∈ C is given as the average distance between
pairs of individuals across the two clusters
1
n1 .n2 i1 ∈c∑
dc = d(i1 , i2 ) (26)
1 ,i2 ∈c2
where Fimax and Fimin are the maximum and minimum values of the ith objective
function respectively. For each non-dominated solution k, the normalized member-
ship function μ k is calculated as
Nob j
∑ μik
μ =
k i=1
, (28)
M Nob j
∑ ∑ μij
j=1 i=1
Multiobjective Evolutionary Algorithms for Electric Power Dispatch Problem 61
Crossover: A blend crossover operator (BLX-α ) has been employed in this study.
This operator starts by choosing randomly a number from the interval [xi − α (yi −
xi ), yi + α (yi − xi )], where xi andyi are the ith parameter values of the parent solutions
andxi < yi . In order to ensure the balance between exploitation and exploration of
the search space, α = 0.5 is selected. This operator can be depicted as shown in
Figure 1.
Mutation: The non-uniform mutation has been employed in this study. In this oper-
ator, the new value xi of the parameter xi after mutation at generation t is given as
xi + Δ (t, bi − xi ), if τ = 0,
xi = (29)
xi − Δ (t, xi − ai ), if τ = 1,
and;
t β
Δ (t, y) = y(1 − r(1− gmax ) ), (30)
this study, β = 5 was selected. This operator gives a value xi ∈ [ai ,bi ] such that the
probability of returning a value close to xi increases as the algorithm advances. This
encourages uniform search in the initial stages when t is small, and local search in
the later stages.
(a) The constraint-handling approach adopted in this work is to restrict the search
within the feasible region. Therefore, a procedure is imposed to check the feasi-
bility of the initial population individuals and the children generated through GA
operations. This ensures the feasibility of the non-dominated solutions.
(b) A procedure for updating the non-dominated archive set is developed. In every
generation, the non-dominated solutions in the first front are combined with the
existing archive set. The augmented set is processed to extract the non-dominated
solutions that represent the updated non-dominated archive.
(c) A fuzzy-based mechanism is employed to extract the best compromise solution
over the trade-off curve and assist the power system operator to adjust the gener-
ation levels efficiently.
The solution procedure starts with generating the initial population at random. A
feasibility check procedure has been developed and superimposed on the MOEA to
restrict the search to the feasible region. The objective functions are evaluated for
each individual. The GA operations are applied and a new population is generated.
This process is repeated until the maximum number of generations is reached. All
techniques used in this study were implemented along with the above modifications
using the FORTRAN language. The computational flow charts of the developed
NSGA, NPGA, and SPEA are shown in Figures 2, 3, and 4 respectively.
G1 G2 G3 G4 G5 G6
a 10 10 20 10 20 10
Cost b 200 150 180 100 180 150
c 100 120 40 60 40 100
α 4.091 2.543 4.258 5.326 4.258 6.131
β -5.554 -6.047 -5.094 -3.550 -5.094 -5.555
Emission γ 6.490 5.638 4.586 3.380 4.586 5.151
ζ 2.0E-4 5.0E-4 1.0E-6 2.0E-3 1.0E-6 1.0E-5
λ 2.857 3.333 8.000 2.000 8.000 6.667
transmission lines. The system has also 6 generation plants to supply 23 electrical
loads. The single-line diagram of this system is shown in Figure 5. The line data and
bus data are given in the Appendix. The values of fuel cost and emission coefficients
are given in Table 1.
To demonstrate the effectiveness of the MOEA, three different cases have been
considered as follows:
Case 1: For the purpose of comparison with the reported results, the system is
considered as lossless and the security constraint is released. Therefore, the prob-
lem constraints are the power balance constraint without Ploss and the generation
capacity constraint.
Case 2: Ploss is considered in the power balance constraint and the generation
capacity constraint is also considered.
Case 3: All constraints are considered.
For fair comparison among the developed techniques, 10 different optimization runs
have been carried out in all cases considered. Table 2 shows the problem complexity
with all cases in terms of the number of equality and inequality constraints.
At first, the fuel cost objective and emission objective are optimized individu-
ally to explore the extreme points of the trade-off surface in all cases. In this case,
the standard GA has been implemented as the problem becomes a single objective
optimization problem. The best results of cost and emission when optimized indi-
vidually for all cases are given in Table 3.
Case 1: NSGA, NPGA, and SPEA have been applied to the problem and both ob-
jectives were treated simultaneously as competing objectives. For NPGA, the niche
radius was chosen based on the guidelines in [26] and the size of the comparison set
Multiobjective Evolutionary Algorithms for Electric Power Dispatch Problem 67
tdom was determined experimentally. The algorithm was tested several times with
different values for tdom starting from 5% to 50% of the population size with a step
of 5%. Only a part of the results is shown in Figure 6 for the purpose of clarity.
Experimental results have shown a favorable value of tdom at 10% for our prob-
lem instance, whereas the performance degrades for values tdom greater than 20%.
Therefore, tdom is set at 10% of the population size.
The non-dominated fronts of all techniques for the best optimization runs are
shown in Figure 7. It is clear that the non-dominated solutions have good diversity
characteristics. It is quite clear that the problem is efficiently solved by these tech-
niques. The results also show that SPEA has better diversity characteristics. The best
cost and best emission solutions obtained out of 10 runs by different techniques are
given in Table 4. It is clear that SPEA gives best cost and best emission compared
to others.
The best results of the MOEAs were compared to those reported using linear
programming (LP) [18] and a multiobjective stochastic search technique (MOSST)
[14]. The comparison is shown in Table 5. It is quite evident that the MOEAs give
better fuel cost results than the traditional methods, as a reduction more than 5 $/h
is observed with less level of emission in case of SPEA. The results also confirm
the potential of multiobjective evolutionary algorithms to solve real-world highly
nonlinear constrained multiobjective optimization problems.
Multiobjective Evolutionary Algorithms for Electric Power Dispatch Problem 69
Table 4 The best solutions out of 10 runs for cost and emission of MOEA, Case 1
Table 5 The best fuel cost and emission out of 10 runs of MOEA compared to traditional
algorithms
Table 6 The best solutions out of 10 runs for cost and emission of MOEA, Case 2
Case 2: With the problem complexity shown in Table 2, MOEA techniques have
been implemented and compared. Figure 8 shows the trade-off fronts of differ-
ent techniques for the best optimization runs. It is evident that the non-dominated
solutions obtained have good diversity characteristics. The closeness of the non-
dominated solutions of different techniques demonstrates good performance char-
acteristics of MOEAs. The best solutions obtained out of 10 runs by different tech-
niques are given in Table 6.
Table 7 The best solutions out of 10 runs for cost and emission of MOEA, Case 3
Case 3: MOEA techniques have been implemented and the trade-off fronts of
different techniques for the best optimization runs are shown in Figure 9. In this
case, the performance of NSGA is degraded with increasing the problem complex-
ity. The best cost and best emission solutions obtained out of 10 runs are given in
Table 7.
72 M.A. Abido
Best compromise solution: - The membership functions given in Equation (27) and
Equation (28) are used to evaluate each member of the non-dominated set for each
technique. Then, the best compromise solution that has the maximum value of mem-
bership function was extracted. This procedure is applied in all cases and the best
compromise solutions are given in Tables 8, 9, and 10 for NSGA, NPGA, and SPEA
respectively. The best compromise solutions are also shown in Figures 8, 9, and 10.
It is clear that there is good agreement between SPEA and NPGA.
Multiobjective Evolutionary Algorithms for Electric Power Dispatch Problem 73
7 A Comparative Study
Generally, the definition of quality in the case of multiobjective optimization is sub-
stantially more complex than for single objective optimization problems. This is
because the optimization goal itself consists of the following multiple objectives
[46, 48, 50]: -
1. The distance of the resulting non-dominated set to the Pareto-optimal front
should be minimized.
2. A good distribution of the solutions found is desirable.
3. The spread of the obtained non-dominated solutions should be maximized.
In this section, the above results for the different techniques have been compiled and
compared in view of the above objectives. In order to assess the diversity character-
istics of the proposed techniques, the best fuel cost and the best emission solutions
among the obtained non-dominated solutions for each technique given in Tables 4,
6, and 7 are compared to those of individual optimization of each objective given in
Table 3. This indicates of how far the extreme solutions are from the single objec-
tive case. The agreement and closeness of the results given in these tables are quite
evident as the best solutions of different techniques are almost identical. It can be
concluded that the developed techniques have satisfactory diversity characteristics
for the problem under consideration as the best solutions for individual optimization
are obtained along with other non-dominated solutions in a single run.
A performance measure of the spread of the non-dominated solutions is presented
in [46]. The measure estimates the range to which the fronts spread out. In other
words, it measures the normalized distance of the two outer solutions, i.e. the best
cost solution and the best emission solution. The average values of the normalized
distance measure over 10 different optimization runs are given in Table 11. The
results show that NPGA has the largest spread of the non-dominated solutions in
Case 1 while SPEA has the largest spread in Case 2. In Case 3, NSGA has the
largest spread.
On the other hand, the set coverage metric measure [50] for comparing the per-
formance of different MOEAs has been examined in this study. The average values
of this measure over 10 different optimization runs are given in Table 12. It can be
shown that the non-dominated solutions of NSGA do not cover any SPEA solutions
in Case 3 while those of NSGA are approximately covered by SPEA. In addition,
NPGA non-dominated solutions barely cover SPEA solutions with a maximum cov-
erage of 14.4% while SPEA solutions cover relatively higher percentages of NPGA
solutions.
The quality measure [6] of the non-dominated solutions obtained by different
MOEAs is applied. This quality measure starts with combining all individual non-
dominated sets of all techniques to form a pool. An index to each solution is added
to refer to the associated technique. Then, the dominance conditions are applied to
all solutions in the pool. The non-dominated solutions are extracted from the pool to
form an elite set of solutions obtained by all techniques. From their indices, the non-
dominated solutions in the elite set can be classified according to their associated
74 M.A. Abido
technique. The quality measure has been applied to the non-dominated solutions
obtained in each case. For 10 different optimization runs with 25 non-dominated
solutions obtained by each technique per run, the created pool contains 750 solu-
tions. For each case, the non-dominated solutions are extracted out of the pool and
the elite set is formed. The elite set consists of 181, 165, and 117 for Cases 1, 2,
and 3 respectively. The results of the proposed quality measure are given in Table
13. It can be observed that SPEA has the majority of the elite set members in all
cases. It can be concluded that the non-dominated solutions obtained by SPEA are
the best since approximately 71%, 78%, and 69% of the elite set size is contributed
Multiobjective Evolutionary Algorithms for Electric Power Dispatch Problem 75
by SPEA in cases 1, 2, and 3 respectively. Also, it can be seen that only one non-
dominated solution obtained by NSGA in case 3 is a member in the elite set. The
trade-off represented by the non-dominated solutions in the elite set for all cases 1,
2, and 3 are shown in Figures 10, 11, and 12 respectively.
76 M.A. Abido
The average value of the normalized distance results of the proposed measure
over 10 different optimization runs are given in Table 14. It is worth mentioning
that the distance obtained with the proposed measure is that between the outer non-
dominated solutions of each technique represented in the elite set. It can be seen that
the non-dominated solutions obtained by SPEA span over the entire Pareto front in
all cases. In general, it can be concluded that SPEA has the best distribution of the
non-dominated solutions for the problem under consideration.
With the proposed approach of extracting an elite set from the combined non-
dominated solutions of all techniques, it can be seen that the proposed measure and
Multiobjective Evolutionary Algorithms for Electric Power Dispatch Problem 77
the normalized distance measure are consistent and their results have a satisfactory
agreement with the simulation results. Also, the proposed measure reflects properly
the quality of the non-dominated solutions produced by each algorithm. In addition,
several techniques can be compared in a single run rather than on a one-to-one basis.
The comparison of the average value of the run time over 10 different optimiza-
tion runs per generation per “Pareto-optimal solution” of MOEA techniques with
case 1 is given in Table 15. It is quite evident that the run time of SPEA is less than
that of the other techniques.
The robustness of MOEA techniques with respect to different initial populations
has been examined in all cases considered. Due to space limitations, only results for
Case 1 are given in Table 16, which shows the minimum, the maximum, and the
average values of the best cost and the best emission. It is clear that all techniques
Multiobjective Evolutionary Algorithms for Electric Power Dispatch Problem 79
8 Future Work
Since this work represents an exploratory study aiming to demonstrate the potential
of MOEA for solving EED problem, the fuel cost function given in Equation (1) is
a smooth and simple quadratic one. However, more complicated formulations with
non-smooth and non-convex fuel cost functions [30, 38, 51] can be considered in
future work. Additionally, different objective functions, such as heat dispatch ,in
addition to the fuel cost and emission objective functions [32] can be considered
and incorporated in problem formulation in future studies.
On the other hand, new and revised versions of MOEA have been presented
such as NSGA-II, NPGA 2, SPEA2, and multiobjective particle swarm optimiza-
tion MOPSO. These techniques can be examined in future studies. This will en-
hance the potential of MOEA to solve more complex multiobjective power system
optimization problems.
9 Conclusions
In this chapter, three multiobjective evolutionary algorithms have been compared
and successfully applied to the environmental/economic power dispatch problem.
The problem has been formulated as a multiobjective optimization problem with
competing economic and environmental impact objectives. MOEAs have been com-
pared to each other and to those reported in the literature. In addition, a new and
efficient procedure for quality measure is proposed and compared to some measures
reported in the literature. The optimization runs indicate MOEAs outperform the
traditional techniques. Moreover, SPEA has better diversity characteristics and is
more efficient when compared to other MOEAs. The results show that evolution-
ary algorithms are effective tools for handling multiobjective optimization where
multiple trade-off solutions can be found in one simulation run.
In addition, the diversity of the non-dominated solutions is preserved. It is also
demonstrated that SPEA has the best computational time. It can be concluded that
MOEA has potential to solve different multiobjective power systems optimization
problems.
Appendix
The line and bus data of the IEEE 30-bus 6-generator system are given in Table 17
and Table 18 respectively.
References
1. Abido, M.A.: A New Multiobjective Evolutionary Algorithm for Environmen-
tal/Economic Power Dispatch. In: IEEE Power Engineering Society Summer Meeting,
Vancouver, Canada, pp. 1263–1268 (2001)
2. Abido, M.A.: A Novel Multiobjective Evolutionary Algorithm for Solving Environmen-
tal/Economic Power Dispatch Problem. In: 14th Power Systems Computation Confer-
ence PSCC 2002, Session 41, Paper 2, Seville, Spain (2002)
3. Abido, M.A.: A Niched Pareto Genetic Algorithm for Multiobjective Environmen-
tal/Economic Dispatch. International Journal of Electrical Power and Energy Sys-
tems 25, 79–105 (2003)
4. Abido, M.A.: A novel multiobjective evolutionary algorithm for environmen-
tal/economic power dispatch. Electric Power Systems Research 65, 71–81 (2003)
5. Abido, M.A.: Environmental/Economic Power Dispatch Using Multiobjective Evolu-
tionary Algorithms. IEEE Trans. on Power Systems 18, 1529–1537 (2003)
6. Abido, M.A.: Multiobjective Evolutionary Algorithms for Electric Power Dispatch Prob-
lem. IEEE Trans. on Evolutionary Computation 10, 315–329 (2006)
7. Abou El-Ela, A.A., Abido, M.A.: Optimal Operation Strategy for Reactive Power Con-
trol. Modelling, Simulation & Control, Part A 41, 19–40 (1992)
8. Brodesky, S.F., Hahn, R.W.: Assessing the Influence of Power Pools on Emission Con-
strained Economic Dispatch. IEEE Trans. on Power Systems 1, 57–62 (1986)
9. Xu, J., Chang, C., Wang, X.: Constrained Multiobjective Global Optimization of Longi-
tudinal Interconnected Power System by Genetic Algorithm. IEE Proc.-Gener. Transm.
Distrib. 143, 435–446 (1996)
10. Chang, C., Wong, K., Fan, B.: Security-Constrained Multiobjective Generation Dispatch
Using Bicriterion Global Optimization. IEE Proc.-Gener. Transm. Distrib. 142, 406–414
(1995)
11. Coello, C.A.C., Christiansen, A.D.: MOSES: A Multiobjective Optimization Tool for
Engineering Design. Engineering Optimization 31, 337–368 (1999)
12. Coello, C.A.C., Hernandez, F.S., Farrera, F.A.: Optimal Design of Reinforced Concrete
Beams Using genetic Algorithms. Int. J. of Expert systems with Applications 12, 101–
108 (1997)
13. Coello, C.A.C., Pulido, G.T., Lechuga, M.S.: Handling Multiple Objectives with Particle
Swarm Optimization. IEEE Trans. on Evolutionary Computation 8, 256–279 (2004)
14. Das, D.B., Patvardhan, C.: New Multi-objective Stochastic Search Technique for Eco-
nomic Load Dispatch. IEE Proc.-Gener. Transm. Distrib. 145, 747–752 (1998)
15. Deb, K., Pratab, A., Agarwal, S., Meyarivan, T.: A Fast and Elitist Multiobjective Genetic
Algorithm: NSGA-II. IEEE Trans. on Evolutionary Computation 6, 182–197 (2002)
16. Dhillon, J.S., Parti, S.C., Kothari, D.P.: Stochastic Economic Emission Load Dispatch.
Electric Power Systems Research 26, 179–186 (1993)
17. El-Keib, A.A., Ma, H., Hart, J.L.: Economic Dispatch in View of the Clean Air Act of
1990. IEEE Trans. On Power Systems 9, 972–978 (1994)
Multiobjective Evolutionary Algorithms for Electric Power Dispatch Problem 81
18. Farag, A., Al-Baiyat, S., Cheng, T.C.: Economic Load Dispatch Multiobjective optimiza-
tion Procedures Using Linear Programming Techniques. IEEE Trans. on Power Sys-
tems 10, 731–738 (1995)
19. Farina, M., Amato, P.: A Fuzzy Definition of Optimality for Many Criteria Optimiza-
tion Problems. IEEE Trans. On Systems, Man, and Cybernetics-Part A: Systems and
Humans 34, 315–326 (2004)
20. Farina, M., Deb, K., Amato, P.: Dynamic Multiobjective Optimization Problems: Test
Cases, Approximations, and Applications. IEEE Trans. on Evolutionary Computation 8,
425–442 (2004)
21. Fuller, R., Carlsson, C.: Fuzzy Multiple Criteria Decision Making: Recent Develop-
ments. Fuzzy Set and Systems 78, 139–153 (1996)
22. Fonseca, C.M., Fleming, P.J.: An Overview of Evolutionary Algorithms in Multiobjec-
tive Optimization. Evolutionary Computation 3, 1–16 (1995)
23. Granelli, G.P., Montagna, M., Pasini, G.L., Marannino, P.: Emission Constrained Dy-
namic Dispatch. Electric Power Systems Research 24, 56–64 (1992)
24. Helsin, J.S., Hobbs, B.F.: A Multiobjective Production Costing Model for Analyzing
Emission Dispatching and Fuel Switching. IEEE Trans. on Power Systems 4, 836–842
(1989)
25. Herrera, F., Lozano, M., Verdegay, J.L.: Tackling Real-Coded genetic Algorithms: op-
erators and Tools for Behavioral Analysis. Artificial Intelligence Review 12, 265–319
(1998)
26. Horn, J., Nafpliotis, N., Goldberg, D.E.: A Niched Pareto Genetic Algorithm for Multi-
objective Optimization. In: Proceedings of the First IEEE Conference on Evolutionary
Computation. IEEE World Congress on Computational Intelligence, vol. 1, pp. 67–72.
IEEE Service Center, Piscataway (1994)
27. Huang, C.M., Yang, H.T., Huang, C.L.: Bi-Objective Power Dispatch Using Fuzzy
Satisfaction-Maximizing Decision Approach. IEEE Trans. on Power Systems 12, 1715–
1721 (1997)
28. Kwang, Y.L., El-Sharkawi, M.A.: Modern Heuristic Optimization Techniques Theory
and Applications to Power Systems. IEEE Press/ Wiley-Interscience, USA (2008)
29. Jensen, M.T.: Reducing the Run-Time Complexity of Multiobjective EAs: The NSGA-II
and Other Algorithms. IEEE Trans. on Evolutionary Computation 7, 503–515 (2003)
30. Park, J.-B., Lee, K.-S., Shin, J.-R., Kwang, Y.L.: A Particle Swarm Optimization for
Economic Dispatch With Nonsmooth Cost Functions. IEEE Trans. on Power Systems 20,
34–42 (2005)
31. Laumanns, M., Thiele, L., Zitzler, E.: Running Time Analysis of Multiobjective Evolu-
tionary Algorithms on Pseudo-Boolean Functions. IEEE Trans. on Evolutionary Com-
putation 8, 170–182 (2004)
32. Lingfeng, W., Chanan, S.: Stochastic Combined Heat and Power Dispatch Based on
Multi-Objective Particle Swarm Optimization. In: Proceedings of 2006 IEEE Power En-
gineering Society General Meeting (2006)
33. Lis, J., Eiben, A.E.: A Multi-Sexual Genetic Algorithm for Multiobjective Optimiza-
tion. In: Proceedings of the 1996 International Conference on Evolutionary Computation
IEEE ICEC 1996, Nagoya, Japan, pp. 59–64 (1996)
34. Mahfoud, S.: Niching Methods for Genetic Algorithms, Ph. D. Thesis, Univ. of Illinois
at Urbana-Champaign (1995)
35. Morse, J.N.: Reducing the Size of Nondominated Set: Pruning by Clustering. Computers
and Operations Research 7, 55–66 (1980)
36. Sakawa, M., Yano, H., Yumine, T.: An Interactive Fuzzy Satisficing Method for Multi-
objective Linear Programming Problems and Its Application. IEEE Trans. On Systems,
Man, and Cybernetics 17, 654–661 (1987)
82 M.A. Abido
37. Schaffer, J.D.: Multiple Objective Optimization with Vector Evaluated Genetic Algo-
rithms. In: Proceedings of the International Conference on Genetic Algorithms and Their
Applications, Pittsburgh, July 24-26, 1985, pp. 93–100 (1985)
38. Selvakumar, A.I., Thanushkodi, K.: A New particle Swarm Optimization Solution to
Nonconvex Economic Dispatch Problems. IEEE Trans. on Power Systems 22, 42–51
(2007)
39. Shin, S.Y., Lee, I.H., Kim, D., Zhang, B.T.: Multiobjective Evolutionary Optimization of
DNA Sequences for Reliable DNA Computing. IEEE Trans. on Evolutionary Computa-
tion 9, 143–158 (2005)
40. Srinivas, N., Deb, K.: Multiobjective Function Optimization Using Nondominated Sort-
ing Genetic Algorithms. Evolutionary Computation 2, 221–248 (1994)
41. Srinivasan, D., Chang, C.S., Liew, A.C.: Multiobjective Generation Schedule using
Fuzzy Optimal Search Technique. IEE Proc.-Gener. Transm. Distrib. 141, 231–241
(1994)
42. Srinivasan, D., Tettamanzi, A.: An Evolutionary Algorithm for Evaluation of Emission
Compliance Options in View of the Clean Air Act Amendments. IEEE Trans. on Power
Systems 12, 152–158 (1997)
43. Talaq, J.H., El-Hawary, F., El-Hawary, M.E.: A Summary of Environmental/Economic
Dispatch Algorithms. IEEE Trans. on Power Systems 9, 1508–1516 (1994)
44. Veldhuizen, D.A.V., Lamont, G.B.: Multiobjective Evolutionary algorithms: Analyzing
the State-of-the-Art. Evolutionary Computation 8, 125–147 (2000)
45. Yokoyama, R., Bae, S.H., Morita, T., Sasaki, H.: Multiobjective Generation dispatch
Based on Probability Security Criteria. IEEE Trans. on Power Systems 3, 317–324
(1988)
46. Zitzler, E., Deb, K., Thiele, L.: Comparison of Multiobjective Evolutionary Algorithms:
Empirical Results. Evolutionary Computation 8, 173–195 (2000)
47. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the Strength Pareto Evolution-
ary Algorithm. TIK-Report, No. 103 (2001)
48. Zitzler, E., Thiele, L.: Multiobjective Optimization Using Evolutionary Algorithms – A
Comparative Case Study. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H.-P.
(eds.) PPSN 1998, vol. 1498, pp. 292–301. Springer, Heidelberg (1998)
49. Zitzler, E., Thiele, L.: An Evolutionary Algorithm for Multiobjective optimization: The
Strength Pareto Approach. TIK-Report, No. 43 (1998)
50. Zitzler, E., Thiele, L.: Multiobjective Evolutionary Algorithms: A Comparative Case
Study and the Strength Pareto Approach. IEEE Trans. on Evolutionary Computation 3,
257–271 (1999)
51. Gaing, Z.-L.: Particle Swarm Optimization to Solving the Economic Dispatch Consider-
ing the Generator Constraints. IEEE Trans. on Power Systems 18, 1187–1195 (2003)
Fuzzy Evolutionary Algorithms and Genetic
Fuzzy Systems: A Positive Collaboration
between Evolutionary Algorithms and Fuzzy
Systems
Abstract. There are two possible ways for integrating fuzzy logic and evolution-
ary algorithms. The first one involves the application of evolutionary algorithms
for solving optimization and search problems related with fuzzy systems, obtaining
genetic fuzzy systems. The second one concerns the use of fuzzy tools and fuzzy
logic-based techniques for modelling different evolutionary algorithm components
and adapting evolutionary algorithm control parameters, with the goal of improv-
ing performance. The evolutionary algorithms resulting from this integration are
called fuzzy evolutionary algorithms. In this chapter, we shortly introduce genetic
fuzzy systems and fuzzy evolutionary algorithms, giving a short state of the art, and
sketch our vision of some hot current trends and prospects. In essence, we paint a
complete picture of these two lines of research with the aim of showing the benefits
derived from the synergy between evolutionary algorithms and fuzzy logic.
1 Introduction
Computational intelligence techniques such as artificial neural networks [157],
fuzzy logic [204], and genetic algorithms (GAs) [87, 63] are popular research sub-
jects, since they can deal with complex engineering problems which are difficult to
solve by classical methods [109].
Hybrid approaches have attracted considerable attention in the computational in-
telligence community. One of the most popular approaches is the hybridization be-
tween fuzzy logic and GAs leading to genetic fuzzy systems (GFSs) [38] and fuzzy
evolutionary algorithms [79, 149, 183]. Both are well known examples of a positive
collaboration between soft computing techniques.
A GFS is basically a fuzzy rule based system (FRBS) augmented by a learning
process based on evolutionary computation, which includes GAs, genetic program-
ming, and evolution strategies, among other evolutionary algorithms (EAs) [56].
F. Herrera and M. Lozano
Department of Computer Science and Artificial Intelligence University of Granada,
18071 - Granada, Spain
e-mail: [email protected],[email protected]
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 83–130.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
84 F. Herrera and M. Lozano
IF-THEN rules. A linguistic variable, as its name suggests, is a variable whose val-
ues are words rather than numbers, e.g., small, young, very hot and quite slow.
Fuzzy IF-THEN rules are of the general form: if antecedent(s) then consequent(s),
where antecedent and consequent are fuzzy propositions that contain linguistic vari-
ables. A fuzzy IF-THEN rule is exemplified by “if the temperature is high then the
fan-speed should be high”. With the objective of modelling complex and dynamic
systems, FRBSs handle fuzzy rules by mimicking human reasoning (much of which
is approximate rather than exact), reaching a high level of robustness with respect
to variations in the system’s parameters, disturbances, etc. The set of fuzzy rules of
an FRBS can be derived from subject matter experts or extracted from data through
a rule induction process.
In this section, we present a brief overview of the foundations of FRBSs, with the
aim of illustrating the way they behave. In particular, in Section 2.1, we introduce
the important concepts of fuzzy sets and linguistic variables. In Section 2.2, we deal
with the basic elements of FRBSs. Finally, in Section 2.3, we describe a simple
instance of FRBS, a fuzzy logic controller for the inverted pendulum.
High
A(t)
(grade of
membership)
0 30 50
Semantic
Rule
1
Membership
A(t)
functions
10 20 30 40
t (temperature)
A sensor measures θ and ω (state variables) and a fuzzy logic controller may
adjust F (output or control space) via a real time feedback loop with the objec-
tive of taking the pendulum to the vertical position. While the classical equations
of motion of this system are extremely complicated and depend upon the specific
characteristics of the pendulum (mass distribution, length), Yamakawa [205] found
a set of linguistic fuzzy rules providing a stable fuzzy control of the pendulum in-
dependently of its characteristics. They are the following:
Rule 1. IF θ is PM AND ω is ZR THEN F is PM.
Rule 2. IF θ is PS AND ω is PS THEN F is PS.
Rule 3. IF θ is PS AND ω is NS THEN F is ZR.
Rule 4. IF θ is NM AND ω is ZR THEN F is NM.
Rule 5. IF θ is NS AND ω is NS THEN F is NS.
Rule 6. IF θ is NS AND ω is PS THEN F is ZR.
Rule 7. IF θ is ZR AND ω is ZR THEN F is ZR.
The linguistic term set for θ , ω , and F is {Negative Large (NL), Negative
Medium (NM), Negative Small (NS), Zero (ZR), Positive Small (PS), Positive
Medium (PM), Positive Large (PL)}, which has associated the fuzzy partition of
their corresponding domains shown in Figure 5.
Given a sensor measured state (θ , ω ), the inference obtained from the fuzzy con-
troller is the result of interpolating among the response of these linguistic fuzzy
rules. The inference’s outcome is a membership function defined on the output
space, which is then aggregated (defuzzified) to produce a crisp output.
The fuzzy logic controller described above is an example of linguistic FRBS.
However, the problem of controlling the inverted pendulum may be tackled as well
by means of a fuzzy logic controller based on the TS-type fuzzy system model. In
this case, possible TS-type rules may include:
If θ is ZR and ω is ZR then F = 0.
If θ is PS and ω is ZR then F = 0.5 × θ .
If θ is PS and ω is NS then F = 0.4 × θ + 0.6 × ω .
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 91
A(y)
NM NS ZR PS PM
Fig. 5 Membership functions of the linguistic variables (where y stands for θ , ω , and F)
DESIGN PROCESS
Knowledge Base
Data Base + Rule Base
Fuzzy Rule-
Input Interface Output Interface
Based System
neural networks and genetic fuzzy clustering algorithms. We will not analyze
them in this papers. Readers can find an extended introduction to them in [38]
(chapter 10).
In this section, we propose a taxonomy of GFSs focused on the FRBS compo-
nents and sketch our vision of some hot current trends of GFSs [73].
Genetic KB
learning
Genetic tuning
With the aim of making the FRBS perform better, some approaches try to improve
the preliminary DB definition or the inference engine parameters once the RB has
been derived. A graphical representation of this kind of tuning is shown in Figure 8.
The following three tuning possibilities can be considered (see the sub-tree under
“genetic tuning”).
1. Genetic tuning of KB parameters. In order to do so, a tuning process considering
the whole KB obtained (the preliminary DB and the derived RB) is used a pos-
teriori to adjust the membership function parameters. Nevertheless, the tuning
process only adjusts the shapes of the membership functions and not the number
of linguistic terms in each fuzzy partition, which remains fixed from the begin-
ning of the design process. In [100], we can find a first and classic proposal on
tuning. We can also find recent proposals that introduce linguistic modifiers for
tuning the membership functions, see [24]. This latter approach is close to the
inference engine adaptation.
2. Genetic adaptive inference systems. The main aim of this approach is the use
of parameterized expressions in the Inference System, sometimes called Adap-
tive Inference Systems, for getting higher cooperation among the fuzzy rules and
therefore more accurate fuzzy models without loosing the linguistic rule inter-
pretability. In [8, 42, 43], we can find proposals in this area focused in regression
and classification.
3. Genetic adaptive defuzzification methods. The most popular technique in prac-
tice, due to its good performance, efficiency and easier implementation, is to
apply the defuzzification function to every inferred rule fuzzy set (getting a
characteristic value) and to compute them by a weighted average operator. This
method introduces the possibility of using parameter based average functions,
and the use of GAs can allow us to adapt the defuzzification methods. In [105],
we can find a proposal in this area.
Genetic KB learning
As a second big area we find the learning of KB components. We will now describe
the four approaches that can be found within the genetic learning of a KB (see the
second tree under “genetic KB learning”).
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 95
learning process more difficult and slow. In [85], we can find a contribution that
uses the simultaneous genetic KB learning process.
This is the last area of GFSs taxonomy, belonging to a hybrid model between an adap-
tive inference engine and KB components learning. We can find novel approaches
that try to find high cooperation between the inference engine via parameter adapta-
tion and the learning of KB components, including both in a simultaneous learning
process. In [135], we can find a recent proposal to learn a linguistic RB and the para-
metric aggregation connectors of the inference and defuzzification in a single step.
Figure 13 presents the coding scheme of the model proposed in this paper.
Although GAs were not specifically designed for learning, but rather as global
search algorithms, they offer a set of advantages for machine learning. Many
methodologies for machine learning are based on the search for a good model inside
the space of possible models. In this sense, they are very flexible because the same
GA can be used with different representations. Genetic learning processes cover
Fig. 13 Example of the coding scheme for learning an RB and the inference connective
parameters
98 F. Herrera and M. Lozano
The collection of papers that we could find on these special issues give us
a historical tour on the different stages we can find in the evolution of GFSs
research:
• The two first special issues (1997, 1998) contain contributions devoted to learn-
ing KB components using the different learning approaches (Michigan, IRL,
Pittsburgh) together with some applications. We can find representative ap-
proaches of different areas of the taxonomy.
• In the next two special issues (2001, 2004) we can find contributions that exploit
the mentioned genetic learning approaches together with contributions that stress
new branches such as genetic rule selection, multiobjective genetic algorithms
for rule selection, the use of genetic programming for learning fuzzy systems,
hierarchical genetic fuzzy systems, coevolutionary genetic fuzzy systems, the
combination of boosting and evolutionary fuzzy systems learning, embedded ge-
netic DB learning, and first studies for dealing with high dimensional problems,
among others. We would like to point out the review paper that was published
in the last issue [36] that was the first review in the topic, briefly introducing
GFS models and applications, trends and open questions. Another short review
was presented in [72]. The present chapter can be considered as a continuation
of those, with the novelty of the taxonomy, the GFSs outlook based on the pio-
neer papers, the ISI Web of Science based visibility and the milestones along the
GFSs history and new trends and prospects.
• The next three special issues, published in 2007, emphasize three different di-
rections. Carse and Pipe’s special issue collect papers focused in the mentioned
areas (multiobjective evolutionary learning, boosting and evolutionary learning,
etc) and stress some new ones such as evolutionary adaptive inference systems.
Casillas et al.’s special issue is focused on the trade-off between interpretabil-
ity and accuracy, collecting four papers that proposed different GFSs for tackling
this problem. Cordón et al.’s special issue focuses its attention on novel GFS pro-
posals under the title “What’s Next?”, collecting highly innovative GFS propos-
als that can mark new research trends. The four collected papers are focused on:
a new Michigan approach for learning RBs based on XCS [22], GFSs for impre-
cisely observed data (low quality data) [162], incremental evolutionary learning
of TS-fuzzy systems [86], and evolutionary fuzzy rule induction for subgroup
discovery [48].
• The last special issue, co-edited by J. Casillas and B. Carse, is devoted to new
developments, paying attention to multiobjective genetic extraction of linguistic
fuzzy rule based systems from imprecise data [163], multiobjetive genetic rule
selection and tuning [60], parallel distributed genetic fuzzy rule selection [144],
context adaptation of fuzzy systems [17], compact fuzzy systems [28], neuro-
coevolutionary GFSs [153], evolutionary learning of TSK rules with variable
structure [140] and genetic fuzzy association rules extraction [29].
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 101
Multiobjective evolutionary algorithms (MOEAs) are one of the most active re-
search areas in the field of evolutionary computation, due to population-based al-
gorithms being capable of capturing a set of non-dominated solutions in a single
run of the algorithm. A large number of algorithms have been proposed in the liter-
ature [45, 34]. Among them, NSGA-II [46] and SPEA2 [209] are well known and
frequently-used MOEAs.
Obtaining high degrees of interpretability and accuracy is a contradictory aim,
and, in practice, one of the two properties prevails over the other. Nevertheless, a
new tendency in the fuzzy modelling scientific community that looks for a good
balance between interpretability and accuracy is increasing in importance. The im-
provement of the interpretability of rule based systems is a central issue in recent
research, where not only the accuracy is receiving attention but also the compacting
and the interpretability of the obtained rules [114, 138].
In multiobjective GFSs it is desirable to design genetic learning algorithms in
which the learning mechanism itself finds an appropriate balance between inter-
pretability and accuracy. We consider objectives based on accuracy and objectives
that include different complexity/interpretability measures. Figure 14 from [91] il-
lustrates this idea where each ellipsoid denotes a fuzzy system. There exists a large
number of non-dominated fuzzy systems along the accuracy-complexity trade-off
curve.
Large
Simple
& Inaccurate
Error
Complicated
& Accurate
Small
GA-based techniques for mining fuzzy association rules and novel data
mining approaches
Fayyad et al. defined knowledge discovery (KD) as the nontrivial process of identi-
fying valid, novel, potentially useful, and ultimately understandable patterns in data
[57]. KD may not be viewed as synonymous with DM, but they are intimately re-
lated. KD is a wide ranging process which covers distinct stages: the comprehension
of the problem, the comprehension of the data, pre-processing (or preparation) of
the data, DM and post-processing (assessment and interpretation of the models).
The DM stage is responsible for automatic KD at a high level, and from informa-
tion obtained from real data. Some of the important problems that DM and KD deal
with are: rule extraction, identification of associations, feature analysis, linguistic
summarization, clustering, classifier design and novelty/anomaly detection.
The interpretability of knowledge is crucial in the field of DM/KD where knowl-
edge should be extracted from data bases and represented in a comprehensible form,
or for decision support systems where the reasoning process should be transparent
to the user. In fact, the use of linguistic variables and linguistic terms in a discovered
process has been explored by different authors.
Frequent pattern mining has been a focused theme in DM research for over a
decade. Association analysis is a methodology that is useful for the discovery of
interesting relationships hidden in large data sets. The uncovered relationships can
be represented in the form of association rules or sets of frequent items. Abundant
literature can be found presenting tremendous progress in the topic [179, 71].
As claimed in [54], the use of fuzzy sets to describe associations between data
extends the types of relationships that may be represented, facilitates the interpreta-
tion of rules in linguistic terms, and avoids unnatural boundaries in the partitioning
of the attribute domains.
Linguistic variables with linguistic terms can contribute in a substantial way to
the advance in the design of association rules and the analysis of data to establish
relationships and identify patterns, in general [90]. On the other hand, GAs in par-
ticular, and EAs in general, are widely used for evolving rule extraction and patterns
association in DM/KD [59]. The conjunction in the GFS field provides novel and
useful tools for pattern analysis and for extracting new kinds of useful information
with a distinct advantage over other techniques: its interpretability in terms of fuzzy
IF-THEN rules. We find interesting recent contributions focused on the genetic ex-
traction of fuzzy association rules in [102, 89, 101, 184].
We would like to pay attention to a subdivision of descriptive induction algo-
rithms which has recently received attention from researchers, called subgroup dis-
covery. It is a form of supervised inductive learning of subgroup descriptions in
which, given a set of data and having a property of interest to the user, attempts to
locate subgroups which are statistically “most interesting” for the user. Subgroup
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 103
Learning genetic models based on low quality data (noise data and vague data)
There are many practical problems requiring learning models from uncertain data.
The experimental designs of GFSs learning from data observed in an imprecise way
are not being actively studied by researchers. However, according to the point of
view of fuzzy statistics, the primary use of fuzzy sets in classification and mod-
elling problems is for the treatment of vague data. Using vague data to train and test
GFSs we could analyze the performance of these classifiers on the type of problems
for which fuzzy systems are expected to be superior. Preliminary results in this area
involve the proposals of different formalizations for the definition of fuzzy classi-
fiers, based on the relationships between random sets and fuzzy sets [161] and the
study of fitness functions (with fuzzy values) defined in the context of GFSs [162].
This is a novel area that is worth being explored in the near future, which may
provide interesting results.
The DB learning comprises the specification of the universes of discourse, the num-
ber of labels for each linguistic variable, as well as the definition of the fuzzy mem-
bership functions associated with each label. In [39] the influence of fuzzy partition
granularity in the FRBS performance was studied. Showing that using an appropri-
ate number of terms for each linguistic variable, the FRBS accuracy can be signifi-
cantly improved without the need of a complex RB learning method.
On the other hand, the idea of introducing the notion of context into fuzzy sys-
tems comes from the observation that, in real life, the same basic concept can be
perceived differently in different situations. In some cases, this information is re-
lated to the physical properties or dimensions of the system or process, including
restrictions imposed due to the measurement acquisition or actuators. In the litera-
ture, context adaptation in fuzzy systems has been mainly approached as the scaling
of fuzzy sets from one universe of discourse to another by means of non-linear scal-
ing functions whose parameters are identified from data.
104 F. Herrera and M. Lozano
Different approaches have been proposed to deal with the learning of membership
functions, granularity, non-linear contexts using GAs, etc. [133, 69, 40, 41, 15, 16, 6].
Although there is a large number of contributions in the area of DB Learning,
we think that this remains a promising research area, due to the importance of using
adequate membership functions and an appropriate context. The use of GFSs has
much potential due to its flexibility for encoding DB components together with other
fuzzy system components.
The first description of a Michigan-style GFS was given in [186]. All the initial ap-
proaches in this area were based on the concept of“rule strength” in the sense that a
rule (classifier) gains “strength” during interactions with the environment (through
rewards and /or penalties). This strength can then be used for two purposes: resolv-
ing conflicts between simultaneously matched rules during learning episodes; and
as the basis of fitness for the GAs.
A completely different approach can be considered in which a rule’s fitness, from
the point of view of the GA, is based on its “accuracy”, i.e., how well a rule predicts
payoff whenever it fires. Notice that the concept of accuracy used here is differ-
ent from that traditionally used in fuzzy modelling (i.e., the capability of the fuzzy
model to faithfully represent the modelled system). This accuracy-based approach
offers a number of advantages, such as avoiding overgeneral rules, obtaining opti-
mally general rules, and learning a complete covering map. The first accuracy-based
evolutionary algorithm, called XCS, was proposed in [199] and it is currently of ma-
jor interest to the research community in this field.
Casillas et al. proposed in [22] a new approach to achieve accuracy-based
Michigan-style GFSs. The proposal, Fuzzy-XCS, is based on XCS but properly
adapted to fuzzy systems, with promising results for function approximation prob-
lems and for robot simulation online learning. In [145], an extension of the UCS
algorithm is proposed, a recent Michigan-style genetic learning algorithm for clas-
sification [14].
These approaches build a bridge between the Michigan-style genetic learning
studies and the fuzzy systems models. This is a promising research line that can
provide interesting results in the near future.
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 105
In this section, we briefly describe the issues that should be tackled in order to
build the FLC used by an FAGA. They include the choice of inputs and outputs,
the definition of the data base associated with them, and the specification of the
rule-base:
• Inputs. They should be robust measures that describe GA behaviour and the
effects of the genetic setting parameters and genetic operators. Some possible
inputs may be: diversity measures, maximum, average, and minimum fitness,
etc. The current control parameters may also be considered as inputs.
• Outputs. They indicate values of control parameters or changes in these pa-
rameters. In [182], the following outputs were reported: mutation probability,
crossover probability, population size, selective pressure, the time the controller
must spend in a target state in order to be considered successful, the degree to
which a satisfactory solution has been obtained, etc.
• Data Base. Each input and output should have an associated set of linguistic
labels. The meaning of these labels is specified through membership functions of
fuzzy sets, the fuzzy partition, contained in the Data Base. Thus, it is necessary
that every input and output have a bounded range of values in order to define
these membership functions over it.
Rule-Base
After selecting the inputs and outputs and defining the Data Base, the fuzzy rules
describing the relations between them should be defined. They facilitate the capture
and representation of a broad range of adaptive strategies for GAs.
Although, the experience and the knowledge of GA experts may be used to derive
rule-bases, many authors have found difficulties in doing this. In this sense, the
following three reflections were quoted by different authors:
“Although much literature on the subject of GA control has appeared, our initial at-
tempts at using this information to manually construct a fuzzy system for genetic con-
trol were unfruitful.” [120].
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 107
“Statistics and parameters are in part universal to any evolutionary algorithm and
in part specific to a particular application. Therefore it is hard to state general fuzzy
rules to control the evolutionary process.” [182].
“The behaviour of GAs and the interrelations between the genetic operators are very
complex. Although there are many possible inputs and outputs for the FLCs, frequently
fuzzy rule-bases are not easily available: finding good fuzzy rule bases is not an easy
task.” [74].
Automatic learning mechanisms to obtain rule-bases have been introduced to
avoid this problem. By using these mechanisms, relevant relations and membership
functions may be automatically determined and may offer insight to understand the
complex interaction between GA control parameters and GA performance [120].
Two types of rule-base learning techniques have been presented: the offline learning
technique [120, 121] and the online learning technique [77]:
• The offline learning mechanism is an evolutionary algorithm that is executed
once, before the operation of the FAGA, however it has associated with it a
high computational cost. It works by considering a fixed set of test functions,
following the same idea as the meta-GA of Grefenstette [68]. Unfortunately, the
test functions may have nothing to do with the particular problem to be solved,
which may limit the robustness of the rule-bases returned.
• In the online learning technique, the rule-bases used by the FLCs come from an
evolutionary process that interacts concurrently with the GA to be adapted. The
learning technique underlying this approach only takes into account the prob-
lem to be solved (in contrast to the previous one, which never considers it). In
this way, the rule-bases obtained will specify adaptation strategies particularly
appropriate for this problem.
Table 1 outlines the main features of several FAGA instances presented in the liter-
ature. It includes the inputs and outputs of the FLCs, the adaptation level, and the
method considered to derive the rule-base. A visual inspection of Table 1 allows one
to conclude that:
1. The study of FAGAs has been an active line of research in the evolutionary com-
putation community that has produced a significant amount of work during the
last fifteen years.
2. Most FAGAs presented in the literature involve population-level adaptation.
However, adaptive mechanisms at the individual level based on FLCs may be in-
teresting to adjust control parameters associated with genetic operators [210, 77].
In this case, the control parameters will be defined on individuals instead of on
the whole population. Inputs to the FLCs may be central measures and/or mea-
sures associated with particular chromosomes or sets of them, and outputs may
be control parameters associated with genetic operators that are applied to those
chromosomes. A justification for this approach is that it allows for the applica-
tion of different search strategies in different parts of the search space. This is
based on the reasonable assumption that, in general, the search space will not be
homogeneous, and that different strategies will be better suited to different kinds
of sublandscapes.
3. Most instances use rule-bases derived from GA experts. The use of an online
learning mechanism has been less explored, though nowadays it is becoming
one of the most prospective alternatives (see Section 4.4.1). An example of is
approach was proposed in [77], which was called coevolution with fuzzy be-
haviours. Its main ideas are:
Since the learning technique underlying this approach only takes into account
the problem to be solved (in contrast to the approaches based on offline learning
mechanisms), the rule-bases obtained shall specify adaptation strategies particu-
larly appropriate for this problem.
Table 1 Instances of FAGAs in the literature
Xu and Vukovich (1993, 1994) [202, 203] Generation and population size pc and pm Population-level GA expert knowledge
Lee and Takagi (1993, 1994) [120, 121] Two phenotypical diversity measures and change in the best fitness since Changes to pc and pm , and population Population-level Offline learning
the last control action size
Bergmann, Burgard, and Hemker (1994) [13] Entropy evolution Inversion rate, pc , and pm Population-level GA expert knowledge
Herrera and Lozano (1996) [74] Genotypical diversity measure and phenotypical diversity measure Frequency of application of two Population-level GA expert knowledge
crossover operators and selection
pressure
Wang et al (1996) [198] Change in average fitness of the population at two consecutive generations Changes to pc and pm Population-level GA expert knowledge
Zeng and Rabenasolo (1997) [210] Variance of fitness values, distance between the fitness of the best parent pc for every pair of parents Individual-level GA expert knowledge
and the best fitness, distance between parents, and normalized fitness val-
ues of the parents
Song et al. (1996, 1997) [172, 173] Change in average fitness of the population at two consecutive generations Changes to pc and pm Population-level GA expert knowledge
Clintock, Lunney, and Hashim (1997) [31, 32] Population statistics and diversity statistics pc , pm , and parameter that determines Population-level GA expert knowledge
the application of different crossover op-
erators
Subbu, Sanderson, and Bonissone (1998) [175] Genotypic and phenotypic diversity measures of the population Population size, pc , and pm Population-level Offline learning
Shi, Eberhart, and Chen (1999) [168] Best fitness, number of generations for unchanged best fitness, and vari- pc and pm Population-level GA expert knowledge
ance of fitness
Herrera and Lozano (2000) [76] Current pm and convergence measure pm Population-level GA expert knowledge
Matousek, Osmera, and Roupec (2000) [136] Variability of population, coefficient of partial convergente, and H- pm and selection pressure Population-level GA expert knowledge
characteristics
Wang (2001) [196] Genetic drift degree, phenotypical diversity measure and number of gen- pc and pm Population-level GA expert knowledge
erations without improving the best individual
Herrera and Lozano (2001) [77] Ranks associated with the parents with regards to their fitness values in the Control parameter associated with fuzzy Individual-level Online learning
population recombination
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems
Zhu, Zhang, and Jing (2003) [211] Population size, generation number, and two phenotypic measure for both pc , pm , and selection pressure Population-level Online learning
diversity and convergence
Yun and Gen (2003) [207] Changes of average fitness in population of two continuous generations Changes to pc and pm Population-level GA expert knowledge
Subbu and Bonissone (2003) [176] Genotypic diversity and percentage completed trials Changes to the population size and pm Population-level GA expert knowledge
Ah King, Radha, and Rughooputh [1] Change in average fitness of the population at two consecutive generations Changes to pc and pm Population-level GA expert knowledge
King, Radha, and Rughooputh (2004) [106] Changes in average fitness at two consecutive steps Changes to pc and pm Population-level GA expert knowledge
Last and Eyal (2005, 2006). [115, 116] Age and lifetime of the chromosomes to be crossed over (parents) and the pc Individual-level GA expert knowledge
population average lifetime
Liu, Xu, and Abraham (2005) [126] Changes of the best fitness and average fitness in the GA population of Changes to pc and pm Population-level GA expert knowledge
two continuous generations
Li et al (2006) [122] Average fitness value of the individuals and standard deviation between pc and pm Population-level GA expert knowledge
two consecutive generations
Hamzeh, Rahmani, and Parsa (2006) [70] Measures associated with an XCS learning classifier system Exploration probability rate Population-level GA expert knowledge
Lau, Chan, and Tsui (2007) [117] Average fitness values in the population and measure of population diver- Changes to pc and pm Population-level GA expert knowledge
sity
Sahoo et al (2006, 2007) [159, 160] Standard deviation of fitness distribution of population and incremental pm Population-level GA expert knowledge
change in average fitness of the population from generation to generation
109
110 F. Herrera and M. Lozano
Fuzzy connectives and triangular probability distributions have been considered for
designing powerful real-parameter crossover operators that establish adequate pop-
ulation diversity levels and thus help to avoid premature convergence:
• FCB-crossovers [82]. These are crossover operators for real-coded GAs based
on the use of fuzzy connectives: t-norms, t-conorms and average functions. They
were designed to offer different exploration and exploitation degrees.
• Heuristic FCB-crossovers [75]. These produce a child each whose components
are closer to the corresponding component of its fitter parent.
• Dynamic FCB-crossovers [81]. These are crossover operators based on the use
of parameterized fuzzy connectives. These operators keep a suitable sequence
between the exploration and the exploitation along the GA run: “to protect the
exploration in the initial stages and the exploitation later”.
• Dynamic Heuristic FCB-crossovers [81]. These operators put together the heuris-
tic properties and the features of the Dynamic FCB-crossover operators. They
showed very good results as compared with other crossover operators proposed
for RCGAs, even better than the FCB-crossover operators and the dynamic ones.
• Soft Genetic Operators. In [192, 193, 195], crossover and mutation operators
were presented, which are based on the use of triangular probability distributions.
These operators, called soft modal crossover and mutation, are a generalization
of the discrete crossover operator and the BGA mutation, respectively, proposed
for the Breeder GA [141]. The term soft is gleaned from fuzzy set theory only to
help grasp the main idea, since probability distributions are considered instead of
membership functions.
Fuzzy Representations
Classical EAs, such as GAs and evolution strategies, do not take into account the
development of an individual or organism from the gene level to the mature pheno-
type. There are no one-gene, one-trait relationships in natural evolved systems. The
phenotype varies as a complex, non-linear function of the interaction between un-
derlying genetic structures and current environmental conditions. Nature follows the
universal effects of pleiotropy and polygeny. Pleiotropy is the fact that a single gene
may simultaneously affect several phenotype traits. Polygeny is the effect when a
single phenotypic characteristic may be determined by the simultaneous interaction
of many genes [58]. An attempt to deal with more complex genotype/phenotype re-
lations in EAs was presented in [191, 194]. A fuzzy representation is proposed for
the case of tackling optimization problems of parameters with variables on contin-
uous domains. Each problem parameter has associated a number (m) fuzzy decision
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 111
variables belonging to the interval [0, 1]. The chromosomes are formed by linking
together the values of the decision variables for each parameter. For each parameter,
the decoding process is carried out using a function g : [0, 1]m → [0, 1], and a linear
transformation from the interval [0,1] to the corresponding parameter domain. As
an example of such a function the authors presented the following:
m
1
2m−1 − 1 ∑
∀d = (d1 , ..., dm ) ∈ [0, 1]m , g(d) = d j 2 j−1 .
j=1
When m > 1, this coding type breaks the one-to-one correspondence between
genotype and phenotype (assumed by classical EAs), since two different genotypes
may induce the same phenotype. So, it is impossible to find inferences from pheno-
type to genotype, i.e., the mapping from genotype to phenotype is not isomorphic.
Different experiments carried out in [194] with m = 1 and m = 2 showed that the
use of a fuzzy representation allows robust behavior to be obtained. In some cases, a
better performance than the Breeder GA was achieved. Furthermore, another impor-
tant conclusion was stated: for a small population size the performance for m = 2 is
slightly better than for m = 1, whereas the opposite is true for large population sizes.
Sharma and Irwin [167], addressed the use of appropriate fuzzy sets to represent a
parameter depending upon its contribution within a problem domain. They proposed
a chromosome encoding method, named fuzzy coding, for representing real number
parameters in a GA. Fuzzy coding is an indirect method for representing a chro-
mosome, where each parameter is represented by two sections. In the first section,
the fuzzy sets associated with each parameter are encoded in binary bits with a “1”
representing the corresponding set selected. In the second section, each parameter
contains degrees of membership corresponding to each fuzzy set. These are encoded
as real numbers and represent the degrees of firing. The actual parameter value of
interest is obtained through the information contained in the chromosome by means
of a defuzzification method. This coding method represents the knowledge asso-
ciated with each parameter and is an indirect method of encoding compared with
the alternatives in which the parameters are directly represented in the encoding.
Two test examples, along with neural identification of a nonlinear pH (measure of
acidity or alkalinity of water) process from experimental data, were studied. It was
shown that fuzzy coding is better than the conventional methods (binary, gray, and
floating-point coding) and is effective for parameter optimization in problems where
the search space is complicated. In addition, the authors claim that this new tech-
nique also has the flexibility to embed prior knowledge from the problem domain
which is not possible in the regular coding methods. We should point out that an ad-
ditional investigation was carried out by Pedycz [149] into the exploitation of fuzzy
sets as a basis for encoding an original search space.
Finally, in [174], an algorithm for adaptively controlling GA parameter coding
using fuzzy rules is presented, which was called fuzzy GAP. This uses an inter-
mediate mapping between the genetic strings and the search space parameters. In
particular, each search parameter is specified by the following equation:
112 F. Herrera and M. Lozano
pg
ps = ( ) · R + O,
2l − 1
where ps is the search parameter, pg is the genetic parameter, l is the number of bits
in the genetic parameter, R is a specified parameter range, and O is a specified offset.
By controlling the offset and range, more accurate solutions are obtained using the
same number of binary bits.
Fuzzy GAP performs a standard genetic search until the population of strings has
converged. Convergence was measured by evaluating the average number of bits
which differ between all the genetic strings. Each string is compared to every other
string and the number of different bits is counted. If the average number of differing
bits per string pair is less than a threshold, the GA has converged. After the genetic
strings have converged, a new range and offset for the search parameters are deter-
mined by means of an FLC with an input that measures the distance between the
centre of the current range and the best solution found in the search. After applying
the FLC, the GA is executed again with the new values for the range and offset.
The performance of fuzzy GAP on a hydraulic brake emulator parameter identifica-
tion problem was investigated. It was shown to be more reliable than other dynamic
coding algorithms (such as the dynamic parameter encoding algorithm), providing
more accurate solutions in fewer generations.
Due to the possibility of premature convergence, GAs do not guarantee that the op-
timal solution shall be found. Therefore, if the optimal solution is not known, GA
performance is difficult to measure accurately. In [137], a fuzzy stopping criterion
mechanism (FSCM) is developed to provide a useful evaluation of the GA’s real
time performance. FSCM is based on achieving a user-defined level of performance
for the given problem. In order to do so, it includes a predicting process based on
statistics for estimating the value of the GA optimal solution, then it compares the
current solution to this optimal one by checking if an acceptable percentage (spec-
ified by the user) of the latter is reached. If so, the GA stops and returns belief and
uncertainty measures that provide reliability measure for the GA chosen solution.
The acceptable percentage optimal solution defined by the user represents a fuzzy
stopping criterion for halting GA if an appropriate solution is reached. The predict-
ing process is invoked every 40 iterations and uses performance values such as the
minimum solution value, average solution value and belief and plausibility values,
all obtained during these iterations. The underlying idea for the FSCM is that the
user does not need to find the global solution, but rather an approximate solution that
is close to the optimal one, i.e., the GA is used for solving a fuzzy goal instead of a
crisp one because of the vagueness of the term approximate. This term is quantita-
tively measured by the user through the acceptable percentage of the optimal solu-
tion that he requires in the final solution. Results obtained on a 25-city TSP problem
indicate this approach is preferable to a simple GA, in term of cost/performance and
in decreasing the amount of time the GA searches for acceptable solutions.
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 113
The availability, over the last few years, of fast and cheap parallel hardware has
favoured research into possible ways for implementing parallel versions of EAs
[20]. EAs are good candidates for effective parallelization, since they are inspired
on the principles of parallel evolution, for a population of individuals. Among the
many types of parallel EAs, distributed and cellular algorithms are two popular op-
timization tools. The basic idea of the distributed EAs lies in the partition of the
population into several subpopulations, each one of them being processed by an
EA, independently from the others. Furthermore, a migration process produces a
114 F. Herrera and M. Lozano
Cultural algorithms (CAs) [154] are dual inheritance systems that consist of a
social population and a belief space. The problem solving experience of individ-
uals selected from the population space by an acceptance function is used to gen-
erate problem solving knowledge that resides in the belief space. This knowledge
can be viewed as a set of beacons that can control the evolution of the popula-
tion component by means of an influence function. The influence function can use
the knowledge in the belief space to modify any aspect of the population compo-
nent. Various evolutionary models have been used for the population component of
CAs, including GAs, genetic programming, evolution strategies, and evolutionary
programming. In [155], a fuzzy approach to CAs is presented in which an FLC reg-
ulates the amount of information to be transferred to the belief space used by the CA
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 115
over time. In particular, the FLC determines the number of individuals which shall
impact the current beliefs. Its inputs are the individual success ratio (ratio of the
number of successes to the total number of mutations) and the current generation.
A comparison was made between the fuzzy version of a CA (that used evolution-
ary programming as the population component) and its non fuzzy version on 34
optimization functions. The conclusions were: 1) the fuzzy interface between the
population and belief space outperformed the non fuzzy version in general, and 2)
the use of a fuzzy acceptance function significantly improved the success ratio and
reduced CPU time.
of the particles oscillate in different sinusoidal waves and converge quickly, some-
times prematurely. Liu and Abraham [124] proposed an adaptive mechanism based
on FLCs to control the velocity of particles in order to avoid premature convergence
in PSO. Empirical results demonstrated that the performance of standard PSO de-
grades remarkably with the increase in the dimension of the problem, while the
influence is very little in the fuzzy PSO approach. Another instance of a PSO model
tuned by FLCs may be found in [99]. Finally, we should point out that a fuzzy ver-
sion of PSO specifically designed to tackle the quadratic assignment problem was
presented in [125].
The differential evolution algorithm (DE) is one of the most recent EAs for solving
real-parameter optimization problems [151]. Like other EAs, DE is a population-
based, stochastic global optimizer capable of working reliably in nonlinear and
multimodal environments. DE has few control parameters. However, choosing the
best parameter setting for a particular problem is not easy [129]. Liu and Lampinen
[127, 128, 129] present the fuzzy adaptive differential evolution algorithm, which
uses FLCs controllers to adapt the search parameters for the DE mutation opera-
tion and crossover operation. These two parameters were adapted individually for
each generation. Parameter vector change and function value change over the whole
population members between the last two generations were nonlinearly depressed
and then used as the inputs for both FLCs. Experimental results, provided by the
proposed algorithm for a set of standard test functions, outperformed those of the
standard differential evolution algorithm for optimization problems with higher di-
mensionality.
Despite the recent activity and the associated progress in fuzzy EAs research, there
remain many directions in which the work may be improved or extended. Next, we
report on some of these.
Future research may take into account the following issues in order to produce ef-
fective FAGAs.
Research on determining relevant input variables for the FLCs controlling GA be-
haviour should be studied in greater depth. These variables should describe either
states of the population or features of the chromosomes, so that control parameters
may be adapted on the basis thereof to introduce real performance improvements.
In this vein, Boulif and Karim [18] claimed recently that previous researches on
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 117
Fuzzy EAs may be defined to tackle particular problems such as multimodal opti-
misation problems. In addition, fuzzy logic may help modern hybrid metaheuristics
to improve their behaviour, obtaining fuzzy hybrid metaheuristics.
Given a problem with multiple solutions, a simple EA will tend to converge to a sin-
gle solution. As a result, various mechanisms have been proposed to stably maintain
a diverse population throughout the search, thereby allowing EAs to identify mul-
tiple optima reliably. Many of these methods work by encouraging artificial niche
formation through sharing and crowding [169], but these methods introduce one or
more parameters that affect algorithm performance, parameters such as the shar-
ing radius in fitness sharing or the crowding factor in crowding. In many problems,
the uniform specification of niche size is inadequate to capture solutions of varying
location and extent without also increasing the population size beyond reasonable
bounds. Therefore, there remains a need to develop niching methods that stably and
economically find the best niches, regardless of their spacing and extent. FLCs may
be useful for the adaptation of parameters associated with sharing and crowding
methods. Possible inputs may be: diversity measures, the number of niches that are
currently in the population, etc.
Over the last years, a large number of search algorithms were reported that do not
purely follow the concepts of one single classical metaheuristic, but they attempt
to obtain the best from a set of metaheuristics (and even other kinds of optimiza-
tion methods) that perform together and complement each other to produce a prof-
itable synergy from their combination. These approaches are commonly referred to
as hybrid metaheuristics [178]. Memetic algorithms (MAs) [112] are well-known
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 119
5 Concluding Remarks
In this chapter, we painted a complete picture of GFSs and fuzzy EAs. In particular,
we overviewed important design principles for these algorithms, cited existing liter-
ature whenever relevant, provided a taxonomy for each one of them, and discussed
future directions and some challenges for these two lines of research. Mainly, this
work reveals that GFSs and fuzzy EAs have consolidated backgrounds of knowl-
edge, and therefore, they are two outstanding examples of positive collaboration
between soft computing technologies. In addition, it shows that there still remain
many exciting research issues connected with these two topics.
References
1. Ah King, R.T.F., Radha, B., Rughooputh, H.C.S.: A fuzzy logic controlled genetic
algorithm for optimal electrical distribution network reconfiguration. In: Proc of the
2004 IEEE International Conference on Networking, Sensing and Control, pp. 577–
582 (2004)
2. Alba, E., Tomassini, M.: Parallelism and evolutionary algorithms. IEEE Transactions
on Evolutionary Computation 6, 443–462 (2002)
3. Alba, E., Dorronsoro, B.: The exploration/exploitation tradeoff in dynamic cellular
genetic algorithms. IEEE Transactions on Evolutionary Computation 9(2), 126–142
(2005)
4. Alcalá, R., Casillas, J., Cordón, O., Herrera, F.: Building fuzzy graphs: features and
taxonomy of learning non-grid-oriented fuzzy rule-based systems. International Journal
of Intelligent Fuzzy Systems 11, 99–119 (2001)
120 F. Herrera and M. Lozano
5. Alcalá, R., Alcalá-Fdez, J., Herrera, F.: A proposal for the genetic lateral tuning of
linguistic fuzzy systems and its interaction with rule selection. IEEE Transactions on
Fuzzy Systems 15(4), 616–635 (2007)
6. Alcalá, R., Alcalá-Fdez, R., Herrera, F., Otero, J.: Genetic learning of accurate and
compact fuzzy rule based systems based on the 2-Tuples linguistic representation. In-
ternational Journal of Approximate Reasoning 44, 45–64 (2007)
7. Alcalá, R., Gacto, M.J., Herrera, F., Alcalá-Fdez, J.: A multi-objective genetic algo-
rithm for tuning and rule selection to obtain accurate and compact linguistic fuzzy rule-
based systems. International Journal of Uncertainty. Fuzziness and Knowledge-Based
Systems 15(5), 521–537 (2007)
8. Alcalá-Fdez, J., Herrera, F., Marquez, F., Peregrin, A.: Increasing fuzzy rules coopera-
tion based on evolutionary adaptive inference systems. International Journal of Intelli-
gent Systems 22(9), 1035–1064 (2007)
9. Alcalá-Fdez, J., Sánchez, L., Garcı́a, S., del Jesús, M.J., Ventura, S., Garrell, J.M.,
Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: KEEL:
A software tool to assess evolutionary algorithms for data mining problems. In: Soft
Computing (in press)
10. Arnone, S., Dell’Orto, M., Tettamanzi, A.: Toward a fuzzy government of genetic pop-
ulations. In: Proc. of the 6th IEEE Conference on Tools with Artificial Intelligence, pp.
585–591. IEEE Computer Society Press, Los Alamitos (1994)
11. Au, W.-H., Chan, K.C.C., Wong, A.K.C.: A fuzzy approach to partitioning continu-
ous attributes for classification. IEEE Transactions on Knowledge and Data Engineer-
ing 18(5), 715–719 (2006)
12. Berlanga, F.J., del Jesus, M.J., González, P., Herrera, F., Mesonero, M.: Multiobjective
evolutionary induction of subgroup discovery fuzzy rules: A case study in marketing.
In: Perner, P. (ed.) ICDM 2006. LNCS, vol. 4065, pp. 337–349. Springer, Heidelberg
(2006)
13. Bergmann, A., Burgard, W., Hemker, A.: Adjusting parameters of genetic algorithms
by fuzzy control rules. In: Becks, K.-H., Perret-Gallix, D. (eds.) New Computing Tech-
niques in Physics Research III, pp. 235–240. World Scientific Press, Singapore (1994)
14. Bernadó-Mansilla, E., Garrell-Guiu, J.M.: Accuracy-based learning classifier sys-
tems: models, analysis and applications to classification tasks. Evolutionary Compu-
tation 11(3), 209–238 (2003)
15. Botta, A., Lazzerini, B., Marcelloni, F.: Context adaptation of Mamdani fuzzy systems
through new operators tuned by a genetic algorithm. In: Proceedings of the 2006 IEEE
International Conference on Fuzzy Systems (FUZZ-IEEE 2006), Vancouver, Canada,
pp. 7832–7839 (2006)
16. Botta, A., Lazzerini, B., Marcelloni, F., Stefanescu, D.C.: Exploiting fuzzy ordering
relations to preserve interpretability in context adaptation of fuzzy systems. In: Pro-
ceedings of the 2007 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE
2007), London, UK, pp. 1137–1142 (2007)
17. Botta, A., Lazzerini, B., Marcelloni, F., Stefanescu, D.C.: Context Adaptation of Fuzzy
Systems Through a Multi-objective Evolutionary Approach Based on a Novel Inter-
pretability Index. Soft Computing 13(3), 437–449 (2009)
18. Boulif, M., Atif, K.: A new fuzzy genetic algorithm for the dynamic bi-objective cell
formation problem considering passive and active strategies. International Journal of
Approximate Reasoning 47, 141–165 (2008)
19. Cano, J.R., Herrera, F., Lozano, M.: Evolutionary stratified training set selection for
extracting classification rules with trade-off precision-interpretability. Data and Knowl-
edge Engineering 60, 90–108 (2007)
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 121
20. Cantú-Paz, E.: Efficient and accurate parallel genetic algorithms. Book Series on Ge-
netic Algorithms and Evolutionary Computation. Kluwer, Norwell (2000)
21. Carse, B., Fogarty, T.C., Munro, A.: Evolving fuzzy rule based controllers using genetic
algorithms. Fuzzy Sets and Systems 80(3), 273–293 (1996)
22. Casillas, J., Carse, B., Bull, L.: Fuzzy-XCS: A Michigan genetic fuzzy system. IEEE
Transactions on Fuzzy Systems 15(4), 536–550 (2007)
23. Casillas, J., Cordón, O., Herrera, F., del Jesus, M.J.: Genetic feature selection in a fuzzy
rule-based classification system learning process for high-dimensional problems. Infor-
mation Sciences 136(1-4), 135–157 (2001)
24. Casillas, J., Cordón, O., del Jesus, M.J., Herrera, F.: Genetic tuning of fuzzy rule deep
structures preserving interpretability for linguistic modeling. IEEE Trans. on Fuzzy
Systems 13(1), 13–29 (2005)
25. Casillas, J., Cordón, O., Herrera, F., Magdalena, L. (eds.): Accuracy improvements in
linguistic fuzzy modeling. Springer, Berlin (2003)
26. Casillas, J., Cordón, O., Herrera, F., Magdalena, L. (eds.): Interpretability issues in
fuzzy modeling. Springer, Berlin (2003)
27. Casillas, J., Martı́nez, P.: Consistent, complete and compact generation of DNF-type
fuzzy rules by a Pittsburgh-style genetic algorithm. In: Proceedings of the 2007 IEEE
International Conference on Fuzzy Systems (FUZZ-IEEE 2007), London, UK, pp.
1745–1750 (2007)
28. Casillas, J., Martı́nez, P., Benı́tez, A.D.: Learning consistent, complete and compact
fuzzy rules sets in conjunctive normal form for system identification. Soft Comput-
ing 13(3), 451–465 (2009)
29. Chen, C.-H., Hong, T.-P., Tseng, V.S., Lee, C.-S.: A genetic-fuzzy mining approach for
items with multiple minimum supports. Soft Computing 13(3), 521–533 (2009)
30. Cherkassky, V., Mulier, F.: Learning from data: concepts, theory and methods. John
Wiley & Sons, New York (1998)
31. Mc Clintock, S., Lunney, T., Hashim, A.: Using fuzzy logic to optimize genetic algo-
rithm performance. In: Proceedings of 1997 IEEE International Conference on Intelli-
gent Engineering Systems, Budapest, Hungary, pp. 271–275 (1997)
32. Mc Clintock, S., Lunney, T., Hashim, A.: A fuzzy logic controlled genetic algorithm
environment. In: Proceedings of 1997 IEEE International Conference on Systems, Man,
and Cybernetics, Orlando, Florida, USA, pp. 2181–2186 (1997)
33. Cococcioni, M., Ducange, P., Lazzerini, B., Marcelloni, F.: A Pareto-based multi-
objective evolutionary approach to the identification of Mamdani fuzzy systems. Soft
Computing 11(11), 1013–1031 (2007)
34. Coello, C.A., Van Veldhuizen, D.A., Lamont, G.B.: Evolutionary algorithms for solving
multi-objective problems. Kluwer Academic Publishers, Dordrecht (2002)
35. Cordón, O., del Jesús, M.J., Herrera, F., Lozano, M.: MOGUL: A Methodology to Ob-
tain Genetic fuzzy rule-based systems Under the iterative rule Learning approach. In-
ternational Journal of Intelligent Systems 14, 1123–1153 (1999)
36. Cordón, O., Gomide, F., Herrera, F., Hoffmann, F., Magdalena, L.: Ten years of genetic
fuzzy systems: current framework and new trends. Fuzzy Sets and Systems 141, 5–31
(2004)
37. Cordón, O., Herrera, F.: A three-stage evolutionary process for learning descriptive
and approximate fuzzy-logic-controller knowledge bases from examples. International
Journal of Approximate Reasoning 17(4), 369–407 (1997)
38. Cordón, O., Herrera, F., Hoffmann, F., Magdalena, L.: Genetic fuzzy systems. In: Evo-
lutionary tuning and learning of fuzzy knowledge bases. World Scientific, Singapore
(2001)
122 F. Herrera and M. Lozano
39. Cordón, O., Herrera, F., Villar, P.: Analysis and guidelines to obtain a good fuzzy par-
tition granularity for fuzzy rule-based systems using simulated annealing. International
Journal of Approximate Reasoning 25(3), 187–215 (2000)
40. Cordón, O., Herrera, F., Magdalena, L., Villar, P.: A genetic learning process for the
scaling factors, granularity and contexts of the fuzzy rule-based system data base. In-
formation Sciences 136, 85–107 (2001)
41. Cordón, O., Herrera, F., Villar, P.: Generating the knowledge base of a fuzzy rule-based
system by the genetic learning of data base. IEEE Transactions on Fuzzy Systems 9(4),
667–674 (2001)
42. Crockett, K.A., Bandar, Z., Fowdar, J., O’Shea, J.: Genetic tuning of fuzzy inference
within fuzzy classifier systems. Expert Systems with Applications 23, 63–82 (2006)
43. Crockett, K., Bandar, Z., Mclean, D.: On the optimization of T-norm parameters within
fuzzy decision trees. In: IEEE International Conference on Fuzzy Systems (FUZZ-
IEEE 2007), London, UK, pp. 103–108 (2007)
44. Das, D.: Optimal placement of capacitors in radial distribution system using a Fuzzy-
GA method. International Journal of Electrical Power & Energy Systems (in press)
45. Deb, K.: Multi-objective optimization using evolutionary algorithms. John Wiley &
Sons, Chichester (2001)
46. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic
algorithm: NSGA-II. IEEE Trans. on Evolutionary Computation 6(2), 182–197 (2002)
47. De Jong, K.A., Spears, W.M., Gordon, D.F.: Using genetic algorithms for concept learn-
ing. Machine Learning 13, 161–188 (1993)
48. del Jesus, M.J., González, P., Herrera, F., Mesonero, M.: Evolutionary fuzzy rule induc-
tion process for subgroup discovery: A case study in marketing. IEEE Transactions on
Fuzzy Systems 15(4), 578–592 (2007)
49. Demsar, J.: Statistical comparison of classifiers over multiple data sets. Journal of Ma-
chine Learning Research 7, 1–30 (2006)
50. Diettereich, T.: Approximate statistical tests for comparing supervised classification
learning algorithms. Neural Computation 10, 1895–1924 (1998)
51. Dorigo, M., Stützle, T.: Ant Colony Optimization. The MIT Press, Cambridge (2004)
52. Dozier, G.V., McCullough, S., Homaifar, A., Moore, L.: Multiobjective evolutionary
path planning via fuzzy tournament selection. In: IEEE International Conference on
Evolutionary Computation (ICEC 1998), pp. 684–689. IEEE Press, Piscataway (1998)
53. Driankow, D., Hellendoorn, H., Reinfrank, M.: An introduction to fuzzy control.
Springer, Berlin (1993)
54. Dubois, D., Prade, H., Sudkamp, T.: On the representation, measurement, and discovery
of fuzzy associations. IEEE Trans. on Fuzzy Systems 13, 250–262 (2005)
55. Eiben, A.E., Hinterding, R., Michalewicz, Z.: Parameter control in evolutionary algo-
rithms. IEEE Trans. Evolutionary Computation 3(2), 124–141 (1999)
56. Eiben, A.E., Smith, J.E.: Introduction to evolutionary computation. Springer, Berlin
(2003)
57. Fayyad, U., Piatesky-Shapiro, G., Smyth, P.: From data mining from knowledge discov-
ery in databases. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.
(eds.) Advances in Knowledge Discovery & Data Mining, pp. 1–34. AAAI/MIT (1996)
58. Fogel, D.B.: An introduction to simulated evolutionary optimization. IEEE Transac-
tions on Neural Networks 5(1), 3–14 (1994)
59. Freitas, A.A.: Data mining and knowledge discovery with evolutionary algorithms.
Springer, Berlin (2002)
60. Gacto, M.J., Alcalá, R., Herrera, F.: Adaptation and application of multi-objective evo-
lutionary algorithms for rule reduction and parameter tuning of fuzzy rule-based sys-
tems. Soft Computing 13(3), 419–436 (2009)
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 123
61. Geyer-Schulz, A.: Fuzzy rule-based expert systems and genetic machine learning.
Physica-Verlag, Berlin (1995)
62. Giordana, A., Neri, F.: Search-intensive concept induction. Evolutionary Computa-
tion 3, 375–416 (1995)
63. Goldberg, D.E.: Genetic algorithms in search, optimization, and machine learning.
Addison-Wesley, Reading (1989)
64. González, A., Pérez, R.: SLAVE: A genetic learning system based on an iterative ap-
proach. IEEE Transactions on Fuzzy Systems 27, 176–191 (1999)
65. González, A., Pérez, R.: Selection of relevant features in a fuzzy genetic learning algo-
rithm. IEEE Transactions on Systems, Man and Cybernetics. Part B: Cybernetics 31(3),
417–425 (2001)
66. González, A., Pérez, R.: An analysis of the scalability of an embedded feature selection
model for classification problems. In: Proc. Eleventh Int. Conf. on Information Process-
ing and Management of Uncertainty in Knowledge-based Systems (IPMU 2006), Paris,
pp. 1949–1956 (2006)
67. Greene, D.P., Smith, S.F.: Competition-based induction of decision models from exam-
ples. Machine Learning 3, 229–257 (1993)
68. Grefenstette, J.J.: Optimization of control parameters for genetic algorithms. IEEE
Trans Systems, Man, and Cybernetics 16, 122–128 (1986)
69. Gudwin, R.R., Gomide, F.A.C., Pedrycz, W.: Context adaptation in fuzzy processing
and genetic algorithms. International Journal of Intelligent Systems 13(10-11), 929–
948 (1998)
70. Hamzeh, A., Rahmani, A., Parsa, N.: Intelligent exploration method to adapt explo-
ration rate in XCS, based on adaptive fuzzy genetic algorithm. In: Proc. of the 2006
IEEE Conference on Cybernetics and Intelligent Systems, pp. 1–6 (2006)
71. Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future
directions. Data Mining & Knowledge Discovery 15(1), 55–86 (2007)
72. Herrera, F.: Genetic fuzzy systems: Status, critical considerations and future directions.
International Journal of Computational Intelligence Research 1(1), 59–67 (2005)
73. Herrera, F.: Genetic fuzzy systems: taxonomy, current research trends and prospects.
Evolutionary Intelligence 1, 27–46 (2008)
74. Herrera, F., Lozano, M.: Adaptation of genetic algorithm parameters based on fuzzy
logic controllers. In: Herrera, F., Verdegay, J.L. (eds.) Genetic Algorithms and Soft
Computing, pp. 95–125. Physica-Verlag (1996)
75. Herrera, F., Lozano, M.: Heuristic crossover for real-coded genetic algorithms based on
fuzzy connectives. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.)
PPSN 1996. LNCS, vol. 1141, pp. 336–345. Springer, Heidelberg (1996)
76. Herrera, F., Lozano, M.: Adaptive control of the mutation probability by fuzzy logic
controllers. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwe-
fel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 335–344. Springer, Heidel-
berg (2000)
77. Herrera, F., Lozano, M.: Adaptive genetic operators based on coevolution with fuzzy
behaviours. IEEE Trans. on Evolut. Comput. 5(2), 1–18 (2001)
78. Herrera, F., Lozano, M.: Fuzzy adaptive genetic algorithms: design, taxonomy, and fu-
ture directions. Soft Computing 7, 545–562 (2003)
79. Herrera, F., Lozano, M., Verdegay, J.L.: Tackling fuzzy genetic algorithms. In: Genetic
Algorithms in Engineering and Computer Science, pp. 167–189. John Wiley, New York
(1995)
80. Herrera, F., Lozano, M., Verdegay, J.L.: Tuning fuzzy-logic controllers by genetic al-
gorithms. International Journal of Approximate Reasoning 12(3-4), 299–315 (1995)
124 F. Herrera and M. Lozano
81. Herrera, F., Lozano, M., Verdegay, J.L.: Dynamic and heuristic fuzzy connectives-based
crossover operators for controlling the diversity and convengence of real-coded genetic
algorithms. Int. Journal of Intelligent Systems 11, 1013–1041 (1996)
82. Herrera, F., Lozano, M., Verdegay, J.L.: Fuzzy connectives based crossover operators
to model genetic algorithms population diversity. Fuzzy Sets and Systems 92(1), 21–30
(1997)
83. Herrera, F., Lozano, M., Verdegay, J.L.: A learning process for fuzzy control rules using
genetic algorithms. Fuzzy Sets and Systems 100, 143–151 (1998)
84. Herrera, F., Lozano, M., Sánchez, A.M.: A taxonomy for the crossover operator for real-
coded genetic algorithms: an experimental study. International Journal of Intelligent
Systems 18, 309–338 (2003)
85. Homaifar, A., Mccormick, E.: Simultaneous design of membership functions and rule
sets for fuzzy controllers using genetic algorithms. IEEE Transactions on Fuzzy Sys-
tems 3(2), 129–139 (1995)
86. Hoffmann, F., Schauten, D., Hölemann, S.: Incremental evolutionary design of TSK
fuzzy controllers. IEEE Transactions on Fuzzy Systems 15(4), 563–577 (2007)
87. Holland, J.H.: Adaptation in natural and artificial systems. University of Michigan
Press, Ann Arbor (1975)
88. Holland, J.H., Reitman, J.S.: Cognitive systems based on adaptive algorithms. In:
Waterman, D.A., Hayes-Roth, F. (eds.) Patter-Directed Inference Systems. Academic
Press, London (1978)
89. Hong, T.P., Chen, C.H., Wu, Y.L., et al.: A GA-based fuzzy mining approach to achieve
a trade-off between number of rules and suitability of membership functions. Soft Com-
puting 10(11), 1091–1101 (2006)
90. Hüllermeier, E.: Fuzzy methods in machine learning and data mining: Status and
prospects. Fuzzy Sets and Systems 156(3), 387–406 (2005)
91. Ishibuchi, H.: Multiobjective genetic fuzzy systems: review and future research direc-
tions. In: Proceedings of the 2007 IEEE International Conference on Fuzzy Systems
(FUZZ-IEEE 2007), London, UK, pp. 913–918 (2007)
92. Ishibuchi, H., Murata, T., Turksen, I.B.: Single-objective and two-objective genetic al-
gorithms for selecting linguistic rules for pattern classification problems. Fuzzy Sets
and Systems 8(2), 135–150 (1997)
93. Ishibuchi, H., Nakashima, T., Murata, T.: Performance evaluation of fuzzy classifier
systems for multidimensional pattern classification problems. IEEE Transactions on
Systems, Man and Cybernetics. Part B-Cybernetics 29(5), 601–618 (1999)
94. Ishibuchi, H., Nakashima, T., Nii, M.: Classification and modeling with linguistic in-
formation granules: Advanced approaches to linguistic data mining. Springer, Berlin
(2004)
95. Ishibuchi, H., Nozaki, K., Yamamoto, N., Tanaka, H.: Selection fuzzy IF-THEN rules
for classification problems using genetic algorithms. IEEE Transactions on Fuzzy Sys-
tems 3(3), 260–270 (1995)
96. Ishibuchi, H., Yamamoto, T.: Fuzzy rule selection by multi-objective genetic local
search algorithms and rule evaluation measures in data mining. Fuzzy Sets and Sys-
tems 141(1), 59–88 (2004)
97. Juang, C.F., Lin, J.Y., Lin, C.T.: Genetic reinforcement learning through symbiotic evo-
lution for fuzzy controller design. IEEE Transactions on Systems, Man and Cybernet-
ics. Part B-Cybernetics 30(2), 290–302 (2000)
98. Kacem, I., Hammadi, S., Borne, P.: Pareto-optimality approach based on uniform design
and fuzzy evolutionary algorithms for flexible job-shop scheduling problems (FJSPs).
In: 2002 IEEE International Conference on Systems, Man and Cybernetics, p. 7 (2002)
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 125
99. Kang, Q., Wang, L., Wu, Q.: Research on fuzzy adaptive optimization strategy of par-
ticle swarm algorithm. International Journal of Information Technology 12(3), 65–77
(2006)
100. Karr, C.: Genetic algorithms for fuzzy controllers. AI Expert 6(2), 26–33 (1991)
101. Kaya, M.: Multi-objective genetic algorithm based approaches for mining optimized
fuzzy association rules. Soft Computing 10(7), 578–586 (2006)
102. Kaya, M., Alhajj, R.: Genetic algorithm based framework for mining fuzzy association
rules. Fuzzy Sets and Systems 152(3), 587–601 (2005)
103. Kennedy, J., Eberhart, R.C.: Swarm intelligence. Morgan Kauffmann, San Francisco
(2001)
104. Kiliç, S., Kahraman, C.: Metaheuristic techniques for job shop scheduling problem
and a fuzzy ant colony optimization algorithm. Studies in Fuzziness and Soft Comput-
ing 201, 401–425 (2006)
105. Kim, D., Choi, Y., Lee, S.: An accurate COG defuzzifier design using Lamarckian co-
adaptation of learning and evolution. Fuzzy Sets Syst. 130(2), 207–225 (2002)
106. King, R.T.F.A., Radha, B., Rughooputh, H.C.S.: A fuzzy logic controlled genetic al-
gorithm for optimal electrical distribution network reconfiguration. In: Proc. of 2004
IEEE International Conference on Networking, Sensing and Control, Taipei, Taiwan,
pp. 577–582 (2004)
107. Klir, G., Yuan, B.: Fuzzy sets and fuzzy logic; theory and applications. Prentice-Hall,
Englewood Cliffs (1995)
108. Klösgen, W.: EXPLORA: a multipattern and multistrategy discovery assistant. In:
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in
Knowledge Discovery and Data Mining, pp. 249–271. MIT Press, Cambridge (1996)
109. Konar, A.: Computational intelligence: principles, techniques and applications.
Springer, Berlin (2005)
110. Kovacs, T.: Strength or accuracy: credit assignment in learning classifier systems.
Springer, Berlin (2004)
111. Koza, J.R.: Genetic programing: on the programming of computers by means of natural
selection. The MIT Press, Cambridge (1992)
112. Krasnogor, N., Smith, J.E.: A tutorial for competent memetic algorithms: model, tax-
onomy, and design issue. IEEE Trans. Evol. Comput. 9(5), 474–488 (2005)
113. Kuncheva, L.: Fuzzy classifier design. Springer, Berlin (2000)
114. Kweku-Muata, Osey-Bryson: Evaluation of decision trees: a multicriteria approach.
Computers and Operations Research 31, 1933–1945 (2004)
115. Last, M., Eyal, S.: A fuzzy-based lifetime extension of genetic algorithms. Fuzzy Sets
and Systems 149, 131–147 (2005)
116. Last, M., Eyal, S., Kandel, A.: Effective black-box testing with genetic algorithms. In:
Ur, S., Bin, E., Wolfsthal, Y. (eds.) HVC 2005. LNCS, vol. 3875, pp. 134–148. Springer,
Heidelberg (2006)
117. Lau, H.C.W., Chan, T.M., Tsui, W.T.: Fuzzy logic guided genetic algorithms for the
location assignment of items. In: 2007 IEEE Congress on Evolutionary Computation
(CEC 2007), pp. 4281–4288 (2007)
118. Lavra, N., Cestnik, B., Gamberger, D., Flach, P.: Decision support through subgroup
discovery: three case studies and the lessons learned. Machine Learning 57, 115–143
(2004)
119. Lee, M.A., Esbensen, H.: Fuzzy/multiobjective genetic systems for intelligent systems
design tools and components. In: Fuzzy Evolutionary Computation, pp. 57–80. Kluwer
Academic Publishers, Dordrecht (1997)
126 F. Herrera and M. Lozano
120. Lee, M.A., Takagi, H.: Dynamic control of genetic algorithms using fuzzy logic tech-
niques. In: Proc of the Fifth Int Conf on Genetic Algorithms, pp. 76–83. Morgan Kauf-
mann, San Francisco (1993)
121. Lee, M.A., Takagi, H.: A framework for studying the effects of dynamic crossover,
mutation, and population sizing in genetic algorithms. In: Advances in Fuzzy Logic,
Neural Networks and Genetic Algorithms. LNCS, vol. 1011, pp. 111–126. Springer,
Heidelberg (1994)
122. Li, Q., Tong, X., Xie, S., Liu, G.: An improved adaptive algorithm for controlling the
probabilities of crossover and mutation based on a fuzzy control strategy. In: Proc. of
the 6th International Conference on Hybrid Intelligent Systems and 4th Conference
on Neuro-Computing and Evolving Intelligence, p. 50. IEEE Computer Society, Los
Alamitos (2006)
123. Li, Q., Yin, Y., Wang, Z., Liu, G.: Comparative studies of fuzzy genetic algorithms. In:
Liu, D., Fei, S., Hou, Z., Zhang, H., Sun, C. (eds.) ISNN 2007. LNCS, vol. 4492, pp.
251–256. Springer, Heidelberg (2007)
124. Hongbo Liu, H., Abraham, A.: A Fuzzy adaptive turbulent particle swarm optimization.
In: Proc. Fifth International Conference on Hybrid Intelligent Systems, pp. 445–450
(2005)
125. Liu, H., Abraham, A.: An hybrid fuzzy variable neighborhood particle swarm opti-
mization algorithm for solving quadratic assignment problems. Journal of Universal
Computer Science 13(9), 1309–1331 (2007)
126. Liu, H., Xu, Z., Abraham, A.: Hybrid fuzzy-genetic algorithm approach for crew group-
ing. In: Proceedings of the 2005, 5th International Conference on Intelligent Systems
Design and Applications (ISDA 2005), pp. 332–337 (2005)
127. Liu, J., Lampinen, J.: Adaptive parameter control of differential evolution. In: Proceed-
ings of the 8th International Mendel Conference on soft computing, pp. 19–26 (2002)
128. Liu, J., Lampinen, J.: A fuzzy adaptive differential evolution algorithm. In: Proceedings
of the 17th IEEE region 10th International Conference on computer, communications,
control and power engineering, pp. 606–611 (2002)
129. Liu, J., Lampinen, J.: A fuzzy adaptive differential evolution algorithm. Soft Comput. 9,
448–462 (2005)
130. Maeda, Y.: Fuzzy adaptive search method for genetic programming. International Jour-
nal of Advanced Computational Intelligence 3(2), 131–135 (1999)
131. Maeda, Y., Ishita, M., Li, Q.: Fuzzy adaptive search method for parallel genetic algo-
rithm with island combination process. International Journal of Approximate Reason-
ing 41, 59–73 (2006)
132. Maeda, Y., Li, Q.: Fuzzy adaptive search method for parallel genetic algorithm tuned by
evolution degree based on diversity measure. In: Melin, P., Castillo, O., Aguilar, L.T.,
Kacprzyk, J., Pedrycz, W. (eds.) IFSA 2007. LNCS (LNAI), vol. 4529, pp. 677–687.
Springer, Heidelberg (2007)
133. Magdalena, L.: Adapting the gain of an FLC with genetic algorithms. International
Journal of Approximate Reasoning 17(4), 327–349 (1997)
134. Mamdani, E.H.: Applications of fuzzy algorithm for control a simple dynamic plant.
Proceedings of the IEEE 121(12), 1585–1588 (1974)
135. Márquez, F.A., Peregrı́n, A., Herrera, F.: Cooperative evolutionary learning of linguistic
fuzzy rules and parametric aggregation connectors for Mamdani fuzzy systems. IEEE
Trans. on Fuzzy Systems 15(6), 1162–1178 (2008)
136. Matousek, R., Osmera, P., Roupec, J.: GA with fuzzy inference system. In: Proceedings
of the 2000 Congress on Evolutionary Computation, pp. 646–651 (2000)
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 127
137. Meyer, L., Feng, X.: A fuzzy stop criterion for genetic algorithms using performance
estimation. In: Proc. Third IEEE Int. Conf. on Fuzzy Systems, pp. 1990–1995 (1994)
138. Mikut, R., Jäkel, J., Gröll, L.: Interpretability issues in data-based learning of fuzzy
systems. Fuzzy Sets and Systems 150, 179–197 (2005)
139. Moriarty, D.E., Miikkulainen, R.: Efficient reinforcement learning through symbiotic
evolution. Machine Learning 22, 11–32 (1996)
140. Mucientes, M., Vidal, J.C., Bugarı́n, A., Lama, M.: Processing time estimations by vari-
able structure TSK rules learned through genetic programming. Soft Computing 13(3),
497–509 (2009)
141. Mühlenbein, H., Schlierkamp-Voosen, D.: Predictive models for the breeder genetic
algorithm I. continuous parameter optimization. Evolutionary Computation 1, 25–49
(1993)
142. Mirabedini, S.J., Teshnehlab, M.: Performance evaluation of fuzzy ant based routing
method for connectionless networks. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot,
P.M.A. (eds.) ICCS 2007. LNCS, vol. 4488, pp. 960–965. Springer, Heidelberg (2007)
143. Nojima, Y., Kuwajima, I., Ishibuchi, H.: Data set subdivision for parallel distribution
implementation of genetic fuzzy rule selection. In: IEEE International Conference on
Fuzzy Systems (FUZZ-IEEE 2007), London, UK, pp. 2006–2011 (2007)
144. Nojima, Y., Ishibuchi, H., Kuwajima, I.: Parallel distributed genetic fuzzy rule selection.
Soft Computing 13(3), 511–519 (2009)
145. Orriols-Puig, A., Casillas, J., Bernadó-Mansilla, E.: Fuzzy-UCS: preliminary results.
In: 10th International Workshop on Learning Classifier Systems (IWLCS 2007), Lon-
don, UK, pp. 2871–2874 (2007)
146. Palm, R., Driankov, D.: Model based fuzzy control. Springer, Berlin (1997)
147. Park, D., Kandel, A., Langholz, G.: Genetic-based new fuzzy-reasoning models with
applications to fuzzy control. IEEE Transactions on Systems, Man and Cybernet-
ics 24(1), 39–47 (1994)
148. Pedrycz, W. (ed.): Fuzzy modelling: paradigms and practice. Kluwer Academic Press,
Dordrecht (1996)
149. Pedrycz, W.: Fuzzy evolutionary computing. Soft Computing 2, 61–72 (1998)
150. Pham, D.T., Karaboga, D.: Optimum design of fuzzy logic controllers using genetic
algorithms. Journal of Systems Engineering 1, 114–118 (1991)
151. Price, K.V., Storn, R.M., Lampinen, J.A.: Differential evolution: a practical approach
to global optimization. Springer, Heidelberg (2005)
152. Rachmawati, L., Srinivasan, D.: A hybrid fuzzy evolutionary algorithm for a multi-
objective resource allocation problem. In: Proc. of the Fifth International Conference
on Hybrid Intelligent Systems (HIS 2005), pp. 55–60 (2005)
153. Regattieri-Delgado, M., Yassue-Nagai, E., Ramos de Arruda, L.V.: A neuro-
coevolutionary GFS to build soft sensors. Soft Computing 13(3), 481–495 (2009)
154. Reynolds, R.G.: An introduction to cultural algorithms. In: Proc. of the 3rd Annual
Conference on Evolutionary Programming, pp. 131–139. World Scientific, Singapore
(1994)
155. Reynolds, R.G., Chung, C.J.: Regulating the amount of information used for self-
adaptation in cultural algorithms. In: Proc. of the Seventh Int. Conf. on Genetic Al-
gorithms, pp. 401–408. Morgan Kaufmann Publishers, San Francisco (1997)
156. Richter, J.N.: Fuzzy evolutionary cellular automata. Master thesis, UT State University,
Logan, Utah (2003)
157. Rojas, R.: Neural networks: a systematic introduction. Springer, Berlin (1996)
158. Ronald, E.: When selection meets seduction. In: Proc of the Fifth Int Conf on Genetic
Algorithms, pp. 167–173. Morgan Kaufmann, San Francisco (1993)
128 F. Herrera and M. Lozano
159. Sahoo, N.C., Ranjan, R., Prasad, K., Chaturvedi, A.: A fuzzy-tuned genetic algorithm
for optimal reconfigurations of radial distribution network. European Trans. Electr.
Power 17, 97–111 (2006)
160. Sahoo, N.C., Prasad, K.: A fuzzy genetic approach for network reconfiguration to en-
hance voltage stability in radial distribution systems. Energy Conversion and Manage-
ment 47, 3288–3306 (2006)
161. Sánchez, L., Casillas, J., Cordón, O., del Jesus, M.J.: Some relationships between fuzzy
and random classifiers and models. International Journal of Approximate Reasoning 29,
175–213 (2001)
162. Sánchez, L., Couso, I.: Advocating the use of imprecisely observed data in genetic
fuzzy systems. IEEE Transactions on Fuzzy Systems 15(4), 551–562 (2007)
163. Sánchez, L., Otero, J., Couso, I.: Obtaining Linguistic Fuzzy Rule-based Regression
Models from Imprecise Data with Multiobjective Genetic Algorithms. Soft Comput-
ing 13(3), 467–479 (2009)
164. Sebban, M., Nock, R., Cahuchat, J.H., Rakotomalala, R.: Impact of learning set quality
and size on decision tree performance. Int. J. of Computers, Syst. and Signals 1, 85–105
(2000)
165. Setnes, M., Roubos, H.: GA-fuzzy modeling and classification: complexity and perfor-
mance. IEEE Transactions on Fuzzy Systems 8(5), 509–522 (2000)
166. Setzkorn, C., Paton, R.C.: On the use of multi-objective evolutionary algorithms for the
induction of fuzzy classification rule systems. BioSystems 81, 101–112 (2005)
167. Sharma, S.K., Irwin, G.W.: Fuzzy coding of genetic algorithms. IEEE Trans. on Evolu-
tionary Computation 7(4), 344–355 (2003)
168. Shi, Y., Eberhart, R., Chen, Y.: Implementation of evolutionary fuzzy systems. IEEE
Trans Fuzzy Systems 7(2), 109–119 (1999)
169. Gulshan, S., Kalyanmoy, D.: Comparison of multi-modal optimization algorithms
based on evolutionary algorithms. In: Proc. of the 8th annual conference on Genetic
and evolutionary computation, pp. 1305–1312 (2006)
170. Smith, S.: A learning system based on genetic algorithms. Ph.D. thesis. Unversity of
Pittsburgh (1980)
171. Smith, J.E.: Coevolving memetic algorithms: a review and progress report. IEEE Trans-
action on Systems, Man, and Cybernetics Part B: Cybernetics 37(1), 6–17 (2007)
172. Song, Y.H., Wang, G.S., Johns, A.T., Wang, P.Y.: Improved genetic algorithms with
fuzzy logic controlled crossover and mutation. In: UKACC International Conference
on CONTROL 1996, pp. 140–144 (1996)
173. Song, Y.H., Wang, G.S., Wang, P.T., Johns, A.T.: Environmental/economic dispatch us-
ing fuzzy logic controlled genetic algorithms. IEEE Proc. on Generation, Transmission
and Distribution 144(4), 377–382 (1997)
174. Streifel, R.J., Marks II, R.J., Reed, R., Choi, J.J., Healy, M.: Dynamic fuzzy control of
genetic algorithm parameter coding. IEEE Trans Systems, Man, and Cybernetics - Part
B: Cybernetics 29(3), 426–433 (1999)
175. Subbu, R., Sanderson, A.C., Bonissone, P.: Fuzzy logic controlled genetic algo-
rithms versus tuned genetic algorithms: An agile manufacturing application. In: Proc.
ISIC/CIRA/ISAS Conf., pp. 434–440 (1998)
176. Subbu, R., Bonissone, P.: A retrospective view of fuzzy control of evolutionary algo-
rithm resources. In: Proc. IEEE Int. Conf. Fuzzy Syst., pp. 143–148 (2003)
177. Takagi, T., Sugeno, M.: Fuzzy identification of systems and its application to mod-
elling and control. IEEE Transactions on Systems, Man and Cybernetics 15(1), 116–
132 (1985)
178. Talbi, E.-G.: A taxonomy of hybrid metaheuristics. J. Heuristics 8(5), 541–565 (2002)
Fuzzy Evolutionary Algorithms and Genetic Fuzzy Systems 129
179. Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson, Boston
(2006)
180. Thrift, P.: Fuzzy logic synthesis with genetic algorithms. In: Proc. of 4th International
Conference on Genetic Algorithms (ICGA 1991), pp. 509–513 (1991)
181. Teodorovic, D., Lucic, P.: Schedule synchronization in public transit using the fuzzy
ant system. Transportation Planning and Technology 28(1), 47–76 (2005)
182. Tettamanzi, A.G.: Evolutionary algorithms and fuzzy logic: a two-way integration. In:
2nd Joint Conference on Information Sciences, pp. 464–467 (1995)
183. Tettamanzi, A., Tomassini, M.: Fuzzy evolutionary algorithms. In: Soft Computing: In-
tegrating Evolutionary, Neural, and Fuzzy Systems, pp. 233–248. Springer, Heidelberg
(2001)
184. Tsang, C.-H., Tsai, J.H., Wang, H.: Genetic-fuzzy rule mining approach and evalua-
tion of feature selection techniques for anomaly intrusion detection. Pattern Recogni-
tion 40(9), 2373–2391 (2007)
185. Tuson, A.L., Ross, P.: Adapting operator settings in genetic algorithms. Evolut. Com-
put. 6(2), 161–184 (1998)
186. Valenzuela-Rendon, M.: The fuzzy classifier system: A classifier system for contin-
uously varying variables. In: Proc. of 4th International Conference on Genetic Algo-
rithms (ICGA 1991), pp. 346–353 (1991)
187. Valenzuela-Rendon, M.: Reinforcement learning in the fuzzy classifier system. Expert
Systems with Applications 14, 237–247 (1998)
188. Venturini, G.: SIA: a supervised inductive algorithm with genetic search for learning
attribute based concepts. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 280–
296. Springer, Heidelberg (1993)
189. Voget, S.: Multiobjective optimization with genetic algorithms and fuzzy-control. In:
Proc of the Fourth European Congress on Intelligent Techniques and Soft Computing,
pp. 391–394 (1996)
190. Voget, S., Kolonko, M.: Multidimensional optimization with a fuzzy genetic algorithm.
Journal of Heuristic 4(3), 221–244 (1998)
191. Voigt, H.M.: Fuzzy evolutionary algorithms. Technical Report tr-92-038, International
Computer Science Institute (ICSI), Berkeley (1992)
192. Voigt, H.M.: Soft genetic operators in evolutionary algorithms. In: Banzhaf, W., Eck-
man, F.H. (eds.) Evolution as a Computational Process 1992. LNCS, vol. 899, pp. 123–
141. Springer, Heidelberg (1995)
193. Voigt, H.M., Anheyer, T.: Modal mutations in evolutionary algorithms. In: Proceeding
of the First IEEE International Conference on Evolutionary Computation, pp. 88–92.
IEEE Press, Los Alamitos (1994)
194. Voigt, H.M., Born, J., Santibáñez-Koref, I.: A multivalued evolutionary algorithms.
Technical Report tr-93-022, International Computer Science Institute (ICSI), Berkeley
(1993)
195. Voigt, H.M., Mühlenbein, H., Cvetković, D.: Fuzzy recombination for the breeder ge-
netic algorithm. In: Proc. of the Sixth Int. Conf. on Genetic Algorithms, pp. 104–111.
Morgan Kaufmann Publishers, San Francisco (1995)
196. Wang, K.: A new fuzzy genetic algorithms based on population diversity. In: Proc. of
2001 IEEE International Symposium on Computational Intelligence in Robotics and
Automation, pp. 108–112 (2001)
197. Wang, H., Kwong, S., Jin, Y., Wei, W., Man, K.F.: Multiobjective hierarchical genetic
algorithm for interpretable fuzzy rule-based knowledge extraction. Fuzzy Sets and Sys-
tems 149, 49–186 (2005)
130 F. Herrera and M. Lozano
198. Wang, P.Y., Wang, G.S., Song, Y.H., Johns, A.T.: Fuzzy logic controlled genetic algo-
rithms. In: Proceedings of the Fifth IEEE International Conference on Fuzzy Systems,
pp. 972–979 (1996)
199. Wilson, S.: Classifier fitness based on accuracy. Evol. Comput. 3(2), 149–175 (1995)
200. Wong, M.L., Leung, K.S.: Data mining using grammar based genetic programming and
applications. Kluwer Academic Publishers, Dordrecht (2000)
201. Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski,
J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg
(1997)
202. Xu, H.Y., Vukovich, G.: A fuzzy genetic algorithm with effective search and optimiza-
tion. In: Proc. of 1993 International Joint Conference on Neural Networks, pp. 2967–
2970 (1993)
203. Xu, H.Y., Vukovich, G., Ichikawa, Y., Ishii, Y.: Fuzzy evolutionary algorithms and au-
tomatic robot trajectory generation. In: Proceeding of the First IEEE International Con-
ference on Evolutionary Computation, pp. 595–600. IEEE Press, Los Alamitos (1994)
204. Yager, R.R., Filev, D.P.: Essentials of fuzzy modeling and control. John Wiley & Sons,
Chichester (1994)
205. Yamakawa, T.: Stabilization of an inverted pendulum by a high-speed fuzzy logic con-
troller hardware system. Fuzzy Sets and Systems 32, 161–180 (1989)
206. Yang, Q., Wu, X.: 10 challenging problems in data mining research. International Jour-
nal of Information Technology & Decision Making 5(4), 597–604 (2006)
207. Yun, Y., Gen, M.: Performance analysis of adaptive genetic algorithm with fuzzy logic
and heuristics. Fuzzy Optimization and Decision Making 2, 161–175 (2003)
208. Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338–353 (1965)
209. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the strength pareto evolution-
ary algorithm for multiobjective optimization. In: Proc. Evolutionary Methods for De-
sign, Optimization and Control with Applications to Industrial Problems (EUROGEN
2001), Barcelona, Spain, pp. 95–100 (2001)
210. Zeng, X., Rabenasolo, B.: A fuzzy logic based design for adaptive genetic algorithms.
In: Proc. of the European Congress on Intelligent Techniques and Soft Computing, pp.
660–664 (1997)
211. Zhu, L., Zhang, H., Jing, Y.: A new neuro-fuzzy adaptive genetic algorithm. Journal of
Electronic Science and Technology of China 1(1) (2003)
Multiobjective Genetic Fuzzy Systems
Abstract. In the design of fuzzy rule-based systems, we have two conflicting goals:
One is accuracy maximization, and the other is complexity minimization (i.e., in-
terpretability maximization). There exists a tradeoff relation between these two
goals. That is, we cannot simultaneously achieve accuracy maximization and com-
plexity minimization. Various approaches have been proposed to find accurate and
interpretable fuzzy rule-based systems. In some approaches, these two goals are
integrated into a single objective function which can be optimized by standard
single-objective optimization techniques. In other approaches, accuracy maximiza-
tion and complexity minimization are handled as different objectives in the frame-
work of multiobjective optimization. Recently, multiobjective genetic algorithms
have been used to search for a large number of non-dominated fuzzy rule-based sys-
tems along the accuracy-complexity tradeoff surface in some studies. These studies
are often referred to as multiobjective genetic fuzzy systems. In this chapter, we
first briefly explain the concept of accuracy-complexity tradeoff in the design of
fuzzy rule-based systems. Next we explain various studies in multiobjective genetic
fuzzy systems. Two basic ideas are explained in detail through computational ex-
periments. Then we review a wide range of studies related to multiobjective genetic
fuzzy systems. Finally we point out future research directions.
1 Introduction
One advantage of fuzzy rule-based systems over other nonlinear systems such as
neural networks is their linguistic interpretability. That is, each fuzzy rule is lin-
guistically interpretable when fuzzy rule-based systems are designed from linguis-
tic knowledge of human experts. Linguistic knowledge, however, is not always
Hisao Ishibuchi and Yusuke Nojima
Department of Computer Science and Intelligent Systems, Graduate School of Engineering,
Osaka Prefecture University
1-1 Gakuen-cho, Naka-ku, Sakai, Osaka 599-8531, Japan
e-mail: [email protected],[email protected]
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 131–173.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
132 H. Ishibuchi and Y. Nojima
available. Thus various approaches have been proposed for extracting fuzzy rules
from numerical data in the literature (e.g., Takagi and Sugeno [136], and Wang and
Mendel [142]). Learning techniques of neural networks (e.g., the back-propagation
algorithm [131]) have been applied to fuzzy rule-based systems for their param-
eter tuning [3, 66, 92, 115]. Fuzzy rule-based systems with learning capability
are often called neuro-fuzzy systems [115]. ANFIS of Jang [92] is the most well-
known and frequently-used neuro-fuzzy system. Whereas only continuous parame-
ters are adjusted in neuro-fuzzy systems, evolutionary optimization techniques can
be used not only for parameter tuning but also for discrete optimization such as
input selection, rule generation and rule selection. Genetic algorithms [53] have
been frequently used for the design of fuzzy rule-based systems from numerical
data under the name of genetic fuzzy systems [36, 37, 63] since the early 1990s
[46, 101, 102, 123, 137, 139].
Fuzzy rule-based systems are universal approximators of nonlinear functions
[106, 111, 143] in a similar way to neural networks [50, 67, 68]. Theoretically we
can improve their approximation accuracy on training data to an arbitrarily spec-
ified level by increasing their complexity. Neuro-fuzzy systems and genetic fuzzy
systems have been successfully used for such an accuracy improvement task. Accu-
racy improvement can be viewed as the following optimization problem:
Large
tradeoff relation between
accuracy maximization (i.e., Interpretable
fuzzy system
error minimization) and
complexity minimization
Error
in the design of fuzzy rule-
based systems. Ellipsoids
show non-dominated fuzzy
rule-based systems along Ideal Accurate
fuzzy system fuzzy system
Small
the accuracy-complexity
tradeoff curve
Simple Complexity Complicated
Training data
accuracy
0 S* Complexity
Since the late 1990s, the importance of interpretability maintenance in the design
of fuzzy rule-based systems has been pointed out by many studies [4, 9, 22, 25,
69, 95, 100, 116, 130, 132, 133, 145]. In other words, complexity minimization as
well as accuracy maximization was taken into account in order to design accurate
and interpretable fuzzy rule-based systems. Whereas accuracy maximization and
complexity minimization were simultaneously considered, the design of fuzzy rule-
based systems was handled in the framework of single-objective optimization in
those studies. That is, the two goals were integrated as follows:
134 H. Ishibuchi and Y. Nojima
other hand, a large number of non-dominated fuzzy rule-based systems are obtained
in multiobjective approaches by solving the following multiobjective problem:
For example, a two-objective fuzzy rule selection method was proposed in [76] to
search for non-dominated fuzzy classifiers with respect to the maximization of the
number of correctly classified training patterns and the minimization of the number
of selected fuzzy rules.
In [79], not only the number of fuzzy rules but also the total number of antecedent
conditions (i.e., the total rule length) was minimized. In this case, the multiobjective
problem in (6) can be rewritten as follows:
Maximize Accuracy(S), and minimize Complexity1 (S) and Complexity2 (S), (7)
where Complexity1 (S) and Complexity2 (S) are different complexity measures.
The basic idea of multiobjective approaches is to search for a large number of
non-dominated fuzzy rule-based systems with different tradeoffs between accuracy
maximization and complexity minimization. This idea is illustrated in Figure 4
where multiple arrows show search directions for finding various non-dominated
fuzzy rule-based systems with respect to error minimization and complexity mini-
mization. Simple and inaccurate fuzzy rule-based systems are located in the upper
left part of this figure while complicated and accurate ones are in the lower right
part. There exist a large number of non-dominated fuzzy rule-based systems along
the accuracy-complexity tradeoff curve. Multiobjective approaches try to search for
those non-dominated fuzzy rule-based systems as many as possible as shown in
Figure 5. Evolutionary multiobjective optimization (EMO) algorithms [32, 34, 41]
are used for this task. In multiobjective approaches, it is assumed that a single non-
dominated fuzzy rule-based system is chosen from a large number of obtained ones
by human users based on their preference with respect to accuracy and complexity.
Some users may choose a fuzzy rule-based system with the highest test data accu-
racy (i.e., with the optimal complexity S∗ in terms of the generalization ability in
Figure 5). Other users may choose simpler fuzzy rule-based systems with higher
interpretability than S∗ .
Large
Interpretable
Fig. 4 Illustration of search fuzzy system
directions for finding var-
Error
Error
of non-dominated fuzzy
rule-based systems along
Test data
the accuracy-complexity
accuracy
tradeoff curve
Training data
accuracy
0 S* Complexity
subject to y ∈ Y, (9)
where f(y) is the objective vector, fi (y) is the ith objective to be maximized, y is
the decision vector, and Y is the feasible region in the decision space. Whereas the
decision vector is usually denoted by x in various fields related to optimization, we
use y in this section. This is because x is used to denote a pattern vector in the next
section.
Let y and z be two feasible solutions of the k-objective maximization problem in
(8) - (9). If the following conditions hold, z can be viewed as being better than y:
∀ ∃
i, fi (y) ≤ fi (z) and j, f j (y) < f j (z). (10)
Rank 3
0 Maximize f1
Solutions with the same rank are compared with each other using a secondary
criterion called a crowding distance. In the calculation of the crowding distance for
a solution, all solutions with the same rank are projected to each objective. Then
the distance between its two adjacent solutions in the projected single-dimensional
space is calculated for each objective. The crowding distance is the sum of the calcu-
lated distances over all objectives. When a solution has the maximum or minimum
value of at least one objective among solutions with the same rank, an infinitely large
Multiobjective Genetic Fuzzy Systems 139
Maximize f2
b C a+b
A
a
Infinitely
large value
0 Maximize f1
value is assigned to that solution as its crowding distance because it has only a sin-
gle adjacent solution for at least one projected single-dimensional objective space.
Figure 7 illustrates the calculation of the crowding distance for the case of two ob-
jectives. In Figure 7, an infinitely large value is assigned to two extreme solutions.
The crowding distance of solution C is calculated as a + b which is the Manhattan
distance between its two adjacent solutions A and B.
In Step 5, each solution in the merged population (i.e., the current and offspring
populations) is evaluated for generation update in the same manner as parent selec-
tion in Step 3 (i.e., by the non-dominated sorting and the crowded distance in the
merged population). The best Npop solutions are chosen from the merged population
with 2 · Npop solutions to form the next population.
In NSGA-II, Pareto dominance-based fitness assignment is realized through the
non-dominated sorting in Figure 6. Better ranks are assigned to better solutions with
respect to Pareto dominance relation. This fitness assignment scheme generates the
selection pressure toward the Pareto front. On the other hand, the crowding dis-
tance in Figure 7 is used as the secondary criterion to differentiate solutions with
the same rank. The secondary criterion works as a diversity maintenance mecha-
nism in NSGA-II in order to evenly distribute solutions over the entire Pareto front.
Roughly speaking, the crowding distance-based secondary criterion widens the pop-
ulation along the Pareto front while the non-dominated sorting-based primary crite-
rion pushes it toward the Pareto front.
On the other hand, elitism is implemented in the framework of the (μ + λ )-ES
generation update scheme in NSGA-II where μ = λ . That is, the best μ solutions
are chosen from (μ + λ ) solutions in the current and offspring populations in order
to form the next population. It is widely recognized that elitist EMO algorithms
outperform non-elitist EMO algorithms [147]. Elitism usually has positive effects
on both the convergence of solutions toward the Pareto front and the diversity along
the Pareto front in EMO algorithms.
In Figure 8, we illustrate multiobjective evolution by NSGA-II on a two-objective
100-item knapsack problem. Figure 8 shows an initial population, an intermediate
population at the 20th generation, and the final population at the 2000th generation.
We can see from Figure 8 that the population moves toward the Pareto front while
increasing the diversity of solutions. It should be noted that a large number of non-
dominated solutions can be obtained by a single run of NSGA-II. This is the main
advantage of EMO algorithms over other techniques for multiobjective optimization.
140 H. Ishibuchi and Y. Nojima
Recently multiobjective approaches have been used for the design of fuzzy rule-
based systems in many studies. As SOGAs in single-objective genetic fuzzy sys-
tems [36, 37, 63], EMO algorithms can optimize various aspects of fuzzy rule-based
systems (e.g., input selection, rule generation, rule selection, determination of the
number of fuzzy sets for each variable, optimization of the shape of each fuzzy set,
etc.) in multiobjective genetic fuzzy systems. Various criteria for evaluating fuzzy
rule-based systems can be also used as objectives in multiobjective genetic fuzzy
systems.
For example, the number of fuzzy rules and the total number of antecedent con-
ditions were used together with an accuracy measure for multiobjective design of
fuzzy rule-based classifiers in [124, 134, 138] as in [79, 81, 85]. That is, EMO algo-
rithms were applied to three-objective problems where f2 (S) and f3 (S) were used as
complexity measures. These two complexity measures were often used to avoid the
overfitting to training data in multiobjective design of fuzzy rule-based classifiers.
Multiobjective approaches have been used not only for classification problems
but also for function approximation problems. The above-mentioned two complex-
ity measures were used for classification problems (together with the average mis-
classification rate) and function approximation problems (together with the mean
squared error) in Wang et al. [140]. The total number of fuzzy sets instead of the
total number of antecedent conditions was used together with an accuracy measure
and the number of fuzzy rules in a three-objective formulation for function approx-
imation problems in Xing et al. [144]. On the other hand, Alcala et al. [10] and
Gonzalez et al. [57] used a two-objective formulation with the mean squared error
and the number of fuzzy rules for function approximation problems. Jimenez et al.
[94] and Gomez-Skarmeta [54] used a different two-objective formulation for func-
tion approximation problems. In their two-objective formulation, one objective is
the mean squared error and the other objective is an interpretability measure defined
by the similarity between adjacent fuzzy sets.
In some studies, more than three objectives were used for multiobjective design
of fuzzy rule-based systems. For example, the following five objectives were used in
Wang et al. [141]: accuracy, completeness and distinguishability, non-redundancy,
the number of fuzzy rules, and the total number of fuzzy sets. It should be noted that
“completeness and distinguishability” was used as a single objective (for details, see
Wang et al. [141]).
In all the above-mentioned studies [10, 54, 57, 75, 76, 79, 81, 85, 89, 94, 124, 134,
138, 140, 141, 144] on multiobjective genetic fuzzy systems for classification and
function approximation problems, multiobjective problems were formulated using
both accuracy and complexity measures. That is, EMO algorithms in these studies
were used to search for a number of non-dominated fuzzy rule-based systems with
different accuracy-complexity tradeoffs. On the other hand, in multiobjective ge-
netic fuzzy systems for control problems, multiple performance measures together
with no complexity measures were often used. For example, Stewart et al. [135]
handled the design of fuzzy logic controllers as a three-objective problem where
three objectives were the current tracking error, the velocity tracking error, and the
power consumption. Chen and Chiang [29] formulated a three-objective problem
142 H. Ishibuchi and Y. Nojima
using the number of collisions, the distance between the target and lead points of
the new path, and the number of explored actions. Kim and Roschke [105] used
the following two performance measures: the structural acceleration and the base
displacement level.
Whereas both complexity and accuracy have been taken into account in many
studies on multiobjective genetic fuzzy systems for classification and function ap-
proximation problems, only accuracy measures were used for fuzzy control in
[29, 105, 135]. This is partially because multiple performance measures are usually
involved in controllers while the performance of classifiers and function approxima-
tors can be often evaluated by a single accuracy measure. Another possible reason is
that the overfitting to training data is not so critical in control problems if compared
with classification and function approximation problems. Moreover, the number of
input variables is usually much larger in classification and function approximation
problems than in control problems. This is also a possible reason why complexity
minimization (including input selection) has been more widely used in classifica-
tion and function approximation problems than in control problems. Of course, it
is possible to use multiple accuracy measures for classification problems with no
complexity measures (e.g., the false negative rate and the false positive rate). It is
also possible to use complexity measures for control problems together with ac-
curacy measures. In general, multiple accuracy measures and multiple complexity
measures are involved in classification, function approximation and control prob-
lems. The choice of an appropriate set of objective functions is an important future
research issue.
The overview on multiobjective genetic fuzzy systems in this subsection is far
from complete. Our overview is limited to journal papers. For a more complete list
of references including conference papers, see Evolutionary Multiobjective Opti-
mization of Fuzzy Rule-Based Systems Bibliography Page (http://www2.ing.unipi.
it/˜o613499/emofrbss.html).
normalized into real numbers in the unit interval [0, 1]. Thus the pattern space of
our classification problem is an n-dimensional unit-hypercube [0, 1]n .
For our n-dimensional pattern classification problem, we use fuzzy rules of the
following type:
Rule Rq : If x1 is Aq1 and ... and xn is Aqn then Class Cq with CF q , (11)
where Rq is the label of the qth fuzzy rule, x = (x1 , ..., xn ) is an n-dimensional
pattern vector, Aqi is an antecedent fuzzy set (i = 1, 2, ..., n), Cq is a class label, and
CF q is a rule weight. We denote the antecedent fuzzy sets of Rq as a fuzzy vector
Aq = (Aq1 , Aq2 , ..., Aqn ).
We use 14 fuzzy sets in four fuzzy partitions with different granularities in Figure
9. In addition to those 14 fuzzy sets, we also use the domain interval [0, 1] itself as
an antecedent fuzzy set in order to represent a don’t care condition. Whereas we use
the prespecified membership functions with no further adjustment in this chapter,
we can include a learning mechanism of the membership functions in multiobjective
genetic fuzzy rule selection. The number of antecedent fuzzy sets for each attribute
(i.e., granularity of each attribute) can be also handled as a decision variable in
multiobjective genetic fuzzy rule selection.
1.0 1 2 1.0 3 4 5
S2 L2 S3 M3 L3
0.0 0.0
0.0 1.0 0.0 1.0
1.0 6 7 8 9 1.0 a b c d e
0.0 0.0
0.0 1.0 0.0 1.0
Let S be a set of fuzzy rules of the form in (11). When an input pattern x p is
to be classified by S, first we calculate the compatibility grade of x p with the an-
tecedent part Aq = (Aq1 , Aq2 , ..., Aqn ) of each fuzzy rule Rq in S using the product
operation as
μAq (x p ) = μAq1 (x p1 ) · ... · μAqn (x pn ), (12)
where μAqi (·) is the membership function of the antecedent fuzzy set Aqi . Then a
single winner rule Rw is identified using the compatibility grade and the rule weight
of each fuzzy rule as
The input pattern x p is classified as the consequent class Cw of the winner rule
Rw . When multiple fuzzy rules with different consequent classes have the same
maximum value in (13), the classification of x p is rejected. If there is no compatible
fuzzy rule with x p , its classification is also rejected.
∑ μAq (x p )
x p ∈ Class h
c(Aq ⇒ Class h) = m , h = 1, 2, ..., M. (14)
∑ μAq (x p)
p=1
It should be noted that “Aq ⇒ Class h” means the fuzzy rule with the antecedent
part Aq and the consequent class h. Then the consequent class Cq is specified by
identifying the class with the maximum confidence:
In this manner, we generate the fuzzy rule Rq (i.e., Aq ⇒ Class Cq ) with the an-
tecedent part Aq and the consequent class Cq . We do not generate any fuzzy rules
with the antecedent part Aq if there is no compatible training pattern with Aq .
The rule weight CF q of each fuzzy rule Rq has a large effect on the performance
of fuzzy rule-based classifiers. We use the following specification of CF q because
good results were reported in the literature [90]:
M
CF q = c(Aq ⇒ ClassCq ) − ∑ c(Aq ⇒ Class h). (16)
h=1
h =Cq
Multiobjective Genetic Fuzzy Systems 145
We do not use the fuzzy rule Rq as a candidate rule if the rule weight CF q is not pos-
itive (i.e., if its confidence is not larger than 0.5). Whereas we heuristically specify
the rule weight of each fuzzy rule by (16) in this chapter, we can include a learning
mechanism of the rule weight in multiobjective genetic fuzzy rule selection.
In the above-mentioned heuristic manner, we can generate a large number of
short fuzzy rules as candidate rules in multiobjective fuzzy rule selection. An EMO
algorithm is used to search for non-dominated subsets of the generated candidate
rules. When the number of candidate rules is too large (e.g., tens of thousands), it
is not easy for EMO algorithms to efficiently perform multiobjective fuzzy rule se-
lection. Thus we use a prescreening procedure to decrease the number of candidate
rules. Our prescreening procedure is based on well-known rule evaluation measures
in the field of data mining [7]: support and confidence.
The confidence of a rule evaluates the accuracy of the association from the an-
tecedent part to the consequent part. We have already shown a fuzzy version of
the confidence in (14). On the other hand, the support indicates the percentage of
covered patterns. Its fuzzy version can be written as follows [80]:
∑ μAq (x p )
x p ∈ Class Cq
s(Rq ) = s(Aq ⇒ Class Cq ) = . (17)
m
For prescreening candidate rules, we use two threshold values: the minimum support
and the minimum confidence. We exclude fuzzy rules that do not satisfy these two
threshold values. Among short fuzzy rules satisfying these two threshold values, we
choose a prespecified number of candidate rules for each class. As a rule evaluation
criterion, we use the product of the support s(Rq ) and the confidence c(Rq ). That
is, we choose a prespecified number of the best candidate rules for each class with
respect to s(Rq ) · c(Rq ).
We use NSGA-II of Deb et al. [42] to search for non-dominated fuzzy rule-based
classifiers with respect to these three objectives. Of course, we can use other EMO
algorithms. Since each individual is represented by a binary string, we can use stan-
dard genetic operations for binary strings in NSGA-II for multiobjective fuzzy rule
selection. In our computational experiments, uniform crossover and bit-flip mutation
were used in NSGA-II. The execution of NSGA-II was terminated at the prespeci-
fied number of generations.
In order to efficiently decrease the number of fuzzy rules in each rule set S, we
can use two heuristic techniques. One is biased mutation where a larger mutation
probability is assigned to the mutation from 1 to 0 than that from 0 to 1. The other
is the removal of unnecessary fuzzy rules. Since we use the single winner-based
scheme in (13) for classifying each training pattern by a fuzzy rule-based classifier
S, some fuzzy rules in S may classify no training patterns. We can remove those
unnecessary fuzzy rules from S without changing any classification results by S
(i.e., without changing the first objective f1 (S)). At the same time, the removal of
those unnecessary fuzzy rules improves the second objective f2 (S) and the third
objective f3 (S). In our computational experiments, the necessity of each fuzzy rule
in S was checked when f1 (S) was calculated. All the unnecessary fuzzy rules were
removed from S before f2 (S) and f3 (S) were calculated. This heuristic procedure
can be viewed as a kind of local search since f2 (S) and f3 (S) are improved without
deteriorating f1 (S).
Table 1 Five data sets used in our computational experiments. Rule extraction criteria (i.e.,
specified values of the minimum support and confidence) are also shown for each data set
the minimum confidence, the minimum support and the maximum number of an-
tecedent conditions, we chose the best 300 rules for each class using the product of
support and confidence. Those 300 fuzzy rules for each class were used as candidate
rules in multiobjective fuzzy rule selection.
Then we applied NSGA-II to the candidate rules to search for non-dominated
fuzzy rule-based classifiers with respect to the three objectives in (18). We used the
following parameter specifications in NSGA-II:
Population size: 200 strings,
Crossover probability: 0.9 (uniform crossover),
Mutation probability: 0.05 (1 → 0) and 1/N (0 → 1) where N is the string length,
Termination condition: 5000 generations.
In the following, we report experimental results of multiobjective fuzzy rule
selection on each data set.
Wisconsin Breast Cancer Data Set: First we randomly divided the data set into
342 and 341 patterns for training and testing, respectively. Next we generated 76270
fuzzy rules of length three or less from the 342 training patterns, which satisfied the
minimum support 0.01 and the minimum confidence 0.6. Among those fuzzy rules,
we chose 300 candidate rules for each class using the product of support and con-
fidence as the rule evaluation criterion. Then we applied NSGA-II to the candidate
rules for multiobjective fuzzy rule selection. From its single run, 11 non-dominated
fuzzy classifiers (i.e., 11 non-dominated subsets of the candidate rules) were ob-
tained. Finally each of the obtained fuzzy rule-based classifier was evaluated for the
training patterns and the test patterns.
Training data accuracy and test data accuracy of each fuzzy rule-based classifier
are shown in Figure 10 (a) and Figure 10 (b), respectively. Each circle in Figure 10
shows a fuzzy rule-based classifier. Some of the obtained fuzzy rule-based classi-
fiers (e.g., a fuzzy rule-based classifier with only a single fuzzy rule) are not shown
because their classification rates are out of the range of the vertical axis of each plot
in Figure 10. Since we used not only the number of fuzzy rules but also the total
number of antecedent conditions as complexity measures in multiobjective fuzzy
rule selection, different fuzzy rule-based classifiers with the same number of fuzzy
rules were obtained in Figure 10. For example, fuzzy rule-based classifiers with two
and three fuzzy rules are shown in Figure 11. It should be noted that the horizontal
axis is the average rule length (i.e., the average number of antecedent conditions in
each fuzzy rule) in Figure 11 whereas it is the number of fuzzy rules in Figure 10.
We can observe a clear accuracy-complexity tradeoff relation in Figure 10 (a)
for the training data. The classification rate on the training data increases with the
increase in the number of fuzzy rules in Figure 10 (a). This means that we cannot
simultaneously achieve the accuracy maximization and the complexity minimiza-
tion. A similar tradeoff relation is observed in Figure 10 (b) for the test data. No
clear deterioration in the generalization ability due to the increase in the complexity
of fuzzy rule-based classifiers is observed in Figure 10 (b). That is, we observe no
clear indication of the overfitting of fuzzy rule-based systems to the training data in
148 H. Ishibuchi and Y. Nojima
100 100
98 B 98
96 96 B
A
94 94
92 92 A
90 90
2 3 4 5 6 2 3 4 5 6
Number of rules Number of rules
(a) Training data accuracy. (b) Test data accuracy.
Fig. 10 Obtained non-dominated fuzzy rule-based classifiers (Breast W)
100 100
Training
Classification rate (%)
Classification rate (%)
98 Test 98
96 96
94 94
Training
92 92
Test
90 90
0 1 2 0 1 2
Average rule length Average rule length
(a) Classifiers with two rules. (b) Classifiers with three rules.
Fig. 11 Relation between the accuracy and the average rule length (Breast W)
Figure 10 (b). We can also observe a clear tradeoff relation between the accuracy
of fuzzy rule-based classifiers and the average rule length in Figure 11 for both the
training data and the test data. That is, there exists the accuracy-complexity tradeoff
relation when the average rule length is used as a complexity measure in Figure 11.
Two fuzzy rule-based classifies A and B in Figure 10 are shown in Figure 12
and Figure 13, respectively. The classifier A in Figure 12 with two fuzzy rules is
the simplest one with the highest interpretability in Figure 10 whereas the classifier
B with six fuzzy rules is the most complicated one with the highest training data
accuracy. There exist many fuzzy rule-based classifiers between these two extremes.
The classifier B may be chosen by many human users since it also has the highest
generalization ability in Figure 10 (b). Some human users, however, may choose
x2 x3 Consequent
Class 1
Fig. 12 The simplest fuzzy R1 DC (0.95)
rule-based classifier A in Class 2
Figure 10 (Breast W)
R2 DC (0.79)
Multiobjective Genetic Fuzzy Systems 149
x1 x2 x3 x4 x5 x6 x8 x9 Consequent
Class 1
R1 DC DC DC DC DC (0.96)
Class 1
R2 DC DC DC DC DC DC (0.97)
Class 2
R3 DC DC DC DC DC DC DC (0.94)
Class 2
R4 DC DC DC DC DC DC DC (0.79)
Class 2
R5 DC DC DC DC DC DC DC (0.77)
Class 2
R6 DC DC DC DC DC DC (0.71)
0.64 0.25
0.62 0.20
Support
Support
0.60 0.15
0.58 0.10
Candidate Rules Candidate Rules
Selected Rules Selected Rules
0.56 0.05
0.88 0.92 0.96 1.00 0.6 0.7 0.8 0.9 1.0
Confidence Confidence
(a) Class 1. (b) Class 2.
Fig. 14 Fuzzy rules in Figure 12 included in the classifier A in Figure 10 (Breast W)
simpler ones than the classifier B in the situation where not only the accuracy but
also the interpretability is an important criterion.
In Figure 14, the two fuzzy rules in Figure 12 are shown in the confidence-support
space together with the other candidate rules. We can see from Figure 14 that these
two fuzzy rules have large support values. This means that they cover many training
patterns. Usually simple fuzzy rule-based classifiers consist of a small number of
general fuzzy rules that cover many training patterns. On the other hand, the six
fuzzy rules in Figure 13 are shown in Figure 15. The most complicated fuzzy rule-
based classifier B includes not only general fuzzy rules with large support values
but also specific fuzzy rules with small support values and high confidence values.
Glass Data Set: Experimental results are shown in Figure 16. Multiobjective fuzzy
rule selection found 34 non-dominated fuzzy rule-based classifiers (some of them
are not shown in Figure 16 due to their low accuracy). We can observe a clear
accuracy-complexity tradeoff relation in Figure 16 (a) for training data. We can also
see that the fuzzy rule-based classifier with the best training data accuracy does not
always have the best test data accuracy (Figure 16 (b)). We examined the two fuzzy
rule-based classifiers A and B in Figure 16 in detail. One interesting observation is
150 H. Ishibuchi and Y. Nojima
0.64 0.25
0.62 0.20
Support
Support
0.60 0.15
0.58 0.10
Candidate Rules Candidate Rules
Selected Rules Selected Rules
0.56 0.05
0.88 0.92 0.96 1.00 0.6 0.7 0.8 0.9 1.0
Confidence Confidence
(a) Class 1. (b) Class 2.
Fig. 15 Fuzzy rules in Figure 13 included in the classifier B in Figure 10 (Breast W)
90 70
A B
Classification rate (%)
Classification rate (%)
80
B 65
A
60
70
55
60
50
50 45
40 40
2 4 6 8 10 12 2 4 6 8 10 12
Number of rules Number of rules
(a) Training data accuracy. (b) Test data accuracy.
Fig. 16 Obtained non-dominated fuzzy rule-based classifiers (Glass)
that the classifier A with the highest test data accuracy in Figure 16 (b) has no fuzzy
rule with Class 3 consequent as shown in Figure 17. On the other hand, the classifier
B with the highest training data accuracy in Figure 16 (a) has at least one fuzzy rule
for each class. The test data accuracy of the classifier B is, however, inferior to the
classifier A due to the overfitting to the training data.
Cleveland Heart Disease Data Set: Experimental results are shown in Figure 19.
Multiobjective fuzzy rule selection found 48 non-dominated fuzzy rule-based clas-
sifiers. We can observe a clear accuracy-complexity tradeoff relation in Figure 19
(a) for training data. On the other hand, we can observe a clear indication of the
overfitting of fuzzy rule-based classifiers to training data due to the increase in their
complexity in Figure 19 (b). That is, the test data accuracy decreases with the in-
crease in the number of fuzzy rules in the range with more than five fuzzy rules.
In Figure 20, we show the relation between the accuracy and the average rule
length of the obtained fuzzy rule-based classifiers with three rules (a) and four rules
(b). We can observe a clear deterioration of the test data accuracy due to the increase
in the average rule length in Figure 20. One may notice that Figure 20 (b) includes
Multiobjective Genetic Fuzzy Systems 151
x1 x2 x3 x4 x5 x6 x7 x8 x9 Consequent
Class 1
R1 DC DC DC DC DC DC DC (0.23)
Class 1
R2 DC DC DC DC DC DC (0.47)
Class 2
R3 DC DC DC DC DC DC DC DC (0.94)
Class 2
R4 DC DC DC DC DC DC DC (0.60)
Class 2
R5 DC DC DC DC DC DC (0.63)
Class 4
R6 DC DC DC DC DC DC DC (0.48)
Class 5
R7 DC DC DC DC DC DC (0.26)
Class 6
R8 DC DC DC DC DC DC DC (0.93)
Fig. 17 Fuzzy rule-based classifier A with the highest test data accuracy in Figure 16 (Glass)
x1 x2 x3 x4 x6 x7 x8 x9 Consequent
Class 1
R1 DC DC DC DC DC (0.48)
Class 1
R2 DC DC DC DC DC (0.29)
Class 2
R3 DC DC DC DC DC DC DC (0.94)
Class 2
R4 DC DC DC DC DC DC (0.22)
Class 2
R5 DC DC DC DC DC (0.26)
Class 3
R6 DC DC DC DC DC DC (1.00)
Class 3
R7 DC DC DC DC DC (0.90)
Class 4
R8 DC DC DC DC DC DC (0.48)
Class 5
R9 DC DC DC DC DC (0.26)
Class 6
R10 DC DC DC DC DC DC (0.93)
Class 6
R11 DC DC DC DC DC (0.70)
Fig. 18 Fuzzy rule-based classifier B with the highest training data accuracy in Figure 16
(Glass)
more circles (i.e., test data accuracy) than triangles (i.e., training data accuracy).
This is because our multiobjective fuzzy rule selection algorithm found multiple
fuzzy rule-based classifiers with the same training data accuracy and the same com-
plexity (i.e., the same number of fuzzy rules and the same average rule length).
Those fuzzy rule-based classifiers do not always have the same test data accuracy.
Thus the number of circles (i.e., results for test data) is larger than the number of
triangles (i.e., results for the training data) in Figure 20 (b).
152 H. Ishibuchi and Y. Nojima
90 65
80 60
70 55
60 50
50 45
2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18
Number of rules Number of rules
(a) Training data accuracy. (b) Test data accuracy.
Fig. 19 Obtained non-dominated fuzzy rule-based classifiers (Heart C)
70 70
Training Training
Classification rate (%)
Classification rate (%)
Test Test
65 65
60 60
55 55
50 50
1 2 3 1 2 3
Average rule length Average rule length
(a) Classifiers with three rules. (b) Classifiers with four rules.
Fig. 20 Relation between the accuracy and the average rule length (Heart C)
Iris Data Set: Experimental results are shown in Figure 21. All training patterns
were correctly classified by three fuzzy rules in Figure 21 (a). Since a small fuzzy
rule-based classifier had a 100% training data accuracy, many non-dominated fuzzy
rule-based classifiers were not obtained. There exists, however, a clear accuracy-
complexity tradeoff relation for training data in Figure 21 (a).
Wine Data Set: Experimental results are shown in Figure 22. All the training pat-
terns were correctly classified by four fuzzy rules in Figure 22 (a). As in the case of
the iris data set, a small fuzzy rule-based classifier had a 100% training data accu-
racy in Figure 22 (a). As a result, many non-dominated fuzzy rule-based classifiers
were not obtained for the wine data set. Nevertheless we can observe an accuracy-
complexity tradeoff relation for training data in Figure 22 (a). The highest test data
accuracy was obtained by a fuzzy rule-based classifier with three fuzzy rules in
Figure 22 (b). That fuzzy rule-based classifier also has the highest training data ac-
curacy among the obtained fuzzy rule-based classifiers with three fuzzy rules (see
Figure 23).
Multiobjective Genetic Fuzzy Systems 153
100 100
90 90
80 80
70 70
60 60
2 3 2 3
Number of rules Number of rules
(a) Training data accuracy. (b) Test data accuracy.
Fig. 21 Obtained non-dominated fuzzy rule-based classifiers (Iris)
100 100
Classification rate (%)
Classification rate (%)
90 90
80 80
70 70
3 4 3 4
Number of rules Number of rules
(a) Training data accuracy. (b) Test data accuracy.
Fig. 22 Obtained non-dominated fuzzy rule-based classifiers (Wine)
80
Training
Test
70
0 1 2 3
Average rule length
classifiers to training data due to the increase in the number of fuzzy rules. It should
be noted that we can obtain these observations from a single run of NSGA-II for
each data set. This is because NSGA-II (EMO algorithms in general) can search for
a large number of non-dominated fuzzy rule-based classifiers by its single run.
μBk (x pi )
P(Bk ) = e , k = 1, 2, ..., 9, a, b, c, d, e. (19)
∑ μB j (x pi )
j=1
156 H. Ishibuchi and Y. Nojima
That is, each antecedent fuzzy set Bk has a selection probability which is propor-
tional to its compatibility grade with the attribute value x pi . Then each antecedent
fuzzy set of the generated fuzzy rule is replaced with dont care using a prespeci-
fied probability Pdont care . In this manner, Nrule initial fuzzy rules are generated. An
initial rule set consists of these fuzzy rules. By iterating this procedure, we generate
Npop initial rule sets (i.e., an initial population).
Two individuals (i.e., two rule sets) are selected from the current population by
binary tournament selection with replacement in the same manner as in NSGA-II.
Let the selected rule sets be S1 and S2 . Some fuzzy rules are randomly selected from
each parent to construct a new rule set by crossover. The number of fuzzy rules
to be inherited from each parent to the new rule set is randomly specified. Let N1
and N2 be the number of fuzzy rules to be inherited from S1 and S2 , respectively. We
randomly specify N1 and N2 in the intervals [1, |S1 |] and [1, |S2 |], respectively, where
|Si | is the number of fuzzy rules in the rule set Si . In order to generate a new fuzzy
rule, N1 and N2 fuzzy rules are randomly chosen from S1 and S2 , respectively. The
offspring rule set has (N1 + N2 ) fuzzy rules. We use an upper bound on the number
of fuzzy rules in each rule set (e.g., 40 in our computational experiments). When the
number of fuzzy rules is larger than the upper bound, we randomly remove fuzzy
rules from the offspring rule set until the upper bound condition is satisfied. The
above-mentioned crossover operation is applied to the selected pair of parent rule
sets with a prespecified crossover probability PC . When the crossover operation is
not applied, one of the two parent rule sets is randomly chosen as their offspring rule
set. Each antecedent fuzzy set of fuzzy rules in the offspring rule set is randomly
replaced with a different antecedent fuzzy set by mutation. The mutation operation
is applied to each antecedent fuzzy set with a prespecified mutation probability PM .
After the crossover and mutation operations, a single iteration of the following
Michigan-style algorithm is applied to the newly generated offspring rule set S:
Step 1: Classify each training pattern by the rule set S. The fitness value of each
fuzzy rule in S is the number of correctly classified training patterns by
that rule.
Step 2: Generate Nreplace fuzzy rules from the existing rules in S by genetic op-
erations and from misclassified and/or rejected training patterns by the
above-mentioned heuristic manner.
Step 3: Replace the worst Nreplace fuzzy rules in S with the newly generated
Nreplace fuzzy rules.
generated from the NMR training patterns. The other fuzzy rules (i.e., (Nreplace −
NMR ) fuzzy rules) are generated by genetic operations. On the other hand, when
NMR is larger than Nreplace /2, Nreplace /2 patterns are randomly chosen from the NMR
misclassified or rejected training patterns. Then Nreplace /2 fuzzy rules are directly
generated from the chosen patterns. The other Nreplace /2 fuzzy rules are generated
by genetic operations.
When we generate a new fuzzy rule from existing rules in S by genetic operations,
first a pair of parent fuzzy rules is selected from S using binary tournament selection
with replacement. Then the standard uniform crossover operation is applied to the
selected pair to generate a new fuzzy rule. Finally each antecedent fuzzy set is ran-
domly replaced with a different one using a prespecified mutation probability. This
procedure is iterated to generate a required number of new fuzzy rules (i.e., Nreplace
fuzzy rules including directly generated fuzzy rules from training patterns).
The number of replaced fuzzy rules (i.e., Nreplace ) is specified as
0.2 × |S| for
each rule set S where
0.2 × |S| is the minimum integer not smaller than 0.2 × |S|.
For example, one fuzzy rule is replaced when the number of fuzzy rules in S is
less than or equal to five. In this case, the heuristic rule generation procedure from
training patterns and the genetic operation-based procedure from existing rules are
randomly evoked with the same probability when at least one training pattern is
misclassified or rejected by the rule set S.
As we have already explained, a new rule set S is generated in our multiobjective
fuzzy GBML algorithm by selection, crossover, mutation, and a single iteration of
the Michigan-style algorithm. Whereas unnecessary fuzzy rules were removed from
each rule set in the multiobjective genetic fuzzy rule selection algorithm in the pre-
vious section, they are not removed in our multiobjective fuzzy GBML algorithm in
this section. This is because unnecessary fuzzy rules may include useful antecedent
fuzzy sets. Effects of the removal of unnecessary fuzzy rules on the performance
of our multiobjective fuzzy GBML algorithm, however, should be examined in de-
tail in future studies. The above-mentioned rule set generation procedure is iterated
Npop times to generate an offspring population of Npop rule sets. The next popula-
tion is constructed by choosing the best Npop rule sets from the current and offspring
populations in the same manner as in NSGA-II. Generation update is iterated until
a prespecified stopping condition is satisfied. The total number of generations was
used as the stopping condition in our computational experiments in this section.
100 100
98 98
96 96
94 94
92 92
90 90
2 3 4 5 6 7 2 3 4 5 6 7
Number of rules Number of rules
(a) Training data accuracy. (b) Test data accuracy.
Fig. 24 Obtained non-dominated fuzzy rule-based classifiers (Breast W)
Wisconsin Breast Cancer Data Set: Experimental results are summarized in Fig-
ure 24. Our multiobjective fuzzy GBML algorithm found 10 non-dominated rule
sets. From the comparison between Figure 10 (a) and Figure 24 (a), we can see
that very similar results were obtained for training data. We can also see that the
most complicated fuzzy rule-based classifier with the highest training data accu-
racy in Figure 24 (a) does not have the highest test data accuracy in Figure 24 (b).
This classifier, which is shown in Figure 25, includes long fuzzy rules with many
antecedent conditions as well as short ones. Computation time for multiobjective
fuzzy GBML was 180 minutes on a PC with Intel Xeon 3.6GHz with 4GB RAM in
Figure 24 while it was 17 minutes for multiobjective fuzzy rule selection including
candidate rule generation in Figure 10. This difference is due to a tailored efficient
implementation of multiobjective fuzzy rule selection where the compatibility grade
of each candidate rule to each training pattern was calculated just once and stored
during the execution of NSGA-II. Such an efficient implementation of multiobjec-
tive fuzzy GBML is a future research issue.
Glass Data Set: Experimental results are shown in Figure 26. Our multiobjective
fuzzy GBML algorithm found 28 non-dominated fuzzy rule-based classifiers. Simi-
lar results were obtained in Figure 16 and Figure 26 except for complicated classifiers
with more than six rules (i.e., those classifiers were not obtained in Figure 26).
Cleveland Heart Disease Data Set: Experimental results are shown in Figure 27.
Our multiobjective fuzzy GBML algorithm found 21 non-dominated fuzzy rule-
based classifiers. As in Figure 26, complicated fuzzy rule-based classifiers were not
Multiobjective Genetic Fuzzy Systems 159
x1 x2 x4 x5 x6 x7 x9 Consequent
Class 1
R1 DC DC (1.00)
Class 1
R2 DC DC DC (0.98)
Class 1
R3 DC DC DC DC DC (0.97)
Class 2
R4 DC DC DC DC (0.97)
Class 2
R5 DC DC DC DC DC (0.81)
Class 2
R6 DC DC DC DC DC (0.81)
Class 2
R7 DC DC DC DC DC DC (0.20)
Fig. 25 The most complicated fuzzy rule-based classifier with the highest training data ac-
curacy in Figure 24 (Breast W). The fifth and sixth fuzzy rules are exactly the same. We can
remove one of them without changing any classification results. Such a duplicated fuzzy rule
is removed as an unnecessary rule if we include the unnecessary rule removal mechanism
into our fuzzy GBML algorithm
90 70
Classification rate (%)
Classification rate (%)
80 65
60
70
55
60
50
50
45
40 40
2 3 4 5 6 7 2 3 4 5 6 7
Number of rules Number of rules
(a) Training data accuracy. (b) Test data accuracy.
Fig. 26 Obtained non-dominated fuzzy rule-based classifiers (Glass)
obtained in Figure 27 (compare Figure 27 with Figure 19). Similar results, however,
were achieved in Figure 27 (b) and Figure 19 (b) with respect to classification rates
on test data.
Iris Data Set: Experimental results are shown in Figure 28. As in Figure 21 (a), all
training patterns were correctly classified by three fuzzy rules in Figure 28 (a).
Wine Data Set: Experimental results are shown in Figure 29. All training patterns
were correctly classified by five fuzzy rules in Figure 29 (a) while they were correctly
classified by four fuzzy rules in Figure 22 (a). That is, better results were obtained
by rule selection in Figure 22 (a) than GBML in Figure 29 (a) for training data.
On the other hand, similar performance was obtained by these two approaches with
respect to classification rates on test patterns in Figure 22 (b) and Figure 29 (b).
160 H. Ishibuchi and Y. Nojima
90 65
80 60
70 55
60 50
50 45
2 3 4 5 6 2 3 4 5 6
Number of rules Number of rules
(a) Training data accuracy. (b) Test data accuracy.
Fig. 27 Obtained non-dominated fuzzy rule-based classifiers (Heart C)
100 100
Classification rate (%)
90 90
80 80
70 70
60 60
2 3 2 3
Number of rules Number of rules
(a) Training data accuracy. (b) Test data accuracy.
Fig. 28 Obtained non-dominated fuzzy rule-based classifiers (Iris)
We did not observe any clear differences between experimental results by the two
approaches: Multiobjective fuzzy rule selection in the previous section and multiob-
jective fuzzy GBML in this section. In Figure 30, we compare these two approaches
in terms of the average rule length. We can see from Figure 30 that much longer
fuzzy rules were included in fuzzy rule-based classifiers obtained by GBML than
rule selection. This is because only short fuzzy rules were used as candidate rules in
rule selection.
Since multiobjective fuzzy GBML can generate any fuzzy rule with an arbi-
trary number of antecedent conditions, fuzzy rule-based classifiers may include long
fuzzy rules as well as short ones (e.g., see Figure 25). This means that GBML has
a much larger search space than rule selection [85]. Thus GBML may need much
more computation load (i.e., a larger population size and/or a larger number of gen-
erations) whereas we used the same specification for these two approaches.
Since the performance and the computation time of rule selection strongly de-
pend on the choice of candidate rules, the above-mentioned observations with re-
spect to the comparison between the two algorithms are not always valid. Different
observations may be obtained when we use different specifications of candidate
Multiobjective Genetic Fuzzy Systems 161
80 80
70 70
3 4 5 3 4 5
Number of rules Number of rules
(a) Training data accuracy. (b) Test data accuracy.
Fig. 29 Obtained non-dominated fuzzy rule-based classifiers (Wine)
72 72
Rule selection Rule selection
Classification rate (%)
Classification rate (%)
70 GBML 70 GBML
68 68
66 66
64 64
62 62
60 60
1 2 3 4 1 2 3 4
Average rule length Average rule length
(a) Classifiers with four rules. (b) Classifier with five rules.
Fig. 30 Relation between the accuracy and the average rule length (Heart C)
rules in multiobjective fuzzy rule selection. Multiobjective fuzzy GBML can be im-
plemented in a more efficient manner by including some heuristics (e.g., an upper
bound on rule length, unnecessary rule removal, and rule removal mutation) as in
multiobjective fuzzy rule selection.
6 Related Studies
In this section, we briefly explain various studies related to multiobjective genetic
fuzzy systems. More detailed explanations can be found in [97] for multiobjective
machine learning and [51] for multiobjective data mining.
tasks: One is to search for Pareto-optimal rules and the other is to search for Pareto-
optimal rule sets.
In data mining techniques such as Apriori [7], support and confidence have fre-
quently been used as rule evaluation criteria. Other rule evaluation criteria, however,
were also proposed. Among them are gain, variance, chi-squared value, entropy
gain, gini, laplace, lift, and conviction [15]. It was shown for non-fuzzy rules that
the best rule according to any of these measures is a Pareto-optimal rule of the fol-
lowing two-objective rule discovery problem [15]:
interpretability (e.g., the number of input variables, the number of fuzzy sets for
each variable, the separability of adjacent fuzzy sets, the number of fuzzy rules, the
number of antecedent conditions of each fuzzy rule, etc.). See [23, 24, 58, 83, 113]
for further discussions on interpretability of fuzzy rule-based systems. If we use
those aspects as separate objectives, fuzzy system design is formulated as a many-
objective problem. Pareto ranking-based EMO algorithms such as NSGA-II [42]
and SPEA [147], however, do not work well on such a problem with many objec-
tives [70, 87, 93, 104, 125]. Thus it is necessary to choose only a few interpretability
measures or to combine various interpretability measures into a few objective func-
tions. It would be interesting to examine how the search ability of EMO algorithms
for multiobjective fuzzy system design depends on the choice of interpretability
measures.
Performance evaluation of multiobjective genetic fuzzy systems is also very im-
portant. This issue is two-fold. One is related to the performance of a finally se-
lected fuzzy rule-based system. After a single fuzzy rule-based system is chosen
from a large number of obtained non-dominated ones, its generalization ability
together with its interpretability can be compared with results by single-objective
approaches. Such a comparative study may clearly demonstrate advantages and dis-
advantages of multiobjective approaches over single-objective ones. Another per-
formance evaluation issue is related to the search ability of multiobjective genetic
fuzzy systems as multiobjective optimizers. A number of performance indices for
evaluating EMO algorithms [32, 34, 41, 47, 148] can be used for this task. It should
be noted that the search ability of EMO algorithms in multiobjective genetic fuzzy
systems is evaluated by training data accuracy (i.e., accuracy measures in multi-
objective problems) while the performance of obtained fuzzy rule-based systems
is evaluated by test data accuracy (i.e., actual performance of fuzzy rule-based
systems).
Another future research issue is theoretical analysis for maximizing the general-
ization ability of fuzzy rule-based systems. As shown in this chapter, multiobjective
genetic fuzzy systems can be used for empirical analysis of accuracy-complexity
tradeoff of fuzzy rule-based systems [85]. Almost all studies on multiobjective ge-
netic fuzzy systems are based on computational experiments with no theoretical
analysis. Theoretical analysis such as statistical learning theory [30] seems to be an
important research issue for finding fuzzy rule-based systems with high generaliza-
tion ability. In this context, regularization methods can be discussed as multiobjec-
tive problems [99].
Incorporation of user’s preference is a hot issue in the EMO community [5, 31,
40, 43, 84, 96]. User’s preference can be incorporated into multiobjective genetic
fuzzy systems in order to efficiently search for preferred fuzzy rule-based systems.
Some users may prefer accurate fuzzy rule-based systems even if its interpretabil-
ity is not high. Other users may prefer simple fuzzy rule-based systems even if its
accuracy is not very high. Interpretability is important in some application areas
while accuracy is the primary objective in many studies on the design of fuzzy rule-
based systems. Information on user’s preference can be used to guide the multiob-
jective search for preferred fuzzy rule-based systems. The choice of a single fuzzy
Multiobjective Genetic Fuzzy Systems 165
8 Conclusions
Linguistic interpretability is the main advantage of fuzzy rule-based systems over
other nonlinear models such as neural networks. Accuracy maximization, however,
often leads to the deterioration in the linguistic interpretability (i.e., increase in the
complexity) of fuzzy rule-based systems. As a promising approach to the handling
of the tradeoff between accuracy maximization and complexity minimization, we
explained multiobjective design of fuzzy rule-based systems in this chapter. A large
number of non-dominated fuzzy rule-based systems with respect to accuracy maxi-
mization and complexity minimization can be obtained by a single run of an EMO
algorithm. Through computational experiments, we demonstrated that an accuracy-
complexity tradeoff relation can be visually shown by the obtained non-dominated
fuzzy rule-based systems. Human users are supposed to choose a single fuzzy rule-
based system from the obtained non-dominated ones by their preference with re-
spect to accuracy and interpretability. In this chapter, we also briefly explained a
wide range of related studies to evolutionary multiobjective design of fuzzy rule-
based systems. Moreover, we pointed out some future research directions in the
field of multiobjective genetic fuzzy systems.
References
1. Abbass, H.A.: Pareto neuro-evolution: Constructing ensemble of neural networks using
multi-objective optimization. In: Proc. of CEC 2003, pp. 2074–2080 (2003)
2. Abbass, H.A.: Speeding up back-propagation using multiobjective evolutionary algo-
rithms. Neural Comput. 15, 2705–2726 (2003)
3. Abe, S.: Pattern classification: Neuro-fuzzy methods and their comparison. Springer,
London (2001)
4. Abonyi, J., Roubos, J.A., Szeifert, F.: Data-driven generation of compact, accurate, and
linguistically sound fuzzy classifiers based on a decision-tree initialization. Int. J. of
Approx. Reason 32, 1–21 (2003)
166 H. Ishibuchi and Y. Nojima
5. Adra, S.F., Griffin, I., Fleming, P.J.: A comparative study of progressive preference ar-
ticulation techniques for multiobjective optimisation. In: Obayashi, S., Deb, K., Poloni,
C., Hiroyasu, T., Murata, T. (eds.) EMO 2007. LNCS, vol. 4403, pp. 908–921. Springer,
Heidelberg (2007)
6. Agrawal, S., Dashora, Y., Tiwari, M.K., Son, Y.J.: Interactive particle swarm: A Pareto-
adaptive metaheuristic to multiobjective optimization. IEEE Trans. on Syst. Man and
Cybern. - Part A 38, 258–277 (2008)
7. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of
association rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.
(eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI, Menlo
Park (1996)
8. Alba, E., Tomassini, M.: Parallelism and evolutionary algorithms. IEEE Trans. on Evol.
Comput. 6, 443–462 (2002)
9. Alcala, R., Cano, J.R., Cordon, O., Herrera, F., Villar, P., Zwir, I.: Linguistic modeling
with hierarchical systems of weighted linguistic rules. Int. J. of Approx. Reason 32,
187–215 (2003)
10. Alcala, R., Alcala-Fdez, J., Gacto, M.J., Herrera, F.: A multi-objective genetic algo-
rithm for tuning and rule selection to obtain accurate and compact linguistic fuzzy
rule-based systems. Int. J. of Uncertain Fuzziness and Knowl-Based Syst. 15, 539–557
(2007)
11. Araujo, L.: Multiobjective genetic programming for natural language parsing and tag-
ging. In: Runarsson, T.P., Beyer, H.-G., Burke, E.K., Merelo-Guervós, J.J., Whitley,
L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 433–442. Springer, Heidelberg
(2006)
12. Babu, B.V., Chakole, P.G., Mubeen, J.H.S.: Multiobjective differential evolution
(MODE) for optimization of adiabatic styrene reactor. Chem. Eng. Sci. 60, 4822–4837
(2005)
13. Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms:
Bagging, boosting, and variants. Mach. Learn. 36, 105–139 (1999)
14. Baumgartner, U., Magele, C., Renhart, W.: Pareto optimality and particle swarm opti-
mization. IEEE Trans. on Magn. 40, 1172–1175 (2004)
15. Bayardo Jr., R.J., Agrawal, R.: Mining the most interesting rules. In: Proc. of KDD, pp.
145–153 (1999)
16. Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. Plenum
Press, New York (1981)
17. Bleuler, S., Brack, M., Thiele, L., Zitzler, E.: Multiobjective genetic programming: Re-
ducing bloat using SPEA2. In: Proc. of CEC 2001, pp. 536–543 (2001)
18. Booker, L.B., Goldberg, D.E., Holland, J.H.: Classifier systems and genetic algorithms.
Artif. Intell. 40, 235–282 (1989)
19. Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
20. Cano, J.R., Herrera, F., Lozano, M.: Stratification for scaling up evolutionary prototype
selection. Pattern Recognit. Lett. 26, 953–963 (2005)
21. Cantu-Paz, E.: Efficient and accurate parallel genetic algorithms. Springer, Berlin
(2000)
22. Casillas, J., Cordon, O., del Jesus, M.J., Herrera, F.: Genetic tuning of fuzzy rule deep
structures preserving interpretability and its interaction with fuzzy rule set reduction.
IEEE Trans. on Fuzzy Syst. 13, 13–29 (2005)
23. Casillas, J., Cordon, O., Herrera, F., Magdalena, L. (eds.): Accuracy improvements in
linguistic fuzzy modeling. Springer, Berlin (2003)
Multiobjective Genetic Fuzzy Systems 167
24. Casillas, J., Cordon, O., Herrera, F., Magdalena, L. (eds.): Interpretability issues in
fuzzy modeling. Springer, Berlin (2003)
25. Castillo, L., Gonzalez, A., Perez, R.: Including a simplicity criterion in the selection of
the best rule in a genetic fuzzy learning algorithm. Fuzzy Sets and Syst. 120, 309–321
(2001)
26. Chandra, A., Yao, X.: DIVACE: Diverse and accurate ensemble learning algorithm.
In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004, vol. 3177, pp. 619–625.
Springer, Heidelberg (2004)
27. Chandra, A., Yao, X.: Evolutionary framework for the construction of diverse hybrid
ensemble. In: Proc. of ESANN 2005, pp. 253–258 (2005)
28. Chang, Y.P., Wu, C.J.: Optimal multiobjective planning of large-scale passive harmonic
filters using hybrid differential evolution method considering parameter and loading
uncertainty. IEEE Trans. on Power Deliv. 20, 408–416 (2005)
29. Chen, L.H., Chiang, C.H.: An intelligent control system with a multi-objective self-
exploration process. Fuzzy Sets and Syst. 143, 275–294 (2004)
30. Cherkassky, V., Mulier, F.: Learning from data: Concepts, theory, and methods. John
Wiley & Sons, New York (1998)
31. Coello, C.A.C.: Handling preferences in evolutionary multiobjective optimization: A
survey. In: Proc. of CEC 2000, pp. 30–37 (2000)
32. Coello, C.A.C., Lamont, G.B.: Applications of multi-objective evolutionary algorithms.
World Scientific, Singapore (2004)
33. Coello, C.A.C., Pulido, G.T., Lechuga, M.S.: Handling multiple objectives with particle
swarm optimization. IEEE Trans. on Evol. Comput. 8, 256–279 (2004)
34. Coello, C.A.C., van Veldhuizen, D.A., Lamont, G.B.: Evolutionary algorithms for solv-
ing multi-objective problems. Kluwer Academic Publishers, Boston (2002)
35. Cordon, O., del Jesus, M.J., Herrera, F., Lozano, M.: MOGUL: A methodology to ob-
tain genetic fuzzy rule-based systems under the iterative rule learning approach. Int. J.
of Intell. Syst. 14, 1123–1153 (1999)
36. Cordon, O., Gomide, F., Herrera, F., Hoffmann, F., Magdalena, L.: Ten years of genetic
fuzzy systems: Current framework and new trends. Fuzzy Sets and Syst. 141, 5–31
(2004)
37. Cordon, O., Herrera, F., Hoffmann, F., Magdalena, L.: Genetic Fuzzy Systems. World
Scientific, Singapore (2001)
38. Cordon, O., Herrera-Viedma, E., Luque, M.: Evolutionary learning of Boolean queries
by multiobjective genetic programming. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-
G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439,
pp. 710–719. Springer, Heidelberg (2002)
39. Cordon, O., Jesus, M.J.D., Herrera, F., Magdalena, L., Villar, P.: A multiobjective ge-
netic learning process for joint feature selection and granularity and contexts learning
in fuzzy rule-based classification systems. In: Casillas, J., Cordon, O., Herrera, F., Mag-
dalena, L. (eds.) Interpretability issues in fuzzy modeling, pp. 79–99. Springer, Berlin
(2003)
40. Cvetkovic, D., Parmee, I.C.: Preferences and their application in evolutionary multiob-
jective optimization. IEEE Trans. on Evol. Comput. 6, 42–57 (2002)
41. Deb, K.: Multi-objective optimization using evolutionary algorithms. John Wiley &
Sons, Chichester (2001)
42. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic
algorithm: NSGA-II. IEEE Trans. on Evol. Comput. 6, 182–197 (2002)
168 H. Ishibuchi and Y. Nojima
43. Deb, K., Sundar, J.: Reference point based multi-objective optimization using evolu-
tionary algorithms. In: Proc. of GECCO 2006, pp. 635–642 (2006)
44. Dietterich, T.G.: An experimental comparison of three methods for constructing en-
sembles of decision trees: Bagging, boosting, and randomization. Mach. Learn. 40,
139–157 (2000)
45. Emmanouilidis, C., Hunter, A., MacIntyre, J.: A multiobjective evolutionary setting for
feature selection and acommonality-based crossover operator. In: Proc. of CEC 2000,
pp. 309–316 (2000)
46. Feldman, D.S.: Fuzzy network synthesis with genetic algorithms. In: Proc. of 5th ICGA,
pp. 312–317 (1995)
47. da Fonseca, V.G., Fonseca, C.M., Hall, A.O.: Inferential performance assessment of
stochastic optimizers and the attainment function. In: Zitzler, E., Deb, K., Thiele, L.,
Coello Coello, C.A., Corne, D.W. (eds.) EMO 2001. LNCS, vol. 1993, pp. 213–225.
Springer, Heidelberg (2001)
48. Freitas, A.: Data mining and knowledge discovery with evolutionary algorithms.
Springer, Berlin (2002)
49. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and
an application to boosting. J. of Comput. and Syst. Sci. 55, 119–139 (1997)
50. Funahashi, K.: On the approximate realization of continuous-mappings by neural net-
works. Neural Netw. 2, 83–192 (1989)
51. Ghosh, A., Dehuri, S., Ghosh, S. (eds.): Multi-objective evolutionary algorithms for
knowledge discovery from databases. Springer, Berlin (2008)
52. Ghosh, A., Nath, B.T.: Multi-objective rule mining using genetic algorithms. Inf.
Sci. 163, 123–133 (2004)
53. Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning.
Addison-Wesley, Reading (1989)
54. Gomez-Skarmeta, A.F., Jimenez, F., Sanchez, G.: Improving interpretability in ap-
proximative fuzzy models via multiobjective evolutionary algorithms. Int. J. of Intell.
Syst. 22, 943–969 (2007)
55. Gonzalez, A., Perez, R.: SLAVE: A genetic learning system based on an iterative ap-
proach. IEEE Trans. on Fuzzy Syst. 7, 176–191 (1999)
56. Gonzalez, J., Rojas, I., Ortega, J., Pomares, H., Fernandez, F.J., Diaz, A.F.: Multiob-
jective evolutionary optimization of the size, shape, and position parameters of radial
basis function networks for function approximation. IEEE Trans. on Neural Netw. 14,
1478–1495 (2003)
57. Gonzalez, J., Rojas, I., Pomares, H., Herrera, L.J., Guillen, A., Palomares, J.M., Ro-
jas, F.: Improving the accuracy while preserving the interpretability of fuzzy function
approximators by means of multi-objective evolutionary algorithms. Int. J. of Approx.
Reason 44, 32–44 (2007)
58. Guillaume, S.: Designing fuzzy inference systems from data: An interpretability-
oriented review. IEEE Trans. on Fuzzy Syst. 9, 426–443 (2001)
59. Handl, J., Knowles, J.: Evolutionary multiobjective clustering. In: Yao, X., Burke,
E.K., Lozano, J.A., Smith, J., Merelo-Guervós, J.J., Bullinaria, J.A., Rowe, J.E., Tiňo,
P., Kabán, A., Schwefel, H.-P. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 1081–1091.
Springer, Heidelberg (2004)
60. Handl, J., Knowles, J.: Multiobjective clustering around medoids. In: Proc. of CEC
2005, pp. 632–639 (2005)
61. Handl, J., Knowles, J.: Improving the scalability of multiobjective clustering. In: Proc.
of CEC 2005, pp. 2372–2379 (2005)
Multiobjective Genetic Fuzzy Systems 169
62. Handl, J., Knowles, J.: Exploiting the trade-off - The benefits of multiple objectives in
data clustering. In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E. (eds.) EMO
2005. LNCS, vol. 3410, pp. 547–560. Springer, Heidelberg (2005)
63. Herrera, F.: Genetic fuzzy systems: Status, critical considerations and future directions.
Int. J. of Comput. Intell. Res. 1, 59–67 (2005)
64. Ho, S.L., Yang, S.Y., Ni, G.Z., Lo, E.W.C., Wong, H.C.: A particle swarm optimization-
based method for multiobjective design optimizations. IEEE Trans. on Magn. 41, 1756–
1759 (2005)
65. Hong, T.P., Kuo, C.S., Chi, S.C.: Trade-off between computation time and number of
rules for fuzzy mining from quantitative data. Int. J. of Uncertain, Fuzziness and Knowl-
Based Syst. 9, 587–604 (2001)
66. Horikawa, S., Furuhashi, T., Uchikawa, Y.: On fuzzy modeling using fuzzy neural net-
works with the back-propagation algorithm. IEEE Trans. on Neural Netw. 3, 801–806
(1993)
67. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal
approximators. Neural Netw. 2, 359–366 (1989)
68. Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural
Netw. 4, 251–257 (1991)
69. Hu, Y.C., Chen, R.S., Tzeng, G.H.: Finding fuzzy classification rules using data mining
techniques. Pattern Recognit. Lett. 24, 509–519 (2003)
70. Hughes, E.J.: Evolutionary many-objective optimization: many once or one many? In:
Proc. of CEC 2005, pp. 222–227 (2005)
71. de la Iglesia, B., Philpott, M.S., Bagnall, A.J., Rayward-Smith, V.J.: Data mining rules
using multi-objective evolutionary algorithms. In: Proc. of CEC 2003, pp. 1552–1559
(2003)
72. de la Iglesia, B., Reynolds, A., Rayward-Smith, V.J.: Developments on a multi-objective
metaheuristic (MOMH) algorithm for finding interesting sets of classification rules.
In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS,
vol. 3410, pp. 826–840. Springer, Heidelberg (2005)
73. de la Iglesia, B., Richards, G., Philpott, M.S., Rayward-Smith, V.J.: The application
and effectiveness of a multi-objective metaheuristic algorithm for partial classification.
Europ. J. of Oper. Res. 169, 898–917 (2006)
74. Ishibuchi, H., Kuwajima, I., Nojima, Y.: Relation between Pareto-optimal fuzzy rules
and Pareto-optimal fuzzy rule sets. In: Proc of IEEE MCDM 2007, pp. 42–49 (2007)
75. Ishibuchi, H., Murata, T., Turksen, I.B.: Selecting linguistic classification rules by two-
objective genetic algorithms. In: Proc. of SMC 1995, pp. 1410–1415 (1995)
76. Ishibuchi, H., Murata, T., Turksen, I.B.: Single-objective and two-objective genetic al-
gorithms for selecting linguistic rules for pattern classification problems. Fuzzy Sets
and Syst. 89, 135–150 (1997)
77. Ishibuchi, H., Nakashima, T.: Improving the performance of fuzzy classifier systems
for pattern classification problems with continuous attributes. IEEE Trans. on Ind. Elec-
tron 46, 157–168 (1999)
78. Ishibuchi, H., Nakashima, T., Murata, T.: Performance evaluation of fuzzy classifier
systems for multi-dimensional pattern classification problems. IEEE Trans. on Syst.
Man and Cybern. - Part B 29, 601–618 (1999)
79. Ishibuchi, H., Nakashima, T., Murata, T.: Three-objective genetics-based machine
learning for linguistic rule extraction. Inf. Sci. 136, 109–133 (2001)
80. Ishibuchi, H., Nakashima, T., Nii, M.: Classification and modeling with linguistic in-
formation granules: Advanced approaches to linguistic data mining. Springer, Berlin
(2004)
170 H. Ishibuchi and Y. Nojima
81. Ishibuchi, H., Namba, S.: Evolutionary multiobjective knowledge extraction for high-
dimensional pattern classification problems. In: Yao, X., Burke, E.K., Lozano, J.A.,
Smith, J., Merelo-Guervós, J.J., Bullinaria, J.A., Rowe, J.E., Tiňo, P., Kabán, A.,
Schwefel, H.-P. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 1123–1132. Springer, Hei-
delberg (2004)
82. Ishibuchi, H., Nojima, Y.: Evolutionary multiobjective optimization for the design of
fuzzy rule-based ensemble classifiers. Int. J. of Hybrid Intell. Syst. 3, 129–145 (2006)
83. Ishibuchi, H., Nojima, Y., Kuwajima, I.: Finding simple fuzzy classification systems
with high interpretability through multiobjective rule selection. In: Gabrys, B., Howlett,
R.J., Jain, L.C. (eds.) KES 2006. LNCS, vol. 4252, pp. 86–93. Springer, Heidelberg
(2006)
84. Ishibuchi, H., Nojima, Y.: Optimization of scalarizing functions through evolutionary
multiobjective optimization. In: Obayashi, S., Deb, K., Poloni, C., Hiroyasu, T., Murata,
T. (eds.) EMO 2007. LNCS, vol. 4403, pp. 51–65. Springer, Heidelberg (2007)
85. Ishibuchi, H., Nojima, Y.: Analysis of interpretability-accuracy tradeoff of fuzzy sys-
tems by multiobjective fuzzy genetics-based machine learning. Int. J. of Approx. Rea-
son 44, 4–31 (2007)
86. Ishibuchi, H., Nozaki, K., Yamamoto, N., Tanaka, H.: Selecting fuzzy if-then rules for
classification problems using genetic algorithms. IEEE Trans. on Fuzzy Syst. 3, 260–
270 (1995)
87. Ishibuchi, H., Tsukamoto, N., Nojima, Y.: Evolutionary Many-Objective Optimization:
A short Review. In: Proc. of CEC 2008, pp. 2424–2431 (2008)
88. Ishibuchi, H., Yamamoto, T.: Evolutionary multiobjective optimization for generating
an ensemble of fuzzy rule-based classifiers. In: Cantú-Paz, E., Foster, J.A., Deb, K.,
Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman,
M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska,
N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2723, pp. 1077–1088.
Springer, Heidelberg (2003)
89. Ishibuchi, H., Yamamoto, T.: Fuzzy rule selection by multi-objective genetic lo-
cal search algorithms and rule evaluation measures in data mining. Fuzzy Sets and
Syst. 141, 59–88 (2004)
90. Ishibuchi, H., Yamamoto, T.: Rule weight specification in fuzzy rule-based classifica-
tion systems. IEEE Trans. on Fuzzy Syst. 13, 428–435 (2005)
91. Ishibuchi, H., Yamamoto, T., Nakashima, T.: Hybridization of fuzzy GBML approaches
for pattern classification problems. IEEE Trans. on Syst. Man and Cybern. - Part B 35,
359–365 (2005)
92. Jang, J.S.R.: ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. on
Syst. Man and Cybern. 23, 665–685 (1993)
93. Jaszkiewicz, A.: On the computational efficiency of multiple objective metaheuristics:
The knapsack problem case study. Europ. J. of Oper. Res. 158, 418–433 (2004)
94. Jiménez, F., Gómez-Skarmeta, A.F., Roubos, H., Babuska, R.: Accurate, transparent,
and compact fuzzy models for function approximation and dynamic modeling through
multi-objective evolutionary optimization. In: Zitzler, E., Deb, K., Thiele, L., Coello
Coello, C.A., Corne, D.W. (eds.) EMO 2001. LNCS, vol. 1993, pp. 653–667. Springer,
Heidelberg (2001)
95. Jin, Y.: Fuzzy modeling of high-dimensional systems: Complexity reduction and inter-
pretability improvement. IEEE Trans. on Fuzzy Syst. 8, 212–221 (2000)
96. Jin, Y. (ed.): Knowledge incorporation in evolutionary computation. Springer, Berlin
(2005)
97. Jin, Y. (ed.): Multi-objective machine learning. Springer, Berlin (2006)
Multiobjective Genetic Fuzzy Systems 171
98. Jin, Y., Okabe, T., Sendhoff, B.: Neural network regularization and ensembling using
multi-objective evolutionary algorithms. In: Proc. of CEC 2004, pp. 1–8 (2004)
99. Jin, Y., Sendhoff, B., Koerner, E.: Evolutionary multi-objective optimization for simul-
taneous generation of signal-type and symbol-type representations. In: Coello Coello,
C.A., Hernández Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 752–
766. Springer, Heidelberg (2005)
100. Jin, Y., von Seelen, W., Sendhoff, B.: On generating FC3 fuzzy rule systems from data
using evolution strategies. IEEE Trans. on Syst. Man and Cybern. - Part B 29, 829–845
(1999)
101. Karr, C.L.: Design of an adaptive fuzzy logic controller using a genetic algorithm. In:
Proc. of 4th ICGA, pp. 450–457 (1991)
102. Karr, C.L., Gentry, E.J.: Fuzzy control of pH using genetic algorithms. IEEE Trans. on
Fuzzy Syst. 1, 46–53 (1993)
103. Kaya, M.: Multi-objective genetic algorithm based approaches for mining optimized
fuzzy association rules. Soft Comput. 10, 578–586 (2006)
104. Khara, V., Yao, X., Deb, K.: Performance scaling of multi-objective evolutionary al-
gorithms. In: Fonseca, C.M., Fleming, P.J., Zitzler, E., Deb, K., Thiele, L. (eds.) EMO
2003, vol. 2632, pp. 376–390. Springer, Heidelberg (2003)
105. Kim, H.S., Roschke, P.N.: Fuzzy control of base-isolation system using multi-objective
genetic algorithm. Comput-Aided Civil and Infrast. Eng. 21, 436–449 (2006)
106. Kosko, B.: Fuzzy systems as universal approximators. In: Proc of FUZZ-IEEE 1992,
pp. 1153–1162 (1992)
107. Kupinski, M.A., Anastasio, M.A.: Multiobjective genetic optimization of diagnostic
classifiers with implications for generating receiver operating characteristic curve. IEEE
Trans. on Med. Imaging 18, 675–685 (1999)
108. Li, H., Zhang, Q.F.: A multiobjective differential evolution based on decomposition for
multiobjective optimization with variable linkages. In: Runarsson, T.P., Beyer, H.-G.,
Burke, E.K., Merelo-Guervós, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS,
vol. 4193, pp. 583–592. Springer, Heidelberg (2006)
109. Liu, H., Motoda, H.: Feature selection for knowledge discovery and data mining.
Kluwer Academic Publishers, Norwell (1998)
110. Llora, X., Goldberg, D.E.: Bounding the effect of noise in multiobjective learning clas-
sifier systems. Evol. Comput. 11, 278–297 (2003)
111. Mendel, J.M.: Fuzzy-logic systems for engineering - A tutorial. In: Proceedings of
IEEE, vol. 83, pp. 345–377 (1995)
112. Miettinen, K.: Nonlinear multiobjective optimization. Kluwer, Boston (1998)
113. Mikut, R., Jakel, J., Groll, L.: Interpretability issues in data-based learning of fuzzy
systems. Fuzzy Sets and Syst. 150, 179–197 (2005)
114. Muhlenbein, H., Schomisch, M., Born, J.: The parallel genetic algorithm as function
optimizer. Parallel. Comput. 17, 619–632 (1991)
115. Nauck, D., Klawonn, F., Kruse, R.: Foundations of neuro-fuzzy systems. John Wiley &
Sons, New York (1997)
116. Nauck, D., Kruse, R.: Obtaining interpretable fuzzy classification rules from medical
data. Artif. Intell. in Med. 16, 149–169 (1999)
117. Nojima, Y., Ishibuchi, H.: Genetic rule selection with a multi-classifier coding scheme
for ensemble classifier design. Int. J. of Hybrid Intell. Syst. 4, 157–169 (2007)
118. Nojima, Y., Ishibuchi, H., Kuwajima, I.: Parallel distributed genetic fuzzy rule selection.
Soft Comput. (2008) (in press)
119. Oliveira, L.S., Morita, M., Sabourin, R.: Multi-objective genetic algorithms to create
ensemble of classifiers. In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E.
(eds.) EMO 2005. LNCS, vol. 3410, pp. 592–606. Springer, Heidelberg (2005)
172 H. Ishibuchi and Y. Nojima
120. Oliveira, L.S., Sabourin, R., Bortolozzi, F., Suen, C.Y.: Feature selection using multi-
objective genetic algorithms for handwritten digit recognition. In: Proc. of ICPR 2002,
pp. 568–571 (2002)
121. Oliveira, L.S., Sabourin, R., Bortolozzi, F., Suen, C.Y.: A methodology for feature se-
lection using multi-objective genetic algorithms for handwritten digit string recogni-
tion. Int. J. of Pattern Recognit. and Artif. Intell. 17, 903–930 (2003)
122. Oliveira, L.S., Sabourin, R., Bortolozzi, F., Suen, C.Y.: Feature selection for ensembles:
A hierarchical multi-objective genetic algorithm approach. In: Proc. of ICDAR 2003,
pp. 676–680 (2003)
123. Parodi, A., Bonelli, P.: A new approach to fuzzy classifier systems. In: Proc. of 5th
ICGA, pp. 223–230 (1993)
124. Pulkkinen, P., Koivisto, H.: Fuzzy classifier identification using decision tree and mul-
tiobjective evolutionary algorithms. Int. J. of Approx. Reason 48, 526–543 (2008)
125. Purshouse, R.C., Fleming, P.J.: Evolutionary many-objective optimization: An ex-
ploratory analysis. In: Proc. of CEC 2003, pp. 2066–2073 (2003)
126. Reynolds, A., de la Iglesia, B.: Rule induction using multi-objective metaheuristics:
Encouraging rule diversity. In: Proc of IJCNN 2006, pp. 6375–6382 (2006)
127. Reynolds, A.P., de la Iglesia, B.: Rule induction for classification using multi-objective
genetic programming. In: Obayashi, S., Deb, K., Poloni, C., Hiroyasu, T., Murata, T.
(eds.) EMO 2007. LNCS, vol. 4403, pp. 516–530. Springer, Heidelberg (2007)
128. Robic, T., Filipic, B.: DEMO: Differential evolution for multiobjective optimization.
In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS,
vol. 3410, pp. 520–533. Springer, Heidelberg (2005)
129. Rodriguez-Vazquez, K., Fonseca, C.M., Fleming, P.J.: Multiobjective genetic program-
ming: A nonlinear system identification application. In: Proc. of GP-97LB, pp. 207–212
(1997)
130. Roubos, H., Setnes, M.: Compact and transparent fuzzy models and classifiers through
iterative complexity reduction. IEEE Trans. on Fuzzy Syst. 9, 516–524 (2001)
131. Rumelhart, D.E., McClelland, J.L., PDP Research Group: Parellel distributed process-
ing. MIT Press, Cambridge (1986)
132. Setnes, M., Babuska, R., Verbruggen, B.: Rule-based modeling: Precision and trans-
parency. IEEE Trans. on Syst. Man and Cybern. - Part C 28, 165–169 (1998)
133. Setnes, M., Roubos, H.: GA-based modeling and classification: Complexity and perfor-
mance. IEEE Trans. on Fuzzy Syst. 8, 509–522 (2000)
134. Setzkorn, C., Paton, R.C.: On the use of multi-objective evolutionary algorithms for the
induction of fuzzy classification rule systems. BioSyst. 81, 101–112 (2005)
135. Stewart, P., Stone, D.A., Fleming, P.J.: Design of robust fuzzy-logic control systems by
multi-objective evolutionary methods with hardware in the loop. Eng. Appl. of Artif.
Intell. 17, 275–284 (2004)
136. Takagi, T., Sugeno, M.: Fuzzy identification of systems and its applications to modeling
and control. IEEE Trans. on Syst. Man and Cybern. 15, 116–132 (1985)
137. Thrift, P.: Fuzzy logic synthesis with genetic algorithms. In: Proc. of 4th ICGA, pp.
509–513 (1991)
138. Tsang, C.H., Kwong, S., Wang, H.L.: Genetic-fuzzy rule mining approach and evalu-
ation of feature selection techniques for anomaly intrusion detection. Pattern Recog-
nit. 40, 2373–2391 (2007)
139. Valenzuela-Rendon, M.: The fuzzy classifier system: A classifier system for continu-
ously varying variables. In: Proc. of 4th ICGA, pp. 346–353 (1991)
140. Wang, H., Kwong, S., Jin, Y., Wei, W., Man, K.F.: Agent-based evolutionary approach
for interpretable rule-based knowledge extraction. IEEE Trans. on Syst. Man and Cy-
bern. - Part C 35, 143–155 (2005)
Multiobjective Genetic Fuzzy Systems 173
141. Wang, H., Kwong, S., Jin, Y., Wei, W., Man, K.F.: Multi-objective hierarchical ge-
netic algorithm for interpretable fuzzy rule-based knowledge extraction. Fuzzy Sets
and Syst. 149, 149–186 (2005)
142. Wang, L.X., Mendel, J.M.: Generating fuzzy rules by learning from examples. IEEE
Trans. on Syst. Man and Cybern. 22, 1414–1427 (1992)
143. Wang, L.X., Mendel, J.M.: Fuzzy basis functions, universal approximation, and orthog-
onal least-squares learning. IEEE Trans. on Neural Netw. 3, 807–814 (1992)
144. Xing, Z.Y., Zhang, Y., Hou, Y.L., Jia, L.M.: On generating fuzzy systems based on
Pareto multi-objective cooperative coevolutionary algorithm. Int. J. of Control Autom.
and Syst. 5, 444–455 (2007)
145. Yen, J., Wang, L., Gillespie, G.W.: Improving the interpretability of TSK fuzzy models
by combining global learning and local learning. IEEE Trans. on Fuzzy Syst. 6, 530–
537 (1998)
146. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the strength Pareto evolu-
tionary algorithms. TIK-Report 103, Computer Engineering and Networks Laboratory
(TIK), Swiss Federal Institute of Technology (ETH) Zurich (2001)
147. Zitzler, E., Thiele, L.: Multiobjective evolutionary algorithms: A comparative case
study and the strength Pareto approach. IEEE Trans. on Evol. Comput. 3, 257–271
(1999)
148. Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C.M., da Fonseca, V.G.: Performance
assessment of multiobjective optimizers: an analysis and review. IEEE Trans. on Evol.
Comput. 7, 117–132 (2003)
Part III
Adaptive Solution Schemes
Exploring Hyper-heuristic Methodologies with
Genetic Programming
1 Introduction
Heuristics for search problems can be thought of as “rules of thumb” for algorithmic
problem solving [53]. They are not guaranteed to produce optimal solutions, rather,
Edmund K. Burke, Mathew R. Hyde, Graham Kendall, Gabriela Ochoa, and Ender Ozcan
University of Nottingham, School of Computer Science, Jubilee Campus, Wollaton Road
Nottingham, NG8 1BB, UK
e-mail: {ekb,mrh,gxk,gxo,exo}@cs.nott.ac.uk
John R. Woodward
University of Nottingham, 199 Taikang East Road, Ningbo 315100, China
e-mail: [email protected]
Corresponding author.
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 177–201.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
178 E.K. Burke et al.
the goal is to quickly generate good quality solutions. They are often used when
exact methods are unable to be employed in a feasible amount of time. Genetic
Programming is a method of searching a space of computer programs, and therefore
is an automatic way of producing programs. This chapter looks at the use of Genetic
Programming to automatically generate heuristics for a given problem domain. A
knowledge of Genetic Programming is assumed, and while a brief introduction is
given, readers unfamiliar with the methodology are referred to suitable tutorials and
textbooks.
1.2 Hyper-heuristics
The main feature of hyper-heuristics is that they search a space of heuristics rather
than a space of solutions directly. In this sense, they differ from most applications
of meta-heuristics, although, of course, meta-heuristics can be (and have been) used
as hyper-heuristics. The motivation behind hyper-heuristics is to raise the level of
generality at which search methodologies operate. Introductions to hyper-heuristics
can be found in [9, 53].
An important (very well known) observation which guides much hyper-heuristic
research is that different heuristics have different strengths and weaknesses. A key
idea is to use members of a set of known and reasonably understood heuristics to
Exploring Hyper-heuristic Methodologies with Genetic Programming 179
either: (i) transform the state of a problem (in a constructive strategy), or (ii) perform
an improvement step (in a perturbative strategy). Such hyper-heuristics have been
successfully applied to bin-packing [54], personnel scheduling [13, 17], timetabling
[1, 13, 14, 15, 59], production scheduling [61], vehicle routing problems [52], and
cutting stock [58]. Most of the hyper-heuristic approaches incorporate a learning
mechanism to assist the selection of low-level heuristics during the solution pro-
cess. Several learning strategies have been studied such as reinforcement learning
[17, 46], Bayesian learning [45], learning classifier systems [54], and case based
reasoning [15]. Several meta-heuristics have been applied as search methodologies
upon the heuristic search space. Examples are tabu search [13, 14], genetic algo-
rithms [23, 29, 58, 59, 61], and simulated annealing [4, 18, 52]. This chapter fo-
cusses on Genetic Programming as a hyper-heuristic for generating heuristics, given
a set of heuristic components.
There are numerous tutorials, introductory articles and text books on Genetic Pro-
gramming. See the series of books by Koza [38, 39, 40, 41] and the book by Banzhaf
et al. [5]. Also [42] and [50] are more recent introductory texts. Introductory articles
can also be found in most current textbooks on machine learning.
Genetic Programming can be employed as a hyper-heuristic. It can operate on
a set of terminals and functions at the meta-level. Figure 1(a) shows a standard
hyper-heuristic framework presented in [9, 17]. Figure 1(b) shows how Genetic
Programming might be employed in this capacity. The base-level of a Genetic
Programming hyper-heuristic includes the concrete functions and terminals asso-
ciated with the problem. Across the domain barrier, abstract functions and termi-
nals in the meta-level can be mapped to concrete functions and terminals in the
base-level.
Hyper-heuristic
Domain Barier
H1 Hn Representation,
evaluation function,
Problem Domain initial solution(s), etc.
(a)
Domain Barier
fk Tl fk
function set={f1, …, fk},
terminal set={T1,…, Tl},
fitness function,
T3 T1 T7 initial solutions, etc.
Problem Domain
In this section, we examine a number of issues concerning the use and suitability
of Genetic Programming to generate heuristics. A fundamental point concerning
the scalability of this method is stated. As this methodology borrows ideas from
human designed heuristics, which are then used as primitives to construct the search
space of Genetic Programming, we are then in the enviable position of being able to
guarantee heuristics which perform at least as good as human designed heuristics.
Finally, we outline the basic approach to using Genetic Programming to generate
heuristics.
3 Case Studies
We examine two examples in detail in order to illustrate the basic methodology (gen-
erating heuristics for Boolean satisfiability and online bin packing). In both cases,
184 E.K. Burke et al.
we describe the problem, a number of currently used human created heuristics, and
some design questions about using Genetic Programming to generate heuristics.
In the first example, evolving a local search heuristic for the Boolean satisfiability
problem, a number of the design decisions (e.g. what variables are needed to ex-
press the problem, and a framework in which to express possible heuristics) seem
reasonably straightforward, as similar choices were made by two independent au-
thors [3, 25]. In the second example, these choices appear to be a little more difficult.
The aim of this section, therefore, is to take the reader step by step through the pro-
cess and raise a number of issues that will arise during the steps needed to apply
Genetic Programming to generate heuristics. These domains have been chosen as
they are well known problems, which both have published results concerning the
automatic generation of heuristics.
Fukunaga [24] lists a number of well known local search heuristics which have been
proposed in the SAT literature.
• GSAT selects a variable from the formula with the highest net gain. Ties are
decided randomly.
• HSAT is the same as GSAT, but it decides ties in favour of maximum age, where
age of a variable indicates the total number of bit-flips from the time when a
variable was last inverted.
• GWSAT(p) (also known as “GSAT with random walk”) randomly selects a vari-
able with probability p in a randomly selected broken clause; otherwise, it is the
same as GSAT.
• Walksat(p) picks a broken clause, and if any variable in the clause has a negative
gain of 0, then it selects one of these to be flipped. Otherwise, it selects a ran-
dom variable with probability p in the clause to flip, and selects a variable with
Exploring Hyper-heuristic Methodologies with Genetic Programming 185
probability (1 − p) in the clause with minimal negative gain (breaking ties ran-
domly). Otherwise, it selects a random variable with probability p in the clause
to flip, and selects a variable with probability (1 − p) in the clause with minimal
negative gain (breaking ties randomly).
Other heuristics, such as, Novelty, Novelty+ and R-Novelty are also discussed
in [24].
Fukunaga [24] first examines the original local search heuristic GSAT, and also its
many variants (including GSAT with random walk, and Walksat). Then, a template
is identified which succinctly describes the most widely used SAT local search al-
gorithms. This framework is also adopted by Bader-El-Den and Poli [3]. In this
template, the set of Boolean variables are initially given random truth assignments.
Repeatedly, a variable is chosen according to a variable selection heuristic and its
value is inverted. This new assignment of values is then tested to see if it satisfies the
Boolean expression. This is repeated until some cut off counter is reached. Notice
that in this framework, only a single Boolean variable is selected to be inverted. An
interesting alternative would be for the variable selection heuristic to return, not a
single variable, but a subset of variables.
Some heuristics are hybrid, in the sense that they are a combination of two existing
heuristics. The “composition” (or blending) of two heuristics is achieved by first
testing to determine if a condition is true, then if the test is passed apply heuristic1
else apply heuristic2. This composition operator therefore gives us a way to com-
bine already existing heuristics. An example of the testing condition may simply
186 E.K. Burke et al.
1 n f2
f4
T9 T1 f3 f3
Evolution
T5 T3 T4 T2
Domain Barier
be “(random number ≤ 0.2)”. Having identified a template for local search and a
method of identifying the utility of inverting a given variable, Fukunaga then de-
fined a language in which most of the previously human designed heuristics can be
described, but more importantly, it can also be used to describe new novel heuristics.
For a complete list of functions see [25].
The fitness function works as follows. First, the heuristic is tested on 200 problem
instances consisting of 50 variables and 215 clauses. The heuristic is allowed 5000
inversions of the Boolean variables. If more than 130 of these local searches were
successful, then the heuristic is run on 400 problem instances consisting of 100
variables with 430 clauses. The heuristic is allowed 20000 inversions of the Boolean
variables. The idea of using smaller and larger problems, is that poor candidate
heuristics can be culled early on (very much like brood selection, reported in [5]).
heuristics. It should be noted that the fitness function takes a large number of
parameters, and reasonable values for these should be arrived at with a little
experimentation.
A number of examples of heuristics used in the online bin packing problem are
described below: In each case, if the item under consideration does not fit into an
existing bin, then the item is placed in a new bin.
• Best-Fit [51]. Puts the item in the fullest bin which can accommodate it.
• Worst-Fit [16]. Puts the item in the emptiest bin which can accommodate it.
• Almost-Worst-Fit [16]. Puts the item in the second emptiest bin.
• Next-Fit [36]. Puts the item in the last available bin.
• First-Fit [36]. Puts the item in the left-most bin.
It should, of course, be noted that this list of heuristics is not exhaustive. The
selection is simply intended to illustrate some of the currently available heuristics,
and provide a background against which we can build a framework. The reader is
referred to the following article if they are particularly interested in the domain of
online bin packing . Here the HARMONIC algorithms are discussed (which in-
clude HARMONIC, REFINED HARMONIC, MODIFIED HARMONIC, MODI-
FIED HARMONIC 2, and HARMONIC+1). All of these algorithms are shown to
be instances of a general class of algorithm, which they call SUPER HARMONIC.
In [10, 11, 12], heuristics are evolved for the online bin packing problem. In the first
paper [10], a number of existing heuristics are listed. Interestingly these heuristics
do not fit neatly into a single framework. In this paper, the decision was made to
apply the evolved heuristic to the bins and place the current item under considera-
tion, into the first bin which receives a positive score. Using this method of applying
heuristics to problem instances, a heuristic equivalent to the “first-fit” heuristic was
evolved. The “first-fit” heuristic places an item in the first bin into which it fits (the
order of the bins being the order in which they were opened). In this framework, the
188 E.K. Burke et al.
Best-Fit → 1
2 Worst-Fit → 2
Almost-Worst-Fit → 3
Next-Fit → 4
First-Fit → 0
0 1 2 3 4 5
3 2 7 6 4
15
Fig. 3 The figure shows the chosen bin for a number of heuristics. The bin capacity is 15,
and the space remaining in the open bins (in order of index 0, 1, 2, 3, 4, 5) is 3, 2, 7, 6, 4, 15.
The current item to be packed has size 2. “Best-fit”, for example would place the item in bin
1, leaving no space remaining. “First-fit”, for example would place the item in bin 0, leaving
1 unit of space
heuristic may not be evaluated on all of the bins when an item is being placed (i.e.
only the bins up until the bin that receives a positive score will be considered).
In [11], it was decided that the heuristic would be evaluated on all the open bins,
and the item placed in the bin that receives the maximum score. This has the ad-
vantage that the heuristic is allowed to examine all of the bins (and, therefore, has
more information available to it to make a potentially better decision). It also has the
Exploring Hyper-heuristic Methodologies with Genetic Programming 189
disadvantage that it will take longer on average to apply, as it will, in general, ex-
amine more bins (though this aspect of the evolved heuristic’s performance was not
studied). In this framework, heuristics were evolved which outperformed the human
designed heuristic “best-fit”. The “best-fit” heuristic places an item in the bin which
has the least space remaining when the item is placed in the bin (i.e. it fits best in
that bin).
The two search spaces created by these frameworks are very different. In the first
case, the “first-fit” heuristic can be expressed, but “best-fit” cannot. In the second
case, the “best-fit” heuristics can be expressed but “first-fit” cannot. The first frame-
work cannot express “best-fit”, as not all of the bins may be examined. That is, the
evaluation of the heuristic is terminated as soon as a positive score is obtained. The
second framework cannot express “first-fit” as a bin which receives a larger score
may exist after one which receives a positive score. That is, an earlier bin may re-
ceive a smaller positive score, but this is overridden when a larger score is obtained.
Further effort could be put into constructing a more general framework in which
both of these heuristics could be expressed.
So far, just two frameworks have been considered which could be used to ap-
ply heuristics to the online bin packing problem. There are many different ways a
heuristic could be applied.
• They can differ in the order in which the bins are examined. For example, left to
right, right to left, or some other order.
• They can differ in the order we start to examine the bins. For example, start at a
random bin and cycle through the bins until each bin has been examined, or start
at some distance from the last bin that received an item.
• They can differ in the score used to decide which bin is employed. For example,
place the item in the bin which got the second highest score, or alternatively place
the item in the bin which gets the maximum then the next item in the bin that gets
the minimum score; in effect we are switching between two placement strategies.
190 E.K. Burke et al.
There is also the question of where to place an item when there is a draw between
two bins (e.g. the item could be placed in a fresh bin, or it could be placed in a bin
using an existing human designed heuristic). The point is that there are plenty of
opportunities to design different ways of applying heuristic selection functions to
a problem domain. Therefore, instead of presenting Genetic Programming with a
single framework, it is possible to widen this and allow a number (or combination)
of different frameworks for Genetic Programming to explore. One interesting way
to tackle this would be to cooperatively co-evolve a population of heuristics and the
frameworks in which they are applied.
It is also worthwhile pointing out that a heuristic evolved under one framework
is unlikely to perform well under another framework, so a heuristic really consists
of two parts; the heuristic function and the framework describing how the heuristic
is applied to a problem instance. In Genetic Programming, we are usually just in-
terested in the function represented by a program, and the program does not need a
context (e.g. in the case of evolving electrical circuits, the program is the solution).
However, if we are evolving heuristics, we need to provide a context or method
of applying the Genetic Programming-program. This additional stage introduces a
considerable difference.
The question of which variables to use to describe the state of a problem instance is
also important, as these will form some of the “terminals” used in Genetic Program-
ming. In the first stages of this work, the authors used the following variables; S the
size of the current item, C the capacity of a bin (this is a constant for the problem)
and, F the fullness of a bin (i.e. what is the total cost of all of the items occupying
that bin).
It was later decided that three variables could be replaced by two; S the size of
the current item and, E (= C − F) the emptiness of a bin (i.e. how much space is
remaining in the bin, or how much more cost can be allocated to it before it ex-
ceeds its capacity). These two variables are not as expressive as the first set, but are
expressive enough to produce human competitive heuristics. The argument is that
it is not the capacity or fullness of a bin which is the deciding factor of whether
to put an item in a bin, but the remaining capacity, or emptiness E, of a bin. In
fact, the capacity of a bin was fixed for the entire set of experiments, so could
be considered as a constant. In other words, the output of the heuristic based on
this pair of variables, could be semantically interpreted as how suitable is the cur-
rent bin, given its emptiness, and the size of the item we would like to place in
the bin.
This pair of variables can be replaced by a single variable, R (= E − S) the space
remaining in a bin if the current item were placed in the bin. The output of a heuristic
based solely on this single variable could not be interpreted as in the previous case,
but rather as the following question: If the current item were placed in the current
bin, what is the utility of the remaining space?
Exploring Hyper-heuristic Methodologies with Genetic Programming 191
So far, we have only considered variables describing the current item and current
bin. However, there are other variables which could be introduced. Other examples
of variables which could be stored are
• the item number (i.e. keep a counter of how many items have been packed so far)
• the minimum and maximum item size seen so far (as more items are packed,
these bounds will diverge)
• the average and standard deviation of item size seen so far (these could provide
a valuable source of statistical information on which to base future decisions).
All of this information can be made available to the evolving heuristic.
In [10], the function set {+, −, x, %, abs, ≤} was used, where abs is the absolute
operator and % is protected divide. There are a few points worth considering with
this chosen function set. Firstly, ≤ returns -1 or +1, rather than 0 or 1, which is
normally associated with this relational operator. This was to satisfy the property of
closure, that the output of any function in the function set, can be used as the input
of any function in the function set. Secondly, this function set is sufficient to express
some the human designed heuristics described (namely “first-fit” and “best-fit”).
Protected divide (%) is often used in Genetic Programming, as if the denominator
is zero, then the function is undefined (i.e. its value tends to infinity). Typically,
protected divide returns 0 or 1. However, this choice does not reflect the idea that
the quotient could be a very large number. Thus, in [11], a much larger value was
returned.
In [12], ≤ was removed from the function set as it was effectively redundant. This
is because, as the evolved heuristic function is enveloped in a loop which returns the
index of the maximum scoring bin, any test for ‘less than’ can be done by the loop.
The aim of this discussion is to outline the difficulty in choosing a function set for
the given problem domain.
It returns a value between 0 and 1, with 0 being the best result where all bins are
filled completely, and 1 representing completely empty bins.
In the bin packing problem, there are many different solutions which use the same
number of bins. If the fitness function were simply the number of bins used, then there
would be a plateau in the search space that is easily reached, but difficult to escape
from [22]. Using equation 2 as a fitness function helps the evolutionary process by
192 E.K. Burke et al.
1 n
f1 f4
f2 T3 T2 T3
Evolution
Domain Barier
− S function set={+, −, *, ÷}
C S
terminal set={F, C, S}
bin packing fitness
F C
Problem Domain function
differentiating between two solutions that use the same number of bins. The fitness
function proposed by Falkenauer rewards solutions more if some bins are nearly full
and others nearly empty, as opposed to all the bins being similarly filled.
The constant k in equation 2 determines how much of a premium is placed on
nearly full bins. The higher the value of k, the more attention will be given to the
almost filled bins at the expense of the more empty ones. A value of k = 2 was
deemed to be the best in [22] so this is the value we use here.
4 Literature Review
In this section, we briefly discuss the area, in order to give the reader a flavour of
what has been attempted to date. We include some work specifically using Genetic
Programming as a hyper-heuristic. We also include some work on other areas which
are similar in the sense that they use a meta-level in the learning system, and can
tackle multiple problems. We now briefly review two areas of the machine learning
literature which could also be considered in the context of hyper-heuristics. The first
is learning to learn, and the second is self-adaptation.
meta-heuristics, low level heuristics, or component parts of them. There are a num-
ber of heuristics used in this system, including heuristics which swap two or three
edges in the solution, and an iterative heuristic which executes another heuristic a
maximum of 1000 times unless an improvement is seen. The execution of the meta-
heuristic is a sequential execution of a list of heuristics and so generates a candidate
solution to the given problem from a random initial route. Tours whose lengths are
highly competitive with the best real-valued lengths from the literature are found
using this grammar based Genetic Programming.
In a series of papers, Burke et al. [10, 11, 12] examine the viability of using
Genetic Programming to evolve heuristics for the online bin packing problem. Given
a sequence of items, each must be placed into a bin in the order it arrived. At each
decision point, we are only given the size of the current item to be packed. In [10],
an item is placed into the first bin which receives a positive score according to
the evolved heuristic. Thus, the heuristic may not be evaluated for all bins, as it is
terminated as soon as a positive score is obtained. This approach produces a heuristic
which performs better than the human designed “first-fit” heuristic.
In [11], a similar approach is used. However, this time, the heuristic is allowed
to examine all bins, and the item is placed in the bin which receives the maxi-
mum score. This produces a heuristic which is competitive with the human designed
heuristic “best-fit”. The difference between these two approaches, illustrates that the
framework to evaluate the heuristics is a critical component of the overall system.
In [11], the performance of heuristics on general and specialised problem classes
is examined. They show that, as one problem class is more general than another,
then the heuristic evolved on the more general class is more robust, but performs
less well than the specialised heuristic on the specialised class of problem. This is,
intuitively, what one would expect.
In [12], evolved heuristics are applied to much larger problem instances than they
were trained on, but as the larger instances come from the same class as the smaller
training instances, performance does not deteriorate and indeed, the approach con-
sistently outperforms the human designed best-fit heuristic. The paper makes the
important distinction between the nature of search spaces associated with direct and
indirect methods. With direct methods, the size of the solution necessarily grows
with the size of the problem instance, resulting in combinatorial explosion, for ex-
ample, when the search space is a permutation. However, when the search space
consists of programs or heuristics, the size of a program to solve a given class of
problem is fixed as it is a generalisation of the solution to a class of problem (i.e. the
solution to a class of problem is independent of the size of an instance).
Drechsler et al. [19], instead of directly evolving a solution, use Genetic Pro-
gramming to develop a heuristic that is applied to the problem instance. Thus the
typically large run-times associated with evolutionary runs have to be invested only
once in the learning phase. The technique is applied to a problem of minimising
Binary Decision Diagrams. They state that standard evolutionary techniques cannot
be applied due to their large runtime. The best known algorithms used for variable
ordering are exponential in time, thus heuristics are used. The heuristics which are
developed by the designer often fail for specific classes of circuits. Thus it would
194 E.K. Burke et al.
be beneficial if the heuristics could learn from previous examples. An earlier pa-
per is referred to where heuristics are learnt using a genetic algorithm [20], but it
is pointed out that there are problems using a fixed length encoding to represent
heuristics. Experiments show that high quality results are obtained that outperform
previous methods, while keeping low run-times.
Fukunaga [24, 25] examines the problem domain of Boolean satisfiability (SAT).
He shows that a number of well-known local search algorithms are combinations of
variable selection primitives, and he introduces CLASS (Composite heuristic Learn-
ing Algorithm for SAT Search), an automated heuristic discovery system which
generates variable selection heuristic functions. The learning task, therefore, is de-
signing a variable selection heuristic as a meta-level optimisation problem.
Most of the standard SAT local search procedures can be described using the
same template, which repeatedly chooses a variable to invert, and calculates the
utility in doing so. Fukunaga identifies a number of common primitives used in hu-
man designed heuristics e.g. the gain of flipping a variable (i.e. the increase in the
number of clauses in the formula) or the age of a variable (i.e. how long since it
was last flipped). He states that “it appears human researchers can readily identify
interesting primitives that are relevant to variable selection, the task of combining
these primitives into composite variable selection heuristics may benefit from au-
tomation”. This, of course, is particularly relevant for Genetic Programming.
In the CLASS language, which was shown to able to express human designed
heuristics, a composition operator is used which takes two heuristics and combines
them using a conditional if statement. The intuition behind this operator is that the
resulting heuristic blends the behaviour of the two component heuristics. The impor-
tance of this composition operator is that it maintains the convergence properties of
the individual heuristics, which is not true if Genetic Programming operators were
used. CLASS successfully generates a new variable selection heuristic, which is
competitive with the best-known GSAT/Walksat-based algorithms. All three learnt
heuristics were shown to scale and generalise well on larger random instances; gen-
eralisation to other problem classes varied.
Geiger et al. [28] present an innovative approach called SCRUPLES (schedul-
ing rule discovery and parallel learning system) which is capable of automatically
discovering effective dispatching rules. The claim is made that this is a significant
step beyond current applications of artificial intelligence to production scheduling,
which are mainly based on learning to select a given rule from among a number
of candidates rather than identifying new and potentially more effective rules. The
rules discovered are competitive with those in the literature. They state that a re-
view of the literature shows no existing work where priority dispatching rules are
discovered through search. They employ Genetic Programming, as each dispatch-
ing rule is viewed as a program. They point out that, Genetic Programming has a
key advantage over more conventional techniques such as genetic algorithms and
neural networks, which deal with fixed sized data structures. Whereas Genetic Pro-
gramming can discover rules of varying length and for many problems of interest,
such as scheduling problems, the complexity of an algorithm which will produce the
Exploring Hyper-heuristic Methodologies with Genetic Programming 195
correct solution is not known a-priori. The learning system has the ability to learn the
best dispatching rule to solve the single unit capacity machine-scheduling problem.
For the cases where no dispatching rules produced optimal solutions, the learning
system discovers rules that perform no worse than the known rules.
Stephenson et al. [57] apply Genetic Programming to optimise priority or cost
functions associated with two compiler heuristics; predicted hyper block formation
(i.e. branch removal via prediction) and register allocation. Put simply, priority func-
tions prioritise the options available to a compiler algorithm. Stephenson et al. [57]
state “Genetic Programming is eminently suited to optimising priority functions be-
cause they are best represented as executable expressions”. A caching strategy, is
a priority function that determines which program memory locations to assign to
cache, in order to minimise the number of times the main memory must be ac-
cessed. The human designed “least recently used” priority function is outperformed
by results obtained by Genetic Programming. They make the point that by evolv-
ing compiler heuristics, and not the applications themselves, we need only apply
our process once, which is in contrast to an approach using genetic algorithms. In
addition they emphasise that compiler writers have to tediously fine tune priority
functions to achieve suitable performance, whereas with this method, this is effec-
tively automated.
but also all of the weaknesses. In contrast, machine generated heuristics will have
their own strengths and weaknesses. Thus, as one of the motives of hyper-heuristics
is to combine heuristics, this would offer a method where manually and automati-
cally designed heuristics can be used side by side. It may also be possible to evolve
heuristics specifically to complement human designed heuristics in a hyper-heuristic
context, where an individual heuristic does not need to be good on its own, but is a
good team player in the environment of the other heuristics. Again this is another
example of cooperation between humans and computers.
References
1. Asmuni, H., Burke, E.K., Garibaldi, J.M., Mccollum, B.: A novel fuzzy approach to
evaluate the quality of examination timetabling. In: Proceedings of the 6th International
Conference on the Practice and Theory of Automated Timetabling, pp. 82–102 (2006)
2. Bäck, T.: An overview of parameter control methods by self-adaption in evolutionary
algorithms. Fundam. Inf. 35(1-4), 51–66 (1998)
3. Bader-El-Din, M.B., Poli, R.: Generating SAT local-search heuristics using a GP hyper-
heuristic framework. In: Monmarché, N., Talbi, E.-G., Collet, P., Schoenauer, M., Lutton,
E. (eds.) EA 2007. LNCS, vol. 4926, pp. 37–49. Springer, Heidelberg (2008)
4. Bai, R., Kendall, G.: An investigation of automated planograms using a simulated an-
nealing based hyper-heuristic. In: Ibaraki, T., Nonobe, K., Yagiura, M. (eds.) Metaheuris-
tics: Progress as Real Problem Solver. Operations Research/Computer Science Interface
Serices, vol. 32, pp. 87–108. Springer, Heidelberg (2005)
5. Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming – An In-
troduction; On the Automatic Evolution of Computer Programs and its Applications.
Morgan Kaufmann, San Francisco (1998)
6. Battiti, R.: Reactive search: Toward self–tuning heuristics. In: Rayward-Smith, V.J., Os-
man, I.H., Reeves, C.R., Smith, G.D. (eds.) Modern Heuristic Search Methods, pp. 61–
83. John Wiley & Sons Ltd, Chichester (1996)
7. Battiti, R., Brunato, M.: Reactive search: Machine learning for memory-based heuristics.
In: Gonzalez, T.F. (ed.) Approximation Algorithms and Metaheuristics, ch. 21, pp. 1–17.
Taylor and Francis Books/CRC Press, Washington (2007)
8. Birattari, M.: The problem of tuning metaheuristics as seen from a machine learning
perspective. Ph.D. thesis, Universite Libre de Bruxelles (2004)
9. Burke, E.K., Hart, E., Kendall, G., Newall, J., Ross, P., Schulenburg, S.: Hyper-
heuristics: An emerging direction in modern search technology. In: Glover, F., Kochen-
berger, G. (eds.) Handbook of Metaheuristics, pp. 457–474. Kluwer, Dordrecht (2003)
10. Burke, E.K., Hyde, M.R., Kendall, G.: Evolving bin packing heuristics with genetic pro-
gramming. In: Runarsson, T.P., Beyer, H.-G., Burke, E.K., Merelo-Guervós, J.J., Whit-
ley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 860–869. Springer, Heidel-
berg (2006)
11. Burke, E.K., Hyde, M.R., Kendall, G., Woodward, J.: Automatic heuristic generation
with genetic programming: evolving a jack-of-all-trades or a master of one. In: Thierens,
D., et al. (eds.) Proceedings of the 9th annual conference on Genetic and evolutionary
computation GECCO 2007, vol. 2, pp. 1559–1565. ACM Press, London (2007)
12. Burke, E.K., Hyde, M.R., Kendall, G., Woodward, J.R.: The scalability of evolved on line
bin packing heuristics. In: Srinivasan, D., Wang, L. (eds.) 2007 IEEE Congress on Evo-
lutionary Computation, pp. 2530–2537. IEEE Computational Intelligence Society/IEEE
Press, Singapore (2007)
Exploring Hyper-heuristic Methodologies with Genetic Programming 199
13. Burke, E.K., Kendall, G., Soubeiga, E.: A tabu-search hyperheuristic for timetabling and
rostering. J. of Heuristics 9(6), 451–470 (2003)
14. Burke, E.K., McCollum, B., Meisels, A., Petrovic, S., Qu, R.: A graph-based hyper-
heuristic for educational timetabling problems. Eur. J. of Oper. Res. 176, 177–192 (2007)
15. Burke, E.K., Petrovic, S., Qu, R.: Case based heuristic selection for timetabling prob-
lems. J. of Sched. 9(2), 115–132 (2006)
16. Coffman Jr., E.G., Galambos, G., Martello, S., Vigo, D.: Bin packing approximation
algorithms: Combinatorial analysis. In: Du, D.Z., Pardalos, P.M. (eds.) Handbook of
Combinatorial Optimization, pp. 151–207. Kluwer, Dordrecht (1998)
17. Cowling, P., Kendall, G., Soubeiga, E.: A hyperheuristic approach to scheduling a sales
summit. In: Burke, E., Erben, W. (eds.) PATAT 2000. LNCS, vol. 2079, pp. 176–190.
Springer, Heidelberg (2001) (selected papers)
18. Dowsland, K.A., Soubeiga, E., Burke, E.K.: A simulated annealing hyper-heuristic for
determining shipper sizes. Eur. J. of Oper. Res. 179(3), 759–774 (2007)
19. Drechsler, N., Schmiedle, F., Grosse, D., Drechsler, R.: Heuristic learning based on
genetic programming. In: Miller, J., Tomassini, M., Lanzi, P.L., Ryan, C., Tetamanzi,
A.G.B., Langdon, W.B. (eds.) EuroGP 2001. LNCS, vol. 2038, pp. 1–10. Springer, Hei-
delberg (2001)
20. Drechsler, R., Becker, B.: Learning heuristics by genetic algorithms. In: ASP-DAC 1995:
Proceedings of the 1995 conference on Asia Pacific design automation (CD-ROM), p.
53. ACM, New York (1995)
21. Eiben, A.E., Hinterding, R., Michalewicz, Z.: Parameter control in Evolutionary Algo-
rithms. IEEE Trans. on Evol. Comput. 3(2), 124–141 (1999)
22. Falkenauer, E., Delchambre, A.: A genetic algorithm for bin packing and line balancing.
In: Proc. of the IEEE 1992 International Conference on Robotics and Automation, pp.
1186–1192 (1992)
23. Fang, H.L., Ross, P., Corne, D.: A promising hybrid GA/heuristic approach for open-
shop scheduling problems. In: Eur. Conference on Artificial Intelligence (ECAI 2004),
pp. 590–594 (1994)
24. Fukunaga, A.S.: Automated discovery of composite sat variable-selection heuristics. In:
AAAI/IAAI, pp. 641–648 (2002)
25. Fukunaga, A.S.: Automated discovery of local search heuristics for satisfiability testing.
Evol. Comput. 16(1), 31–61 (2008)
26. Gagliolo, M., Schmidhuber, J.: Learning dynamic algorithm portfolios. Ann. of Math.
and Artif. Intell. 47(3-4), 295–328 (2006)
27. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-
Completeness. Series of Books in the Mathematical Sciences. W. H. Freeman, New York
(1979)
28. Geiger, C.D., Uzsoy, R., Aytug, H.: Rapid modeling and discovery of priority dispatching
rules: An autonomous learning approach. J. of Sched. 9(1), 7–34 (2006)
29. Hart, E., Ross, P., Nelson, J.: Solving a real-world problem using an evolving heuristi-
cally driven schedule builder. Evol. Comput. 6(1), 61–80 (1998)
30. Hoos, H.H., Sttzle, T.: Stochastic Local Search: Foundations and Applications. Morgan
Kaufmann / Elsevier (2005)
31. Huberman, B.A., Lukose, R.M., Hogg, T.: An economics approach to hard computational
problems. Sci. 275, 51–54 (1997)
32. Hutter, F., Hamadi, Y., Hoos, H.H., Leyton-Brown, K.: Performance prediction and auto-
mated tuning of randomized and parametric algorithms. In: Benhamou, F. (ed.) CP 2006.
LNCS, vol. 4204, pp. 213–228. Springer, Heidelberg (2006)
200 E.K. Burke et al.
33. Hutter, F., Hoos, H.H., Stützle, T.: Automatic algorithm configuration based on local
search. In: AAAI, pp. 1152–1157. AAAI Press, Menlo Park (2007)
34. Jakob, W.: HyGLEAM – an approach to generally applicable hybridization of evolution-
ary algorithms. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernández-Villacañas,
J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 527–536. Springer, Hei-
delberg (2002)
35. Jakob, W.: Towards an adaptive multimeme algorithm for parameter optimisation suiting
the engineers’ needs. In: Runarsson, T.P., Beyer, H.-G., Burke, E.K., Merelo-Guervós,
J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 132–141. Springer,
Heidelberg (2006)
36. Johnson, D., Demers, A., Ullman, J., Garey, M., Graham, R.: Worst-case performance
bounds for simple one-dimensional packaging algorithms. SIAM J. on Comput. 3(4),
299–325 (1974)
37. Keller, R.E., Poli, R.: Linear genetic programming of parsimonious metaheuristics. In:
Srinivasan, D., Wang, L. (eds.) 2007 IEEE Congress on Evolutionary Computation, pp.
4508–4515. IEEE Computational Intelligence Society/IEEE Press, Singapore (2007)
38. Koza, J.R.: Genetic Programming: on the Programming of Computers by Means of Nat-
ural Selection. The MIT Press, Boston (1992)
39. Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programs. The
MIT Press, Cambridge (1994)
40. Koza, J.R., Bennett III, F.H., Andre, D., Keane, M.A.: Genetic Programming III: Dar-
winian Invention and Problem solving. Morgan Kaufmann, San Francisco (1999)
41. Koza, J.R., Keane, M.A., Streeter, M.J., Mydlowec, W., Yu, J., Lanza, G.: Genetic Pro-
gramming IV: Routine Human-Competitive Machine Intelligence (Genetic Program-
ming). Springer, Heidelberg (2003)
42. Koza, J.R., Poli, R.: Genetic programming. In: Burke, E.K., Kendall, G. (eds.) Search
Methodologies: Introductory Tutorials in Optimization and Decision Support Tech-
niques, pp. 127–164. Springer, Boston (2005)
43. Krasnogor, N., Gustafson, S.: A study on the use of ’self-generation’ in memetic algo-
rithms. Nat. Comput. 3(1), 53–76 (2004)
44. Krasnogor, N., Smith, J.E.: Emergence of profitable search strategies based on a simple
inheritance mechanism. In: Proceedings of the 2001 Genetic and Evolutionary Compu-
tation Conference, pp. 432–439. Morgan Kaufmann, San Francisco (2001)
45. Mockus, J.: Application of bayesian approach to numerical methods of global and
stochastic optimization. J. of Glob. Optim. 4(4), 347–366 (1994)
46. Nareyek, A.: Choosing search heuristics by non-stationary reinforcement learning. In:
Resende, M.G.C., de Sousa, J.P. (eds.) Metaheuristics: Computer Decision-Making, ch.
9, pp. 523–544. Kluwer, Dordrecht (2003)
47. Ong, Y.S., Keane, A.J.: Meta-lamarckian learning in memetic algorithms. IEEE Trans.
on Evol. Comput. 8, 99–110 (2004)
48. Ong, Y.S., Lim, M.H., Zhu, N., Wong, K.W.: Classification of adaptive memetic algo-
rithms: a comparative study. IEEE Trans. on Syst. Man and Cybern. Part B 36(1), 141–
152 (2006)
49. Ozcan, E., Bilgin, B., Korkmaz, E.E.: A comprehensive survey of hyperheuristics. Intell.
Data Anal. 12(1), 3–23 (2008)
50. Poli, W.B.R., Langdon, N.F.M.: A Field Guide to Genetic Programming. Lulu Enter-
prises, UK (2008)
51. Rhee, W.T., Talagrand, M.: On line bin packing with items of random size. Math. Oper.
Res. 18, 438–445 (1993)
Exploring Hyper-heuristic Methodologies with Genetic Programming 201
52. Ropke, S., Pisinger, D.: A unified heuristic for a large class of vehicle routing problems
with backhauls. Eur. J. of Oper. Res. 171(3), 750–775 (2006)
53. Ross, P.: Hyper-heuristics. In: Burke, E.K., Kendall, G. (eds.) Search Methodologies:
Introductory Tutorials in Optimization and Decision Support Techniques, ch. 17, pp.
529–556. Springer, Heidelberg (2005)
54. Ross, P., Marin-Blazquez, J.G., Schulenburg, S., Hart, E.: Learning a procedure that
can solve hard bin-packing problems: A new ga-based approach to hyper-heuristics. In:
Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2003,
pp. 1295–1306. Springer, Heidelberg (2003)
55. Seiden, S.S.: On the online bin packing problem. J. ACM 49(5), 640–671 (2002)
56. Smith, J.E.: Co-evolving memetic algorithms: A review and progress report. IEEE Trans.
in Syst. Man and Cybern. Part B 37(1), 6–17 (2007)
57. Stephenson, M., O’Reilly, U., Martin, M., Amarasinghe, S.: Genetic programming ap-
plied to compiler heuristic optimisation. In: Proceedings of the Eur. Conference on Ge-
netic Programming, pp. 245–257. Springer, Heidelberg (2003)
58. Terashima-Marin, H., Flores-Alvarez, E.J., Ross, P.: Hyper-heuristics and classifier sys-
tems for solving 2D-regular cutting stock problems. In: Beyer, H.G., O’Reilly, U.M.
(eds.) Proceedings of Genetic and Evolutionary Computation Conference, GECCO 2005,
Washington DC, USA, June 25-29, pp. 637–643. ACM, New York (2005)
59. Terashima-Marin, H., Ross, P., Valenzuela-Rendon, M.: Evolution of constraint satis-
faction strategies in examination timetabling. In: Proc. of the Genetic and Evolution-
ary Computation Conf. GECCO 1999, pp. 635–642. Morgan Kaufmann, San Francisco
(1999)
60. Thrun, S., Pratt, L.: Learning to learn: Introduction and overview. In: Thrun, S., Pratt, L.
(eds.) Learning To Learn. Kluwer Academic Publishers, Dordrecht (1998)
61. Vazquez-Rodriguez, J.A., Petrovic, S., Salhi, A.: A combined meta-heuristic with hyper-
heuristic approach to the scheduling of the hybrid flow shop with sequence dependent
setup times and uniform machines. In: Proceedings of the 3rd Multidisciplinary Inter-
national Scheduling Conference: Theory and Applications (MISTA 2007), pp. 506–513
(2007)
62. Wah, B.W., Ieumwananonthachai, A.: Teacher: A genetics-based system for learning
and for generalizing heuristics. In: Yao, X. (ed.) Evol. Comput., pp. 124–170. World
Scientific Publishing Co. Pte. Ltd, Singapore (1999)
63. Wah, B.W., Ieumwananonthachai, A., Chu, L.C., Aizawa, A.: Genetics-based learning
of new heuristics: Rational scheduling of experiments and generalization. IEEE Trans.
on Knowl. and Data Eng. 7(5), 763–785 (1995)
Adaptive Constraint Satisfaction: The Quickest
First Principle
Abstract. The choice of a particular algorithm for solving a given class of constraint
satisfaction problems is often confused by exceptional behaviour of algorithms. One
method of reducing the impact of this exceptional behaviour is to adopt an adaptive
philosophy to constraint satisfaction problem solving. In this report we describe one
such adaptive algorithm, based on the principle of chaining. It is designed to avoid
the phenomenon of exceptionally hard problem instances. Our algorithm shows how
the speed of more naı̈ve algorithms can be utilised safe in the knowledge that the
exceptional behaviour can be bounded. Our work clearly demonstrates the potential
benefits of the adaptive approach and opens a new front of research for the constraint
satisfaction community.
1 Introduction
Constraint Satisfaction Problems occur in many areas of everyday life. These range
from problems such as timetabling and transport planning to configuration prob-
lems and document layout design. In all cases, the notion of a Constraint Satisfac-
tion Problem (CSP) is characterised by the need to assign values to elements of the
problem instances, these values coming from a finite set of possibilities and subject
to a set of rules or constraints [31] [23].
Once a CSP has been identified there are whole host of problem solving tech-
niques which have been developed for solving them [23]. The most basic of these is
the simple backtracking algorithm but more sophisticated algorithms such as look-
ahead approaches have been shown to be highly effective [13] and are commonly
used in commercial software libraries such as ILOG Solver [22]. Heuristic search
has been applied to CSP with success, e.g. see [14] [18] [32], and have also been
embedded in industrial packages such as iOpt [33].
James E. Borrett and Edward P.K. Tsang
University of Essex, Dept. of Computer Science, Wivenhoe Park, Colchester CO4 3SQ,
United Kingdom
e-mail: {jborrett,edward}@essex.ac.uk
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 203–230.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
204 J.E. Borrett and E.P.K. Tsang
More formally, the constraint satisfaction problem (CSP) can be defined in terms
of the triple <Z, D, C>, where Z is a set of variables, D is a mapping of the variables
in Z to domains and C is a set of constraints [31]. Given this definition of a CSP,
there are many ways in which different types of problem can be classified, in terms
of the elements of Z, D and C1 . This classification may then be used as a basis for
the selection of a particular algorithm to solve that class of problems.
There is, however, a significant complication with the definition of CSP classes.
Sometimes particular instances of problems in a class may exhibit exceptional qual-
ities, in terms of the solving abilities of the chosen algorithm. One clear example
of this is the phenomenon of exceptionally hard problem instances [28], or EHPs as
they shall be referred to in this paper.
The example of EHPs is illustrative of the dilemma posed to the problem solver.
There is a clear choice of either using a naı̈ve algorithm which is likely to solve
most instances very quickly, at the risk of catastrophic encounter with an EHP, or to
choose a more complex algorithm, which has a far lower probability of encountering
EHPs2 . However, as is often the case, the use of more complex algorithms entails
an overhead.3
One approach which can overcome this dilemma is to use a more flexible ap-
proach which we describe as adaptive constraint satisfaction. The notion of adaptive
constraint satisfaction can be encapsulated in the following description:
Adaptive Constraint Satisfaction is a general philosophy for solving constraint satis-
faction problems. It aims to make use of the many algorithms and techniques available
by relaxing the commitment to a single algorithm when solving a particular CSP, al-
lowing for the active modification or switching of algorithms and models during the
search process.
Built upon the Adaptive Constraint Satisfaction context was a set of research
projects4 . Adaptive constraint satisfaction is based on the belief that there is no “best
algorithm” in constraint satisfaction – different algorithms work for different prob-
lem instances – an idea that was later articulated as the “No Free Lunch Theorem”
[35, 36]5 . Based on this belief, Kwan et al [17] [30] developed a machine learn-
ing framework for learned mappings from CSPs to algorithms and heuristics. Given
1 In [3] the issue of classifying different formulations of the same problem is considered.
2 In the context of complete algorithms, [25], [26] suggest it is likely that investing in more
complex algorithms, such as forward checking with conflict-directed backjumping [20],
will decrease the frequency of encounters with EHPs.
3 Given that EHPs are algorithm dependent, as explained above, another approach is to
restart the search with, say a random algorithm. The difficulty in this approach is deciding
when to restart. Abandoning the search prematurely means a waste of search effort; if one
is not careful, one could end up restarting indefinitely. That motivated us to develop a
mechanism to recognize thrashing.
4 See Adaptive Constraint Satisfaction Project (1992-98) http://csp.bracil.net/acs.html
5 According to this theorem “there is no free lunch when the probability distribution on
problem instances is such that all problem solvers have identically distributed results”. See
Wikipedia http://en.wikipedia.org/wiki/No free lunch in search and optimization (ac-
cessed 18 August 2008)
Adaptive Constraint Satisfaction: The Quickest First Principle 205
a CSP, the algorithm picked may not work efficiently. This is because such map-
pings were generated statistically, which may not apply to every problem instance.
The problem instance on hand may be “exceptionally hard” to the chosen algo-
rithm and heuristic. Therefore, part of the Adaptive Constraint Satisfaction project
was to develop measures for monitoring algorithms when they search. Every algo-
rithm is designed to exploit certain characteristics of the problem instance. If an
algorithm/heuristic does not do what it is supposed to do, it should be stopped, and
a different algorithm/heuristic should be used. For example, lookahead algorithms
[13] are designed to propagate constraints in order to prune the search space and
detect dead-ends. If, during the search, it is found that not much of the search space
is pruned, and a large amount of constraint propagation effort has resulted in few
dead-ends being detected, the lookahead algorithm that is currently used should be
replaced.
In this paper, we outline a particular instance of the adaptive approach where we
make use of Algorithmic Chaining. The result is REBA (for Reduced Exceptional
Behaviour Algorithm) which is designed to avoid the phenomenon of exceptionally
hard problems in the so called easy region for solvable CSPs. REBA operates on
complete search methods – methods that explore the search space systematically
and entirely if necessary.
Table 1 Example showing how the ranking of algorithms can differ when based on median
cost of solving CSPs, and sensitivity to EHPs
encountering an EHP. The chain can simply be set to an ordering based on the
“Quickest First Principle” (QFP), where quickest indicates the algorithm with the
best median performance.
We wanted to design an algorithm for solving easy solvable problems without
failing in EHPs. Using QFP means that we always have the potential for solving the
CSPs quickly. However, if we can detect that the current algorithm is not working
well, we could switch to the next quickest algorithm, and so on. As a result we can
still benefit from the speed of the naı̈ve algorithms while at the same time having the
capability to resort to more complex algorithms in the event that a switch scenario
is detected.
While there is some overhead involved in this approach, the benefits can be con-
siderable. For example, the ability to use a simple algorithm can result in an order of
magnitude gain in performance over its more complex counterparts. Another advan-
tage is that in the event of a bad initial choice of algorithm, we are not stuck with it.
Mistakes of this nature will be rectified when we switch away from the bad choice.
threshold supplied to it, the predictor will suggest that a switch is necessary if the
threshold is reached.
Fig. 2 The types of progress during search (see text for explanations)
BM+MWO fails, we try adding intelligent backjumping to it. If this fails, we try
changing the ordering, since a bad ordering is often a contributing factor to EHPs
[28]. If these simpler algorithms fall victim to an EHP, we attempt to use a form
of forward checking with conflict-directed backjumping and a dynamic variable or-
dering. Finally, if this fails, we resort to another algorithm which has relatively low
susceptibility to EHPs, MAC+MDO.
As a basis for the design of MSL we defined the following functional specifica-
tion;
Given a CSP, an algorithm, and a variable ordering, the predictor should monitor the
progress of the search and be able to predict if thrashing is likely to occur during the
search.
One indication of thrashing is when the search from a particular level i never pro-
ceeds beyond a certain depth, d, and that a large proportion of the search space
between level i and level i + d is explored (i.e. little pruning takes place between
these two levels, see Figure 1). Such a situation can occur when the culprits of the
failure at level i + d precede the level i. MSL is a simple method which uses this
observation to predict thrashing type behaviour.
Before discussing MSL in more detail, we must identify three distinct types of
progress which occur during search. These are presented in figure 2.The types of
progress are defined as;
1. A value is found for the current variable which is compatible with all previous
assignments, or future variables in the case of lookahead algorithms. For example
the second arrow in Figure 2, where a value is found for the variable at level 2
which is compatible with the value assigned to the variable at level 1.
2. Backtracking occurs after finding no values for the current variable which are
compatible with previous assignments, or future variables in the case of looka-
head algorithms. For example the third arrow in Figure 2, where no value can
be found for the variable at level 3 which is compatible with the current assign-
ments of the variables at levels 1 and 2. This will be known as a No Assigned
Value (NAV) backtrack. The NAV backtrack occurs at the tail of the arrow, level
3. At the head of the arrow, level 2 learns of an Unsuccessful Subspace Search
(USS).
3. Backtracking occurs, but only after at least one value has been found for the
current variable which is compatible with the assignments of previous variables,
or future variables in the case of lookahead algorithms (Meaning the search must
have progressed at least one level further down than the current one). For example
the seventh arrow in Figure 2, where a value for the variable at level 3 has been
found which is compatible with the assignments of the variables at levels 1 and
2, but is later rejected because no value can be found for the variable at level
4. This will be known as a Successfully Assigned Values (SAV) backtrack. The
SAV backtrack occurs at the tail of the arrow, level 3. At the head of the arrow,
level 2 learns of a USS.
During the search MSL keeps track of the last level at which a NAV backtrack
occurred. This is considered to be the deepest level of the current search sub-space.
We will refer to this level as DEEPEST.
In addition, for each level in the search, MSL keeps track of two values. Firstly
a count indicating the number of USS’s which returned to the level with the same
value for DEEPEST. Secondly a record of the value of DEEPEST when this count
210 J.E. Borrett and E.P.K. Tsang
is started. We will refer to these values as counti and DLi respectively, where i is the
level they refer to.
In considering how the count is maintained, we must examine the seven possible
cases. These depend on whether a USS, a NAV backtrack or a SAV backtrack is
occurring, and what the value of DEEPEST is compared to the value of DLi for the
level. Table 2 illustrates the different actions taken at a given level, i, depending on
these circumstances.
Some points should be noted here:
• DEEPEST and counti are initialised to 0 and DLi are initialised to i
• DEEPEST can only be changed by a NAV backtrack occurring, and always
changes when such a backtrack occurs.
Figure 3 gives an example illustrating the possible situations encountered by
MSL. Each column in Figure 3 represents either an assignment, a NAV backtrack,
or a SAV backtrack together with a USS if applicable (with the exception of the
first column). The numbers below the arrow indicate the values of DL1 ,..,DL4 ,
count1,...,count4 and DEEPEST after the actions for that column have been car-
ried out. The values of the actions indicate which entries in Table 2 apply to the
above arrow7. This includes actions at both the tail and the head of the arrow. The
first column simply shows the initial values before the search begins.
As an example consider columns 14 to 16. Column 14 shows a simple assignment
to the variable at level 3, action A. No further actions take place. Column 15 then
shows a NAV backtrack from the variable at level 4. When the backtrack occurs,
DL4 = 4 and DEEPEST = 3, so DL4 > DEEPEST and entry b1 in Table 2 applies
to level 4. As a result DEEPEST is set to the value of i, i.e. DEEPEST = 4. At the
7 The entry A indicates a successful assignment, no action is taken.
Adaptive Constraint Satisfaction: The Quickest First Principle 211
head of the arrow USS entry a3 applies (because DEEPEST = 4 and DL3 = 3) and
count3 is set to 1 with DL3 being set to DEEPEST.
Column 16 shows a SAV backtrack from the variable at level 3. When the back-
track occurs, DL3 = 4 and DEEPEST = 4. Since DL3 = DEEPEST entry c2 in Table
2 applies and no action is taken at level 3. At the head of the arrow USS entry a3
applies and count2 is set to 1 with DL2 being set to DEEPEST.
Having defined the function of our prediction mechanism, we also define a set of
criteria for evaluating its effectiveness. These criteria are based on three main func-
tions;
i It should predict as exceptionally hard those problem instances with high search
cost for the current algorithm.
ii The computational cost of predicting a CSP to be exceptionally hard should be
low and preferably not exceed the median cost. It should also be cheap in terms
of space.
iii It should not be so sensitive that too many problem instances are predicted to
be exceptionally hard. A high proportion of the problem instances with search
212 J.E. Borrett and E.P.K. Tsang
costs of median or lower should not be predicted to be exceptionally hard for the
current algorithm.
where: - base is the base threshold, which is a linear function of the domain size
- n is the number of variables,
- separation is the number of separating levels (DLi - i).
The threshold is adjusted according to separation to improve the sensitivity of de-
tection when the subspace is only searched sparsely, as might be the case with intel-
ligent backjumping algorithms.
Note that in subsequent experiments a suffix is given to the name of REBA. This
suffix indicates the multiples of the domain size used for the base threshold.
4 Experiments
In order to evaluate the overall performance of REBA and the effectiveness of its
switching mechanism we carried out an experiment on different classes of easy
soluble CSPs (which is what REBA is designed to tackle). This section describes
details of our experiment as well as presenting our results.
The main aim of our experiment was to compare the performance of REBA with two
types of algorithms - those exhibiting good median performance in the easy soluble
region, and those that have a good worst case performance on easy soluble region.
Randomly generated CSPs are used to evaluate REBA. They allow us to control the
tightness of problem classes, and therefore select appropriate problem classes for
experimentation.
Adaptive Constraint Satisfaction: The Quickest First Principle 213
The actual CSPs we used were based on randomly generated binary CSPs clas-
sified by the tuple <n, m, p1, p2>, where the elements of the tuple are defined
as;
n number of variables
m uniform domain size
p1 density of constraints in the constraint graph
p2 tightness of individual constraints8 i.e. the percentage of incompatible
assignments between the two variables involved in the constraint
Specifically, we wanted to conduct our experiments on problems in the so-called
easy soluble region where exceptionally hard problem instances were likely to oc-
cur. As a result, we chose the class <50, 10 , 0.1, 0.35 - 0.5 >. This range of p2
gives us a spread of problem instances in the region of interest and it also includes
some of the sets of problems used in [25] and [26], where EHPs were investigated.
The algorithms we chose for comparison, based on initial tests of problem
instances in the class description above, were as follows;
Fig. 6 Median performance on 50 variable problems in terms of cpu time. (Note that where
the plot for REBA and BMCBJ+MWO does not exist this means the median time was less
that one clock cycle and hence does not show in the logarithmic scale)
performance of REBA is much better than that of the more complex algorithms,
in most cases. This is particularly apparent when the CPU time is considered as in
Figures 6 and 7.
It should be noted that we have tested REBA on problems in the easy region.
This is because we advocate that different types of problem would be tackled by
different algorithms as noted in [30]. REBA, by design, appears to be useful in
tackling problems in the easy region on the soluble side of the phase transition. It is
the subject of further work to investigate the applicability of the strategies used in
REBA to tackling other problem types such as those in the phase transition.
216 J.E. Borrett and E.P.K. Tsang
Fig. 8 Ultimate search cost for BM+MWO had a switch not been predicted (total of 589
instances)
11 For the purposes of this experiment we used a base threshold equal to the domain size of
the variables.
12 The results are presented as multiples of the median search cost when considering the cost
to completion for all CSPs in the sample of 1000.
Adaptive Constraint Satisfaction: The Quickest First Principle 217
Fig. 9 Ultimate search cost for BMCBJ+MWO had a switch not been predicted (total of 693
instances)
The second criterion was that the cost to detection should be low. Figures 10 and
11 show the actual search cost up to detection for the instances where a switch was
suggested.
As can be seen from these figures the performance is good, since the median
cost for predicting a switch in BM+MWO was always less than the median search
cost when all CSPs are considered. For BMCBJ+MWO a similar result can be seen,
with the exception of a few cases. However, even with these exceptions, there are
no cases where the cost exceeds five times the overall median.
Finally, the third criterion was that the prediction mechanism should not be too
sensitive and prevent completion of search for the many problem instances that
would have only had median cost to solve to completion. Figures 12 and 13 show
the cost of search for all the problem instances where no switch was predicted place
(of which there were 411 for BM+MWO and 307 for BMCBJ+MWO).
Fig. 12 Search cost for problems where no switch was predicted for BM+MWO (total of 411
instances)
This clearly shows that no high cost problem instances are allowed through and
that there were many low cost problems let through. For BM+MWO, the maximum
search cost for a CSP in this set was less than the median for all problems. In the
case of BMCBJ+MWO, the maximum never exceeds five times the median.
From Figures 9–14 it is clear that the MSL predictor used for this version of
REBA, with a base threshold of 1.0, has performed very effectively, and that the
criteria laid out in Section 3.2 are largely fulfilled.
There is obviously a trade off when choosing the value for the threshold such that
no exceptionally hard problems are encountered, whilst at the same time allowing
the majority of the easier problems to be solved. The base threshold we have used
was equal to the domain size of the variables and was the same for all algorithms.
However, it may be possible to improve the effectiveness of algorithms such as
REBA by using a different threshold, or perhaps by using different thresholds for
the different algorithms in the chain.
Adaptive Constraint Satisfaction: The Quickest First Principle 219
Fig. 13 Search cost for problems where no switch was predicted for BMCBJ+MWO (total
of 307 instances)
We have experimented with different thresholds and find that they also produce
good results when compared to the algorithms used in the above tests. We have also
looked at how REBA performs with larger problem sizes. Again, REBA performs
well. These results are given in Section A2.
5 Discussion
Acknowledgements. This work was supported by EPSRC research grant ref. GR/J/42878.
The authors would like to thank Alvin Kwan for his useful comments on the contents of
Adaptive Constraint Satisfaction: The Quickest First Principle 221
this paper. Natasha Walsh participated in the early part of this research. Christine Mumford
(editor) and the anonymous referee provided us with insightful comments, which helped to
improve the quality of this paper.
Please note: this paper is based on an extended form of Borrett, J., Tsang, E.P.K. & Walsh,
N.R., Adaptive constraint satisfaction: the quickest first principle, Proceedings, 12th Euro-
pean Conference on AI, Budapest, Hungary, 1996, p.160-164.
Appendix
Table 3 Data for Figure 4, median performance on 50 variable problems in terms of compat-
ibility checks
Table 4 Data for Figure 5, worst case performance on 50 variable problems in terms of
compatibility checks
Table 5 Data for Figure 6, median performance on 50 variable problems in terms of cpu time
Table 6 Data for Figure 7, worst case performance on 50 variable problems in terms of cpu
time
Fig. 15 Worst case performance on 100 variable problems in terms of compatibility checks
Fig. 17 Worst case performance on 100 variable problems in terms of cpu time
Table 7 Data for Figure 14, median performance on 100 variable problems in terms of com-
patibility checks
Table 8 Data for Figure 15, worst case performance on 100 variable problems in terms of
compatibility checks
Table 9 Data for Figure 16, median performance on 100 variable problems in terms of cpu
time
p2 bmcbj+mwo fccbj+bz mac+mdo REBA1.0
15 16 166 1232 0
16 16 166 1216 0
17 16 166 1200 16
18 16 166 1199 16
19 32 166 1183 16
20 32 166 1166 16
21 32 150 1166 16
22 32 166 1150 32
23 33 150 1133 33
Table 10 Data for Figure 17, worst case performance on 100 variable problems in terms of
cpu time
Table 11 REBA results for base thresholds of 1.5 and 2.0 for CSPs used in Figures 3–6
p2 median checks median cpu time worst case checks worst case cpu time
REBA2.0 REBA1.5 REBA2.0 REBA1.5 REBA2.0 REBA1.5 REBA2.0 REBA1.5
35 299 299 0 0 2068 2057 33 33
36 314 314 0 0 4016 3111 98 49
37 339 339.5 0 0 3421 3143 48 49
38 375.5 380.5 0 0 6000 4709 66 49
39 489.5 520.5 0 0 6356 5837 99 50
40 608.5 602 0 0 6387 6387 82 98
41 716 698 0 0 6567 6028 83 98
42 782.5 762 0 0 8938 8437 98 82
43 895.5 865 0 0 14045 13285 198 116
44 969.5 944 16 0 24296 17351 298 183
45 1181.5 1158 16 16 27050 16279 316 199
46 1461 1408.5 16 16 49321 34802 832 750
Adaptive Constraint Satisfaction: The Quickest First Principle
p2 median checks median cpu time worst case checks worst case cpu time
REBA2.0 REBA1.5 REBA2.0 REBA1.5 REBA2.0 REBA1.5 REBA2.0 REBA1.5
15 866 866 0 0 2841 2829 49 49
16 926.5 927 16 0 4197 4191 66 66
17 994.5 997 16 16 4082 4077 50 49
18 1420 1437.5 16 16 13668 10756 115 99
19 1861 1834.5 16 16 7783 7770 66 83
20 2103 2072 16 16 19634 14960 148 216
21 2345 2302.5 16 16 29137 24651 332 299
22 2703.5 2693 32 32 106636 39053 949 366
23 3457 3447 33 33 92259 80427 915 732
J.E. Borrett and E.P.K. Tsang
Adaptive Constraint Satisfaction: The Quickest First Principle 229
References
1. Allen, J.A., Minton, S.: Selecting the right heuristic algorithm: runtime performance
predictors. In: Proceedings of 11th Biennial Conference of the Canadian Society for
Computational Studies of Intelligence, pp. 41–53 (1996)
2. Beck, J.C.: Solution-guided multi-point constructive search for job shop scheduling.
Journal of Artificial Intelligence Research 29, 49–77 (2007)
3. Borrett, J.E., Tsang, E.P.K.: A Context for Constraint Satisfaction Problem Formulation
Selection. Constraints 6(4), 299–327 (2001)
4. Brélaz, D.: New methods to color the vertices of graphs. Communications of the
ACM 22(4), 251–256 (1979)
5. Epstein, S.L., Freuder, E.C., Wallace, R., Morozov, A., Samuels, B.: The adaptive con-
straint engine. In: Van Hentenryck, P. (ed.) CP 2002. LNCS, vol. 2470, pp. 525–540.
Springer, Heidelberg (2002)
6. Freuder, E.C.: A sufficient Condition for Backtrack-Free Search. Journal of ACM 29,
24–32 (1982)
7. Gagliolo, M., Schmidhuber, J.: Learning restart strategies. In: Veloso, M. (ed.) Proceed-
ings, Twentieth International Joint Conference on Artificial Intelligence (IJCAI 2007),
Hyderabad, India, January 6-12, pp. 792–797 (2007)
8. Gagliolo, M., Schmidhuber, J.: Impact of censored sampling on the performance of
restart strategies. In: Benhamou, F. (ed.) CP 2006. LNCS, vol. 4204, pp. 167–181.
Springer, Heidelberg (2006)
9. Gashnig, J.: A General Backtrack Algorithm That Eliminates Most Redundant Tests. In:
Proceedings 5th International Joint Conference on Artificial Intelligence, vol. 457 (1977)
10. Gebruers, C., Hnich, B., Bridge, D., Freuder, E.: Using CBR to select solution strategies
in constraint programming. In: Muñoz-Ávila, H., Ricci, F. (eds.) ICCBR 2005. LNCS,
vol. 3620, pp. 222–236. Springer, Heidelberg (2005)
11. Gomes, C.P., Selman, B.: Algorithm portfolios. Artificial Intelligence 126(1-2), 43–62
(2001)
12. Gomes, C., Fernandez, C., Selman, B., Bessiere, C.: Statistical regimes across con-
strainedness regions. Constraints 10(4), 317–337 (2005)
13. Haralick, R.M., Elliott, G.L.: Increasing Tree Search Efficiency for Constraint Satisfac-
tion Problems. Artificial Intelligence 14, 263–313 (1980)
14. Hoos, H., Tsang, E.P.K.: Local search for constraint satisfaction. In: Rossi, F., van Beek,
P., Walsh, T. (eds.) Handbook of Constraint Programming, ch. 5, pp. 245–277. Elsevier,
Amsterdam (2006)
15. Huberman, B., Lukose, R., Hogg, T.: An economics approach to hard computational
problems. Science 265, 51–54 (1997)
16. Kern, M.: Parameter Adaptation in heuristic search - a population-based approach, PhD
Thesis, Department of Computer Science, University of Essex, Colchester, UK (2005)
17. Kwan, A.: A framework for mapping constraint satisfaction problems to solution meth-
ods, PhD Thesis, Department of Computer Science, University of Essex, Colchester, UK
(1997)
18. Mills, P., Tsang, E.P.K., Ford, J.: Applying an Extended Guided Local Search on the
Quadratic Assignment Problem. In: Annals of Operations Research, vol. 118, pp. 121–
135. Kluwer Academic Publishers, Dordrecht (2003)
19. Nudel, B.: Consistent-Labeling Problems and their Algorithms: Expected-Complexities
and Theory-Based Heuristics. Artificial Intelligence 21, 135–178 (1983)
20. Prosser, P.: Hybrid Algorithms for the Constraint Satisfaction Problem. Computational
Intelligence 9, 268–299 (1993)
230 J.E. Borrett and E.P.K. Tsang
21. Prosser, P.: Binary Constraint Satisfaction Problems: Some are Harder than Others. In:
Proceedings 11th European Conference on Artificial Intelligence, pp. 95–99 (1994)
22. Puget, J.-F.: Applications of constraint programming. In: Montanari, U., Rossi, F. (eds.)
Proceedings, Principles and Practice of Constraint Programming (CP 1995). LNCS, pp.
647–650. Springer, Heidelberg (1995)
23. Rossi, F., van Beek, P., Walsh, T. (eds.): Handbook of Constraint Programming. Elsevier,
Amsterdam (2006)
24. Sabin, D., Freuder, E.C.: Contradicting Conventional Wisdom in Constraint Satisfaction.
In: Proceedings 11th European Conference on Artificial Intelligence, pp. 125–129 (1994)
25. Smith, B., Grant, A.: Sparse Constraint Graphs and Exceptionally Hard Problems. In:
Proceedings 14th International Joint Conference on Artificial Intelligence, pp. 646–651
(1995a)
26. Smith, B., Grant, A.: Where the Exceptionally Hard Problems are. In: Workshop on
Studying and Solving Really Hard Problems. CP 1995, pp. 172–182 (1995b)
27. Smith, B.: Phase Transition and the Mushy Region in Constraint Satisfaction Prob-
lems. In: Proceedings 11th European Conference on Artificial Intelligence, pp. 100–104
(1994a)
28. Smith, B.: In search of Exceptionally Difficult Constraint Satisfaction Problems. In: Pro-
ceedings of the Workshop on Constraint Processing, 11th European Conference on Arti-
ficial Intelligence, pp. 79–86 (1994b)
29. Turner, J.S.: Almost all k-Colorable Graphs are Easy to Color. Journal of Algorithms 9,
63–82 (1988)
30. Tsang, E.P.K., Borrett, J.E., Kwan, A.C.M.: An Attempt to Map a Range of Constraint
Satisfaction Algorithms and Heuristics. In: Proceedings AISB 1995, pp. 203–216 (1995)
31. Tsang, E.P.K.: Foundations of Constraint Satisfaction. Academic Press, London (1993)
32. Voudouris, C., Tsang, E.P.K.: Guided local search. In: Glover, F. (ed.) Handbook of meta-
heuristics, pp. 185–218. Kluwer, Dordrecht (2003)
33. Voudouris, C., Dorne, R., Lesaint, D., Liret, A.: iOpt: A Software Toolkit for Heuristic
Search Methods. In: Walsh, T. (ed.) CP 2001. LNCS, vol. 2239, pp. 716–729. Springer,
Heidelberg (2001)
34. Voudouris, C., Owusu, G., Dorne, R., Lesaint, D. (eds.): Service Chain Management:
Technology Innovation for the Service Business. Springer, Heidelberg (2008)
35. Wolpert, D.H., Macready, W.G.: No Free Lunch Theorems for search, Technical Report
SFI-TR-95-02-010, Santa Fe Institute (1995)
36. Wolpert, D.H., Macready, W.G.: No Free Lunch Theorems for Optimization. IEEE
Transactions on Evolutionary Computation 1(1), 67–82 (1997)
37. Minton, S.: Automatically configuring constraint satisfaction programs, a case study.
Constraints 1(1-2), 7–43 (1996)
38. Kautz, H., Horvitz, E., Ruan, Y., Gomes, C., Selman, B.: Dynamic restart policies. In:
Proceedings, Eighteenth National Conference on Artificial Intelligence (AAAI 2002),
Edmonton, Alberta, Canada, pp. 674–682 (2002)
39. Xu, L., Hutter, F., Hoos, H.H., Leyton-Brown, K.: SATzilla: portfolio-based algorithm
selection for SAT. Journal of Artificial Intelligence Research 32, 565–606 (2008)
Part IV
Multi-Agent Systems
Collaborative Computational Intelligence in
Economics
Shu-Heng Chen
Abstract. In this chapter, we review the use of the idea of collaborative compu-
tational intelligence in economics. We examine two kinds of collaboration: first,
the collaboration within the realm of computational intelligence, and, second, the
collaboration beyond the realm of it. These two forms of collaboration have had a
significant impact upon the current state of economics. First, they enhance and en-
rich the heterogeneous-agent research paradigm in economics, alternatively known
as agent-based economics. Second, they help integrate the use of human agents and
software agents in various forms, which in turn has tied together agent-based eco-
nomics and experimental economics. The marriage of the two points out the future
of economic research. Third, various hybridizations of the CI tools facilitate the
development of more comprehensive treatments of the economic and financial un-
certainties in terms of both their quantitative and qualitative aspects.
1 Introduction
Computational intelligence has been applied to economics for more than a decade.
These applications can be roughly divided into two categories, namely, agent-based
computational economics and financial data mining. Although in many such stud-
ies only one computational intelligence (CI) tool is involved, studies which apply
more than one CI tool also exist and have become popular. In these studies, a few
CI tools work together or collaborate with each other to perform a certain function.
These studies are, therefore, examples of the use of collaborative computational
intelligence. In this chapter, we shall provide a general review of collaborative com-
putational intelligence in economics based on these studies.
There are three major sources that motivate the application of collaborative
computational intelligence (CCI) to economics. The first source of stimulation
Shu-Heng Chen
Department of Economics, National Chengchi University
e-mail: [email protected]
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 233–273.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
234 S.-H. Chen
2 Heterogeneous Agents
Why is collaborative computational intelligence relevant to the study of economics?
There is a straightforward answer: economic agents are heterogeneous, and their
differences and interactions match the idea of CCI well. In this section, we shall see
how computational intelligence has worked with the conventional approach in mod-
eling a population of heterogeneous agents and their interactions. Basically, these
types of collaboration can be differentiated into three levels, from the macroscopic,
to the microscopic, to the molecule level. We shall first briefly state these three levels
of collaboration (Section 2.1), and then elaborate on the significance of the collabo-
ration at each level by highlighting existing research (Sections 2.2-2.4).
in the former case, all agents are represented by genetic programming, while us-
ing different parameters of population size [36], or, for the latter case, some agents
are represented by the K-nearest-neighbors (KNN) algorithm or general instance-
based learning (IBL) algorithms, while some others are represented by Bayesian
learning [21].
By moving further down to even more fine-detail level, referred to as the molecule
level, we can regard each individual agent as being represented by more than one
CI tool [67, 98]. In other words, the idea of hybrid systems is applied to model
agents, and hence their heterogeneity can also be manifested in terms of different
hybridization styles.
Let us now focus on the core of the agent-based financial markets, namely, financial
agents and their design. In reality, financial agents can differ in many dimensions,
ranging from expectations formation (beliefs), trading strategies, information ex-
posure, risk attitudes,and wealth (investment scale), to the need for liquidity, etc.
Given this high-dimensional heterogeneity, the essential question for financial agent
engineering is to decide how much heterogeneity is to be reflected in the artificial
markets. How much coarsely or finely do we want to differentiate these financial
agents?
Before we examine the design of artificial financial agents, it is useful to recall
what we have done for other artifacts. To name a few, the design of artificial ants
(ant algorithms) was motivated by observing the behavior of real ants in a labo-
ratory; the design of artificial bacteria (bacterial algorithms) was inspired by the
microbial evolution phenomenon; the design of the artificial brain (neural networks,
self-organizing maps) was motivated by the study of the real human brain; and the
design of the evolutionary process (evolutionary algorithms) was inspired by real
biological evolution. Generally speaking, the design of an artifact is, by and large,
motivated and guided by the behavior of its counterpart in nature.
The design of artificial financial agents is no exception. It is highly motivated by
observing how real financial agents behave. Empirical evidence accumulated since
the late 1980s and early 1990s has shed new light on the forecasting behavior of
financial agents. This empirical evidence was obtained through different kinds of
surveys, such as questionnaires and telephone interviews, with financial specialists,
bankers, currency traders, and dealers, etc. [50, 2]. The general findings from these
abundantly established empirical data are two-fold. First, the data indicate that, by
and large, there are two kinds of expectations existing in the market. The one which
is characterized as a stabilizing force of the market is associated with a type of fi-
nancial agent, called the fundamentalist. The one which is characterized as a desta-
bilizing force is associated with another type of financial agent, called the chartist,
Collaborative Computational Intelligence in Economics 237
2-Type Design
To make what we say more precise, we generally denote the forecasting rule of a
type-h agent as follows:
where Eh,t refers to the expectations of the type-h agent at time t. Equation (1)
indicates the one-step ahead forecast. At the beginning, we start with a very general
forecasting function fh,t , which uses all the historical data on price up to the present.
In addition, by considering that agents are adaptive, we allow the function to change
over time and hence denote it by the subscript t.
For the fundamentalists (h = f ) and chartists (h = c), their forecast rules, in a
very simple setting, can be written as
3-Type Design
There is little doubt that the behavior of financial agents can be more complex than
the two-type design. One obvious way to scale-up this design is to add more types
of agents to the model so as to take into account a finer degree of heterogeneity of
financial agents. This type of expansion is called the N-type design. For example,
in a three-type design, one can further distinguish two kinds of chartists, namely,
238 S.-H. Chen
Contrarians consider that the price trend will finish soon, and will start to reverse.
However, unlike fundamentalists, contrarians do not base their forecasts on the fun-
damental price, which they either do not know, or they do not care about.
The recent availability of more proprietary data has enhanced the transparency of
the trading behavior of financial agents, including both individual and institutional
investors. Empirical studies using such data have shown that individuals and institu-
tions differ systematically in their reaction to past price performance and the degree
to which they follow momentum and contrarian strategies. On average, individual
investors are contrarian investors: they tend to buy stocks that have recently under-
performed the market and sell stocks that have performed well in recent weeks [15].
With this empirical basis, financial agent engineering has already added the contrar-
ians to the fundamentalist-chartist model, and popularized this three-type design.
Financial agent engineering can also be advanced by enriching the behavioral rules
associated with each type of financial agent. This alteration may make financial
agents more interdisciplinary. Considerations from different fields, including neural
sciences, cognitive psychology, and statistics, can be incorporated into designs. For
example, in behavioral finance, there is a psychological bias known as the “law of
small numbers”, which basically says that people underweight long-term averages,
and tend to put too much weight on recent experiences (the recency effect). When
equity returns have been high for many years, financial agents with this bias may
believe that high equity returns are “normal”. By design, we can take such bias into
account. One way to do so is to add a memory parameter to the behavioral rules of
our financial agents. This more general rule for contrarians is specified as follows:
T
Ec,t (pt+1 ) = pt + αc (1 − βc) ∑ (βc )i (pt−i − pt−i−1 ), 0 ≤ αc , 0 ≤ βc ≤ 1. (5)
i=0
T
Eco,t (pt+1 ) = pt + αco (1− βco ) ∑ (βco )i (pt−i − pt−i−1 ), 0 ≥ αco , 0 ≤ βco ≤ 1. (6)
i=0
The momentum traders and contrarians now compute a moving average of the
past changes in the stock price and they extrapolate these changes into the future
of the stock price. However, we assume that there is an exponential decay in the
weights given to the past changes in the stock price. The parameters βc and βco can
be interpreted as reflecting the memory of momentum traders and contrarians. If
βc = βco = 0, momentum traders and contrarians remember only the last period’s
price change and they extrapolate this into the future. When βc and βco increase, the
Collaborative Computational Intelligence in Economics 239
weight given to the price changes farther away in the past increases. In other words,
the chartists’ memory becomes longer.
The psychological bias mentioned earlier, therefore, corresponds to a small
value of this memory parameter, and this “hypothesis” can actually be tested. In
fact, by using the data for the S&P 500 index, one of the three major US stock
market indices, from January 1980 to December 2000, [4] actually estimated a
three-type agent-based financial market model, and found that contrarians have a
longer memory than momentum traders when they form their forecast of the fu-
ture price. Of course, this is just the beginning in terms of seeing how agent-based
financial market models can be quantified so as to communicate with behavioral
finance.
Adaptive Behavior
In the original fundamentalist-chartist model, learning does not exist. Agents who
initially happen to be fundamentalists will continue to be fundamentalists and will
never change this role, and likewise for chartists. As a result, the proportion (market
fraction) of fundamentalists and chartists remains fixed. Nonetheless, this simplifi-
cation underestimates the uncertainty faced by each trader. In general, traders, be
they fundamentalists or chartists, can never be certain about the duration of the
biased trend, since the trend can finish in weeks, months, or years. This uncer-
tainty causes the alerted traders to review and revise their beliefs constantly. In other
words, traders are adaptive.
Therefore, a further development of financial agent engineering is to consider
an evolving micro-structure of market participants. In this extension, the idea of
adaptive agents or learning agents is introduced into the model. Hence, an agent who
was a fundamentalist (chartist) may now switch to being a chartist (fundamentalist)
if he considers this switching to be more promising. Since, in the two-type model,
agents can only choose to be either a fundamentalist or a chartist, modeling their
learning behavior becomes quite simple, and is typically done using a binary-choice
model, specifically, the logit model or the Gibbs-Boltzmann distribution.
The logit model, also known as the Luce model, is the main model used in the
psychological theory of choice, and was proposed by Ducan Luce in 1959 in his
seminal book, “Individual Choice Behavior: A Theoretical Analysis.” Consider two
alternatives f (fundamentalist) and c (chartist). Each will produce some gains to the
agent. However, since the gain is random, the choice made by the agent is random
as well. The logit model assumes that the probability of the agent choosing f is the
probability that the profits or utilities gained from choosing f are greater than those
gained from choosing c. Under a certain assumption for the random component of
the utility, one can derive the following binary logit model:2
expλ V f ,t−1
Prob(X = f ,t) = , (7)
expλ V f ,t−1 + expλ Vc,t−1
2 The extension into the multinomial logit model is straightforward.
240 S.-H. Chen
where V f ,t and Vc,t are the deterministic components of the gains from the alterna-
tives f and c at time t. The parameter λ is a parameter carried over from the assumed
random component. The logit model says that the probability of choosing the alter-
native f depends on its absolute deterministic advantages, as we can see from the
following reformulation:
1
Prob(X = f ,t) = −(λ (V f ,t−1 −Vc,t−1 )
. (8)
1 + exp
When applied to the agent-based financial models, these deterministic components
are usually related to the temporal realized profits associated with different fore-
casting rules. So, in the two-type model, if V f can be the temporal realized profits
from being a fundamentalist, then Vc can be the temporal realized profits from be-
ing a chartist. In addition, there is a new interpretation for the parameter β , namely,
the intensity of choice, because it basically measures the extent to which agents are
sensitive to the additional profits gained from choosing f instead of c.
The market fractions above then determine the market fraction of each type of agent
in the market. For example, if Prob(X = F) = 0.8, it means that 80% of the market
participants are fundamentalists and the remaining 20% are chartists. The asset price
will be determined by this market fraction via the market maker equation.
pt = pt−1 + μ0 + μ1 Dt (9)
where
Dt = ∑ wh,t dh,t = ∑ Prob(X = h,t)dh,t . (10)
h h
Equation (9) is the market maker equation, which assumes that the price is adjusted
by the market maker, whose decision is in turn determined by the excess demand
normalized by the number of market participants, Dt . Dt , in Equation (10), is a
weighted average of the individual demand of each type of trader, weighted by the
market fractions (7).
The demand for assets of each type of trader is derived in a standard expected-
utility maximization manner, which depends on the risk preference of the type-h
agent. Risk preference is important because it is the main determinant of agents’
portfolios, i.e., how agents’ wealth is distributed among different assets. The classi-
cal Markowitz mean-variance portfolio selection model offered the first systematic
treatment of asset allocation. Harry Markowitz, who later received the 1990 Nobel
Prize in Economics for this contribution, assumes that investors are concerned only
with the mean and variance of returns. This mean-variance preference has been
Collaborative Computational Intelligence in Economics 241
extensively applied to modeling agents’ risk preference since the variance of returns
is normally accepted as a measure of risk.
In addition to the mean-variance preference, there are two other classes of risk
preferences that are widely accepted in the standard theory of finance. These two
correspond to two different attitudes toward risk aversion. One is called constant
absolute risk aversion (CARA), and the other is called constant relative risk aver-
sion (CRRA). When an agent’s preference exhibits CARA, his demand for the risky
asset (or stock) is independent of his changes in wealth. When an agent’s preference
exhibits CRRA, his demand for risky assets will increase with wealth in a linear way.
Using a Taylor expansion, one can connect the mean-variance preference to CARA
preferences and CRRA preferences. In fact, when the returns on the risky assets
follow a normal distribution, the demand for risky assets under the mean-variance
preference is the same of that under the CARA preference, and is determined by the
subjective-risk-adjusted expected return.
While putting this N-type design into practical financial forecasting is still in its
infancy stage, we have already seen some successful initial attempts in foreign ex-
change markets, which can be found in [43], a three-type design, and [75], a two-
type design.
So far, all the types and rules of financial agents are given at the beginning of the
design, and what financial agents can do is to choose among these different types
and rules based on their past experiences. The N-type design has characterized a
major class of agent-based financial markets. However, this way of doing things
also severely restricts the degree of autonomy available for financial agents. First,
they can only choose how to behave based on what has been offered; secondly, as
a consequence, there will be no new rules available unless they are added outside
by the designers. If we want our artificial financial agents to behave more like real
financial agents, then we will certainly expect that they learn and discover on their
own. Therefore, as time goes by, new rules which have never been used before and
have not been supplied by the designer may be discovered by these artificial agents
inside the artificial world.
242 S.-H. Chen
Genetic Algorithms
Designing artificial agents who are able to design on their own is an idea similar to
John von Neumann’s self reproducing automata, i.e., a machine which can repro-
duce itself. This theory had a deep impact on John Holland, the father of the genetic
algorithmindexgenetic algorithms. Under von Neumann’s influence, Holland had
devoted himself to the study of a general-purpose computational device that could
serve as the basis for a general theory of automata. In the 1970s, he introduced the
genetic algorithm, which was intended to replace those ad hoc learning modules
in contemporary mainstream AI. Using genetic algorithms, Holland could make an
adaptive agent that not only learned from experience but could also be spontaneous
and creative. The latter property is crucial for the design of artificial financial agents.
In 1991, Holland and John Miller, an economist, published a sketch of the artificial
adaptive agent in the highly influential American Economic Review. This blueprint
was actually carried out in an artificial stock project in 1988 in the Santa Fe Institute
[82, 10].
Armed with GAs, the Santa Fe Artificial Stock Market (SFI-ASM) considers a novel
design for financial agents. First, like many N-type designs, it mainly focuses on
the forecasting behavior of financial agents. Their trading behavior, as depicted in
Equation (11), will depend on their forecasts of the price in the next period. Second,
however, unlike the N-type designs, these agents are not divided into a fixed number
of different types. Instead, the forecasting behavior of each agent is “customized”
via a GA. We shall be more specific regarding its design because it provides us
with a good opportunity to see how economists take advantage of the increasing
computational power to endow artificial decision makers with a larger and larger
degree of autonomy.
In the SFI-ASM, each financial agent h uses a linear forecasting rule as follows:
However, the coefficients αh,t and βh,t not only change over time (time-dependent),
but also are state-dependent. That is, the value of these two coefficients at time t will
depend on the state of the economy (market) at time t. For example, the recent price
dynamics can be an indicator, so, say, if the price has risen in the last 3 periods,
the financial agent may consider lower values of both α and β than otherwise. The
price dividend ratio can be another indicator. If the price dividend ratio is lower
than 50%, then the financial agent may want to take a higher value of β than if it is
not. This state-dependent idea is very similar to what is known as classification and
regression trees (CART) or decision trees, a very dominant approach in machine
learning.
Therefore, one simple way to think of the artificial agents in the SFI-ASM is that
they each behave as machine-learning people who use regression trees to forecast
Collaborative Computational Intelligence in Economics 243
the stock price. At each point in time, the agent has a set of indicators which help him
to decompose the state of the economy into m distinct classes, (A1h,t , A2h,t , ..., Am
h,t ),
and corresponding to each of the classes there is an associated linear forecasting
model. Which model will be activated depends on the state of the market at time t,
denoted by St . Altogether, the behavior of the financial agent can be summarized as
follows:
⎧ 1
⎪
⎪ αh,t + βh,t
1
pt , i f St ∈ A1h,t ,
⎪
⎪
⎪
⎪ α + βh,t pt ,
2 2 i f St ∈ A2h,t ,
⎨ h,t
. .
Eh,t (pt+1 ) = (13)
⎪
⎪ . .
⎪
⎪
⎪
⎪ . .
⎩ m
αh,t + βh,t
mp,
t i f St ∈ Amh,t .
A few remarks are added here. First, the forecasting rule summarized above is
updated as time goes by, as we keep the subscript t there. So, agents, in this sys-
tem, are learning over time with a regression tree, or they are using a time-variant
regression tree, in which all the regression coefficients and classes may change ac-
cordingly with the agents’ learning. Second, agents are completely heterogeneous
as we also keep the subscript h above. Therefore, if there are N financial agents in
the markets at each point in time, we may observe N regression trees, each of which
is owned and maintained by one individual agent. Third, however, the forecasting
rules introduced in the SFI-ASM are not exactly regression trees. They are, in fact,
classifier systems.
Classifier System
A classifier system is another of John Holland’s inventions in the late 1970s. This
system is similar to the Newell-Simon type of expert system, which is a population
of if-then or condition-action rules. The conventional expert systems are not able to
learn by themselves. To introduce adaptation into the system, Holland applied the
idea of market competition to a society of if-then rules. A formal algorithm, known
as the bucket-brigade algorithm, credits rules generating good outcomes and debits
rules generating bad outcomes. This accounting system is further used to resolve
conflicts among rules. The shortcoming of the classifier system is that it cannot
automatically generate or delete rules. Therefore, a GA is applied to evolve them
and to discover new rules.
This autonomous-agent design has been further adopted in many later studies.
While most studies continuously carried out this task using genetic algorithms3, a
few studies also used other population-based learning models, such as evolutionary
programming and genetic programming.
3 A lengthy review of this literature can be found in [23].
244 S.-H. Chen
The development from the few-type designs to the many-type designs and further to
the autonomous-agent designs can be considered to be part of a continuous effort to
increase the collective search space of the forecasting function Eh,t , from finite to in-
finite space, and from parametric to semi-parametric functions. The contribution of
genetic programming (GP) to this development is to further extend the search space
to a infinite space of non-parametric functions, whose size (e.g., the dimensionality,
the cardinality or the number of variables used) and shapes (for example, linearity
or non-linearity, continuity or discontinuity) have to be determined, via search, si-
multaneously. This way of increasing the degree of autonomy may not contribute
much to the price dynamics, but can enrich other aggregate dynamics as well as the
behavior at the individual level. As we shall see below, the endogenous determina-
tion of the size and shape of Eh,t provides us with great opportunities to see some
aspects of market dynamics which are not easily available in the N-type designs or
other autonomous-agent design.
The first example concerns the sophistication of agents in market dynamics. The
definition and operation of GP rely on a specific language environment, known as
LIST Programming (LISP). For each LISP program, there is a tree representation.
The number of nodes (leaves) or the number of depths in the LISP trees provides
one measure of complexity in the vein of the program length. This additional obser-
vation enables us to study not just the heterogeneity in Eh,t , but also the associated
complexity of Eh,t . In other words, genetic programming can not only distinguish
agents by their forecasts, as the N-type designs did, but further delineate the dif-
ferentiation according to the agents’ sophistication (complexity). Must the survival
agents be sophisticated or can the simple agents prosper as well?
One interesting hypothesis related to the above inquiry is the monotone hypoth-
esis: the degree of traders’ sophistication is an increasing function of time. In other
words, traders will evolve to be more and more sophisticated as time goes on.
However, this hypothesis is rejected in [33]. They found that, based on the statis-
tics on the node complexity or the depth complexity, traders can evolve toward
a higher degree of sophistication, and at some point in time, they can be simple
as well.
The second example concerns the capability to distinguish the information from
noise. As we mentioned earlier, the variables recruited in the agents’ forecasting
function are also endogenously determined. This variable-selection function allows
us to examine whether the smart picking of these variables is crucial for survival.
In particular, the hypothesis of the extinction of noisy traders says that traders who
are unable to distinguish information from noise will become extinct. [34] test this
hypothesis. In an agent-based artificial market, they supplied traders with both in-
formative and noisy variables. The former include prices, dividends and trading vol-
umes, whereas the latter are just series of pseudo random numbers. Their simulation
shows, as time goes on, that traders who are unable to distinguish information from
noise do have a tendency to decline and even become extinct.
Collaborative Computational Intelligence in Economics 245
This type of momentum trader is naive in the sense that they continuously believe
that what they experience today regarding the price change will remain unchanged
tomorrow. Given this naive momentum trader, they also introduced two sophisti-
cated type of agents, namely, empirical Bayesian traders and K-nearest-neighbor
traders. Both empirical Bayesian traders and K nearest neighbors (KNN) are active
members of the CI toolkit.
Bayesian Learning
Before the advent of computational intelligence in the early 1990s, Bayesian learn-
ing was the dominant learning model used by economists. Economists have a strong
preference for Bayesian learning partially because in spirit it is consistent with opti-
mization. The optimality of Bayesian learning has been well established in statisti-
cal decision theory. It has a lot of variants and applications in regard to the economic
modeling of learning. The two most popular ones are Kalman filtering and recursive
least squares.12
As a Bayesian, the trader forecasts pt+1 using his posterior distribution (belief)
of pt+1 , denoted by ft+1 (p | x). ft+1 (p | x) is the trader’s updated subjective belief
of the distribution of the price pt+1 after receiving the state information x at time t.
The updating formula is the famous Bayes rule:
ft (p)ht (x | p)
ft+1 (p | x) = (15)
ft ( p̄)ht (x | p̄)d p̄
The Bayesian trader will then forecast using the posterior mean:
Eb,t (pt+1 ) = p ft+1 (p | x), (16)
where Eb,t refers to the prediction made by the Bayesian trader at time t. Intu-
itively speaking, the Bayesian trader has a set of possible predictions (hypotheses)
St = {pt+1
e }, and not just a single degenerated prediction (hypothesis) pe . The
t+1
possibility of each of the possible predictions in the set St is governed by the pos-
terior distribution (15). It is now clear that Bayesian traders need to have a greater
mental capacity to first keep a set of hypotheses and then to deal with possibly very
demanding computations involved in (15) and (16).13
There is, however, a way to reduce this very demanding work. With the assump-
tion of the multivariate normal distribution, the entire updating of the posterior dis-
tribution can be reduced to the updating of two parameters only, namely, the mean
and the variance. In this case, we have the familiar Kalman filtering. By denoting
these expectations by Ek,t , then
12 See [89] and [47] for details.
13 It has been argued that the inability of humans to produce consistent and reliable probabil-
ity and preference judgments may explain why Bayesian decision theory fails in view of
this lack of necessary inputs.
248 S.-H. Chen
K Nearest Neighbors
KNN differs from the conventional time-series modeling techniques. The conven-
tional time-series modeling, known as the Box-Jenkins approach, is a global model,
which is concerned with the estimation of the function, be it linear or non-linear, in
the following form:
by using all of the information up to t, i.e., Pms , ∀s ≤ t, where the estimated function
fˆ is assumed to hold for every single point in time. As a result, what will affect
pt+1 most is its immediate past pt , pt−1 , ... under the law of motion estimated by all
available samples.
For KNN, while what affects pt+1 most is also its immediate past, the law of
motion is estimated only with similar samples, and not all samples. The estimated
function fˆt is hence assumed to only hold for that specific point in time. To facilitate
the discussion, we introduce the following notations.
1 , P2 , ..., PT , Pt ∈ R , ∀ t = 1, 2, ..., T
Pm m m m m
(19)
N (Ptm ) = {s | Rank(d(Ptm , Pm
s )) ≤ k, ∀s < t}, (21)
14 See [20] for a fine overview of empirical Bayes, and also [19] for an in-depth treat-
ment. The BUGS software provides an implementation of empirical Bayes methods us-
ing Markov Chain Monte Carlo [51]. The software is available from http://www.mrc-
bsu.cam.ac.uk/bugs.
Collaborative Computational Intelligence in Economics 249
In other words, Ptm itself serves as the centroid of a cluster, called the neighborhood
of Ptm , N (Ptm ). It then invites its k nearest neighbors to be the members of N (Ptm )
by ranking the distance d(Ptm , Pm s ) over the entire community
{Pm
s | s < t} (22)
In practice, the function f used in (23) can be very simple, either taking the
unconditional mean or the conditional mean. In the case of the latter, the mean is
usually assumed to be linear. In the case of the unconditional mean, one can simply
use the simple average in the forecast,
The efficient market hypothesis implies that there are no profitable strategies, and
hence learning, regardless of its formalism, does not matter. As a result, the three
types of traders, momentum traders, empirical Bayesian and k-nearest-neighbor
traders should behave equally well, at least in the long run. However, when the mar-
ket is not efficient, and learning may matter, it is expectedthat smarter agents can
take advantage of dumber agents. In their experiments, [21] found that momentum
traders, who never learn, performed worst during the transition period when market
is not efficient. Furthermore, the empirical Bayesian traders was also outperformed
by the KNN traders. While the two types of traders started learning at the same time
and competed with each other to discover the true price, evidently the KNN traders
were able to exploit predictability more quickly than the empirical Bayesian traders.
[21] points to a new style of application of CCI to economics, namely, using an
agent-based environment to allow for a more vivid competition of different CI tools,
each of which is to represent an opportunity-seeking trader with different degrees
15 Even though the functional form is the same, the coefficients can vary depending on Ptm
and its resultant N (Ptm ). So, we add a subscript t as ft to make this time-variant property
clear.
250 S.-H. Chen
of smartness. This style of application is not the same as a general forecasting tour-
nament, in which a number of CI tools also compete with each other in forecasting
a given time series. The key difference between the two styles of application lies in
the complex interacting effects among these competing CI tools, which obviously
exist in the former style, but not the latter. Put alternatively, [21] demonstrates an
embodied game-theoretic environment for a set of CI tools, which may be coined as
game-theoretic CCI.
10, 20,..., to 50. A smaller population size assumes a lower degree of smartness,
whereas a larger population size implies a higher degree of smartness.
It is found that, other things being equal, increasing the intelligence of individual
traders can contribute positively to the realized social welfare, a measure of market
efficiency. Nevertheless, it is also found that, other things being equal, increasing the
number of intelligent traders can exert an negative influence on the realized social
welfare. These findings, therefore, suggest an interesting implication: if the increase
in the number of intelligent traders is inevitable, then the increase in social welfare
can be made possible only if all intelligent traders become smarter.
The significance of the degree of smartness is pursued in [25]. In the context of
an agent-based artificial stock market, [25] address whether agents with different
degrees of smartness may result in different wealth. This brings us closer to the
original concerns of psychometricians mentioned in Section 2.3.1. In this study,
artificial traders are all modeled by genetic algorithms (GA). They use a GA to do
the forecasting, and then use it again to engage in portfolio optimization. By varying
the control parameter of the GA, [25] are able to design traders with different levels
of intelligence. In this case, the chosen control parameter is the size of the validation
window, and this choice can be justified as follows.
In machine learning literature, it is very common to divide our data into three
parts, namely, the training set, validation set, and testing set. The purpose of the
validation sample is to prevent the trained model from being subjected to over-
learning or over-fitting. In the environment of [25], it can be shown that the size of
the validation window can affect the forecasting accuracy of the model constructed,
which in turn will affect the quality of the portfolio decision. Through agent-based
simulation, they, therefore, show that the agents’ degree of smartness can positively
affect their wealth share. Not surprisingly, the smarter they are, the wealthier they
become.
Sections 2.3.2 and 2.3.3 are both concerned with an economy composed of agents
with different degrees of smartness. These kinds of application can then examine
how these different degrees of smartness can contribute to the resultant heterogene-
ity in economic performance. Therefore, they provide us with replications or pre-
dictions of the correlation between IQ and performance in a social context. One
can further ask, to what extent, the institutional design can eliminate or minimize
the impact of the heterogeneity in intelligence on the heterogeneity in economic
performance, for example, income inequality [79].
However, one may also be concerned with the comparisons between different
economies or different groups of agents. For example, [72, 71] provide rich re-
sources on the comparative studies of IQ among different countries and races. On
the other hand, differences in individuals’ behavior among different societies can
also be attributed to the culture factor. The recent path-breaking studies in this area
can be found in [55, 56]. Using experimental results from the ultimatum bargain-
ing games, [55] is able to show that “economic decisions and economic reasoning
252 S.-H. Chen
Minority Games
[90] addressed the traffic-flow problem in the context of games. This issue concerns
the most efficient distribution of the road space among drivers, characterized by the
travel time among different paths among drivers connecting the same origin with the
destination being identical. The intriguing part of this issue is: can we achieve this
goal in a bottom-up manner without the top-down supervision? [90] explored the
possibility by assuming that each driver learns how to choose the paths by means
of reinforcement learning. Several different versions of reinforcement learning have
been attempted. They differ in one key parameter, learning speed or the degree of
forgetting. It has been found that the allocative efficiency of roads is not independent
of this parameter. In other words, unless the learning speed is tuned correctly, there
is no guarantee that drivers will necessarily coordinate their use of roads in the most
efficient way, and congestion can happen all the time.
The congestion problem, also known as minority games, originates from the fa-
mous El Farol problem, which was first studied by [9]. The problem concerns the
attendance at the bar, El Farol, in Sante Fe. Agents’ willingness to visit the bar on
a specific night depends on their expectations of the attendance at that night. An
agent will go to the bar if her expected attendance is less than the capacity of the
bar; otherwise, she will not go. [9] showed that the time series of attendance levels
seems to always fluctuate around an average level of the capacity of the bar. How-
ever, agents in [9] reason with a fixed set of models, deterministically iterated over
time. Discovering new models is out of the question in this set-up.
[49] replace this fixed set of rules with a class of auto-regressive (AR) models.
Furthermore, the number of lag terms and the respective coefficients are revised and
renewed via evolutionary programming (EP). The introduction of EP to the system
of AR agents has a marked impact on the observed behavior: the overall result is
one of large oscillations rather than mild fluctuations around the capacity.
[90] and [49] together show that there is no guarantee that agents with arbitrary
learning algorithms, characterizing different cultures, habits, routines, or IQ, can
Collaborative Computational Intelligence in Economics 253
coordinate well to avoid congestion and maximize social efficiency, and the coordi-
nation limit is affected by the IQ or cultures of society, which are characterized by
various computational intelligence tools.
The idea of hybrid systems is also employed to build individual agents. In this case,
each individual is represented by more than one CI tool, and is an incarnation of a
specific style of CCI.
[67] provides the first application of this kind, and, in this case, the specific style is
the evolutionary artificial neural net. In the context of the SFI artificial stock market,
the financial agents are required to solve the portfolio optimization problem, which
involves the distribution of the savings into risky and riskless assets, something
which is similar to Equation (11). Equation (11) is a typical two-stage decision, i.e.,
the forecast decision is made before the investment decision, but [67] considered
a reduced one-stage decision. The mapping is, therefore, directly constructed from
the information available at time t − 1 to the optimal portfolio at time t, yh,t . More
precisely, the financial agents are first represented by an artificial neural network, or
a feedforward neural network with one hidden layer, to be exact.
l p
yh,t = h2 (w0 + ∑ w j h1 (w0 j + ∑ wi j xi,t−1 )) (25)
j=1 i=1
p
The information set, {xi }i=1 includes past dividends, returns, the price/dividend ra-
tio, and trend-following technical trading indicators. This population of investment
decision rules (over all agents) is then evolved with genetic algorithms to symbolize
the evolutionary learning of financial agents.
Another related development has occurred in the use of natural language. People
frequently and routinely use natural language or linguistic values, such as high,
low, and so on, to describe their perceptions, demands, expectations, and decisions.
Some psychologists have argued that our ability to process information efficiently is
the outcome of applying fuzzy logic as part of our thought processes. The evidence
on human reasoning and human thought processes supports the hypothesis that at
least some categories of human thought are definitely fuzzy. Yet, early agent-based
economic models have assumed that agents’ adaptive behavior is crisp. [98] made
progress in this direction by using the genetic-fuzzy classifier system (GFCS) to
model traders’ adaptive behavior in the SFI-like artificial stock market.
[98] considers a fuzzy extension of the forecasting function (13). In Equation
(13), each forecasting rule has two coefficients, the constant term (α ) and the
254 S.-H. Chen
slope (β ). Without any augmentation, these forecasting rules are simply linear, and
cannot be expected to work well. The original SFI-ASM made them non-linear by
making these two coefficients state dependent via the classifier system. However,
the two coefficients are crisp. [98] applies the Mamdani style of fuzzy rules to make
them fuzzy. As an illustration, the Mamdani style of a fuzzy if-then rule is:
whereas the input set “A” and the output set “B” are both fuzzy. In [98], this
application becomes something like:
If pt
MA(5) is “ low”, then α is “moderately high”, and β is “moderately high”.
Obviously, the terms “low”, “high”, “moderately low”, and “moderately high” are
all linguistic variables, and they are represented by the respective membership
functions. The state variable is pt /MA(5), where MA(5) is the moving average of
the price over the last five periods. So, this rule compares the current price with the
5-day moving average, and if the ratio is low enough, then both α and β will be
moderately high. Of course, the above fuzzy forecasting rule can easily be extended
to include more variables. For example,
“If ptd×r
t
t
is high and pt /MA(5) is moderately low and pt /MA(10) is moderately
high and pt /MA(100) is low and pt /MA(500) is high, then α is “moderately
low”, and β is “high”.
pt rt /dt reflects the current price in relation to the current dividend and it in-
dicates whether the stock is above or below the fundamental value at the current
price. The inclusion of this information makes agents behave like fundamentalists.
The remaining four state variables indicate whether the price history exhibits a trend
or similar characteristic. The inclusion of this information makes agents behave
more like chartists. Therefore, by combining these state variables, the financial
agents may choose to behave more like fundamentalists or more like chartists.17
We have now reviewed how the idea of CCI enhances the heterogeneous-agent re-
search paradigm at the macro, micro and molecule levels. In addition to that, the
idea of CCI also plays an important role in the recent efforts made by economists
to overarch agent-based computational economics (ACE) and experimental eco-
nomics. It has been argued in many instances that agent-based simulation should be
integrated with experiments using human subjects, for example, [45], [60] and [74].
The relationship between agent-based computational economics and experimental
17 This design is not the 2-type design as we see in Section 2.2.2.
Collaborative Computational Intelligence in Economics 255
economics is, in essence, a relationship between human agents and software agents.
The literature has already demonstrated three possible ways of closely relating ACE
to experimental economics, namely, mirroring, competition and collaboration. They
appear in the literature in chronological order.
3.1 Mirroring
The early ACE studies are clearly motivated by using software agents to mimic
the behavior of human agents observed in the laboratory. The famous Turing test
serves as the best illustration. [8] point out that the development of social science
theories can be likened to the task of building a computer to mimic human behavior,
or equivalently, to building a computer that will pass the Turing test in the range of
behavior covered by the theory. Thus, a social science theory can be deemed to be
successful when it is no longer possible for a computer judge to tell the difference
between behavior generated by humans and that generated by the theory (i.e., by a
machine).
In this regard, the two CI tools, namely, genetic algorithms and genetic program-
ming are frequently used to build software agents such that their collective behavior
can mirror the laboratories with human subjects. [6] pioneered this research direc-
tion. [6] applied two versions of GAs to study market dynamics in a cobweb model.
The basic GA involves three genetic operators: reproduction, crossover, and muta-
tion. Arifovic found that in each simulation of the basic GA, individual quantities
and prices exhibited fluctuations for its entire duration and did not result in con-
vergence to the rational expectations equilibrium values, which is quite inconsistent
with experimental results involving human subjects.
Arifovic’s second GA version, the augmented GA, includes the election opera-
tor in addition to reproduction, crossover, and mutation. The election operator in-
volves two steps. First, crossover is performed. Second, the potential fitness of the
newly-generated offspring is compared with the actual fitness values of its parents.
Among the two offspring and two parents, the two highest fitness individuals are
then chosen. The purpose of this operator is to overcome difficulties related to the
way mutation influences the convergence process, because the election operator can
bring the variance of the population rules to zero as the algorithm converges to the
equilibrium values.
The results of the simulations show that the augmented GA converges to the ratio-
nal expectations equilibrium values for all sets of cobweb model parameter values,
including both stable and unstable cases, and can capture several features of the ex-
perimental behavior of human subjects better than other simple learning algorithms.
To avoid the arbitrariness of choice of an adaptive scheme, [70] suggested that com-
parison of the behavior of adaptive schemes with behavior observed in laboratory
experiments with human subjects can facilitate the choice of a particular adaptive
256 S.-H. Chen
The application of genetic programming to the cobweb model started from [30]. [30]
compared the learning performance of GP-based learning agents with that of GA-
based learning agents. They found that, like GA-based learning agents, GP-based
learning agents can also learn the homogeneous rational expectations equilibrium
price under both the stable and unstable cobweb case. However, the phenomenon of
price euphoria, which did not happen in [6], does show up quite often in the early
stages of the GP experiments. This is mainly because agents in their setup were
initially endowed with very limited information as compared to [6]. Nevertheless,
GP-based learning can quickly coordinate agents’ beliefs so that the emergence of
price euphoria is only temporary. Furthermore, unlike [6], [30] did not use the elec-
tion operator. Without the election operator, the rational expectations equilibrium
is exposed to potentially persistent perturbations due to the agents’ adoption of the
new, but untested, rules. However, what shows up in [30] is that the market can still
bring any price deviation back to equilibrium. Therefore, the self-stabilizing feature
of the market, known as the invisible hand, is more powerfully replicated in their
GP-based artificial market.
The self-stabilizing feature of the market demonstrated in [30] was further tested
with two complications. In the first case, [31] introduced a population of speculators
to the market and examined the effect of speculations on market stability. In the
second case, the market was perturbed with a structural change characterized by
a shift in the demand curve, and [32] then tested whether the market could restore
the rational expectations equilibrium. The answer to the first experiment is generally
negative, i.e., speculators do not enhance the stability of the market. On the contrary,
they do destabilize the market. Only in special cases when trading regulations, such
as the transaction cost and position limit, were tightly imposed could speculators
enhance the market stability. The answer for the second experiment is, however,
positive. [32] showed that GP-based adaptive agents could detect the shift in the
demand curve and adapt to it. Nonetheless, the transition phase was non-linear and
non-smooth; one can observe slumps, crashes, and bursts in the transition phase. In
addition, the transition speed is uncertain. It could be fast, but could be slow as well.
In addition to genetic algorithms, genetic programming is also extensively ap-
plied to build systems of software agents which are able to replicate the labora-
tory results with human subjects. [26] studied bargaining behavior observed in the
double-auction laboratory markets with human subjects. All buyers and sellers in
[26] are artificial adaptive agents. Each artificial adaptive agent is built upon ge-
netic programming. The architecture of genetic programming used is what is known
as multi-population genetic programming (MGP). Briefly, they viewed or modeled
Collaborative Computational Intelligence in Economics 257
Experiment 1 Experiment 2
Market 7
15500 15500
14500 14500
13500 13500
12500 12500
11500 11500
10500 10500
9500 9500
8500 8500
7500 7500
1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1
1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0
1 1
Market 10
1950 1950
1450 1450
950 950
450 450
-50 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 -50 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1
1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0
1 1
Market 20
2450 2450
2250 2250
2050 2050
1850 1850
1650 1650
1450 1450
1250 1250
1050 1050
1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1
1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0
1 1
Fig. 2 Agent-Based Double Auction Market Simulation with GP Agents. The three markets
presented here are selected and adapted from [26], Fig. 4
3.2 Competition
In addition to mirroring the collective behavior of human agents, software agents
are also used directly to interact with human agents. This advancement is partially
18 The number of bargaining strategies assigned to each bargaining agent is called the popu-
lation size. AIE-DA Version 2, developed by the AI-ECON Research Center, allows each
agent to have at most 1000 bargaining strategies.
258 S.-H. Chen
3.3 Collaboration
At the third stage, neither do we mirror, nor do we match, the two kinds of agents.
Human agents now work with software agents as a team, and they are no longer
treated as two entities. The modern definition of artificial intelligence has already
given up the dream of passing the Turing test. Instead, a more realistic and also in-
teresting definition is based on the team work cooperatively performed by software
agents and human agents [68]. If the studies of the first two stages can be considered
to be the works under the influence of classical AI, then the third development is a
natural consequence of the modern AI.
One of the reasons why human agents did not use software agents as their repre-
sentatives was that they did not feel comfortable with them, or else they did not
260 S.-H. Chen
quite trust them. In other words, these software agents were not customized. In a
subsequent experiment, they considered a different setup which may blur the rela-
tions between human and software agents. That is, they asked each human agent
to write his or her own trading program (software agents), and used them as their
incarnations in the later agent-based market simulation. This idea is very similar to
the game-like tournaments pioneered by Robert Axelrod in the mid-1980s [12, 63],
and the market-like tournament initiated by the Santa Fe Institute in the early 1990s
[86]. However, moving one step further, they considered a comparison between the
simulation based on these humanly-supplied software agents and the one based on
purely computer-generated software agents. For the latter case, genetic program-
ming was applied to generate the autonomous agents, and the platform AIE-DA
was used to implement the simulation (see also Section 3.1.2).
Out of the 20 simulations which they carried out, they found that the market
composed of purely computer-generated software agents that are autonomous and
adaptive performs consistently better, in terms of market efficiency, than the mar-
ket composed of humanly-supplied software agents, even though humanly-supplied
software agents are more sophisticated or thoughtful.
The two experiments above together have two implications. The first experiment,
from a sociological viewpoint, provides evidence that human agents may have dif-
ficulty embracing (containing) software agents when making decisions. The second
experiment further shows that if we allow human agents to choose or even design
their own software agents, their collective behavior may not be the same as that ob-
served from the computer-generated software agents. This second finding is similar
to that of [8].21
4 Hybrid Systems
Section 2 reviews the applications of CCI to agent-based computational economics
(the heterogeneous-agent research paradigm), and Section 3 reviews the applications
of CCI to experimental economics. In both of these two cases, CCI is mainly used to
build software agents in economic and financial models. In other words, CCI con-
tributes to economic and financial agent engineering. Another major area to which
CCI is also vastly applied, with an even longer history, is financial engineering. This
application is mainly motivated by the rapid development of various hybridizations
of CI tools. There are a number of hybrid systems frequently observed in financial
engineering. We shall only sample some in this section.
decade. 22 The room for this collaboration is available mainly because of the het-
erogeneity of CI tools (see also Section 2). Different CI tools are designed with
different mechanisms inspired from various natural and artificial processes; there-
fore, they may each handle one or a few different aspects of an intelligent task [23].
To name a few, self-organizing maps are operated for pattern discovery and concept
formation, auto-association neural networks are good for the removal of redundan-
cies and data compression, feedforward and recurrent neural networks are regarded
as universal function approximators, and the approximation process can usually be
facilitated by genetic algorithms. With this diversity in specialization, it would be
surprising to see very little collaboration but much competition among them, as has
been developed in the literature over the past. The recent research trend seems to
recognize this biased development and move back to the collaboration theme ac-
cordingly.
Financial hybrid systems are mainly the financial applications of the hybrid sys-
tems or multi-agent systems. Among the many designs of financial hybrid systems,
it is important to distinguish models from processes, particularly, evolutionary pro-
cesses. Many financial hybrid systems are designed based on the idea of putting
a model or a population of models under an evolutionary process, which includes
evolutionary artificial neural networks, evolutionary fuzzy inference systems, evo-
lutionary support vector machines, etc. We shall start with a review of this main idea
(Section 4.2). The second major element we experienced in hybrid financial systems
is the desire to make semantic sense of the hybrid systems, which includes the use
of natural language and qualitative (non-numeric) reasoning. We shall then provide
a brief review of this (Section 4.3). We conclude this section by mentioning the col-
laboration work done with the data or database, such as feature selection, dimension
reduction, etc. (Section 4.4).
Among all hybridizations of CI in finance, the most popular one is probably the
combination of genetic algorithms and artificial neural nets, which is one of the
kinds of evolutionary artificial neural nets (EANNs), referred as to GANNs (genetic
22 A more comprehensive treatment of the hybrid system can be found in [18] and [41].
262 S.-H. Chen
In the 1990s, based on results from statistical learning theory, an alternative to the
artificial neural network was developed, i.e. the support vector machine (SVM), also
called the kernel machine. It has been found that when compared with the standard
feedforward neural nets trained with the backpropagation algorithm, support vector
machine can have superiorperformance [37]. This superiority may be attributable
to different optimization principles running behind the two: for the SVM, it is the
structural risk minimization principle, whereas for backpropagation it is the empiri-
cal risk minimization principle. The objective of the former is to minimize the upper
bound of the generalization error, whereas the objective of the latter is to minimize
the error based on training data. Hence, the former may lead to better generalization
than the latter. Partially due to this advantage, the financial applications of SVM
have kept on expanding.23
However, like the ANN, the SVM can also be treated as a semi-parametric model.
To use it, there are a number of parameters or specifications that need to be deter-
mined. Basically, there are three major parameters in the SVM. First, there is the
penalty parameter associated with the empirical risk appearing in the structural risk
function. In the literature, it is normally denoted by C. Second, when the SVM is ap-
plied to the regression problem, in addition to C, there is a parameter ε appearing in
the ε -error intensive function. Third, it is the parameter of the chosen kernel func-
tion. Support vector machines non-linearly map a lower dimensional input space
into a high dimensional, possibly, an infinite dimensional, feature space. However,
a central concept of the SVM is that one does not need to consider the feature space
in explicit form; instead, based on the Hilbert-Schmidt theory, one can use the ker-
nel function. There are two kernel functions frequently used, namely, the Gaussian
kernel and the polynomial kernel. The former has a parameter associated with the
second moment of a Gaussian called width (normally denoted by σ ), and the latter
has a parameter associated with the polynomial degree (normally denoted by p).
At the beginning, these parameters were arbitrarily given by trial-and-error. Later
on, genetic algorithms were extensively employed to optimize the SVM, and the ap-
plications of ESVM have been seen in various fields, such as reliability forecasting
[81], traffic flow forecasting [97], bankruptcy forecasting [1, 103, 106], and stock
market prediction [107, 40].
[103] uses a GA to genetically determine the parameters C and σ of the SVM.
The proposed GA-SVM was then tested on the prediction of financial crisis in Tai-
wan. The experimental results show that the GA-SVM model performs better than
the manually-driven SVM. [1] use a GA to simultaneously optimize the feature se-
lection and the instance selection as well as the parameters of a kernel function. It is
also found in this study that the prediction accuracy of the conventional SVM may
be improved significantly by using the ESVM. In [107] a GA is used for variable
selection in order to reduce the modeling complexity of the SVM and improve the
23 The interested reader can find some useful references directly from the website of the
SVM: http://www.svms.org/
264 S.-H. Chen
speed of the SVM, and then the SVM is used to identify the stock market movement
direction based on historical data.
4.3.1 ANFIS
“A” and “B” above are fuzzy sets. However, the function f (x, y) in the AN-
FIS is linear:
z = f (x, y) = α + βx x + βy y. (26)
The structure can, therefore, be comparable to the autonomous-agent design in the
SFI artificial stock market (see Section 2.2.3, Equation 13) and is even closer to
the modified version of the SFI-ASM proposed by [98] (see Section 2.4). However,
unlike [98], the rule base used in the ANFIS must be known in advance. The ANFIS
integrates the backpropagation algorithm with the recursive least squares algorithm
to adjust parameters.
Collaborative Computational Intelligence in Economics 265
The ANFIS has been applied to water consumption forecasting [11], stock prices
forecasting [13, 39, 108], credit scoring [73], market timing decisions [38], credit
risk evaluation [104], and option pricing [62].
[62] applies the ANFIS to option market pricing based on the transaction data
of the Indian Stock Option. The pricing capability of the ANFIS is compared with
the performance of the ANN model and Black-Scholes (BS) model. The empirical
results show that the out-of-sample pricing performance of the ANFIS is superior to
that of the BS, and is also better than the ANN. In addition, compared to the ANN,
the ANFIS is explicit about its decision rules.
Instead of backpropagation, [39] uses extended Kalman filtering to estimate the
ANFIS, and demonstrates its performance by comparing it with the ANFIS with
regard to stock index forecasting. It is found that the proposed extended Kalman
filtering can perform better than backpropagation.
The hybrid system which we shall review in this section comprises two CI tools,
namely, genetic programming and rough sets. The hybridization of GP and rough
sets provides an excellent illustration of how the usual competitive relationship
266 S.-H. Chen
between two CI tools can be more productively transformed into into a collaborative
relationship ([77, 88]).
Rough sets define a mathematical model of vague concepts that is used to rep-
resent the relationship of dissimilarity between objects. Two objects are considered
equivalent with respect to a certain subset of attributes if they share the same value
for each attribute of the subset. By collecting all equivalent objects, one can decom-
pose the entire universal (set of objects) into equivalent classes. The decomposition,
of course, is not unique and is dependent on the subset of attributes which we use to
define the equivalent relation.
Rough sets arise when one tries to use the equivalent classes with respect to
some attributes to give a description of a concept based on the associated decision
attributes. For example, if the decision attribute concerns financial distress and is
binary (bankruptcy or not), then what one wants to characterize is the concept of
bankruptcy by using some attribute of the firms, e.g., their financial ratios, size, etc.
The characterizations are only approximate when complete specification of the con-
cept is infeasible. In this case, the concept itself is rough, and the objects associated
with the concept are referred to as the rough set.
Two partial specifications are considered to be the most important, namely, the
lower approximations and the upper approximations. The lower approximations
consist of objects (equivalent classes) which belong to a concept with certainty,
i.e., the entire equivalent classes are a subset of the rough set. The upper approx-
imations consist of those equivalent classes which possibly belong, i.e., they have
non-empty intersection with the rough set. A subset of the attributes is called reduct
if all attributes belonging to it are indispensable. An attribute is dispensable if its
absence does not change the set approximation. In other words, a reduct is a set of
attributes that is sufficient to describe a concept.
A financial hybrid system using rough sets and GP is first proposed by [77]. In
this hybridization, the rough set is firstly adopted to select the discriminative features
by identifying reducts. Only these reducts are then taken as the input features for
the GP learning process. [77] uses genetic programming to construct a bankruptcy
prediction model with variables from a rough sets model. The genetic programming
model reveals relationships between variables that are not apparent in using the
rough sets model alone.24
5 Concluding Remarks
According to the current trend in the literature, this paper addresses what collabo-
rative computational intelligence can mean for economists. While the recent series
of publications on the economic and financial applications of computational intel-
ligence has already demonstrated the relevance of various CI tools to economists
[29, 35], they are mostly taken as techniques for economists. In this chapter, we go
24 There are many other ways to hybridize GP or GA with rough sets, but so far their financial
applications have rarely been seen.
Collaborative Computational Intelligence in Economics 267
one step further to show that they can be more productive so as to be part of the
future of economics. Specifically, we demonstrate this potential by singling out two
new research paradigms in economics, namely, agent-based economics and experi-
mental economics.
The essence of agent-based economics is a society of heterogeneous agents, a
subject which is highly interdisciplinary motivated. Collaborative computation in-
telligence enables or inspires economists to see how some initial explorations of
the richness of this society can be made. In this regard, computational intelligence
may contribute by providing not just models of learning or adaptation, but models
of learning or adaptation processes which may be influenced by behavioral genetic
and cultural factors.
After one decade of rapid development, a challenging issue facing experimental
economics is how to strengthen the reliability of the laboratory results with human
subjects by properly introducing software agents to labs. In fact, the state-of-art
economic laboratory is no longer a lab with only human subjects, but a lab com-
prising both human agents and software agents [27]. Collaborative computational
intelligence can contribute significantly to the building of the modern lab.
The last part of the paper reviews some recent economic and financial applica-
tions of hybrid systems. However, there is no attempt to give an exhaustive list,
which itself may deserve a separate treatment. We, therefore, single out the two
most significant elements in frequently used economic and financial hybrid sys-
tems, namely, evolution and semantics. The former mainly contributes to the hybrid
system as a process to facilitate the universal approximation, whereas the latter con-
tributes to the hybrid system by enhancing its semantics.
To sum up, this chapter has shown how collaborative computational intelligence
has enriched the design of economic and financial agents, while, in the meantime,
providing quantitative economists with a longer list of ideas to cope with the inher-
ent complexity and uncertainty in the data.
Acknowledgements. The author is grateful to one anonymous referee for his helpful sug-
gestion. The author is also grateful to the editor, Dr. Christine L. Mumford, for her devoted
efforts made in editing the chapter. NSC research grant No. 95-2415-H-004-002-MY3 is
gratefully acknowledged.
References
1. Ahn, H., Lee, K., Kim, K.-J.: Global optimization of support vector machines using
genetic algorithms for bankruptcy prediction. In: King, I., Wang, J., Chan, L.-W., Wang,
D. (eds.) ICONIP 2006. LNCS, vol. 4234, pp. 420–429. Springer, Heidelberg (2006)
2. Allen, H., Taylor, M.: Charts, noise and fundamentals in the London foreign exchange
market. Economic Journal 100, 49–59 (1990)
3. Alvarez-Diaz, M., Alvarez, A.: Forecasting exchange rates using an evolutionary neural
network. Applied Financial Economics Letters 3(1), 5–9 (2007)
4. Amilon, H.: Estimation of an adaptive stock market model with heterogeneous agents.
Journal of Empirical Finance (forthcoming) (2008)
268 S.-H. Chen
26. Chen, S.-H., Tai, C.-C.: Trading restrictions, price dynamics, and allocative efficiency
in double auction markets: Analysis based on agent-based modeling and simulations.
Advances in Complex Systems 6(3), 283–302 (2003)
27. Chen, S.-H., Tai, C.-C.: On the selection of adaptive algorithms in ABM: A
computational-equivalence approach. Computational Economics 28(1), 51–69 (2006)
28. Chen, S.-H., Tai, C.-C.: Would human agents like software agents? Results from
prediction market experiments. Working paper, AI-ECON Research Center, National
Chengchi University (2007)
29. Chen, S.-H., Wang, P.: Computational intelligence in economics and finance. Springer,
Heidelberg (2003)
30. Chen, S.-H., Yeh, C.-H.: Genetic programming learning and the cobweb model. In:
Angeline, P. (ed.) Advances in Genetic Programming, vol. 2, ch. 22, pp. 443–466. MIT
Press, Cambridge (1996)
31. Chen, S.-H., Yeh, C.-H.: Modeling speculators with genetic programming. In: An-
geline, P.J., McDonnell, J.R., Reynolds, R.G., Eberhart, R. (eds.) EP 1997. LNCS,
vol. 1213, pp. 137–147. Springer, Heidelberg (1997)
32. Chen, S.-H., Yeh, C.-H.: Simulating economic transition processes by genetic program-
ming. Annals of Operation Research 97, 265–286 (2000)
33. Chen, S.-H., Yeh, C.-H.: Evolving traders and the business school with genetic pro-
gramming: A new architecture of the agent-based artificial stock market. Journal of
Economic Dynamics and Control 25, 363–394 (2001); Chen, S.-H., Kuo, T.-W., Shieh,
Y.-P.: Genetic Programming: A Tutorial with the software Simple GP. In: Chen, S.-H.
(ed.) Genetic algorithms and genetic programming in computational finance, pp. 55–77.
Kluwer, Dordrecht (2002)
34. Chen, S.-H., Liao, C.-C., Chou, P.-J.: On the plausibility of sunspot equilibria: Simu-
lations based on agent-based artificial stock markets. Journal of Economic Interaction
and Coordination 3(1), 25–41 (2008)
35. Chen, S.-H., Wang, P., Kuo, T.-W.: Computational intelligence in economics and fi-
nance, vol. 2. Springer, Heidelberg (2007)
36. Chen, S.-H., Zeng, R.-J., Yu, T.: Co-evolving trading strategies to analyze bounded
rationality in double auction markets. In: Riolo, R. (ed.) Genetic programming theory
and practice VI. Springer, Heidelberg (forthcoming) (2008)
37. Chen, W.-H., Shih, J.-Y., Wu, S.: Comparison of support-vector machines and back
propagation neural networks in forecasting the six major Asian stock markets. Interna-
tional Journal of Electronic Finance 1(1), 49–67 (2006)
38. Chen, P., Quek, C., Mah, M.: Predicting the impact of anticipatory action on U.S. stock
market– An event study using ANFIS (a neural fuzzy model). Computational Intelli-
gence 23(2), 117–141 (2007)
39. Chokri, S.: Neuro-fuzzy network based on extended Kalman filtering for financial time
series. In: Proceedings of World Academy of Science, Engineering and Technology,
vol. 15, pp. 290–295 (2006)
40. Choudhry, R., Garg, K.: A hybrid machine learning system for stock market forecast-
ing. In: Proceedings of the World Academy of Science, Engineering and Technology,
vol. 29, pp. 315–318 (2008)
41. Cordon, O., Herrera, F., Hoffmann, F., Magdalena, L.: Genetic fuzzy systems: Evo-
lutionary tuning and learning of fuzzy knowledge bases. World Scientific, Singapore
(2002)
42. De Grauwe, P., Grimaldi, M.: The exchange rate in a behavior finance framework.
Princeton University Press, Princeton (2006)
270 S.-H. Chen
43. de Jong, E., Verschoor, W., Zwinkels, R.: Heterogeneity of agents and exchange rate
dynamics: Evidence from the EMS, http://ssrn.com/abstract=890500
44. Delli Gatti, D., Gaffeo, E., Gallegati, M., Giulioni, G.: Emergent macroeconomics: An
agent-based approach to business fluctuations. Springer, Heidelberg (2008)
45. Duffy, J.: Agent-based models and human subject experiments. In: Tesfatsion, L., Judd,
K. (eds.) Handbook of computational economics: Agent-based computational eco-
nomics, vol. 2, pp. 949–1011. Elsevier, Oxford (2006)
46. Duval, Y., Kastens, T., Featherstone, A.: Financial classification of farm businesses
using fuzzy systems. In: 2002 AAEA Meetings Long Beach, California (2002)
47. Evans, G., Honkapohja, S.: Learning and expectations in macroeconomics. Princeton
University Press, Princeton (2001)
48. Feinberg, M.: Why smart people do dumb things: Lessons from the new science of
behavioral economics, Fireside (1995)
49. Fogel, D., Chellapilla, K., Angeline, P.: Evolutionary computation and economic mod-
els: sensitivity and unintended consequences. In: Chen, S.-H. (ed.) Evolutionary com-
putation in economics and finance, pp. 245–269. Springer, Heidelberg (2002)
50. Frankel, J., Froot, K.: Chartists, fundamentalists, and trading in the foreign exchange
market. American Economic Review 80, 181–186 (1990)
51. Gilks, W., Richardson, S., Spiegelhalter, D.: Markov chain Monte Carlo in practice.
CRC, Boca Raton (1995)
52. Gode, D., Sunder, S.: Allocative efficiency of markets with zero intelligence traders:
Market as a partial substitute for individual rationality. Journal of Political Econ-
omy 101, 119–137 (1993)
53. Grossklags, J., Schmidt, C.: Software agents and market (in)efficiency: a human trader
experiment. IEEE Transactions on Systems, Man, and Cybernetics, Part C 36(1), 56–67
(2006)
54. Hartley, J.: The representative agent in macroeconomics. Routledge, London (1997)
55. Henrich, J.: Does culture Matter in Economic Behavior? Ultimatum game bargaining
among the Machiguenga of the Peruvian Amazon. American Economic Review 90(4),
973–979 (2000)
56. Henrich, J., Boyd, R., Bowles, S., Camerer, C., Fehr, E., Gintis, H. (eds.): Founda-
tions of human sociality: Economic experiments and ethnographic evidence from fif-
teen small-scale societies. Oxford University Press, Oxford (2004)
57. Herrnstein, R., Murray, C.: Bell curve: Intelligence and class structure in American life.
Free Press (1996)
58. Hommes, C.: Heterogeneous agent models in economics and finance. In: Tesfatsion,
L., Kenneth, J. (eds.) Handbook of Computational Economics, vol. 2, ch. 23, pp. 1109–
1186. Elsevier, Amsterdam (2006)
59. Jang, R.: ANFIS: Adaptive network-based fuzzy inference system. IEEE Transactions
on Systems, Man and Cybernetics 23(3), 665–685 (1993)
60. Janssen, M., Ostrom, E.: Empirically based, agent-based models. Ecology and Soci-
ety 11(2), 37 (2006)
61. Jensen, A.: The g Factor: The science of mental ability, Praeger (1998)
62. Kakati, M.: Option pricing using the adaptive neuro-fuzzy system (ANFIS). ICFAI
Journal of Derivatives Markets 5(2), 53–62 (2008)
63. Kendall, G., Yao, X., Chong, S.-Y.: The iterated prisoners’ dilemma: 20 years on. World
Scientific, Singapore (2007)
64. Kim, K.-J.: Artificial neural networks with evolutionary instance selection for financial
forecasting. Expert Systems with Applications 30(3), 519–526 (2006)
Collaborative Computational Intelligence in Economics 271
65. Koyama, Y., Sato, H., Matsui, H., Nakajima, Y.: Report on UMIE 2004 and summary
of U-Mart experiments based on the classification of submitted machine agents. In:
Terano, T., Kita, H., Kaneda, T., Arai, K., Deguchi, H. (eds.) Agent-based simulation:
From modeling methodologies to real-world applications. Springer Series on Agent-
Based Social Systems, vol. 1, pp. 158–166 (2005)
66. Kosko, B.: Neural networks and fuzzy systems. Prentice-Hall, Englewood Cliffs (1992)
67. LeBaron, B.: Evolution and time horizons in an agent based stock market. Macroeco-
nomic Dynamics 5, 225–254 (2001)
68. Lieberman, H.: Software agents: The MIT approach. In: Invited speech delivered at the
7th European Workshop on Modelling Autonomous Agents in a Multi-Agent World
(MAAMAW 1996), Eindhoven, The Netherlands, January 22-25 (1996)
69. Lubinski, D., Humphreys, L.: Incorporating general intelligence into epidemiology and
the social sciences. Intelligence 24(1), 159–201 (1997)
70. Lucas Jr., R.: Adaptive behavior and economic theory. Journal of Business 59, 401–426
(1986)
71. Lynn, R.: Race differences in intelligence: An evolutionary analysis. Washington Sum-
mit Publishers (2006)
72. Lynn, R., Vanhanen, T.: IQ and the wealth of nations. Praeger (2002)
73. Malhotra, R., Malhotra, D.: Differentiating between good credits and bad credits us-
ing neuro-fuzzy systems. European Journal of Operational Research 136(1), 190–211
(2002)
74. Markose, S.: Developments in experimental and agent-based computational economics
(ACE): overview. Journal of Economic Interaction and Coordination 1(2), 119–127
(2006)
75. Manzan, S., Westerhoff, F.: Heterogeneous expectations, exchange rate dynamics and
predictability. Journal of Economic Behavior and Organization 64, 111–128 (2007)
76. McClearn, G., Johansson, B., Berg, S., Pedersen, N., Ahern, F., Petrill, S., Plomin,
R.: Substantial genetic influence on cognitive abilities in twins 80 or more years old.
Science 276, 1560–1563 (1997)
77. McKee, T., Lensberg, T.: Genetic programming and rough sets: A hybrid approach to
bankruptcy classification. European Journal of Operational Research 138(2), 436–451
(2002)
78. Mora, A., Alfaro-Cid, E., Castillo, P., Merelo, J., Esparcia-Alcazar, A., Sharman, K.:
Discovering causes of financial distress by combining evolutionary algorithms and ar-
tificial neural networks. In: Proceedings of the Genetic and Evolutionary Computation
Conference (GECCO 2008), Atlanta, USA (July 2008)
79. Murray, C.: Income Inequality and IQ. AEI Press (1998)
80. Nauck, D., Klawonn, F., Kruse, R.: Foundations of neuro-fuzzy systems. John Wiley
and Sons, Chichester (1997)
81. Pai, P.-F.: System reliability forecasting by support vector machines with genetic algo-
rithms. Mathematical and Computer Modelling 43(3-4), 262–274 (2006)
82. Palmer, R., Arthur, B., Holland, J., LeBaron, B., Tayler, P.: Artificial economic life: A
simple model of a stock market. Physica D 75, 264–274 (1994)
83. Phua, P., Ming, D., Li, W.: Neural network with genetically evolved algorithms for
stocks prediction. Asia-Pacific Journal of Operational Research 18(1), 103–107 (2001)
84. Plomin, R., Petrill, S.: Genetics and intelligence: What’s new? Intelligence 24(1), 53–77
(1997)
85. Richiardi, M., Leombruni, R., Contini, B.: Exploring a new ExpAce: The complemen-
tarities between experimental economics and agent-based computational economics.
Journal of Social Complexity 3(1) (2006)
272 S.-H. Chen
86. Rust, J., Miller, J., Palmer, R.: Behavior of trading automata in a computerized double
auction market. In: Friedman, D., Rust, J. (eds.) The double auction market: Institutions,
theories, and evidence, ch. 6, pp. 155–198. Addison Wesley, Reading (1993)
87. Rust, J., Miller, J., Palmer, R.: Characterizing effective trading strategies: Insights from
a computerized double auction market. Journal of Economic Dynamics and Control 18,
61–96 (1994)
88. Salcedo-Sanz, S., Fernandez-Villacanas, J.-L., Segovia-Vargas, M., Bousono-Calzon,
C.: Genetic programming for the prediction of insolvency in non-life insurance compa-
nies. Computers & Operations Research 32(4), 749–765 (2005)
89. Sargent, T.: Bounded rationality in macroeconomics. Oxford University Press, Oxford
(1993)
90. Sasaki, Y., Flann, N., Box, P.: Multi-agent evolutionary game dyanmics and reinforce-
ment learning applied to online optimization for the traffic policy. In: Chen, S.-H., Jain,
L., Tai, C.-C. (eds.) Computational economics: A perspective from computational in-
telligence. IDEA Group Publishing, USA (2005)
91. Sato, H., Matsui, H., Ono, I., Kita, H., Terano, T., Deguchi, H., Shiozawa, Y.: U-Mart
project: Learning economic principles from the bottom by both human and software
agents. In: Terano, T., Nishida, T., Namatame, A., Tsumoto, S., Ohsawa, Y., Washio, T.
(eds.) JSAI-WS 2001. LNCS, vol. 2253, pp. 121–131. Springer, Heidelberg (2001)
92. Sato, H., Kawachi, S., Namatame, A.: The statistical properties of price fluctuations by
computer agents in a U-Mart virtual future market simulator. In: Terano, T., Deguchi,
H., Takadama, K. (eds.) Meeting the challenge of social problems via agent-based sim-
ulation, pp. 67–76. Springer, Heidelberg (2003)
93. Selvaratnam, S., Kirley, M.: Predicting stock market time series using evolutionary ar-
tificial neural networks with Hurst exponent windows. In: Sattar, A., Kang, B.-h. (eds.)
AI 2006. LNCS, vol. 4304, pp. 617–626. Springer, Heidelberg (2006)
94. Shahwan, T., Odening, M.: Forecasting agricultural commodity prices using hybrid
neural networks. In: Chen, S.-H., Wang, P., Kuo, T.-W. (eds.) Computational intelli-
gence in economics and finance, vol. 2, pp. 63–74. Springer, Heidelberg (2007)
95. Shiozawa, Y., Nakajima, Y., Matsui, H., Koyama, Y., Taniguchi, K., Hashimoto, F.:
Artificial Market Experiments with the U-Mart. Springer, Tokyo (2006)
96. Smith, V.: Bidding and auctioning institutions: Experimental results. In: Smith, V. (ed.)
Papers in eExperimental economics, pp. 106–127. Cambridge University Press, Cam-
bridge (1991)
97. Su, H., Yu, S.: Hybrid GA based online support vector machine model for short-term
traffic flow forecasting. In: Xu, M., Zhan, Y.-W., Cao, J., Liu, Y. (eds.) APPT 2007.
LNCS, vol. 4847, pp. 743–752. Springer, Heidelberg (2007)
98. Tay, N., Linn, S.: Fuzzy inductive reasoning, expectation formation and the behavior of
security prices. Journal of Economic Dynamics & Control 25, 321–361 (2001)
99. Terano, T., Shiozawa, Y., Deguchi, H., Kita, H., Matsui, H., Sato, H., Ono, I., Kakajima,
Y.: U-Mart: An artificial market testbed for economics and multiagent systems. In: Ter-
ano, T., Deguchi, H., Takadama, K. (eds.) Meeting the challenge of social problems via
agent-based simulation, pp. 53–66. Springer, Heidelberg (2003)
100. Tung, W.-L., Quek, C., Cheng, P.: GenSo-EWS: A novel neural-fuzzy based early warn-
ing system for predicting bank failures. Neural Networks 17(4), 567–587 (2003)
101. Ueda, T., Taniguchi, K., Nakajima, Y.: An analysis of U-Mart experiments by machine
and human agents. In: Proceedings of 2003 IEEE international symposium on compu-
tational intelligence in robotics and automation, vol. 3, pp. 1340–1347 (2003)
102. Wang, S.-C., Li, S.-P., Tai, C.-C., Chen, S.-H.: Statistical properties of an experimental
political futures markets. Quantitative Finance (2007) (forthcoming)
Collaborative Computational Intelligence in Economics 273
103. Wu, C.H., Tseng, G.H., Goo, Y.J., Fang, W.C.: A real-valued genetic algorithm to opti-
mize the parameters of support vector machine for Predicting bankruptcy. Expert Sys-
tems with Applications 32(2), 397–408
104. Xiong, Z.-B., Li, R.-J.: Credit risk evaluation with fuzzy neural networks on listed cor-
porations of China. In: Proceedings of 2005 IEEE International Workshop on VLSI
Design and Video Technology, pp. 397–402 (2005)
105. Yu, L., Zhang, Y.-Q.: Evolutionary fuzzy neural networks for hybrid financial predic-
tion. IEEE Transactions on Systems, Man and Cybernetics, Part C 35(2), 244–249
(2005)
106. Yu, L., Lai, K.-K., Wang, S.: An evolutionary programming based SVM ensemble
model for corporate failure prediction. In: Beliczynski, B., Dzielinski, A., Iwanowski,
M., Ribeiro, B. (eds.) ICANNGA 2007. LNCS, vol. 4432, pp. 262–270. Springer, Hei-
delberg (2007)
107. Yu, L., Wang, S.-Y., Lai, K.K.: Mining stock market tendency using GA-based support
vector machines. In: Deng, X., Ye, Y. (eds.) WINE 2005. LNCS, vol. 3828, pp. 336–
345. Springer, Heidelberg (2005)
108. Yunos, Z., Shamsuddin, S., Sallehuddin, R.: Data modeling for Kuala Lumpur Compos-
ite Index with ANFIS. In: Proceedings of 2008 Second Asia International Conference
on Modelling & Simulation, pp. 609–614 (2008)
109. Zopounidis, C., Doumpos, M., Pardalos, P. (eds.): Handbook of financial engineering.
Springer, Heidelberg (2008)
IMMUNE: A Collaborating Environment for
Complex System Design
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 275–320.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
276 M. Efatmaneshnik and C. Reidsema
1 Introduction
A decision is a choice between alternatives based on estimates of the values of those
alternatives. There is a substantial amount of empirical evidence that human intu-
itive judgment and decision making can be far from optimal, and it deteriorates even
further with the complexity of the problem and stress [21]. Supporting a decision
means helping people working alone or in a group to gather intelligence, generate
alternatives and make choices. A Decision Support System (DSS) is a computer-
ized system for helping make decisions. Supporting the decision process involves
supporting the estimation, the evaluation and/or the comparison of alternatives (see
Turban [68]). Turban defines a DSS more specifically as “an interactive, flexible, and
adaptable computer-based information system, especially developed for supporting
the solution of a non-structured management problem for improved decision mak-
ing”. Druzdzel and Flynn [21] define a DSS as an integrated computing environment
for complex decision making. A DSS can be defined as a knowledge-based system,
which formalizes the domain knowledge so that it is amenable to mechanized rea-
soning. Knowledge-based problem solving is the domain of Artificial Intelligence
(AI) and the selection of an appropriate AI development tool that may provide a
framework to incorporate knowledge will come from this area (see Reidsema and
Szczerbicki [54]). Reidsema and Szczerbicki identified three different architectures
for decision support systems for product design planning and manufacturing in a
concurrent engineering environment: Expert Systems, Agent Based Systems, and
Blackboard Database Systems. These have been defined as follows:
An Expert system is one of a class of AI techniques that is able to capture the
knowledge and reasoning of an experienced expert for re-use in assisting the less
experienced in making decisions.
The Blackboard Database Architecture is a problem solving system based on
the metaphor of human experts who cooperate by entering partial solutions to the
current problem onto a physical blackboard. The type of problems best suited to
this approach are those that are able to be reduced to a set of simpler problems that
are reasonably independent. The integration of the partial solutions to the overall
solution takes place by the intervention of a centralized controller known as control
source and therefore has a top down approach to problem solving.
Multi-agent systems are distributed systems that use a bottom up approach to
problem solving in which case the intervention of the centralized coordination be-
tween agents is minimal or totally eliminated. Each agent in a multi-agent system
behaves as an abstraction tool which has the characteristics of a self-contained prob-
lem solving system that is capable of autonomous, reactive, proactive as well as in-
teractive behaviour. The solution in this case emerges as a whole and is the result
of a synergetic effect. Synergy denotes a level of group performance that is above
and beyond which could be achieved by the members of the group working inde-
pendently [40]. Synergy in a multi-agent system enables the integration of partial
solutions of nonlinear and coupled problem. Multi-agent systemsindexmulti-agent
system are the natural candidates for complex systems which show heavy interde-
pendency between partial problems.
IMMUNE: A Collaborating Environment 277
and maximum complexity of the process. It then monitors the complexity of the
process as a function of the exchanged information between the agents (cognitive
complexity). An effective and efficient design process must have a cognitive com-
plexity in between the minimum and maximum complexity of the problem; this is
the result of a simple notion which is: the best a single person (or a single system)
can do is limited by his/her (cognitive) complexity [2].
A design system, with its cognitive complexity surpassing the maximum com-
plexity of the problem, has lost effectiveness since the design process may become
chaotic. If the cognitive complexity of the design system is lower than the minimum
complexity of the problem, then the efficiency of the system, in solving the complex
problem and managing the interdependencies between its subproblems, would not
be achieved. In both cases, the agents are expected to undertake corrective measures
to stabilize the cognitive complexity of the system and immunize it against fragility,
and failure. The CEO monitors the complexity at two levels: inside the coalitions
(at the local levels) and the entire system (at the federal level). The next section dis-
cusses the fundamentals of design planning for complex products and a complexity
based method for monitoring the design process.
Real world concurrent engineering design projects require the cooperation of mul-
tidisciplinary design teams using sophisticated and powerful engineering tools such
as commercial CAD tools, engineering database systems and knowledge based sys-
tems [51]. To coordinate the design activities of various groups and support effective
cooperation among the different engineering tools, a distributed intelligent environ-
ment is required. This environment should not only automate individual tasks, in
the manner of traditional computer aided engineering tools, but also help individual
members share information and coordinate their actions as they explore alternatives
in search of a globally optimal or near optimal solution. Designing in a concurrent
environment requires the precise planning of the resources, tasks, collaborations, in-
formation exchanges and cooperation [53]. Planning to achieve the development of
a new product is usually accomplished by distributing the tasks required to achieve
the plan to individuals or groups of team members, best suited to accomplishing
these tasks [53]. Planning increases the design efficiency, and reduces the risk of not
achieving a design consensus and consequently the agreed upon design objectives.
There are various approaches and perspectives to design planning which Reid-
sema and Szczerbicki [53] summarized as:
• Task model
• Design Process
• Resource Structure
• Organizational Model
• Cooperative Planning Model (GDDI)
IMMUNE: A Collaborating Environment 279
Fig. 1 Design planning can be performed by using knowledge corresponding to various levels
of the hierarchy of organization, process and product [53]
Fig. 2 Produce, process and organization structures are tightly related [7]
the multi-agent planning actions of task decomposition, task distribution and result
integration [53].
For complex systems, due to coupling between the distributed tasks, integra-
tion may not be performed linearly simply by adding the partial solutions together.
Since the coupled problems tend to be nonlinear (same as the coupled differen-
tial equations) as a result the solutions may not be achieved by using the usual
concurrent planning (that adds the partial solutions to obtain the overall solution).
The nonlinearity limits the kind of knowledge being used for planning. In general
in a CE environment, planning may be obtained by the utilization of quite diverse
knowledge from the Product, Process and Organizational (PPO) knowledge domains
(Figure 1).
reach the lowest levels of design activity, where individual design parameters are
determined based on other parameters. Determining these parameters constitutes
the lowest level design activities, and a bottom-up, integrative analysis of these low-
level activities can provide process structure insights. Browning [7] emphasized that
clearly, parameter-based DSMs have integrative applications. This characteristic of
the parameter based DSM which represents the low level product knowledge makes
it suitable to be utilized in the planning of complex engineering systems. We have
previously referred to this matrix as the self of the system [23] with the values of
parameters representing the non self.
Chen and Li [9] referred to Concurrent Product Design taking place in the para-
metric design stage as Concurrent Parametric Design. An engineered product is
developed through the concurrent consideration of various design issues, and mul-
tiple teams may be needed to tackle different design issues. In parametric design,
the focus is on the determination of a parametric configuration that achieves an op-
timization of individual design attributes. In multi-team design, a team refers to a
collaboration of design participants that, in a broad sense, can consist of design-
ers, computers or even algorithms, whatsoever is able to cope with distributed tasks
as part of the whole design problem [9]. In this design situation, teams may face
uncertainties during the design process, especially when their design decisions are
interrelated [9].
• Different disciplines.
• A wide range of scientific and engineering time and space domains.
• Multiple scientific and engineering models and representations (science-
engineering integration).
• Multiple methods (analytical theories and experimentation & testing) and related
knowledge bases.
As creating high-fidelity simulation models is a complex activity that can be
quite time-consuming [60], Monte Carlo Simulation is suggested to establish the
fitness landscape of the design problem. A fitness landscape is a multi-dimensional
data set, in which the number of dimensions is determined by the number of system
variables [46]. Marczyk [45] has stressed that by means of Monte Carlo Simulation
of design parameters (at both micro and macro levels), the fitness landscape of the
design space is created, enabling the verification of the global dependencies between
IMMUNE: A Collaborating Environment 283
Table 1 A simulated DSM is a weighted adjacency matrix. This DSM has 10 variables
- V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
V1 0 0.53 0.32 0 0 0.1 0 0 0 0
V2 0.76 0 0.12 0.12 0.3 0.2 0.2 0 0.1 0
V3 0.45 0.11 0 0 0 0.2 0.3 0 0.52 0.72
V4 0.16 0.65 0.64 0 0.34 0.43 0 0 0 0
V5 0.22 0.44 0.11 0.45 0 0.53 0.02 0 0.02 0
V6 0.77 0.78 0.31 0.34 0 0 0 0 0 0
V7 0.12 0 0.02 0 0 0 0 0.45 0.1 0.3
V8 0.01 0 0 0 0.01 0 0.2 0 0.4 0.1
V9 0 0 0.15 0 0 0 0.7 0.2 0 0.5
V10 0 0.18 0 0 0.01 0 0.1 0.8 0.9 0
low level design variables (product characteristics) and high level design process
variables (cost, time). We suggest the actualization of the multi scale and multidis-
ciplinary design structure matrices through Monte Carlo and Statistical Simulation
in the early stage of the design process, after the plan generation phase in each GDDI
cycle and before the plan decomposition and distribution. In order to establish the
correlation coefficients between different variables, global entropy based correla-
tion coefficient have significant advantages over linear covariance analyses through
capturing both linear and nonlinear dependencies. This measure is embedded in the
OntospaceTM software. The outcome of the design space (or fitness landscape) sim-
ulation may be fed into this software to produce the correlation matrix (simulated
parameter based DSM). Table 1 shows an example of a typical simulated parameter
based DSM of a product (such as an aircraft, car, computes, etc) with normalized
weights (all the weights are between zero and one). The data presented and the de-
sign variables are however fictitious. Throughout this paper, the self map will be
referred to as the corresponding weighted graph of the matrix in Table 1.
Complexity is frequently confused with emergence; emergence of new structures
and forms is the result of re-combination and spontaneous self-organization of sim-
pler systems to form higher order hierarchies, i.e. a result of complexity [46]. We
define complexity as the intensity of emergence in a system. If the complexity is too
high the system becomes chaotic and uncontrollable and is likely to lose its struc-
ture, or in other words, downward causation raises the subsystems’ performance. If
complexity is too low the system loses the intrinsic characteristics of the entity it was
intended to describe, and fails to emerge as a spontaneous organization. Complexity
materializes the system’s self by the emergence of the self structure when the sub-
systems have sufficient interaction. Complexity is a “holistic” measure of the system
that enables us to study the system as a “whole”. Marczyk and Deshpande [46] pro-
posed a comprehensive complexity metric and established a conceptual platform for
practical and effective complexity management. The metric takes into account all
significant aspects necessary for a sound and comprehensive complexity measure,
284 M. Efatmaneshnik and C. Reidsema
namely structure, entropy and data granularity, or coarse-graining [46]. The metric
allows one to relate complexity to fragility and to show how critical threshold com-
plexity levels may be established for a given system. The metric is incorporated into
OntoSpaceTM , an innovative complexity management software. This software cal-
culates three complexity measures for every self map (Figure 3):
1. The complexity of the map which is a very specific measure reflecting the
coupled-ness and size of the system. This complexity measure is called Ontix.
We will refer to the complexity of this map as self complexity.
2. The upper complexity bound to which the complexity of the system may be in-
creased without exhibiting chaos.
3. The lower complexity bound where the system with a lower complexity loses its
intrinsic characteristics and fails to emerge as a coherent self.
“A system performing a given basic function is irreducibly complex if it includes
a set of well-matched, mutually interacting, nonarbitrarily individuated parts such
that each part in the set is indispensable to maintaining the system’s basic, and
therefore original, function. The set of these indispensable parts is known as the
irreducible core of the system” [17]. The lower complexity bound represents the ir-
reducible complexity of the system that contains the intrinsic characteristics of the
system.
There is a sufficient body of knowledge to sustain the belief that whenever dy-
namical systems undergo a catastrophe, the event is accompanied by a sudden jump
in complexity [46]; this is also intuitive: a catastrophe implies loss of functionality,
or organization. The increase of entropy increases complexity – entropy is not nec-
essarily adverse as it can help to increase fitness – but at a certain point, complexity
IMMUNE: A Collaborating Environment 285
reaches a peak beyond which even small increase of entropy inexorably cause the
breakdown of structure [46]. After structural breakdown commences, an increase in
entropy nearly always leads to loss of complexity (fitness) [46]. However, beyond
the critical point, loss is inevitable, regardless of the dimensionality and/or density
of the system. Therefore every closed system can only evolve/grow to a specific
maximum value of complexity. This is known as the system’s critical maximum
complexity. Close to criticality, systems become vulnerable, fragile and difficult to
manage. The difference between the current and critical complexity is a measure
of the overall “health” of the system. The closer to criticality a system is, the less
healthy and therefore generally more risky it becomes.
system (or real complexity, CR ) which is always more than or equal to the complex-
ity of the system before the decomposition [24]. In Figure 5, 120 random decompo-
sitions of the system example in Table 1 are illustrated. In Table 2 the system after
decomposition is shown.
The approach to organizational integration in IMMUNE is to directly derive the
team based DSM from the simulated parameter based DSM. IMMUNE does not
have any pre-specified organizational architecture – rather the developed organiza-
tion integration scheme is determined by deriving the team based DSM from the
simulated parameter based DSM. A direct mapping is used that forces the organi-
zation structure to mirror the product architecture. Therefore the above matrix is
directly taken as the predicted team based DSM. The predicted team based DSM is
used for two purposes: 1) to form the multidisciplinary and multi functional teams,
and 2), to calculate the minimum and maximum process complexity.
Three teams responsible for the parametric problem solving of these variables
must be chosen (same as the number of subsystems). These would normally be
IMMUNE: A Collaborating Environment 287
multidisciplinary and cross functional teams of people with different functional ex-
pertise and disciplinary knowledge responsible for working toward a common goal.
It may include people from finance, marketing, operations, and human resources de-
partments and different scientific disciplines. Typically, it includes employees from
all levels of an organization.
Note that the team members or design agents at this stage are not yet explicitly
defined but the likely interactions inside and across the teams are determined. From
the above DSM a team based DSM is derived by summing up the amount of in-
formation exchange (dependency) of the design variables each is responsible for. In
Table 3 the CTi ’s denote the complexity of the interaction inside the teams, or they
may be interpreted as the complexity that each team must internally deal with. The
complexity based method is derived from the assumption that the complexity of an
interconnected system may be represented by a single measure of that system which
is termed a complexity measure. Figure 6 shows this simplified team based DSM.
This predicted team based matrix reflects the amount of information exchange as well
as the internal complexity of the problem that the design teams are dealing with.
From the above matrix a single measure of complexity of the entire problem that
the network of design teams must deal with can be estimated by using the Ontix.
This measure is equal to the real complexity of the decomposed system (CR ). Simi-
lar to the maximum and minimum complexity of the system before decomposition,
real minimum and maximum complexity may be calculated for the above matrix
(CRmin, CRmax ).
288 M. Efatmaneshnik and C. Reidsema
Table 3 The predicted team based DSM for the entire system
3 Radical Innovation
Henderson and Clark [32] demonstrated that there are different kinds of innovation
as depicted in Figure 7 where innovation is classified along two dimensions. The
horizontal dimension captures an innovation’s impact on components (subsystems),
while the vertical dimension captures its impact on the linkages between core con-
cepts and components. Radical innovation establishes a new dominant design and,
hence, a new set of core design concepts embodied in subsystems that are linked
together in a new architecture. Incremental innovation refines and extends an es-
tablished design. Improvement occurs in individual components, but the underlying
core design concepts, and the links between them, remain the same. Modular inno-
vation on the other hand, changes only the core design concepts without changing
the product’s architecture. Finally, architectural innovation changes only the rela-
tionships between modules but leaves the components, and the core design concepts
that they embody, unchanged. We can say that radical innovation embodies both
modular and architectural innovation.
An organization’s communication channels, both formal and informal are critical
to achieving radical and architectural innovation [32]. The communication channels
that are created between these groups will reflect the organization’s knowledge
of the critical interactions between product modules. Organization’s communication
IMMUNE: A Collaborating Environment 289
channels will embody its architectural knowledge of the linkages between compo-
nents that are critical to effective design [32]. They are the relationships around
which the organization builds architectural knowledge.
Innovation processes in complex products and systems differ from those com-
monly found in mass produced goods [34]. The creation of complex products and
systems often involves radical innovation, not only because they embody a wide
variety of distinctive components and subsystems (modular innovation), skills and
knowledge inputs but also because large numbers of different organizational units
have to work together in a collaborative manner (architectural innovation). Here, the
key capabilities are systems design, project management, systems engineering and
integration [34]. Integration in complex system and product design is to make the
solutions to subproblems compatible with each other and this is possible through
innovation. The innovation that integrates the complex system must be radical inno-
vation and creativity [62] and this is an emergent property of the entire system rather
than the property of the sub-solutions to the individual subproblems [61]. A property
that is only implicit, i.e. not represented explicitly, is said to be an emergent property
if it can be made explicit and it is considered to play an important role in the intro-
duction of new schemas [28]. The radical innovation and coherency in an engineered
large scale system is emergent and obtained in a self-organizing fashion in a multi-
agent environment. When designing self-organizing emergent multi-agent systems
with emergent properties, a fundamental engineering issue is to achieve a macro-
scopic behavior that meets the requirements and emerges only from the behavior of
locally interacting agents. Agent-oriented methodologies today are mainly focused
on engineering the microscopic issues, i.e. the agents, their rules, how they interact,
etc, without explicit support for engineering the required macroscopic behavior. As
a consequence, the macroscopic behavior is achieved in an ad-hoc manner [69].
Creativity requires ad hoc communication in which the need to communicate of-
ten arises in an unplanned fashion, and is affected by the autonomy of the agents to
290 M. Efatmaneshnik and C. Reidsema
develop their own communication patterns (see Leenders et al. [43]). It is thus obvi-
ous that, a fixed organizational structure with established patterns of communication
is not capable of delivering new complex structures (products). Also Leenders et al.
[43] showed that team creative performance will be negatively related to the pres-
ence of central team members (including brokers, mediators and facilitators) in the
intra-team communication network. ‘Emerging’ properties and innovative organiza-
tional structures are required to coordinate between different design teams that lead
to integration of the entire complex system with coherency.
CC ≥ CRmin (1)
CC ≤ CRmax (2)
Fig. 9 Measuring the cognitive complexity of the design process of any instance
problem, the design system might be on the edge of chaos but certainly not chaotic.
Besides, for collaborative multi-agent systems with cognitive complexity less than
the minimum complexity, the design process is certainly far away from the edge of
chaos, thus the design systems does not have enough functionality to deliver radical
innovation in an optimal and efficient manner. Extravagant and excessive practice
of collaboration and cooperation has a negative effect on the design system by re-
ducing the collective cognitive complexity ; as such, this condition strikes one as
being chaotic. Chaos makes the design process fragile and susceptible and raises
the design system effectiveness. It can be tested in figure 10 that design system’s
overall cognitive complexity increase only to a certain threshold by the proclivity of
IMMUNE: A Collaborating Environment 293
the design agents for exchanging design information. In order to ensure the health
of the design process it is necessary to ensure that the overall cognitive complexity
stays away from the maximum complexity and above the minimum. This way the
real minimum and maximum complexity that are obtained using the initial Monte
Carlo Simulation of the complex product (low level product knowledge) are used
to monitor the efficiency and effectiveness (health) of the complex product design
process. This may be achieved through monitoring the design process. The process
monitoring here serves the purposes of meeting the design objectives (quality, cost,
and lead time) by immunizing the design system against chaos and lack of effective-
ness. This immunization enables the design system to integrate the complex system
and product through utilization of radical innovation
Fig. 11 Summary of the proposed immune algorithm for the design of complex systems
In the field of AIS an immune algorithm is a plan that determines how the com-
ponents of the systems are going to interact to determine the system dynamics
[66]. For example Dasgupta [16] examined various response and recognition mech-
anisms of immune systems and suggested their usefulness in the development of
massively parallel adaptive decision support systems. Lau and Wong [41] pre-
sented a multi-agent system that could imitate the properties and mechanisms of
the human immune system. The agents in this artificial immune system could ma-
nipulate their capabilities to determine the appropriate response to various prob-
lems. Through this response manipulation, a non-deterministic and fully distributed
system with agents that were able to adapt and accommodate to dynamic en-
vironment by independent decision-making and inter-agent communication was
achieved [41]. Ghanea-Hercock [29] maintained a multi-agent simulation model
that could demonstrate self-organizing group formation capability and a collective
IMMUNE: A Collaborating Environment 295
immune response. He showed that the network of agents could survive in the face
of continuous perturbations. Fyfe and Jain [27] presented a multi-agent environ-
ment in which the agents could manipulate their intensions by using concepts sug-
gested by artificial immune systems to dynamically respond to challenges posed
by the environment. Goel and Gangolly [30] presented a decision support mecha-
nism for robust distributed systems security based on biological and immunological
mechanism.
We define a system to be immune to chaos and capable of preserving its holis-
tic self characteristics if its complexity is in between the minimum and maximum
complexity bounds. The proposed immune algorithm provides collective immune
responses for the engineering design of complex systems and is illustrated in Figure
11. This algorithm ensures the successful emergence of the complex product in a
multi-agent design environment. This method is in accordance with the recent re-
sults that argue for flatter, organic organizational structures that enable workers to
deal more effectively with dynamic and uncertain environments [33].
The next section describes the structure of a decision support system for com-
plex system design that emulates this immune algorithm. It should be noted that
the conceptual framework without any validation is presented here. The system can
however be tested by using simulations in the context of game theory.
Shen et al. [59] have identified three different architectures for agent based
systems: hierarchical architectures, federated architectures and autonomous agent
architectures. Each architecture has particular strengths for specific applications,
and choosing the right one involves matching requirements to capabilities. Hierar-
chical architectures consist of semi autonomous agents with a global control agent
dictating goals/plans or actions to the other agents. Systems with a global black-
board data base are hierarchical architectures.
Because hierarchical architectures suffer from serious problems associated with
their centralized character, federated multi-agent architectures are increasingly be-
ing considered as a compromise solution for industrial agent based applications,
especially for large scale engineering applications. A fully federated agent based
system has no explicit shared facility for storing active data; rather, the system stores
all data in local databases and handles updates and changes through message pass-
ing. They mostly use local control regimes referred to as facilitators, brokers and
mediators.
In general, collaborating environments use two different techniques to manage
complexity: abstraction and decomposition [3]. Abstraction simplifies the descrip-
tion or specification of the system, by representing the same problem from different
viewpoints and at various levels of detail. Decomposition, however, breaks a sin-
gle problem into many smaller problems. According to Bar-Yam [3] each of these
two techniques, when dealing with complexity suffers from inefficiency. Abstrac-
tion assumes that the details to be provided to one part of the system (module) can
be designed independent of the details in other parts [3]. Decomposition incorrectly
assumes that a complex system behavior can be reduced to the sum of its parts [3].
The Blackboard architecture, however, utilizes abstraction and solves prob-
lems through iteration. Because it uses a global memory, blackboard architectures
are able to maintain the focus of attention of different knowledge sources asyn-
chronously on different abstraction levels within this memory; the main point is that
the knowledge sources do not communicate with each other directly and commu-
nication is solely done through the blackboard. This allows for asynchronous com-
munication and thus blackboard systems suit loosely coupled problems [14]. On the
other hand, multi-agent systems, due to their ability to interact autonomously, can
reach to high overall cognitive complexity to solve densely interconnected problems
without the need for a global memory (integrator). Collaboration in multi-agent en-
vironments can be asynchronous and is not restricted to one abstraction level; there-
fore agents must be allowed to focus on different aspect of the problem.
According to Corkill [14] a quarter-century of blackboard-system experience and
more than a decade of multi-agent system development have produced a strong
baseline of collaborating-software technologies. The next generation of complex,
collaborating software applications must span the entire design space of Figure
12 to enable development of high performance, generic collaborating-software ca-
pabilities. This is the motivation behind the design of the presented architecture,
IMMUNE, which combines agent based and blackboard technologies. This rare
approach was first introduced by Lander et al. [39], in which they proposed the
use of agent based blackboards to manage agent interactions [59]. Their model
298 M. Efatmaneshnik and C. Reidsema
contained multiple blackboards, used as data repositories for each group of agents.
Along with design data, tactical control knowledge could be represented in the
shared repository, enabling reasoning about the design itself [59]. SINE [5] was
another agent based blackboard platform that used a single global blackboard to
record the current state of the design. Even though agents could exchange messages
directly, design data could flow through the blackboard, and it was accessible to all
agents [59].
Our proposed architecture (IMMUNE) is an agent based blackboard system that
uses a flat federated architecture. All the agents are grouped into virtual teams or
coalitions. There is no local controller for coordination in between the coalitions.
IMMUNE uses a global blackboard to save the current state of the design and to
facilitate asynchronous communication between agents through the blackboard by
saving the complete solution and partial solutions in different abstraction hierarchies.
The primary purpose of designing this architecture was to incorporate the com-
plexity science into the collaborating software paradigm. Lissack [44] demonstrated
that since both organization science and complexity science deal with uncertainty, it
is important to combine the two. This marriage of the two sciences allows for having
an autopoietic view of organization. Autopoietic systems theory analyse systems as
having self-productive, self-organized, and self-maintained nature [20]. The main
characteristic of IMMUNE is to emulate complexity measures of the product and
process which enables the manifestation of autopoietic characteristics.
The control source of the proposed blackboard does not dictate the pattern of
cooperation between agents, thus allowing autonomy in the interaction. It however
does monitor the complexity of the system at two levels: inside the coalitions and
in between the coalitions at the same abstraction level (we refer to this as a layer).
IMMUNE: A Collaborating Environment 299
The agents are designed to react to the information that they receive from the con-
trol source about the complexity of the coalitions and layers. Adding or eliminat-
ing agent(s) from the design system is possible in IMMUNE, making it an open
architecture.
• Composition Agent: groups the agents based on their bids for the problems in
coalitions using the contract net protocol. A composition agent contains the map
of all the agents, their characteristics and types of expertise.
• IT manager: Sets up the LAN and communications channels of the dispersed
agents for each abstraction level.
All the agents in the same coalition must be visible to each other, meaning that
the messages that one agent receives are made visible to all team members. This
may be thought of as a shared mailbox for each coalition. For example, Figure
13 shows that when a message is sent from agent 2 in coalition A to agent 4 in
coalition B, it would be visible to all the members of these two coalitions.
• Simulation Agent: Performs Monte Carlo Simulation to generate the design space
fitness landscape. It comprises a Monte Carlo Simulation software package and
a human operator. It gathers information about the conditional probability distri-
bution of the design variables from the agents that generate them. As the criteria
for the termination of the generation stage of a given abstraction level can be
based on the self complexity of the abstraction level, the composition agent must
be able to dynamically simulate the fitness landscape as the new entries (design
variables) appear on the blackboard for a given abstraction level.
• CEO (complexity evaluator and observer): This agent announces the termination
of the generation stage for a given abstraction level as soon as the complexity
of the level reaches a certain threshold. This threshold is a control feature of the
entire system. This agent also monitors the design process. It has an embedded
blackboard on which all the communications between the agents and in between
the agents and blackboard are recorded (Figure 14). The design agents can only
write on this blackboard but there is no necessity for them to be able to read it.
The communication arrows on this blackboard must have a tag that represents
both the qualitative and quantitative weight of transferred information. Based on
these maps of the system which vary regularly over time, the CEO measures the
instantaneous cognitive process complexity of each coalition (a team of agents)
and layer (in between the coalitions of one abstraction level). The CEO measures
the lower and upper complexity bounds for the coalitions and also upper and
lower complexity bound for layers. The control source must contain knowledge
IMMUNE: A Collaborating Environment 301
the number of active design agents (design resources) and the real complexity. The
control source then clusters the design agents into virtual teams and distributes the
subproblems to them. The design agents within the virtual teams solve the prob-
lems cooperatively. They send the results back to the blackboard, and negotiate the
conflicts with the other groups until they reach a resolution.
Using the common practices of sequential engineering would lead to starting a
new cycle (GDDI cycle) after all activities of the previous cycle are performed and
the results are finalized. However, the concurrent engineering principle of overlap-
ping the activities to shorten the design lead time may be applied here. Therefore
agents must be allowed to introduce their proposed design variables on the black-
board (problem generation), however the decomposition and distribution stages start
only when the CEO supposes an abstraction level as having reached a certain com-
plexity threshold. Since agents can be cloned to perform different tasks, it is possi-
ble that two or more abstraction levels could simultaneously be at different stages
within the GDDI cycle. In order to reduce the complexity of the entire design sys-
tem, we propose that the design agents of different abstraction levels be allowed to
communicate only through the blackboard.
behavior) as well as the ability of the system to degrade gracefully in the presence
of local failures [65]. Coherency is about the ability of the MAS’s to “cope” with
problem integration. Several methods for increasing coherence have been studied,
all of which relate to the individual agent’s ability to reason about the following
questions: who should I interact with, when should I do it, and why? Sophisticated
individual agent reasoning can increase MAS coherence because each individual
agent can reason about non-local effects of local actions, form expectations of the
behaviour of others, or explain and possibly repair conflicts and harmful interac-
tions [65]. On this basis, four different agent architectures have been discussed in
the literature: reactive agents (also known as behaviour based or situated agent ar-
chitectures), deliberative agents (also called cognitive agents, intentional agents, or
goal directed agents), collaborative agents (also called social agents or interacting
agents), and hybrid agents [59].
Reactive agents are passive in their interactions with other agents. They do not
have an internal model of the world and respond solely to external stimuli. They
respond to the present state of the environment in which they are situated. They do
not take history into account or plan for the future [65]. Through simple interactions
with other agents, complex global behavior can emerge. In reactive systems, the
relationship between individual behaviors, environment, and overall behaviour is
not understandable [59]. However, the advantage of reactive agent architecture is
simplicity [59].
Deliberative agents use internal symbolic knowledge of the real world and en-
vironment to infer actions in the real world. They proactively interact with other
agents based on their sets of Beliefs, Desires and Intentions (BDI system). These
agents perform sophisticated reasoning to understand the global effects of their lo-
cal actions [65]. Consequently, they have difficulties when applied to large complex
systems due to the potentially large symbolic knowledge representations required
[65]. Shen et al. [59] identified collaborative agents as a distinct class of agents that
work together to solve problems; communication between them leads to synergetic
cooperation, and emergent solutions.
Hybrid architectures are neither purely deliberative nor purely reactive [65], and
the agents in IMMUNE have a hybrid architecture (Figure 15). According to Sycara
[65] hybrid agents usually have three layers: at the lowest level in the hierarchy, there
is typically a reactive layer, which makes decisions about what to do on the basis
of raw sensor input. This layer contains the self knowledge that is the knowledge
of the agent about itself including physical state, location, and skills, etc. [59]. The
agent’s self knowledge is used to participate in tasks and reply to other agents as
well as control source requests about its competence.
The middle layer, typically abstracts away from raw sensor input and deals with
a knowledge-level view of the agent’s environment, often making use of symbolic
representations [65]. This layer contains two types of knowledge: domain knowl-
edge and common sense knowledge. The domain knowledge is the description of the
working projects (problems to be solved), partial states of the current problem, hy-
pothesis developed and the intermediate results [59]. The common sense knowledge
enables the agent to emulate and make sense of the cognitive complexity measure
304 M. Efatmaneshnik and C. Reidsema
of the environment that is reported by the CEO. The middle layer has two mod-
ules that are in contact with backboard and the CEO and correspond to two major
responsibilities:
1. Deciding on the focus of attention and reporting it to the lower layer: in this, the
middle layer acts as an agenda manager that has been used in many blackboard
systems such as HEARSAYII [8] with the difference that in these systems a cen-
tral agenda manager has been in charge of maintaining the focus of attention for
the entire set of knowledge sources (agents). Generally agenda managers are data
driven (what is present on the blackboard) and its operation leads to opportunistic
problem solving [8]. The agenda manager chooses the focus of attention of the
agent on different problems at different abstraction levels. It may shift the focus
of attention of the agent from one abstraction level to another depending on the
status of the problems and partial solution on the blackboard. The main reason
for using agendas for control is to speed up the process of problem solving, and
for reducing agent idle time [8]. The agenda manager in the middle layer emu-
lates the domain knowledge and regularly monitors the blackboard to maintain
its domain knowledge.
2. Maintaining the cognitive complexity of the coalitions in the focused level in the
appropriate range: This module contains the common sense knowledge (or sym-
bolic knowledge of the world) which (in IMMUNE) is the collective cognitive
complexity of the environment being broadcasted by the CEO. We refer to this
module as COPE (Complexity Oriented Problem Evaluator) that can make sense
of the environment’s cognitive complexity by comparing it to the maximum and
minimum complexity that is determined by CEO. COPE is a goal driven mod-
ule and communicates with the agent’s upper layer. To increase the complexity
of the environment COPE informs the upper layer of the agent to socialize and
IMMUNE: A Collaborating Environment 305
collaborate more actively. To decrease the complexity of the environment the up-
per layer of the agent becomes more passive by only reacting to the incoming
information from the control source and other agents and avoiding proactive en-
gagement in the design process” In this way COPE may provide immunity from
agent overreacting or under acting in the environment. Also COPE can dictate
to the upper layer to choose different conflict resolution schemes that are more
passive like constraint relaxation to reduce the complexity. Conversely, active
negotiation techniques can be used to increase the cognitive complexity of the
environment when there is conflict with another agent’s solutions.
The uppermost level of the architecture handles the social aspects of the environ-
ment [65]. This layer contains the social knowledge and is in charge of coordination
with other agents. It reports its information exchanges to the process blackboard of
the CEO.
The control source of IMMUNE is active throughout the entire design process. The
design process starts with the generation of an initial set of product variables upon
a notification from the control shell to single agents to introduce their entries on the
blackboard. This set of product variables act as a seed representing the highest ab-
straction hierarchy of the problem space. The seed might be a rough guess of what
needs to be done (Figure 16). The simulation agent of the control source simulates
the fitness landscape of the generated problem space and is in charge of gathering all
the required information (for simulation) from the design agents. The decomposi-
tion agent of the control source decomposes the set of generated variables and calls
for the design agents’ bids to participate in solving them. The agents announce their
interest back to the control shell by weighting their interest in solving each individ-
ual design variable or estimating the value of a design constraint. The composition
agent assigns each individual parameter to a single design agent. So far the process
performs the same as SINE, which was a support platform for single function agents
[5]. However, in IMMUNE the single function agents are also grouped into virtual
teams (coalitions). The composition agent announces the coalitions’ formats; this
issue is discussed in detail in Section 5.2. The IT manager is in charge of setting
the shared mail boxes for each coalition. The problems are solved by the design
agents and the results are sent back to the blackboard. The CEO agent of the control
source estimates the minimum and maximum process cognitive complexities that
are exactly the minimum and maximum complexity of the problem. It monitors the
design process cognitive complexity arising from the collaboration of the design
agents during this last stage. If all the solutions are prepared, the virtual groups are
dismantled and collaboration process is stopped. The next set of design variables is
introduced and the cycle continues.
One possible drawback of this approach is that the design agents might be idle
for a while until a task is assigned to them. To rectify this problem we introduced
306 M. Efatmaneshnik and C. Reidsema
Fig. 16 The design process at different abstraction levels may run simultaneously
an agenda manager for each individual design agent. The agents are allowed to in-
troduce new product variables at any abstraction level at any time during the design
process. They may also be cloned to solve two different design variables related
to different design groups or in the same design group. Also it is possible that the
entire process of two abstraction levels be at run at the same time. To reduce the
complexity of the system we propose that agents in the same abstraction level be
allowed to communicate directly but agents that are working in different abstraction
levels be allowed to communicate only through the blackboard. In order to facilitate
this process, the simulation agent must respond dynamically. This notion is further
discussed in the next section.
is depicted in Figure 17a. In this case, the ’strength’ of knowledge is just a sum of
each of the independent knowledge bases [58]. Integrated knowledge bases can be
represented as in Figure 17b. Here, the knowledge bases can be applied to various
situations and the ’strength’ of knowledge is near maximum [58]. In Figure 17c
independent knowledge bases can communicate and form an interoperable situation,
although the ’strength’ of knowledge might be weaker than that in Figure 17b [58].
The entire knowledge base is a federated set of loosely coupled intelligent agents.
Zhang [72] classified types of problem solving among human experts in four
predominant categories according to their interdependencies:
(a) Horizontal cooperation is where each expert in the cooperative group can get
solutions to problems without depending on other experts, but if the experts
cooperate, possibly using different expertise and data, they can increase confi-
dence in their solutions. For example, cooperation between doctors when diag-
nosing problem patients is often a form of horizontal cooperation. Consultation
and comparison of opinions add significantly to the value of the confidence of
the final diagnosis.
(b) Tree cooperation is where a senior expert depends on lower-level experts in
order to get solutions to problems. For example, a chief engineer’s decisions
often depend on the work of junior engineers.
(c) Recursive cooperation is where different experts mutually depend on each other
in order to get solutions to problems. For example, in order to interpret geologi-
cal data, geological experts, geophysical experts, and geochemical experts often
depend on each other in a recursive way. That is, there is a mutual dependence
in that a geophysicist may ask a geologist to perform a subtask which in turn
requires performance by the geophysicist of the same sub-subtask. (NB: tree
cooperation is a special case of recursive cooperation.)
(d) Hybrid cooperation is where different experts use horizontal cooperation at
some level in an overall tree or recursive cooperation. On the other hand, they
could equally use tree or recursive cooperation at some point in an overall hor-
izontal cooperation.
Rosenman and Wang [55] introduced the Component Agent-based Design-
Oriented Model (CADOM) for collaborative design. This was a dynamic integrated
model, using an integrated schema to contain data for multiple perspectives, but
IMMUNE: A Collaborating Environment 309
also with flexibility to support dynamic evolution. They recognized five types of
modeling mechanisms for a collaborative design environment (Figure 17).
(a) Integrated mode (Figure 18a): This is an integrated CAD system which works
as a sharable server for all users using an integrated data model and a central
management mechanism. The distributed users register with the main host and
operate the system remotely. However, such an integrated system does not seem
able to meet the complex design requirements needed in a multi-disciplinary
environment. For example, each item on the system will be communicated to
all users.
(b) Distributed-integrated mode (Figure 18b): in this mode, distributed designers
usually have their own domain systems along with a central service module
called a sharable workspace.
(c) Discrete mode (Figure 18c): this is a fully distributed system, where there is
normally no central module but simply a set of distributed domain systems
with discrete models and management mechanisms. The most obvious feature
of this mode is its flexibility, without a central control unit, but many model
interpreters are required between different domain systems.
(d) Stage-based mode (Figure 18d): In this mode a base model is set up at the
first stage, and all subsequent models are derived from the base model. Some
internal mechanisms are provided to control this evolution process. This evolu-
tionary system solves the flexibility of a system, but it requires a great deal of
AI work to develop the system.
(e) Autonomy-based mode (Figure 18e): This is based on the concept of autonomy,
in which each model is implemented as a distributed set of knowledge sources
representing autonomous, interacting components.
310 M. Efatmaneshnik and C. Reidsema
For directed graphs, K is twice the amount presented in (3). If the self map of the
simulated parameter based DSM has k connections (edges), we define the problem
connectivity as:
k
p= (4)
K
It must be noted that the connectivity values presented in Table 4 are sugges-
tions based on the experience gained from the experiments of authors with ran-
dom graphs. More appropriate methods require more theoretical investigations into
graph theory. We propose five modes of knowledge sharing and organizational struc-
ture that correspond to these decomposition modes: independent, integrative, multi-
agent, collaborative and competitive:
IMMUNE: A Collaborating Environment 311
Table 4 Modal decomposition of the problem at every abstraction layer based on the con-
nectivity of the problem
Illustration:
a subsystem
1. Independent mode: in this mode the decomposition agent has managed to fully
decompose the problem; generally, very low self map connectivity leads to such
situations. In this mode there is no collaboration between the coalitions as de-
picted in Figure 19 because when tasks are not interdependent, there is no need
or reason to collaborate [43]. Consequently the need for radical innovation to in-
tegrate the system at the considered abstraction level would be minimal, and the
process will be characterized by short lead times. However collaboration exists
between the design agents inside the same coalition. The CEO monitors the cog-
nitive complexity inside the coalitions using the system knowledge provided by
the agents regarding the degree of interaction with other members of the coali-
tion; this is the only control relationship in the system.
312 M. Efatmaneshnik and C. Reidsema
Fig. 20 Integrated process model. All coalitions exchange information with only one central
coalition
2. Integrative mode: Integrative systems were reported and studied by Sosa et al.
[63]; in these systems all subsystems are connected to a single subsystem, namely
the integrative subsystem. The integrative decomposition is derived from the re-
sequencing [63] of the parameter based DSM usually using integer program-
ming [38]. Simple coordination of the design process makes this mode desirable,
since all the coalitions have to coordinate their communications with only one
integrative coalition. The corresponding organizational structure and integrative
process mode is that illustrated in Figure 20. One drawback of this mode is that
it might be hard for the design agents of the integrative coalition to maintain the
cognitive complexity of the layer above the CEO-prescribed minimum cognitive
complexity; which is to say one module must be able to reach a high cogni-
tive complexity. In other words the coordination in between several coalitions
through one coalition might not be feasible. As such integrative mode is advised
for problems with low self connectivity.
3. Modular decomposition and autonomy based process model: Modular decom-
position results in subsystems having significant connectivity to each other. In
Efatmaneshnik and Reidsema [24] we have shown that the immune decomposi-
tion of systems into modular mode is less than that of integrative decomposition.
As such modular decomposition is desirable for problems with intermediate self
connectivity. The major criteria for clustering algorithms of modular decomposi-
tion are 1) to minimize the connectivity between the subsystems, 2) to maximize
the connectivity of each subproblem. Both these criteria are met by minimization
of the real complexity. Two significant issues that appear as decomposition con-
straints are the maximum number of subsystems and their minimum degree. The
corresponding process model is depicted in Figure 21 and is referred to as an au-
tonomy based process model. In this mode the agents explicitly act autonomously
IMMUNE: A Collaborating Environment 313
in their social behavior. The main characteristic of the autonomy process model
is cooperation amongst the SiFAs inside and across the boundaries of coalitions.
4. Overlap decomposition and collaborative process model: In this mode subsys-
tems are overlapped and they share some of the design variables with each other.
As a result some of the coalitions explicitly share some of the agents, and there
are some agents that have the membership of two or more coalitions (Figure
22). The real complexity is measured for overlap decompositions [24]. The main
characteristic of this process model is the intense collaboration between coali-
tions that makes this mode an information and knowledge intensive process [36].
5. No decomposition and competitive process model: when the problems are very
connected resulting in dense self maps, any kind of decomposition leads to large
departures of the real complexity from the self complexity (CR >> CS ). In
this condition, decomposition may not be a solution to problem tractability.
314 M. Efatmaneshnik and C. Reidsema
Bar-Yam [2] proposed the enlightened process model of problem solving for
very connected problems. It was based on competition and cooperation between
several design teams focused on solving the same problem. The main character-
istic of this process model is competition between the design coalition (Figure
23). In the competitive mode the problem is not decomposed, and each coalition
tackles the entire problem by itself. However, informal cooperation may exist
between the coalitions, although there is no formal and explicit cooperation be-
tween the coalitions. The final solution is chosen from the submitted solutions of
the coalition for the entire problem for a given abstraction level.
The quality of the solution is determined by the control source, based on the
accuracy weights that the coalitions suggest for their solutions. In this mode the
complexity is controlled only at the agents level (inside the coalitions) and the
cognitive complexity arising from the informal cooperation of the coalitions is
ignored.
6 Conclusion
Monte Carlo Simulation and Global Entropy Based Correlation Coefficient [23]:
are used to establish, early in the design phase, the self map of the system which
shows the sensitivity of design and objective variables. This self map is represented
as a weighted graph or parametric based design structure matrix [7]. A complexity
measure is then applied to the graph to measure the complexity of the problem.
Immune Decomposition: The resulting complexity measure obtained through
problem decomposition has been shown to be an indication of the system’s real
complexity which we have shown increases the overall complexity. We have de-
fined this as a system’s real complexity. Additional real complexity is the price paid
for improving the tractability and manageability of a complex problem through de-
composition. (No Free Lunch Theorem). A decomposition that has the least real
complexity leads to a problem space being more robust and immune to chaotic
behaviour.
Modal decomposition of problem space: A problem may be decomposed in sev-
eral modes depending on the connectivity (or coupled-ness) of the problem variables
at each level of abstraction. These decomposition modes are described as being
analogous to the growing connectivity of the problem and are defined as: 1) Full
decomposition, all subsystems (subproblems) are independent for least connected
systems. 2) Integrative or coordination based decomposition; where one subsystem
(named integrative subsystem) is connected to all other independent subsystems that
are independent. (3) Modular or multi-agent decomposition, where all subsystems
or some of them are connected. 4) Overlap decomposition [24], which is similar
to multi-agent decomposition with the exception that some of the subsystems are
overlapped indicating shared design and objective variables. 5) No decomposition
[3] for densely connected systems.
Adaptive Structuration [19]: The proposed architecture is capable of planning
decisions in a metamorphose environment [47] for each of the five mentioned de-
composition modes at every GDDI cycle. Design agents are clustered within each
GDDI cycle as virtual teams or coalitions of agents whose structure mimics the
structure of the problem. Subsequently, and correspondingly to the five modes of
decomposition, the IMMUNE architecture is capable of employing five modes of
design: 1) Independent mode that is fully concurrent problem solving. 2) Integrative
mode which is coordination based problem solving. 3) Autonomy based problem
solving that is cooperative and on the basis of cooperation of several coalitions of
agents. 4) Collaborative problem solving where some of the coalitions of agents
are semi merged and overlapped [36]. 5) Competitive problem solving on basis of
Enlightened Engineering [2] in which several independent coalitions of agents com-
peting to solve the same problem.
Global blackboard data base: Adaptive Structuration is accomplished using a
global blackboard containing the current state of the design at all abstraction levels
[5]. The control source decides on the decomposition mode based on the connectiv-
ity of the problem and then decomposes it on the basis of minimum real complexity.
Monitoring of the design process complexity using complexity bounds: Since the
relationships between system parameters in a complex system are often nonlinear,
the development and integration of such systems is often obscure. Nonlinear systems
316 M. Efatmaneshnik and C. Reidsema
may exhibit chaotic properties that are non-integrative systems. Cognitive complex-
ity is the ability of a person or an organization to integrate a system [42]. In order
to integrate and manage a complex system the problem solving central management
unit requires a complexity that is more, or at least equal, to the complexity of the
problem [2] and that is the cognitive complexity of the organization. In this paper
the minimum and maximum cognitive complexity bounds were measured from the
simulated parameter based design structure matrix by the CEO module of the black-
board control source. The collective cognitive complexity of a product development
organization is tied to the extent its units are connected [42], and therefore can be
measured as a function of the amount of information exchange between the de-
sign agents inside one coalition and in between the coalitions. Correspondingly, the
CEO monitors the cognitive complexity at two hierarchical levels: 1) low level and
inside each coalition 2) high level that is the entire federation of coalitions in any
abstraction level. The CEO informs all the design agents of the amount of cognitive
complexity of their coalition and the federation. The COPE module of the design
agents are then in charge of maintaining the cognitive complexity of the coalitions
and the federation above the announced (by CEO) minimum away from the maxi-
mum bound. The COPE module, decides on the high level interactions mode (pas-
sive or proactive-social) by using the conflict resolution strategies that are passive
like constraint relaxation or proactive such as active negotiation.
By utilizing the aforementioned tools the fragility of the development process
of a complex system may be dealt with. The presented architecture is IMMUNE
against sudden failure in meeting the top level organization objectives including
cost, lead time and the quality of the product. It is often argued that complex sys-
tems are robust yet in the presence of uncertainties they become fragile; this strange
behaviour is related to the chaotic and sensitive characteristic of complex systems.
In the domain of sustainability of the organizations that design complex products
this means that the top level goals may often be robustly met, however, sudden and
large departures from those goals may seem inevitable. To immunize against this
fragility the proposed system advocates coherency in collaboration. That is, the lo-
cally aware design agents (aware of their local tasks) maintain the global coherency,
harmony and order through their COPE module by making the agents’ social be-
haviour subservient to the information the system’s cognitive complexity received
from the CEO module.
References
1. Bar-Yam, Y.: When Systems Engineering Fails — Toward Complex Systems Engineer-
ing. In: Proceedings of International Conference on Systems, Man & Cybernetics, pp.
2021–2028. IEEE Press, Piscataway (2003)
2. Bar-Yam, Y.: Making Things Work: Solving Complex Problems in a Complex World.
NECSI Knowledge Press (2004)
3. Bar-Yam, Y.: Engineering Complex Systems: Multiscale Analysis and Evolutionary En-
gineering. In: Complex Engineered Systems, pp. 22–39. Springer, Heidelberg (2007)
IMMUNE: A Collaborating Environment 317
4. Bayrak, C., Tanik, M.M.: A Process Oriented Monitoring Framework. Systems Integra-
tion 8, 53–82 (1998)
5. Brown, D.C., Dunskus, B., Grecu, D.L., et al.: SINE: Support for single function agents.
In: Applications of AI in Engineering, Udine, Italy (1995)
6. Browning, T.R.: Designing system development projects for organizational integration.
Systems Engineering 2, 217–225 (1999)
7. Browning, T.R.: Applying the Design Structure Matrix to System Decomposition and
Integration Problems: A Review and New Directions. IEEE Transactions on Engineering
Management 48, 292–306 (2001)
8. Carver, N., Lesser, V.: The Evolution of Blackboard Control Architectures. Expert Sys-
tems with Applications 7, 1–30 (1994)
9. Chen, L., Li, S.: Concurrent Parametric Design Using a Multifunctional Team Approach.
In: Design Engineering Technical Conferences DETC 2001, Pittsburgh, Pennsylvania
(2001)
10. Chiva-Gomez, R.: Repercussions of complex adaptive systems on product design man-
agement. Technovation 24, 707–711 (2004)
11. Cisse, A., Ndiaye, S., Link-Pezet, J.: Process Oriented Cooperative Work: an Emergent
Framework. In: IEEE Symposium and Workshop on Engineering of Computer Based
Systems, Friedrichshafen, Germany, pp. 342–347 (1996)
12. Cohen, I.: Real and Artificial Immune Systems: Computing the State of the Body. Imm.
Rev. 7, 569–574 (2007)
13. Corkill, D.D.: Blackboard Systems. AI Expert 6, 40–47 (1991)
14. Corkill, D.D.: Collaborating Software: Blackboard and Multi-Agent Systems & the Fu-
ture. In: International Lisp Conference, New York (2003)
15. Craig, I.D.: Formal Techniques in the Development of Blackboard Systems. In: Research
Report Coventry, Department of Computer Science, University of Warwick, UK (1993)
16. Dasgupta, D.: An Artificial Immune System as a Multi-Agent Decision Support System.
In: IEEE International Conference on Systems, Man and Cybernetics (SMC), San Diego,
California, vol. 4, pp. 3816–3820 (1998)
17. Dembski, W.: No Free Lunch: Why Specified Complexity Cannot Be Purchased without
Intelligence. Rowman & Littlefield Publishers, Inc (2002)
18. DeSanctis, G., Monge, P.: Communication processes for virtual organizations. Organi-
zation Science 10, 693–703 (1999)
19. Desanctis, G., Poole, M.S.: Capturing the Complexity in Advanced Technology Use:
Adaptive Structuration Theory. Organization Science 5, 121–147 (1994)
20. Dissanayake, K., Takahashi, M.: The Construction of Organizational Structure: Connec-
tions with Autopoietic Systems Theory. Contemporary Management Research 2, 105–
116 (2006)
21. Druzdzel, M.J., Flynn, R.R.: Decision Support Systems. In: Kent, A. (ed.) Encyclopedia
of Library and Information Science, vol. 67(30), pp. 120–133. Marcel Dekker, Inc., New
York (2000)
22. Dunskus, B.V.: Single Function Agents and Their Negotiation Behavior in Expert Sys-
tems. Worcester Polytechnic Institute, Worcester (1994)
23. Efatmaneshnik, M., Reidsema, C.A.: Immunity as a Design Decision Making Paradigm
for Complex Systems: a Robustness Approach. Cybernetics and Systems 38(8), 759–780
(2007a)
24. Efatmaneshnik, M., Reidsema, C.A.: Immunity and Information Sensitivity of Complex
Product Design Process in Overlap Decomposition. In: Minai, A., Braha, D., Bar-Yam,
Y. (eds.) Proceedings of 7th ICCS, Boston, MA (2007b)
318 M. Efatmaneshnik and C. Reidsema
25. Formica, A.: Strategic Multiscale A New Frontier for R&D and Engineering. Ontonix,
Turin (2007), http://www.ontonix.com/index.php?page=download&
CID=36
26. Fruchter, R., Clayton, M.J., Krawinkler, H., et al.: Interdisciplinary communication
medium for collaborative conceptual building design. Advances in Engineering Soft-
ware 25, 89–101 (1996)
27. Fyfe, C., Jain, L.: Teams of intelligent agents which learn using artificial immune sys-
tems. Journal of Network and Computer Applications 29, 147–159 (2006)
28. Gero, J.S.: Creativity, emergence and evolution in design. Knowledge-Based Systems 9,
435–448 (1996)
29. Ghanea-Hercock, R.: Survival in cyberspace. Information Security 12, 200–208 (2007)
30. Goel, S., Gangolly, J.: On decision support for distributed systems protection: A per-
spective based on the human immune response system and epidemiology. International
Journal of Information Management 27, 266–278 (2007)
31. Gulati, R.K., Eppinger, S.D.: The coupling of product architecture and organizational
structure decisions. MIT, Cambridge (1996)
32. Henderson, R.M., Clark, K.B.: Architectural Innovation: The Reconfiguration of Exist-
ing Product Technologies and the Failure of Established Firms. Administrative Science
Quarterly 35 (1990)
33. Hinds, P., McGrath, C.: Structures that work: social structure, work structure and co-
ordination ease in geographically distributed teams. In: Proceedings of 2006 the 20th
anniversary conference on Computer supported cooperative work (CSCW 2006), Banff,
Alberta, Canada, November 04-08, pp. 343–352 (2006)
34. Hobday, M., Rush, H., Tidd, J.: Innovation in complex products and system. Research
Policy 29, 793–804 (2000)
35. Kan, J., Gero, J.: Can entropy represent design richness in team designing? In: Bhatt, A.
(ed.) CAADRIA 2005, New Delhi, pp. 451–457 (2005)
36. Klein, M., Sayama, H., Faratin, P., et al.: The Dynamics of Collaborative Design: Insights
from Complex Systems and Negotiation Research. Concurrent Engineering Research &
Applications (2003)
37. Kratzer, J., Leenders, R.T.A.J., Engelen, J.M.L.V.: A delicate managerial challenge: how
cooperation and integration affect the performance of NPD teams. Team Performance
Management 10, 20–25 (2004)
38. Kusiak, A.: Engineering Design: Products, Processes, and Systems. Academic Press,
London (1999)
39. Lander, S.E., Staley, S.M., Corkill, D.D.: Designing Integrated Engineering Environ-
ments: Blackboard-Based Integration of Design and Analysis Tools. Concurrent Engi-
neering: Research and Applications 4, 59–72 (1996)
40. Larson, J.R.: Deep Diversity and Strong Synergy: Modeling the Impact of Variability in
Members’ Problem-Solving Strategies on Group Problem-Solving Performance. Small
Group Research 38, 413–436 (2007)
41. Lau, H., Wong, V.: Immunologic Responses Manipulation of AIS Agents. In: Nicosia,
G., Cutello, V., Bentley, P.J., Timmis, J. (eds.) ICARIS 2004. LNCS, vol. 3239, pp. 65–
79. Springer, Heidelberg (2004)
42. Lee, J., Truex, D.P.: Cognitive Complexity and Methodical Training: Enhancing or Sup-
pressing Creativity. In: 33rd Hawaii International Conference on System Sciences (2000)
43. Leenders, R.T.A.J., Kratzer, J., Hollander, J., et al.: Virtuality, Communication, and New
Product team Creativity: a Social Network Perspective. Engineering and Technology
Management, 69–92 (2003)
IMMUNE: A Collaborating Environment 319
44. Lissack, M.: Complexity: the Science, its Vocabulary, and its Relation to Organizations.
Emergence 1(1), 110–126 (1999)
45. Marczyk, J.: Principles of Simulation Based Computer Aided Engineering. FIM Pub-
lications (1999); BarcelonaSycara, K.: Multiagent Systems. AI Magazine 19(2), 79–93
(1998)
46. Marczyk, J., Deshpande, B.: Measuring and Tracking Complexity in Science. In: Minai,
A., Braha, D., Bar-Yam, Y. (eds.) 6th ICCS, Boston, MA (2006)
47. Maturana, F., Shen, W., Norrie, D.H.: MetaMorph: an Adaptive Agent-Based Achitecture
for Intelligent Manufacturing. International Journal of Production Research 37, 2159–
2173 (1999)
48. Monceyron, E., Barthes, J.P.: Architecture for ICAD Systems: an Example from Harbor
Design. Sience et Techniques de la Conception 1, 49–68 (1992)
49. Prasad, B.: Concurrent Engineering Fundamentals. Integrated Product Development,
vol. II. Prentice Hall, Englewood Cliffs (1996)
50. Quadrel, R.W., Woodbury, R.F., Fenves, S.J., et al.: Controlling asynchronous team de-
sign environments by simulated annealing. Research in Engineering Design 5, 88–104
(1993)
51. Reidsema, C., Szczerbicki, E.: Towards a System for Design Planning in a Concurrent
Engineering Environment. International Journal of Systems Analysis, Modeling, and
Simulation 29, 301–320 (1997)
52. Reidsema, C., Szczerbicki, E.: Blackboard Approach in Design Planning for Concurrent
Engineering Environment. Cybernetics and Systems 29, 729–750 (1998)
53. Reidsema, C., Szczerbicki, E.: A Blackboard Database Model of the Design Planning
Process in Concurrent Engineering. Cybernetics and Systems 32, 755–774 (2001)
54. Reidsema, C., Szczerbicki, E.: Review of Intelligent Software Architectures for the De-
velopment of An Intelligent Decision Support System for Design Process Planning in
Concurrent Engineering. Cybernetics and Systems 33, 629–658 (2002)
55. Rosenman, M., Wang, F.: CADOM: A Component Agent-based Design-Oriented Model
for Collaborative Design. Research in Engineering Design 11, 193–205 (1999)
56. Saad, M., Maher, M.L.: Shared understanding in computer-supported collaborative de-
sign. Computer-Aided Design 28, 183–192 (1996)
57. Shen, W., Barthès, J.P.: An Experimental Multi-Agent Environment for Engineering De-
sign. International Journal of Cooperative Information Systems 5, 131–151 (1996)
58. Shen, W., Norrie, D.H.: A Hybrid Agent-Oriented Infrastructure for Modeling Manufac-
turing Enterprises. In: KAW 1998, Banff, Canada, pp. 117–128 (1998)
59. Shen, W., Norrie, D.H., Barthès, J.-P.: Multi-Agent Systems for Concurrent Intelligent
Design and Manufacturing. CRC Press, Boca Raton (2001)
60. Sinha, R., Liang, V.C., Paredis, C.J.J., et al.: Modeling and Simulation Methods for De-
sign of Engineering Systems. Computing and Information Science in Engineering 1,
84–91 (2001)
61. Sosa, R., Gero, J.: Diffusion of Creative Design: Gate keeping Effects. International
Journal of Architectural Computing 2, 518–531 (2004)
62. Sosa, R., Gero, J.: A Computational Study of Creativity in Design. AIEDAM 19, 229–
244 (2005)
63. Sosa, M.E., Eppinger, S.D., Rowles, C.M.: Designing Modular and Integrative Systems.
In: DETC 2000 International Design Engineering Technical Conferences and Computers
and Information in Engineering Conference, Baltimore, Maryland (2000)
64. Stacey, R.D.: The Science of Complexity: An Alternative Perspective for Strategic
Change Processes. Strategic Management Journal 16, 477–495 (1995)
65. Sycara, K.: Multiagent Systems. AI Magazine, 79–93 (1998)
320 M. Efatmaneshnik and C. Reidsema
66. Timmis, J., Andrews, P., Owens, N., et al.: An Interdisciplinary Perspective on Artificial
Immune Systems. Evolutionary Intelligence 1, 5–26 (2008)
67. Tomiyama, T., Umeda, Y., Ishii, M., et al.: Knowledge systematization for a knowl-
edge intensive engineering framework. In: Tomiyama, T., Mantyla, M., Finger, S. (eds.)
Knowledge Intensive CAD-1, pp. 55–80. Chapman & Hall, Boca Raton (1995)
68. Turban, E.: Decision support and expert systems: management support systems. Prentice
Hall, Englewood Cliffs (1995)
69. Wolf, T.D., Holvoet, T.: Towards a Methodology for Engineering Self-Organising Emer-
gent Systems. In: Self-Organization and Autonomic Informatics, Glasgow, UK, pp. 18–
34 (2005)
70. Wong, A., Sriram, D.: SHARED: An information model for cooperative product devel-
opment. Research in Engineering Design 5, 21–39 (1993)
71. Zdrahal, Z., Motta, E.: Case-Based Problem Solving Methods for Parametric Design
Tasks. In: Proceedings of the Third European Workshop on Advances in Case-Based
Reasoning (EWCBR 1996), pp. 473–486. Springer, London (1996)
72. Zhang, C.: Cooperation under uncertainty in distributed expert systems. Artificial Intel-
ligence 56, 21–69 (1992)
Bayesian Learning for Cooperation in
Multi-Agent Systems
1 Introduction
As computing power and ubiquity increase, the use of multi-agent technology in
large distributed systems is becoming more widespread. For example, sensors are
often now included in new buildings or vehicles. When these sensors are able to
sense intelligently and communicate with one another, they form a multi-agent sys-
tem. Mobile sensors may be able to make inferences about a scenario such as a
terrorist attack or a flood, and provide human teams with uncertainty estimates
and suggest actions. In situations where a human would be at some risk, intelli-
gent communicating machines may be deployed—for example, thousands of UAVs
(unmanned aerial vehicles) can collaborate to search over a wide area [32], or robots
may also act as intelligent agents, aiding or replacing humans to perform a coordi-
nated search of a burning building [33]. Consequently, as such systems develop, the
scalability of complex interacting systems becomes increasingly important.
Mair Allen-Williams and Nicholas R. Jennings
School of Electronics and Computer Science, University of Southampton, SO17 1BJ, UK
e-mail: mhaw05r,[email protected]
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 321–360.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
322 M. Allen-Williams and N.R. Jennings
2 Background
In this Section we introduce the ideas which we will use in our algorithm, explain-
ing the way in which the multi-agent approach to partially-observable systems is
developed from single-agent decision theory, and justifying the decisions we have
made at each step. First, however, we introduce the disaster response domain as a
motivation for this work and identify its key characteristics.
In more detail, we find that taking disaster response as our focus domain drives a
particular interest in collaborative multi-agent domains which include the following
properties:
Decentralisation: In these large and dynamic systems, providing a central con-
troller is likely to be infeasible. Firstly, there are unlikely to be sufficient re-
sources to allow communications between one central controller and every other
node. Secondly, one central controller is almost certainly not going to be able to
obtain a complete view of the system, and the potentially rapid changes as agents
enter and leave the system would be difficult to track.
Dynamism: Realistic systems are rarely static. For example, in disaster recovery
agents must adapt to changing weather conditions, any aftershocks, and unex-
pected events such as building collapse or fires. When taken together, this can
lead to a very dynamic environment.
Partial observability: Along with decentralisation, it is likely that no one agent is
able to see the complete system all the time. Although communication between
agents may extend a particular agent’s view of the system, the agent must con-
tinually make judgements based on an incomplete view.
Bandwidth-limited: Limited communication is a characteristic common to disas-
ter scenarios—for example, mobile phone networks often become jammed [24],
or time constraints can limit opportunities for communication. Thus, agents may
be able to exchange some information, but both time and bandwidth restrictions
will limit these exchanges.
Openness: The rescue agents are likely to be entering or leaving the disaster scene
throughout the rescue operation. Agents may be harmed at the scene and thence-
forth be out of action, while new agents may arrive late. A collaborative model
in a disaster response scenario must therefore be able to adapt to the continual
arrival and loss of agents.
In this most basic model, the agent perceives the state of the world through its sen-
sory inputs, and decides on its immediate action based on this state. Following the
agent’s action, the world transitions into a new state, and the agent may receive some
reward. This model forms the basis of Markov decision theory [36]. The fundamen-
tal feature of this theoretical model is the assumption that the immediate next state
is dependent only on the previous state and choice of action—this is the Markov
property. Although the Markov property may not fully hold, it is often a sufficiently
good approximation, and techniques which use this theory can get good results. This
is demonstrated by many practical examples [18] [35] [2]. With the Markov assump-
tion, if the models describing the transition and reward probabilities are completely
known to the agent then the system can be solved, using a pair of recursive equa-
tions [36] which determine the optimal action from each world state. These are the
326 M. Allen-Williams and N.R. Jennings
Bellman equations. For large systems, there are efficient ways of approximating
these solutions—we do not go into these here, as we will not be dealing with known
MDPs, but refer the interested reader to [36], Chapter 9.
Now, reinforcement learning, combined with the Bellman equations, will allow a
single agent to solve any observable MDP which comes its way. However, although
MDP models will form the basis of our environment, in large or complex scenarios
it is common for an agent to make local observations which allow it to form infer-
ences about the current state (Example 4), without observing the complete state di-
rectly (although in multi-agent systems, local observations may be augmented with
communicated information). When the underlying process of moving from global
state to global state is still (assumed to be) Markov, the scenario is described as a
partially observable Markov decision process, or POMDP, and there are a host of
POMDP-solution techniques.
In particular, when the underlying environmental model is known, the POMDP can
be converted to a continuous Markov decision process by defining a belief state as a
probability distribution over states. The resulting continuous MDP, from belief state
to belief state, can be solved using exact algorithms [7] or using approximations to
make computation easier [3], [20]. If the underlying model is not known, learning
techniques must be used to refine a solution as the agent explores the system. Model-
free approaches, such as [1], have had some success in using learning techniques to
solve POMDPs. However, as discussed, we believe that model-based approaches
may again have benefits—for example, [34] demonstrates a model-based algorithm
which uses variable length suffix trees to address the fact that even if state transitions
are Markov, the observable process may not be. However, existing approaches rely
on a number of approximations and assumptions about the state space, hence are not
entirely satisfactory. A principled approach may be to extend the Bayesian model
described previously into partially observable domains [30].
328 M. Allen-Williams and N.R. Jennings
Perhaps the simplest example: agents functioning in uncertain worlds among other
agents may include others’ behaviour in the Markov state transition model they
develop. However, by doing this they may form inaccurate assumptions about the
world, as agents adapt their behaviour to one another. Consequently, maintaining
models of the world and of other agents separately provides greater flexibility and
may enable the agent to reuse a world model as agents come and go, or reuse models
built for known agents in fresh scenarios. Below, we outline three common ways in
which agents may develop and use models of the world to coordinate.
as the bandwidth and timeliness constraints will typically preclude it. Finally, it is
possible to extend single agent learning into the multi-agent domain. The uncertain-
ties of our target domain make learning techniques a natural approach to problems
within this domain. Learning techniques enable agents to evolve coordinated polices
within uncertain state spaces, either with a group of learners exploring the space and
converging towards an equilibrium (as in [10] and [22]), or by one agent explicitly
learning about the behaviour of others in order to adapt its own appropriately [8].
• By convention, they will drive on the left hand side of the road; they will use sirens to
indicate their approach; they will rescue the elderly, the mothers and the children first.
• They will communicate with the other ambulance drivers, calling things like “road’s gone
up there”, “I think someone needs to check out the North-East of the town”, “We need
three more people to help lift here”. By convention, sirens also communicate with others.
• As they work with other ambulance teams, they will learn which teams need the most
help, which areas of the city have been searched, and how a particular team tends to
operate (for example, whether they use one-two-lift, or one-two-three-lift)
Distinct from the three approaches to coordination identified above, another research
domain which investigates coordination from a theoretical angle is game theory
[21]. In game-theoretic formulations, agents model the scenario as a game and try
and derive, either through exact evaluation or through learning, a best response to
the strategies of the other players in the game (Example 7). If all the players itera-
tively keep playing best responses, and if strategies are mixed (stochastic) the play
will converge to a (mixed) equilibrium, in which every player’s strategy is a best
response to every other player. One of the challenges of game theory is to direct the
play so that convergence is not just to any equilibrium but to an optimal one [10].
Within the domain of game theory, the form of multi-agent learning in which
the agents maintain explicit models of the other agents is described as learning in
stochastic games. One effective approach to extending single-agent reinforcement
learning into this setting is the win-or-learn-fast (WoLF) approach: an agent’s learn-
ing rate is adjusted according to its current performance, without explicitly mod-
elling the other agents [5]. However, WoLF techniques can be improved upon by
using a Bayesian model in which agents maintain beliefs about the behaviour of the
other agents, as well as a probability distribution over world models [8]. The need
for heuristically determined learning rates is then eliminated, while prior informa-
tion about agents can be incorporated.
Game theory is an obvious model for scenarios with heterogeneous and compet-
itive agents, but searching for the optimal Nash equilibrium is also a useful formu-
lation for cooperative problems. WoLF and related approaches are often applied in
such problems, with each agent gradually adjusting to the others so that the whole
system is incrementally improved. Although there is no guarantee that the optimal
330 M. Allen-Williams and N.R. Jennings
Given the aim of learning models of the environment, we have previously discussed
reinforcement learning. However, learning about other agents’ behaviour is typically
a different kind of task from learning about the environment. In a fully observable
domain with the Markov assumption, the optimal action will only ever depend on
the current state. Therefore, agents can learn simple models of the strategies of the
other agents, using multinomial distributions over actions (one for each state) and
updating these distributions either using a simple frequency count or using Bayes’
rule. In the situation in Example 7, this may mean the twins observing the state
(Home empty, Gang in alley) and deciding to loot, or observing the state (Home
empty, Gang at head of alley) and deciding not to risk it. This is known as fictitious
play [17]. Conversely, in scenarios where the full state is unknown to the agent,
simple fictitious play is not appropriate. Each agent may have knowledge of the
environment and a model of the current world state—but this is not sufficient to
respond optimally to the other agents. In a rescue scenario, some rescue tasks require
several agents, and so the agents must come to the same conclusions about when
these tasks are approached. If agents have differing views of the situation, they may
not make the same decisions about urgency, resulting in an ineffective dispersal of
agents. In Example 7, the twins may believe mistakenly that the gang will realise
how unstable the building is, and thus expect the gang to take more care than it does,
or they may not know how desperate one of the gang is for cash.
In principle, each agent can maintain and update a POMDP in which the unknown
POMDP “state” includes the world state, the other agents’ world models, and be-
havioural models for the other agents. In practice, it is not tractable either to up-
date such a model or to determine a best response within it without performing some
approximations—for example, projecting just a small number of steps into the future,
and using a domain-specific heuristic to estimate the values of those future states [14].
Even this heuristic approach relies on each agent being able to predict the compu-
tations of the other agents—each must be initialised with the same random seed. A
different approach to approximation is to restrict the possible opponent strategies to
those which can be described by regular automata, often called finite state machines
or finite automata.
An agent controlled by a finite state machine has a number of internal states, each
associated with an action (or a probability distribution over actions)—this tells the
agent how to act when it reaches this internal state. After taking an action, the agent’s
observations determine its movement to a new internal state. The finite state machine
captures the notion that an agent’s beliefs can be approximated, for the purposes of
decision making, by a variable but finite sequence of past observations, and exam-
ples such as [38] [6] demonstrate that it can be very effective. Furthermore, approx-
imate best responses to finite state machines can be computed efficiently [23].
To date previous work using finite state machines focuses on offline solutions to
multi-agent problems, precomputing responses to every possible belief state. How-
ever, it is impossible for every belief state to be reached: each belief state which is
visited narrows the space of possible future beliefs (at least within a static environ-
ment). For offline solvers without tight time constraints, there may be no problem in
generating redundant information. Other approaches use the intuition that the belief
332 M. Allen-Williams and N.R. Jennings
space need only be divided into sufficient chunks to determine the next action, for
example using principal components analysis on a discretized state space [31]. The
alternative to such techniques is to search for solutions online. This is the only way
of approaching very dynamic systems, or systems where the problem parameters
may not be known in time to perform a comprehensive offline search—as is likely
to be the case in our target domain. Online solutions will, of necessity, be approx-
imate, since any accurate solution projects infinitely far into the future and thus is
effectively an offline solution.
In the next section we expand some of these ideas and describe in detail an al-
gorithm for online cooperative action in partially observable multi-agent systems in
which agent communication is limited to information-sharing. Our algorithm uses
finite state machines to model the policies of the other agents and each agent com-
putes online a best response to its beliefs about these finite state machines.
be important. It balances the importance we place on future states with our need to
accumulate reward now. In practical terms it will be chosen to express the extent of
lookahead appropriate to the problem (consider chess as an analogy: for the most
part, say, 3 steps of lookahead are sufficient to play well). Typically, we will use
a γ value of around 0.8, making lookahead negligible after around ten steps into
the future—in a fragile disaster scenario we expect this to be sufficient for most
planning purposes. It is most common for reinforcement learning algorithms to set
γ between 0.7 and 1, although the choice will depend on the exact problem.
Now, in a fully observable world, O = S and P(ot |st ) = (1 if ot = st , 0 otherwise),
i.e. the agent knows the complete state st at every timestep t. Given the Markov
property, its optimal policy therefore need depend only on the current state. We can
therefore define a policy in a fully observable MDP by π (s) = a, a function from
states to actions. Then, if the strategies of the other agents are known, the agent can
compute its own optimal policy by solving the large simultaneous equation known
as the Bellman Equations (2 and 3), via dynamic programming, and then taking the
policy π ∗ described in equation 4.
In more detail, Qπ (s, a) is the (discounted expected) value of taking action a from
state s, and then working to policy π . Q∗ (s, a) is the (discounted expected) value of
taking action a from state s, and then working to the optimal policy π ∗ . We will use
“best response” to refer to the optimal single-agent action, a, maximising Q(s, a)
throughout this paper as we replace s with more complex models.
There are various ways of efficiently approximating these solutions in large prob-
lems, and for solving in continuous systems. Briefly, the equations can be solved
iteratively, and efficiency is achieved by (a) updating the states most likely to have
changed first, and (b) updating “nearby” states when a state is updated [36]. We do
not go into details of these solution techniques as realistically we are unlikely to
know all the necessary parameters. In the next section we explain how this model is
extended into systems with unknowns.
It is often the case that the agent may not know (in the case of static parameters), or
be able to observe (in the case of state-related values) all the details of the MDP. If
the underlying state s cannot be observed, then the problem becomes a POMDP: a
“partially observable” Markov decision process (Figure 3). At each step, the MDP
proceeds behind the scenes, while the agent makes observations o derived from
the underlying state s, where o is insufficient for the agent to reliably determine s.
O f (s, i) describes the probability density function P(o|s) for agent i.
To solve this POMDP, we can derive from it a secondary MDP—a belief MDP.
The multi-dimensional states of this secondary MDP have one continuous variable,
Bayesian Learning for Cooperation in Multi-Agent Systems 335
b(s), for every possible value s of the underlying state. The value of b(s) indicates
the probability that the underlying state is s, given the agent’s prior knowledge and
the history of observations and actions. The system proceeds from b to b at each
step using Bayes’ rule (equation 6) to update the state probabilities (Figure 4):
In principle, any general techniques for continuous MDP solutions can be used to
solve the belief MDP [36]. However, all belief-state MDPs fall into a particular class
of continuous MDPs, since each belief state restricts the possible future belief states.
More efficient solution techniques exploit the properties of these MDPs [28] [27].
Given this, we can extend the belief MDP idea further to consider cases where the
environmental dynamics, θ , are not known or are partially known. In these cases,
we can consider an underlying MDP which has the dynamics, θ , as one of its state
variables. This MDP has a known transition function: (s, θ ) → (θ (s), θ ). The obser-
vations for the POMDP associated with this MDP will include state transitions as
well as the immediate observations. In principle, this POMDP can be solved exactly
as described above. Finally, the same model extends into the multi-agent world by
including the actions of other agents in the underlying state, and the behaviour func-
tions of other agents in θ . In a partially observable system, the behaviour of another
agent will depend on its beliefs about the state, and so we also add the beliefs over
states of the other agents to our own MDP state:
4.1 Definitions
A deterministic finite state machine has:
• A set of n nodes N = {n1 , . . . nn }
• A set of m edges E = {e1 , . . . en }
• For each node, an associated action a from the set of actions
• For each edge, an associated observation o from the set of observations
One of the nodes is designated as a start node, N0 . We write Act(n) to refer to the
action associated with a node n.
338 M. Allen-Williams and N.R. Jennings
However, there are two problems: one is that observation strings can be of indefi-
nite length, i.e. we may find ourselves storing the entire observation history in order
to accurately build the FSM. The second is that although the FSM is a deterministic
model, the behaviour it is modelling may be neither deterministic nor static. (A third
issue is that we do not in fact know the observation strings, but rather have proba-
bilities over them which are based on our own observations). We therefore wish to
adjust our learning strategy to take these facts into account.
A point to note is that although we do not know the strategies of others or their
optimal strategies, because we do know the MDP and the observation function, we
can make some judgements about how much observation history is likely to be im-
portant in making decisions, providing us with a way of judging the optimal size of
the FSMs.
We propose to sample possible observation strings from our belief state, and
construct a candidate FSM for each sample, using the following tactics in learning
these candidate FSMs:
• Define a maximum number of nodes which can occur in the FSM
• Break the observation history into overlapping observation strings of length l
• Assign each observation string a likelihood based on the frequency of occurrence
and its sample probability, weighting more recent occurrences more highly. Dis-
card completely observation strings older than nt timesteps.
• Rather than resolve inconsistencies by always creating new nodes, resolve incon-
sistencies by appealing to the likelihood of each of the inconsistent strings, and
discarding the least likely
In the next sections we describe in more detail an algorithm for learning FSMs
from observation strings.
For any set of agent behaviours, there may be several possible FSMs. The least
compact FSM for a finite time period has a distinct node for every time step. The
minimal FSM for an agent’s behaviour has the smallest number of nodes neces-
sary to describe the behaviour exactly. Now, finding the minimal FSM is an NP-
complete problem and cannot be approximated by any polynomial-time algorithm
[6]. However, it is possible to learn compact FSMs in polynomial time, for many
practical problems. The US-L* algorithm [6] has polynomial running time and has
been shown to be effective at finding compact models of agent behaviour on small
agent coordination problems—we propose to test it on larger problems.
This algorithm models the FSM using a table, with rows corresponding to ob-
servation string prefixes s, columns corresponding to string suffixes e, and the table
entries corresponding to actions σ . The alphabet of possible observations is Σ . The
table is then partitioned into equivalence classes:
The table must be constructed in such a way that it describes a FSM: that is, it
must be
• consistent: ∀s1 , s2 ∈ S, [C(s1 ) = C(s2 ) → ∀t ∈ Σ ,C(s1 t) = C(s2t)].
• closed : ∀s ∈ SΣ , ∃s ∈ S, s ∈ S, s ∈ C(s )
340 M. Allen-Williams and N.R. Jennings
From such a consistent and closed table a deterministic FSM can be described.
Specifically, US-L* marks entries in the table as either hole entries or permanent
entries. The former are those which can be reassigned as the algorithm tries to re-
adjust the table for consistency. Only when no hole entries can be reassigned is a
new test added to the table. Permanent entries correspond to a fixed action.
The algorithm proceeds by:
• Take a set of observation strings
• Initialise the table so that all the prefixes of the observation strings have an asso-
ciated row in the table, and there is just one column with the empty string.
• Fill in the table entries using the observations, marking entries as hole entries if
they are not supported by previous examples or permanent entries if they are are
supported by previous examples. In order to bound the size of the automaton,
we specify a maximum number of times a hole entry can be changed, basing
the maximum on domain knowledge if it is available: the maximum should de-
pend on the dynamism in the system (since an entry will change if the system is
changing) and on the uncertainty in the system. In our work, we may adjust the
maximum over time using learned domain knowledge.
• Adjust the table to make it consistent, adding new columns to the table where
necessary (adding a new column enables the separation of one equivalence class
into two—this adds at least one new state to the corresponding automaton).
• Adjust the table to close it, adding new rows where necessary.
• Take the next set of observation strings and loop.
This algorithm is designed to be used as an online algorithm for an adaptive agent
to learn models of opponent behaviour, although Carmel and Markovitch only ap-
ply it to repeated two-player games. We will be investigating its application in our
domain, specifying in advance a maximum size for the automata. Now, in order to
make use of these finite state machine models of agent behaviour, our agent (main-
taining these models) must be able to find an optimal response to what it believes to
be the current situation. Referring back to our generic Bayesian model, this means
evaluating Q(b, a) for a belief state b which includes beliefs over finite state ma-
chines. The next section explains how this is done.
where P(b (s )|b, a) = ∑ P(b (s )|s, a ◦ Act(nj ))P(nj , s|b) (8)
nj ,s ,s
and P(b (nj )|b, a) = ∑ P(nj |o )P(o |s, a ◦ Act(nj))P(nj , s|b) (9)
n j ,o ,s
where P(o |s, a ◦ Act(nj )) = ∑ P(s |s, a ◦ Act(nj ))P(o |s ) (10)
s
In our partially observable setting, where the agent does not in fact have knowl-
edge of the policies of the other agents, but rather has beliefs over these policies,
we propose to estimate the best response to the belief state by sampling from the
possible policies to obtain a selection of sets of FSMs, F = {F1, ...Fm }. For each
sample FSM set Fs (containing a FSM for each other agent), the agent computes a
best response action BRi (Fs , b). The action decision is then given by:
m
a = maxai ∑ Pi .δ (BRi (Fs , b) = ai )
i=1
by a single agent who is aiming to adaptively find a best response to the behaviour of
the other agents in the system. Our intent is that when all agents are implementing this
algorithm, adapting to each other, they should converge on a “good” collaborative
solution for the problem. This algorithm, as described below, maintains approximate
models of the other agents in the form of finite state machines. A set of possible
models is held in a belief state which is updated using Bayesian learning. At each
step, the agent uses its observations to update each of the possible models and to
update its beliefs about the world. It then computes the finite-horizon best response
to each of these possible models and weights the possible responses with its belief
in the corresponding model to decide on its action.
• An agent maintains a current belief state, b(X), with beliefs over the variables
X = (s, {o, F, n}) where s is the current state, and {o, F, n} describes a set of
triples: in each triple, o is an observation history and (F, n) are the induced FSM
and current node in the FSM. The belief state contains one such triple for each
other agent in the system. The agent also maintains historical information about
b(s) over a fixed number of steps.
• Several parameters are fixed initially: Fmax the maximum number of nodes in
any FSM, γ the myopia of the agent, nt the horizon length to use in computing
an approximate best response, ol the observation window length. nt may be de-
termined based on γ : roughly, for a state n steps into the future, sn contributes
γ n .r(sn ) towards the discounted future reward. Thus with γ = 0.8 (a common
myopia value), after 10 steps less than 10% of the reward will be contributing
towards the estimates of the future reward. This may be a small enough value to
ignore. If γ is increased to 0.9, then it will take 21 steps before the fraction of the
reward under consideration is reduced below 10%.
• initialise:
The belief state is initialised: b(s) is initialised either to uniform beliefs or biased
based on domain knowledge. The observation strings o are all empty, and the F
have a single node with uniform probabilities over all actions1
• at each step:
– The agent observes the actions of the others and makes observations about the
state: these observations are used to update b(s) using Bayes’ rule.
– The observation samples o are extended into the current time frame to obtain
o , reweighting as appropriate. This is achieved by sampling from the expected
observations of the other agents, given the current observation samples and
b(s). When the length of an observation string exceeds ol , the earliest obser-
vations are dropped. If a sample’s likelihood falls below probability threshold
ps , the sample is discarded, and a new string sampled using b(s) and the stored
history of b(s) over ol previous steps.
– For each observation sample o , update the FSMs F associated with the sample
with the new information in o using US-L*. The weighting given to the FSM
F is the probability of the associated observation sample.
– For each sample FSM, compute an approximate best response, and thus decide
the maximum likelihood best response action a from the FSM weightings as
described in Section 4.3.
1 It would be possible to initialise with a more sophisticated set of F corresponding to shared
conventions relating to the domain, for example encapsulating the knowledge that agents
will run from a burning building. We leave that possibility to future work.
Bayesian Learning for Cooperation in Multi-Agent Systems 343
5 Model Instantiation
In order to test the algorithm on a challenging problem, we implemented a rescue
scenario involving coordinating ambulances. We compared our algorithm with a
current state of the art algorithm and a hand-written solution for this problem. In
this section, we specify the problem as a multi-agent POMDP and explain how we
simplify the observation space.
In more detail, in the rescue problem we have an n by m gridworld. k agents can
move left, right, up or down (constrained, of course, at the edges of the grid). In the
gridworld are buried victims, described by two parameters: D and R. D (‘deadness’)
is a measure of the proximity of the victim to death. When it reaches a maximum
level the victim is dead and subsequently ignored for the purposes of the rescue
problem. R (‘rescue needed’) is a measure of the depth at which the victim is be-
lieved to be buried. Agents digging can reduce R. If R reaches 0 before the victim
dies, then the victim is assumed to be safe. The urgency of the victim therefore
increases with increased D and with increased R, unless R is sufficiently large com-
pared with D that the victim can be considered a lost cause. Figure 5 shows one step
on the grid for a 4x4 grid with three agents.
Specifically, taking the model of Section 3, the various parameters are instanti-
ated in the following way:
States: A state of this world is described by using a pair of variables for each of
the grid squares, characterising the D and R values in the square (we make the
simplifying assumption that there can be at most one victim in the square), and
a variable for each agent, identifying its current square. We use ld and lr discrete
levels to describe D and R, so for each square there are ld *lr possible states, and
for each agent there are m * n possible states, making a total of ((ld ∗ lr )(m∗n) )
possible states.
Agents: We assume that the number of agents, k, is fixed throughout each problem
and known to each agent.
Locations: The location variable for each agent is its current square.
Actions: Agents may take Move actions (left, right, up or down), or Dig actions
in their current square.
Observations: An agent observes some subset of the state variables, so there is one
observation variable for each state variable. The values taken on by observation
variables are those of the corresponding state variable, plus “null”.
344 M. Allen-Williams and N.R. Jennings
Fig. 5 One step of the rescue problem on a 4x4 grid with three agents
Transition function: Move actions move the agent one square in the requested di-
rection, unless this is impossible in which case the action has no effect. Each
square transitions (D, R) independently of other squares, so it is sufficient to de-
fine the transition function for one square. We use two global probabilities, pd
and pr , to specify the probability of the D level changing (this is a constant prob-
ability independent of the action) and of the R level changing if there is a Dig
action. If there is no dig action, R remains unchanged. We assume that if there
are k digs in a square, they are concatenated. Finally, if a square is empty, we
use a further parameter, pa , to define the probability that a victim will appear in
that square. If a victim does appear, the (D, R) levels it has are determined with
uniform probability (greater than 0).
Observation function: Agents are able to see the squares (deadness, rescue-level,
and any other agents in the square) to the left and right, and above and below
them, as well as their own square. Additionally, we define a problem-specific
parameter, v, for the visibility. For every other square, the agent will be able
to see the agent-deadness D in that square with probability v and the rescue-
level R in the square with independent probability v. Since all agent actions are
fully observable, we assume that we can also observe all agent locations. This
‘visibility’ parameter could be justified as some level of communication with a
centralised observer, say a helicopter viewing the scene. We assume no error in
the observation: either a variable is completely and correctly observed or it is not
observed at all.
Reward function: The reward function is a function of both the previous state and
the current state. For each square, if a victim disappears because they have died,
then the reward is decremented by one point. If a victim disappears because they
Bayesian Learning for Cooperation in Multi-Agent Systems 345
have been saved, then there is no change to the reward. Consequently, for this
problem rewards will always be less than or equal to 0.
The above definitions allow us to define beliefs over the values (D, R) of a square
(and thus over the state, since locations are observable), and beliefs over the obser-
vations of other agents, given their locations:
Agent locations: We are certain for all squares how many rescue agents they con-
tain / for all agents where they are located
The square is observed: We are certain of both its parameters
The square is not observed and has not been observed for i timesteps: for each
property sq which may take values x,
6 Experimental Evaluation
In order to test our strategy, we compare it against two other online algorithms: the
state of the art for online partially observable stochastic games is the Bayesian game
approximation using the finite-horizon approximation technique [14], described in
Section 3 (“POSG”). However, for large dynamic problems, this algorithm, which is
exponential in the number of agents, proves to be very inefficient and we find that
for all but the smallest variants of the rescue problem, POSG is too slow to be useful.
Previous work on large dynamic rescue problems of a similar form [26] compares
with a handwritten strategy (“smart”) tailored to the problem, and we do the same
thing. Our handwritten strategy is the strategy that was used by the AladdinRescue
team for ambulance distribution in the Robocup Rescue competition, which inspired
this problem. The algorithm uses a greedy strategy to allocate ambulances to victims
346 M. Allen-Williams and N.R. Jennings
and is optimal in scenarios where (1) no new victims are arriving and (2) visibility is
perfect [29]. It is therefore not an optimal strategy for the problem as we have stated
it, but is a good approximation, thus providing a good target for our algorithm to meet.
Comparing against these two algorithms, and using the null policy in which
agents move randomly, but never dig and so never effect any rescues (“null”)
as a baseline, we investigate our algorithm, “best response” over different pa-
rameter settings on the rescue problem, and then focus on the scaling properties of
the algorithm. Next, we identify the fixed parameters and then go on to our results.
120
smart
best response, samplerate=10
best response, samplerate=50
100
80
reward (scaled)
60
40
20
70
reward (scaled)
60
50
40
Fig. 6 Comparison of two algorithms over time on a 7x7 grid with 3 agents. Note that we use
a log scale to show more clearly the differences between the algorithms, and the rewards are
scaled up to > 0 for the log scale.
348 M. Allen-Williams and N.R. Jennings
over time. From Figure 6(a) it is not clear that there is a large improvement in
this advantage—that is, the lines are fairly straight. However, Figure 6(b) shows
a closeup comparison of the two different sampling rates, showing the way in which
the lower sampling rate is able to match the performance of the higher sampling
rate after around 800 steps. We therefore see that with better information, the best
response algorithm is able to perform well on this problem even without accurate
models of the other agents, but when the sampling rate is very low, the best response
algorithm is able to compensate for this by learning.
Consequently, it seems that the best response algorithm is performing well
primarily on the basis of the sampled best response, rather than accurate estimates of
the behaviour of the others being critical. In order to investigate further, we compare
the algorithms on some smaller problems which the POSG algorithm is able to run
on, first looking at the effects of changing sample rates in more detail, and then
varying two parameters relating to the character of the problem (visibility and victim
distributions). This allows us to gain insights into the performance of our algorithm
as the problem nature is changed. We also investigate parameters relating to the scale
of the problem (number of agents, and size of grid). For each of these experiments
we compare the total reward after 150 steps—from Figure 6 we can see that this is
sufficient to show the differences between the algorithms or settings.
smart
null
0 best response
POSG
-5
reward
-10
-15
-20
0 20 40 60 80 100
sample rate
(a) 2 agents on 3x3 grid
null
0 smart
best response
-5
-10
reward
-15
-20
-25
-30
10 20 30 40 50 60 70
sample rate
(b) 3 agents on 5x5 grid
Fig. 7 Effects of changing the sampling rate with two and three agents
350 M. Allen-Williams and N.R. Jennings
These results indicate that similar actions are selected even with a small number
of samples, perhaps because the best response can be estimated well, and the best
response performs well with small sampling rates, making it possible for the
algorithm to be very efficient. This compares favourably with the POSG algorithm
which approaches optimality at high sampling rates but performs very badly at low
sampling rates, at least for this type of problem. We do not investigate the POSG
algorithm in the larger version of the problem (Figure 7(b)) but we see that as for
the larger problems above, the best response algorithm slightly outperforms
the smart policy, due to its better handling of imperfect visibility. The next section
investigates the effects of visibility in more detail.
smart
5 best response
POSG
0
reward
-5
-10
-15
-20
0 0.2 0.4 0.6 0.8 1
visibility
(a) 3x3 grid
smart
best response
0
-5
reward
-10
-15
-20
0 0.2 0.4 0.6 0.8 1
visibility
(b) 7x7 grid
smart
null
0 best response
POSG
-10
-20
reward
-30
-40
-50
-60
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
arrival rate
(a) 2 agents on 3x3 grid
0 smart
null
best response
-50
-100
reward
-150
-200
-250
-300
10 smart
null
best response
0
-10
reward
-20
-30
-40
-50
-60
2 3 4 5 6 7
number of agents
(a) 7x7 grid
20
smart
null
best response
0
-20
reward
-40
-60
-80
-100
2 3 4 5 6 7
number of agents
(b) 9x9 grid
Fig. 11 Effects of increasing the number of agents on the results for two large grids
Bayesian Learning for Cooperation in Multi-Agent Systems 355
smart
null
best response
0
-50
reward
-100
-150
3 4 5 6 7 8 9 10 11 12
grid size
(a) 3 agents
smart
null
best response
0
-50
reward
-100
-150
3 4 5 6 7 8 9 10 11 12
grid size
(b) 5 agents
Fig. 12 Effects of changing the grid size on the results for 3 and for 5 agents
356 M. Allen-Williams and N.R. Jennings
best response improves over the smart policy more as the grid size increases,
a consequence of the way in which the best response policy incorporates un-
certainty and the need for search on larger grids. The results are very similar for both
three agents (Figure 12(a)) and five agents (Figure 12(b)) although, as expected, five
agents are able to make more rescues than three agents (the lines are slightly flatter).
Thus, we have observed that the best response algorithm performs well
by comparison with a handwritten strategy designed for the same problem, and
requiring much less sampling than the POSG algorithm to achieve this perfor-
mance. Although the best response algorithm typically has similar performance to
the handwritten strategy, it is consistently outperforming it. Furthermore, the best
response algorithm scales well, solving problems with many states and increas-
ing numbers of agents and improving on the handwritten strategy for these large
problems. Since in general we anticipate our algorithm to be useful in scenarios
where no good handwritten strategy is available, especially as the problem scales,
the best response algorithm seems promising. Further improvements are dis-
cussed in the next section.
Firstly, we propose to improve upon the learning of the FSM, using automatic
state clustering. In the rescue problem, and in many other problems, groups of states
can be considered equivalent by the agents. As a simple demonstration, note that
there are several symmetries in our example problem: at every step the grid can
be rotated until our agent is towards, say, the bottom right, dividing state space
into equivalence classes with four states in each class, one corresponding to each
rotation (90o , 180o, 270o , 0o ). More generally, we need only as many abstract states
as there are joint actions, associating every underlying state with its optimal joint
action. However, in practice, particularly if we plan to re-use parts of our model,
reducing it purely to joint actions will be too abstract. An appropriate abstraction
algorithm should be adaptable, allowing us to change our mind about which action
is associated with a particular state, should allow us to update clusters incrementally
and should not tie us to any predefined set of clusters. We propose to use a form of
statistical clustering based on that described in [18] for this purpose.
A second area of improvement is to better exploit the information available to
agents. We are investigating a complex problem domain in which some domain
knowledge can be assumed. We may also be able to assume some level of rationality
in the other agents (akin to coordination conventions). As we develop our models of
the agents, we have discussed how we can use these models to improve our beliefs
about the agents’ observations, applying Bayes’ rule. However, it may be possible to
make more sophisticated belief updates by considering the observations which we
make and the observations which other agents will make to be correlated streams
of information. Techniques such as the Kalman Filter [40] are able to operate over
correlated streams of information to make more accurate estimates about the value
of any particular point and to estimate missing data [25]. These techniques could be
applied (with caution) to our estimates of the observations of the other agents and
of the current state.
Thirdly, we propose to move beyond the scope of the current work, considering
cases in which the environmental dynamics are unknown or are changing, and in
which agents are able to enter and leave the environment as the problem progresses.
As discussed in Section 3, the algorithm we have presented can in principle be used
to learn fixed parameters such as parts of the environmental dynamics, by treating
these parameters as a part of a “grand state” from which observations are made.
Indeed, related work [8] [30] has done this for some special cases.
In this context, given the uncertainties of our domain, it is clear that if the be-
haviour of the other agents is completely unknown, and the current state is un-
known, and the environmental parameters are completely unknown, an agent must
stumble around “in the dark” for some considerable time before it can begin to
get a handle on good or optimal behaviour. However, in the typical scenarios moti-
vated by our example domain of disaster response, an agent will have strong prior
information about some or all of the unknown parameters. For example, the other
agents may be assumed to be rational and cooperative, thus likely to behave in a
near-optimal way. In our example problem, the form of the transition function may
be known, but not the exact values of every parameter. By incorporating all the
information available to the agent into its model, and particularly by correlating in-
formation, we anticipate that our model will be able to handle problems in which
the environmental dynamics are not completely known using the theoretical form
laid out in Section 3.
358 M. Allen-Williams and N.R. Jennings
Following on from this, our model will easily handle scenarios in which the num-
ber of agents changes (but is known to our agent) over time. Since the best response
is computed at each step, there will be no difficulty in computing a best response
over a subset of the other agents, or in adding a new agent model to the collection.
Our agent will adapt continually during the problem run. Similarly, if the agent is
learning the environmental dynamics, and those dynamics change, the agent should
adjust its model smoothly.
With these improvements, we anticipate that the model of Section 3 can be used
as the basis of an algorithm capable of solving medium-sized distributed collabo-
rative problems in the real world, such as traffic management, controlling search
robots in a building after a fire, or distributing ambulances during a disaster. Similar
algorithms could also be included in software which could be loaded onto handheld
devices to aid human decision-making during critical situations such as war or a
large-scale disaster.
References
1. Aberdeen, D., Baxter, J.: Scaling internal-state policy-gradient methods for POMDPs.
In: Proceedings of the 19th International Conference on Machine Learning, vol. 2, pp.
3–10. Morgan Kaufmann, San Francisco (2002)
2. Abul, O., Polat, F., Alhajj, R.: Multiagent reinforcement learning using function ap-
proximation. IEEE Transactions on Systems, Man, and Cybernetics, Part C 30, 485–497
(2000)
3. Amato, C., Bernstein, D.S., Zilberstein, S.: Solving POMDPs using quadratically con-
strained linear programs. In: Proceedings of the fifth international joint conference on
Autonomous agents and multiagent systems, pp. 341–343. ACM Press, New York (2006)
4. Boutilier, C.: Planning, learning and coordination in multiagent decision processes. In:
Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge,
pp. 195–210. Morgan Kaufmann Publishers Inc., San Francisco (1996)
5. Bowling, M., Veloso, M.: Rational and convergent learning in stochastic games. In: In-
ternational Joint Conferences on Artificial Intelligence, pp. 1021–1026 (2001)
6. Carmel, D., Markovitch, S.: Learning models of intelligent agents. In: Proceedings of
the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, vol. 2,
pp. 62–67 (1996)
7. Cassandra, A., Littman, M., Zhang, N.: Incremental pruning: A simple, fast, exact
method for partially observable Markov decision processes. In: Proceedings of the 13th
Annual Conference on Uncertainty in Artificial Intelligence, pp. 54–61. Morgan Kauf-
mann, San Francisco (1997)
8. Chalkiadakis, G., Boutilier, C.: Coordination in multiagent reinforcement learning: a
Bayesian approach. In: Proceedings of the second international joint conference on Au-
tonomous agents and multiagent systems, pp. 709–716. ACM Press, New York (2003)
9. Clark, A., Thollard, F.: PAC-learnability of probabilistic deterministic finite state au-
tomata. Journal of Machine Learning Research 5, 473–497 (2004)
10. Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multi-
agent systems. In: Proceedings of the fifteenth national/tenth conference on Artificial
intelligence/Innovative applications of artificial intelligence, Menlo Park, CA, USA, pp.
746–752. American Association for Artificial Intelligence (1998)
Bayesian Learning for Cooperation in Multi-Agent Systems 359
11. Dearden, R., Friedman, N., Andre, D.: Model-based Bayesian exploration. In: Proceed-
ings of the 15th Annual Conference on Uncertainty in Artificial Intelligence, pp. 150–
159. Morgan Kaufmann, San Francisco (1999)
12. Durfee, E.H.: Practically coordinating. AI Magazine 20(1), 99–116 (1999)
13. Dutta, P.S., Dasmahapatra, S., Gunn, S.R., Jennings, N., Moreau, L.: Cooperative infor-
mation sharing to improve distributed learning. In: Proceedings of the AAMAS 2004
workshop on Learning and Evolution in Agent-Based Systems, pp. 18–23 (2004)
14. Emery-Montemerlo, R., Gordon, G., Schneider, J., Thrun, S.: Approximate solutions
for partially observable stochastic games with common payoffs. In: Proceedings of the
Third International Joint Conference on Autonomous Agents and Multiagent Systems,
Washington, DC, USA, pp. 136–143. IEEE Computer Society, Los Alamitos (2004)
15. Fischer, F., Rovatsos, M., Weiss, G.: Hierarchical reinforcement learning in
communication-mediated multiagent coordination. In: Proceedings of the Third Inter-
national Joint Conference on Autonomous Agents and Multiagent Systems, Washington,
DC, USA, pp. 1334–1335. IEEE Computer Society, Los Alamitos (2004)
16. Fitoussi, D., Tennenholtz, M.: Choosing social laws for multi-agent systems: Minimality
and simplicity. Artificial Intelligence 119(1-2), 61–101 (2000)
17. Fudenberg, D., Levine, D.K.: The Theory of Learning in Games. MIT Press, Cambridge
(1998)
18. Hoar, J.: Reinforcement learning applied to a real robot task. DAI MSc Dissertion, Uni-
versity of Edinburgh (1996)
19. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observ-
able stochastic domains. Artificial Intelligence 101(1-2), 99–134 (1998)
20. Kim, Y., Nair, R., Varakantham, P., Tambe, M., Yokoo, M.: Exploiting locality of interac-
tion in networked distributed pomdps. In: Proceedings of the AAAI Spring Symposium
on Distributed Plan and Schedule Management (2006)
21. Leslie, D.: Reinforcement learning in games. PhD thesis, University of Bristol (2004)
22. Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In:
Proceedings of the 11th International Conference on Machine Learning, New Brunswick,
NJ, pp. 157–163. Morgan Kaufmann, San Francisco (1994)
23. Marecki, J., Gupta, T., Varakantham, P., Tambe, M.: Not all agents are equal: scaling up
distributed POMDPs for agent networks. In: Proceedings of the Seventh International
Joint Conference on Autonomous Agents and Multiagent Systems (2008)
24. National Research Council. Summary of a Workshop on Using Information Technology
to enhance Disaster Management. National Academies Press (2005)
25. Osborne, M.A., Rogers, A., Ramchurn, S., Roberts, S.J., Jennings, N.R.: Towards real-
time information processing of sensor network data using computationally efficient
multi-output gaussian processes. In: International Conference on Information Process-
ing in Sensor Networks, April 2008, pp. 109–120 (2008)
26. Paquet, S., Tobin, L., Chaib-draa, B.: An online POMDP algorithm for complex multi-
agent environments. In: Proceedings of the fourth international joint conference on Au-
tonomous agents and multiagent systems, pp. 970–977. ACM Press, New York (2005)
27. Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for
POMDPs. In: International Joint Conference on Artificial Intelligence, August 2003, pp.
1025–1032 (2003)
28. Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete bayesian
reinforcement learning. In: Proceedings of the 23rd international conference on Machine
learning, pp. 697–704. ACM, New York (2006)
29. Ramamritham, K., Stankovic, J.A., Zhao, W.: Distributed scheduling of tasks with dead-
lines and resource requirements. IEEE Transactions on Compututers 38(8), 1110–1123
(1989)
30. Ross, S., Chaib-draa, B., Pineau, J.: Bayes-adaptive POMDPs. In: Neural Information
Processing Systems (2008) (in press)
360 M. Allen-Williams and N.R. Jennings
31. Roy, N., Gordon, G.: Exponential family PCA for belief compression in POMDPs. In:
Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing,
Vancouver, Canada, December 2002, pp. 1043–1049 (2002)
32. Scerri, P., Liao, E., Xu, Y., Lewis, M., Lai, G., Sycara, K.: Coordinating very large
groups of wide area search munitions. In: Theory and Algorithms for Cooperative Sys-
tems (2005)
33. Scerri, P., Sycara, K., Tambe, M.: Adjustable autonomy in the context of coordination.
In: AIAA 3rd Unmanned Unlimited Technical Conference, Workshop and Exhibit (2004)
(invited paper)
34. Shani, G., Brafman, R.I., Shimony, S.E.: Model-based online learning of POMDPs. In:
Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS,
vol. 3720, pp. 353–364. Springer, Heidelberg (2005)
35. Smith, A.J.: Dynamic generalisation of continuous action spaces in reinforcement learn-
ing: A neurally inspired approach, Ph.D. thesis, Division of Informatics, Edinburgh Uni-
versity, UK (2002)
36. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cam-
bridge (1998)
37. Tambe, M., Adibi, J., Alonaizon, Y., Erdem, A., Kaminka, G.A., Marsella, S., Muslea,
I.: Building agent teams using an explicit teamwork model and learning. Artificial Intel-
ligence 110(2), 215–239 (1999)
38. Vu, T., Powers, R., Shoham, Y.: Learning against multiple opponents. In: Proceedings of
the fifth international joint conference on Autonomous agents and multiagent systems,
pp. 752–759. ACM, New York (2006)
39. Wang, F.: Self-organising communities formed by middle agents. In: Proceedings of the
first international joint conference on Autonomous agents and multiagent systems, pp.
1333–1339. ACM Press, New York (2002)
40. Welch, G., Bishop, G.: An introduction to the Kalman filter. Technical report, University
of North Carolina at Chapel Hill, Chapel Hill, NC, USA (1995)
41. Wooldridge, M.: An Introduction to Multi-agent Systems. Wiley, Chichester (2002)
Collaborative Agents for Complex Problems
Solving
1 Introduction
Complex problem solving typically requires diverse expertise and multiple tech-
niques. Over the last few years, Multi-Agent Systems (MASs) have come to be
perceived as a crucial technology, not only for effectively exploiting the increasing
availability of diverse, heterogeneous, and distributed on-line information resources,
but also as a framework for building large, complex, and robust distributed infor-
mation processing systems which exploit the efficiencies of organized behaviour.
MAS technology is particularly applicable to complex problem solving in many ap-
plication domains, such as distributed information retrieval [22], traffic monitoring
systems [32], and Grid computing [35], etc.
Minjie Zhang, Quan Bai, Fenghui Ren, and John Fulcher
School of Computer Science and Software Engineering, University of Wollongong,
Wollongong NSW 2522, Australia
e-mail: {minjie,quan,fr510,john}@uow.edu.au
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 361–399.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
362 M. Zhang et al.
A MAS comprises a group of agents, which can collaborate when dealing with
complex problems, or alternatively perform tasks individually with high autonomy.
In a MAS, agents can be characterised as either ‘self-interested’ or ‘cooperative’
[21] [34]. When different types of agents work together, management of their inter-
actions is a very important and challenging issue for the success of MASs.
This chapter introduces two main approaches for complex problem solving via
agent cooperation and/or competition, these being (i) a partner selection strat-
egy among competitive agents, and (ii) dynamic team forming strategies among
cooperative agents.
This chapter is organised as follows. Section 2 provides some background knowl-
edge and definitions relevant to agents and MASs. In Section 3, a dynamic team-
forming approach for MASs in open environments is introduced, which can be
used among both cooperative and self-interested agents. In Section 4, a fuzzy logic
approach for partner selection among self-interested agents via agent competi-
tion is discussed in detail. The chapter concludes and further research outlined in
Section 5.
Definition 1. A MAS that contains agents with distinct or even competitive individ-
ual goals is defined as a self-interested MAS.
Definition 2. A MAS that contains agents with common goals is defined as a coop-
erative MAS.
Normally, agents of a cooperative MAS work together toward maximising the real-
ization of their common goal(s).
An example of a cooperative MAS application is RoboCup [4] [5] [18]. In a robot
soccer team, all robot players (agents) collaborate to achieve their common goal,
i.e., winning the game. A typical example of a self-interested MAS is an agent-
based e-Commerce system in an electronic marketplace [15] [23] [39] [40]. In an
electronic marketplace, different agents work in the same environment toward non-
cooperative individual goals. However, agents still need to collaborate with others in
order to maximise their individual utilities, i.e. purchase/sell items collaboratively
in order to obtain the best price(s).
Collaborative Agents for Complex Problems Solving 363
Scenario 1
In a general service composition system, a number of services need to be combined
together to execute a task in the system. For instance, if we want to transport goods
overseas, we have to combine several kinds of services together, which might in-
clude packing service, road transport service, custom elated service and shipping
service. An agent in a service composition system is normally used to represent a
364 M. Zhang et al.
particular service, and the resource of the agent is the service that the agent can
provide. In such a system, agents must work with each other like a team in order to
achieve the desired goal i.e. to execute tasks cooperatively because each task must
be accomplished by more than one services.
Scenario 2
A car buyer wants to purchase a car. However, there are several prospective sell-
ers. To avoid extensive negotiation with each seller, the buyer should filter out some
‘impossible’ car sellers. For example, if a car seller’s bid is much higher than the
buyer’s expectation or the seller’s reputation cannot be trusted by the buyer, then the
buyer will filter out such car sellers by employing the partner selection approach be-
fore the negotiation starts. During the negotiation, in order to maximise self’s profit,
the car buyer can predict its negotiation partner’s behaviors and make corresponding
responses. For example, for a car buyer in a hurry, if he estimates that a car seller
cannot make further concession, then he will not spend more time on the current
bargaining but looks for another possible seller. On the other hand, for a patient car
buyer, if he estimates that a car seller still has scope to make future concessions, then
the car buyer will make more effort on the bargaining. Therefore, by employing the
behaviours prediction approach, the agent can get some advantages in bargaining.
In distributed and complex problem solving, many MAS applications face a
similar situation as Scenario 1, such as Web-based grid computing, distributed
information gathering, distributed monitoring systems, automated design and
production lines. Scenario 2 is a typical example for self-interested MASs in the
domain of e-commerce and frequently happens in wide agent-based e-trading and
e-market places. Section 3 and Section 4 introduce the detail definitions and princi-
ples about two proposed approaches for agent collaboration, and also demonstrate
experimental results about how to achieve agent collaboration through dynamic
team formation in Scenario 1, and how to achieve agent collaboration by using a
partner selection strategy in Scenario 2, respectively.
Published tasks are accessible to all individual agents and agent teams within the
system. The number of agents in the system can be dynamic; agents can enter and
leave the system at will. However, agents need to publish and remove their regis-
tration information on the system Agent Board before they so enter (leave). The
registration information records the skills and status of an agent (see Definition 4).
Agent abilities are limited. To perform tasks beyond their individual ability, an
agent needs to collaborate with other agents through joining or forming a team.
Each agent team is composed of one (and only one) team leader and several team
members. After an agent joins an agent team, it can receive payments from the agent
team. At the same time it needs to work for the agent team for a certain period. The
payment and serving term are described in the contract (see Definition 5) between
the team member and the team leader.
Before presenting the team-formation mechanism, some important definitions
and assumptions are given.
Definition 7. A Contributor Set CSi j (CSi j ⊂ MSi ) of agent team ATi is the set
of agents that participate in performing task t j , where t j is a task of ATi . For
a one-shot team, the Contributor Set is equal to MSi of the team (also refer to
Definition 6).
Definition 8. For agent team ATi , a Member Contribution mci jk is the contribution
of agent ak , where ak ∈ CSi j , in performing task t j (ti = w, Ri ). mci jk equals w/N,
where N is the size of Contributor Set and w is the task reward.
10. AT j starts to perform ti ; the team leader and the team members of AT j modi-
fies/modify its/their statuses to (1, 1, 0) and (1, 2, 0), respectively;
11. AT j accomplishes ti ; agents of AT j modify their statuses to (0, 0, 0) and are
released from the team.
In the long-term team-formation mechanism, the agent team will not be dissolved
after performing tasks. On the contrary, the team leader gives the team members
some payment to maintain the cooperative relationship, even if the team member
does not contribute to accomplishing the task.
The long-term team strategy normally includes the following processes [30]:
1. Team leader ai finds several free agents, whose status values are (0, 0, 0), from
the Agent Board and sends them contracts in order to form a team with them.
Agents modify their status to (0, 1, ti j ) if they accept the contracts. In this case,
agent team ATi is formed successfully;
2. Team leader ai searches the Task Board for a suitable task and bids on task
tk (tk = wk , Rk ), where Rk ⊆ T Ri and wk ≥ ∑ j|a j ∈MSi (Si j + gi ) (also refer to
Definitions 3 through 6).
3. If tk is successfully bid by team leader ai , ai assigns tk to team member
a p , aq ...an , where R p ∪ Rq , ..., ∪Rn is the minimum set that satisfies Rk ⊆ R p ∪
Rq , ..., ∪Rn . At the same time, a p , aq , ..., an modify their status to (1, 1,tip ),
(1, 1,tiq ), ..., (1, 1,tin ). Also, for this task performance, the Contributor Set CSik
(refer to Definition 7) should be {a p , aq , ..., an };
4. a p , aq , ..., an modify their status to (0, 1,tip ), (0, 1,tiq ), ..., (0, 1,tin ) after tk is ac-
complished;
5. team leader ai awards team member am (am ∈ ATi ) with (scim + sdim ) if am ∈
CSik , or sdim if am is not in CSik ;
In addition, if the team leader ai or team member a p wants to terminate the con-
tract before the contract completion time tip , they may process the following two
steps:
1. ai /a p terminates cip with a p /ai , and pays pip to a p /ai ;
2. a p is released from ATi , and its status modified to (0, 0, 0).
One-shot teams are suitable for dynamic MAS application domains. They always
maintain loosely-coupled relationships among agents by default. However, agents in
dynamic applications may also need to keep stable organisations in some situations.
For example, the tasks may have some similarity, and their requirements might be
similar (which means they may just need similar agent teams). In this case, frequent
grouping and regrouping is not necessary, since each such grouping consumes some
Collaborative Agents for Complex Problems Solving 371
system resources. In contrast with one-shot teams, long-term teams can greatly re-
duce the system overhead caused by grouping and regrouping. However, most cur-
rent long-term team formation strategies cannot figure out when agents should form
long-term teams, which agents should be included, and how long the relationships
should be maintained. For self-interested MAS applications, keeping unnecessary
long-term cooperative relationships could be dangerous and harmful for the overall
system performance. Features of one-shot and long-term teams are summarised and
compared in Table 2.
In general, agents that always contribute to performing tasks and can bring more
benefits to the team are the most valuable members of an agent team. These agents
should be kept on the team for a long time. By contrast, an agent team should not
include agents that bring little contribution to the team. In this mechanism, two
372 M. Zhang et al.
factors, namely Utilisation Ratio (ur) and Contribution Ratio (cr), are used to eval-
uate the value of a team member.
Definition 9. Utilisation Ratio urMk (urMk ∈ [0, 1]) is the frequency with which a
team member ak has participated in the most recent M tasks of the agent team ATi .
It can be calculated using Equation 1. The value of the parameter M is chosen by
team leaders or assigned by users. Team leaders can also adjust M values according
to environmental situations and team performance.
M
1
urMk = ∑M ( k|ak ∈ CSi j ) (1)
j=1
Definition 10. Contribution Ratio crMk (crMk ∈ [0, 1]) is the ratio that team mem-
ber ak has contributed to the agent team ATi in the most recent M tasks. It can be
calculated using Equation 2 (also refer to Definition 8).
The following example shows how to evaluate team members through Utilisa-
tion Ratio and Contribution Ratio. Suppose t1 =< 40, R1 >,t2 =< 50, R2 > and
t3 =< 60, R3 > are the three most recent tasks accomplished by agent team ATi .
a p , aq , ar and as are the team members of ATi . Team members that participate in the
three tasks are {a p, aq }, {a p, ar } and {a p, aq }, respectively. According to Equations
1 and 2 , the Utilisation Ratio and Contribution Ratio values of a p , aq , ar and as are:
(40/2+50/2+60/3)
a p : ur3p = 1, cr3p = (40+50+60)
= 0.5
(40/2+60/3)
aq : ur3q = 0.67, cr3q = (40+50+60)
= 0.33
50/2
ar : ur3r = 0.33, cr3r = (40+50+60) = 0.17
as : ur3s = 0, cr3p = 0
Comparing Utilisation Ratio and Contribution Ratio values of the four team
members of ATi , it can be seen that a p is the most important member of ATi , since it
frequently participated in recent tasks and contributed the most benefit to the team.
On the other hand, as did not participate in recent tasks and contributes nothing
to ATi .
With Utilisation Ratio and Contribution Ratio, a team leader can evaluate contri-
butions of team members. However, to make reasonable contracts with team mem-
bers, a team leader also needs to evaluate whether it is easy to find similar agents
(which possess similar resources and skills) in the MAS. In this mechanism, Agent
Collaborative Agents for Complex Problems Solving 373
Definition 11. Agent Resource Availability arak : arak is the ratio of available
agents (which do not have a team/task) that possess the same or more resources
than team member ak . It can be calculated using Equation 3 (Note: Nav here is the
available agent number of the MAS).
Rk ⊆Ri
1
arak = ∑ N
(3)
s =(0,0,0) av
i
For example, suppose that ak is a team member of ATi . Currently, there are ten
out of twenty available agents in the MAS, which possess the same or more re-
sources than ak . Hence, the Agent Resource Availability value of team member ak is:
arak = 0.5.
According to the values of Utilisation Ratio, Contribution Ratio and Agent Resource
Availability, in this mechanism, team leaders use a fuzzy method to determine co-
operation durations and cost with their team members.
Input and Output Parameters:
In the fuzzy method, Utilisation Ratio, Contribution Ratio and Agent Resource
Availability are input parameters. The output parameters are Contract Term ct and
Commission Amount ca. These parameters are defined in Definitions 12 and 13.
Definition 12. Contract Term ctk is the parameter which denotes the duration that
agent ak should be kept in the agent team. It is an output parameter that needs to
be identified through the fuzzy method. The working range of Contract Term is
[0, MAXTERM]. MAXTERM, which is a constant that is defined in the MAS, and
denotes the maximum time period that an agent can be kept in an agent team.
Definition 13. Commission Amount cak is the parameter that denotes the maximum
commission that the agent team should pay to agent ak in order to keep it in the
team. It is an output parameter that needs to be identified through the fuzzy method.
The working range of Commission Amount is [0, MAXPAY], where the parameter
MAXPAY is decided by the team leader. MAXPAY denotes the maximum payment
that an agent team can afford to keep a single agent as a team member.
four fuzzy sets. The membership functions for these four fuzzy sets are defined in
Equations 4 through 7, respectively. They are also depicted in Figure 2.
1 − 5x x ∈ [0, 0.2]
FNever (x)/FNone (x) = (4)
0 x ∈ [0, 0.2]
⎧
⎪
⎪10x − 1 x ∈ [0.1, 0.2]
⎪
⎨1 x ∈ (0.2, 0.3)
FSeldom (x)/FLittle (x) = (5)
⎪
⎪4 − 10x x ∈ [0.3, 0.4]
⎪
⎩
0 x ∈ [0.1, 0.4]
⎧
⎪
⎪ 10x − 3 x ∈ [0.3, 0.4]
⎪
⎨1 x ∈ (0.4, 0.6)
FMedium (x) = (6)
⎪
⎪ 7 − 10x x ∈ [0.6, 0.7]
⎪
⎩
0 x ∈ [0.3, 0.7]
⎧
⎪
⎨10x − 6 x ∈ [0.6, 0.7]
FFrequent (x)/FHuge (x) = 1 x ∈ (0.7, 1] (7)
⎪
⎩
0 x ∈ [0.6, 1]
For ara, three linguistic states are selected, namely Rare (R), Some (S), and Many
(M). The membership functions for ara are defined in Equations 8 through 10, and
depicted in Figure 3.
1 − 4x x ∈ [0, 0.4]
FRare (x) = (8)
0 x ∈ [0, 0.4]
Collaborative Agents for Complex Problems Solving 375
⎧
⎪
⎨5x − 1 x ∈ [0.2, 0.4]
FSome (x) = 3 − 5x x ∈ (0.4, 0.6] (9)
⎪
⎩
0 x ∈ [0.2, 0.6]
⎧
⎪
⎨5x − 2 x ∈ [0.4, 0.6]
FMany (x) = 1 x ∈ (0.6, 1] (10)
⎪
⎩
0 x ∈ [0.4, 1]
With regard to output membership, the output values can be determined by tracing
the membership values for each rule back through the output membership functions.
Finally, the centroid defuzzification method [8] [17] is used to determine the output
value. In centroid defuzzification, the output value is calculated using Equation 16,
where membership of vi is represented as μ (vi ), and k is the number of fuzzy rules
which are activated.
∑k (vi · μ (vi ))
DF = i=1k (16)
∑i=1 μ (vi )
3.4 Experiments
Agent Searching Time represents the time that a team leader needs to search for
required agents from the agent board to accomplish the tasks. In general, the higher
the Agent Searching Time, the more communication cost the team leader needs to
spend on searching agents.
According to the experimental result, it can be seen that the Agent Search-
ing Time of one-shot team formation is much higher than both long-term and
flexible team formation (See Figure 5). This is because team leaders in one-shot
teams need to keep searching suitable team members for each task and disband
them after a task is accomplished. With long-term and flexible team formation,
the whole team (or part thereof) is retained after a task is completed. Thus these
two latter strategies will have less communication overhead. The experimental re-
sult shows that long-term teams have higher Agent Searching Time than flexible
teams. This is because, in the experiment, a long-term team can at most keep a
limited number of members for a long period. Hence, after a team accomplishes
several tasks, the number of long-term members will reach the limit, and the
team will start to search and disband new members in subsequent tasks. The re-
sult shows that the Agent Searching Time of using flexible team formation is the
lowest, which means it has the lowest communication overhead among the three
mechanisms.
Reward Distribution Situation is the second comparison factor. It represents the
rationality of agent team organisation. Without considering communication over-
head, a one-shot team has an ideal organisational structure because all its team
Collaborative Agents for Complex Problems Solving 379
3.5 Summary
As a social entity, an intelligent agent needs to cooperate with others in most multi-
agent environments. At the same time, unreasonable team-formation mechanisms
could prevent agents from achieving local benefits, or lead to unnecessary system
overhead. Focusing on challenges inherent in dynamic application domains, many
researchers have suggested using long-term or one-shot team-formation mecha-
nisms in MASs. However, both of these mechanisms have some advantages and
disadvantages, as discussed earlier. A flexible team-formation mechanism can avoid
some of the limitations of the one-shot and long-term team-formation mechanisms.
It can enable agents to automatically evaluate the performance of other agents in
the system, and select team members with reasonable terms and costs according
to the evaluation result. In flexible team-formation, factors related to agent perfor-
mance and task requirements are considered as evaluation factors. Through eval-
uating these factors, team compositions are more reasonable and can avoid some
potential benefit conflicts among team members.
stability. For different negotiation protocols, the equilibrium strategy may differ.
However, it is required that each negotiation participant should select an equilib-
rium strategy in the negotiation.
In this subsection, we provide an example of negotiation between two agents.
In our example, the negotiation is performed between two agents, i.e. the ‘buyer’
agent and the ‘seller’ agent. Both agents are bargaining over the price, therefore it
is a single-issue negotiation. In the following, we will show the four components in
our example negotiation and introduce how the negotiation is processed.
The negotiation protocol. We simply adopt the basic alternating offers protocol
[28]. Let b denote the buyer agent, and s the seller agent. The negotiation starts
when the first offer is made by an agent (b or s). The agent who makes the initial
offer is selected randomly at the beginning of the negotiation. When an agent
receives an offer from its opponent, it will evaluate it. According to this evalua-
tion, the agent will take one of the following actions: (i) Accept: when the value
of the offer received from the opponent is equal to or greater than the value of
the counter-offer it is going to send in the next negotiation cycle. Once the agent
accepts this offer, the negotiation ends successfully in an agreement; (ii) Reject:
when the value of the offer received from the opponent is less than the value of
the counter-offer it is going to send in the next negotiation cycle. Once the agent
rejects this offer, providing the negotiation deadline has not been reached, the
agent sends out a counter-offer to its opponent and the negotiation proceeds to
the next cycle; (iii) Quit: when the negotiation deadline falls due and no agree-
ment has been reached, then the agent has to quit and the negotiation fails.
The negotiation strategies. In our example, two agents are bargaining over price,
therefore each agent should have some idea about its acceptability. Let [IPa , RPa ]
denote the range of price values which are acceptable to agent a, where a ∈ {b, s}.
IPa denotes the initial price and RPa the reserve price of agent a. In general, when
a = b, IPb ≤ RPb , and when a = s IPb ≥ RPb . Let â denote agent a’s opponent,
where â ∈ {b, s}. Then the offer made by agent a to agent â at time t (0 ≤ t ≤ τ a ),
where τ a is the deadline for agent a, is modeled as a function Φ a depending on
time as follows:
t IPa + Φ a (t)(RPa − IPa ) a=b
pa→â = (17)
RPa + (1 − Φ a(t))(IPa − RPa) a = s
b in Figure 7 [9]: (i) Conceder: when λ > 1, agent b gives more concession in
the beginning of the negotiation, and less concession closer to the deadline; (ii)
Linear: when λ = 1, agent b gives constant concession throughout the negotia-
tion; and (iii) Boulware: when 0 ≤ λ ≤ 1, agent b gives less concession initially,
and more concession when the deadline is looming.
Finally, agent utility functions at time t are defined as per Equation 19.
a t RPa − pta→â a = b
U (pa→â ) = (19)
pta→â − RPa a = s
U a (t) is the agent a’s evaluation result of its opponent’s offer at negotiation cycle
t; based on this evaluation result, agent a can make a decision about its action.
where t is the time of the next negotiation cycle. Therefore, the equilibrium
strategy employed in this negotiation indicates that the agent will only accept the
offer which can maximize self’s benefit given the time constraint.
384 M. Zhang et al.
Se l f l e s s ne s s
100%
s e lf le s s
80%
60%
e q u it a b le
40%
s e lf is h
20%
R e lia n t D e g r e e
0% 20% 40% 60% 80% 1 0 0 % Se lf is hne s s
where IDi is the unique identification of the ith potential partner, and GainRatioxi,
ContributionRatioxi and ReliantDegreexi are factors used to evaluate the potential
partner IDi to be selected in the negotiation. These three factors are defined in Def-
initions 14 through 16, respectively.
Definition 14. GainRatioxi is the percentage benefit that agent IDx obtains out of
the global benefit upon completion of the task. GainRatioxi can be calculated as
GainRatioxi = LS × 100%, GainRatioxi ∈ [0, 100%], where S denotes the benefit that
agent IDx gains by selecting partner agent IDi as its partner, and L denotes the global
benefit by completing the task.
Definition 15. ContributionRatioxi is the percentage benefit that agent IDi obtains
out of the global benefit upon completion of the task. ContributionRatioxi can be cal-
culated as ContributionRatioxi = LI × 100%, ContributionRatioxi ∈ [0, 100%], where
I denotes the benefit that partner agent IDi gains by cooperating with agent IDx , and
L denotes the global benefit by completing the task.
where CollaborateDegreexi ∈ [0, 1]. This indicates the tendency that agent IDi will
be selected as a partner in subsequent negotiations by agent IDx . The bigger the
CollaborateDegreexi, the higher the likelihood that agent IDi will be selected. The
function Ψ specifies how to evaluate a potential partner. The interested reader can
refer to [31] for a (non-linear) fuzzy approach to Ψ . In this chapter, we only consider
a linear approach to Ψ .
In order to cover all potential cases in partner selection, we need to consider not
only both GainRatio and ContributionRatio, but also the preference of the agent
on these two criteria. It is proposed that the agent’s preference on GainRatio and
ContributionRatio can be represented by a normalized weight. Let wg stand for the
weight on GainRatio, wc stand for the weight on ContributionRatio, and wc + wg =
1. Then the CollaborationDegree between agent IDx and its potential partner IDi is
defined as follows:
The collaborationDegree (∈ [0, 1]) indicates the degree for which the potential
partner should be selected by the agent. The bigger the collaborationDegree, the
more chance that the potential partner will be selected by the agent. In general,
there are three extreme cases on different combinations of wc and wg , namely:
• When wg = 0 and wc = 1, CollaborateDegree is calculated based only on Con-
tributionRatio, i. e. agent IDx ’s attitude to negotiation is fully selfless.
• When wg = 1 and wc = 0, CollaborateDegree is calculated based only on
GainRatio, i. e. agent IDx ’s attitude to negotiation is fully selfish.
• When wg = wc = 0.5, CollaborateDegree is calculated based equally on Gain-
Ratio and ContributionRatio, i. e. agent IDx ’s attitude to negotiation is equitable.
• Besides the above three cases, the restriction of wg + wc = 1 can also reflect agent
IDx ’s attitude to GainRatio and ContributionRatio in other cases.
The weights wg and wc can be calculated by employing the value of ReliantDe-
gree, which are defined by Equation 24 and Equation 25, respectively.
Finally, by combining Equations 23 through 25, the potential partners are evaluated
by considering the factors of GainRatio, ContributionRatio and ReliantDegree. The
collaborationDegree between the agent IDx and its potential partner IDi is:
Collaborative Agents for Complex Problems Solving 387
where CollaborateDegreexi ∈ [0, 1]. Then the collaboration degrees set (Collaborat-
eDegreex ) between the agent IDx and all its potential partners are generated accord-
ing to Equation 27.
Finally, any sorting algorithm can be employed to select favorable partners or ex-
clude unsuitable partners from the collaboration degree set CollaborateDegreex for
the agent IDx .
In this chapter, three examples are demonstrated. In each example, agent g is
going to select the most suitable partner from three potential partners (agents ga , gb
and gc ). These examples will illustrate how the proposed approach selects the most
suitable partner for the agent.
Table 4 Example 1
In Example 1 (Table 4), as the agent IDx performs as a fully selfish agent
(wg = cos2 (0◦ ) = 1 and wc = sin2 (0◦ ) = 0), the potential partner who can offer
the biggest GainRatio will be selected as the most suitable partner. From Table 4,
agent ga should be selected as the most suitable partner because it can contribute
the highest GainRatio to agent IDx among the three potential partners. By using our
proposed Equation 26, agent ga is also chosen as the most suitable partner because
the CollaborateDegree for agent ga is the largest among the three potential partners.
In Example 2 (Table 5), as the agent IDx performs as a fully selfless agent
(wg = cos2 (90◦ ) = 0 and wc = sin2 (90◦ ) = 1), agent gc should be selected as the
most suitable partner because it has the largest ContributionRatio. According to
Equation 26, agent gc is also selected as the most suitable partner because the Col-
laborateDegree for agent gc is the largest among the three potential partners.
Table 5 Example 2
Table 6 Example 3
In Example 3 (Table 6), the agent IDx has different attitudes to its potential part-
ners. For potential partner ga , agent IDx performs as a selfish agent so that only
the GainRatio (80%) will be used to select the most suitable partner. For potential
partner gb , agent IDx performs as an equitable agent so that both GainRatio (80%)
and ContributionRatio (20%) will be used to evaluate whether gb could be chosen
as a suitable partner. Therefore, the final benefit by considering both GainRatio and
ContributionRatio for gb should be between 20% and 80%. For potential partner gc ,
agent IDx performs as a selfless agent so that only the benefit of ContributionRa-
tio (20%) will be used for the selection of gc as a partner. By comparing the three
cases, agent ga should be selected as the most suitable partner because agent IDx
would gain the largest benefit(80%) when collaborating with agent ga . According
to Equation 26, agent ga is also selected as the most suitable partner because the
CollaborateDegree for agent ga is the largest among the three potential partners.
Therefore, from the examples, it can be seen that by considering the factors of
GainRatio, ContributionRatio and ReliantDegree between the agent and its poten-
tial partners, a partner selection mechanism can be generated dynamically to allow
agents to adapt to their individual behaviors in negotiation. The selection result is
also accurate and reasonable.
required to be well trained using the available training data. Therefore, in a way, the
performance of the estimation function is virtually determined by the training result.
In this step, as much data as possible is employed by designers to train a system.
The training data could be synthetic and/or collected from the real world. Usually,
synthetic data is helpful in training a function to enhance its problem solving ability
for some particular issues, while real world data can help the function to improve its
ability in complex problem solving. After the system has been trained, the second
step is to employ the estimation function to predict partner behavior in the future.
However, no matter which and how many data are employed by designers to train
the proposed function, the training data will never be sufficiently comprehensive to
cover all situations in reality. Therefore, even though an estimation function is well
trained, it is also quite possible that some estimation results do not make sense at
all for some kind of agents whose behavior records are not included in the train-
ing data. Currently, as negotiation environments become more open and dynamic,
agents with different kinds of purpose, preference and negotiation strategy can enter
and leave the negotiation dynamically. This Machine Learning-based agent behavior
estimation function may not work well in some more flexible application domains,
for reasons of (i) lack of sufficient data to train the system, and (ii) requiring too
many resources during each training process.
In order to address the aforementioned issues, in this subsection we introduce
a quadratic regression approach for analysis and estimation of partner behaviors
during negotiation. The proposed quadratic regression function is:
u = a × t2 + b × t + c (28)
u = a×x+b×y+c (30)
390 M. Zhang et al.
where both a and b are independent of variables x and y. Let pairs (t0 , uˆ0 ), . . .,
(tn , uˆn ) be instances from each negotiation cycle. The distance (ε ) between the real
utility value (ûi ) and the expected value (ui ) should obey the Gaussian distribution
ε ∼ N(0, σ 2 ), where ε = ûi − a × xi − b × yi − c. Now since each ûi = a × xi + b ×
yi + c + εi , εi ∼ N(0, σ 2 ), ûi is distinctive, and the joint probability density function
for ûi is:
n
1 1
L= ∏ σ √2π exp[− 2σ 2 (ûi − axi − byi − c)2] (31)
i=1
1 1 n
= ( √ )n exp[− 2 ∑ (ûi − axi − byi − c)2 ]
σ 2π 2σ i=1
where L indicates the probability that a particular ûi may occur. Because each ûi
comes from the historical record, we should use their probabilities as L’s maximum
value. Obviously, in order to make L achieve its maximum, ∑ni=1 (ûi − axi − byi − c)2
should achieve its minimum value. Let
n
Q(a, b, c) = ∑ (ûi − axi − byi − c)2 (32)
i=1
di = ûi − ui (40)
It is known that all di (i ∈ [1, n]) obey the Gaussian distribution N(0, σ 2 ). Then σ
can be calculated as follows:
∑ni=1 (di − d)2
σ= (41)
n
where,
1 n
d= ∑ di
n i=1
(42)
Now by employing the Chebyshev inequality, we can calculate (1) the interval of
partners’ behaviors according to any accuracy requirements; and (2) the probability
that any particular behavior may occur in potential partners in the future.
392 M. Zhang et al.
σ2
P(|ûi − ui | ≥ ε ) ≤ (43)
ε2
where ûi is an instance, ui is the mathematical expectation, σ is the deviation and
ε is the accuracy requirement. This function indicates that based on a particular
accuracy requirement ε , the possibility that the real behavior ûi is included in the
interval [ui − σ , ui + σ ] is greater than 1 − σε 2 .
2
∑ik=1 |ûi − ui |
AEi = (44)
i
The AEi indicates the difference between the estimated results and the real value.
The smaller the value of AEi , the better the prediction result.
In the first scenario, a buyer wants to purchase a mouse pad from a seller. The
acceptable price for the buyer is in [$0, $1.4]. The deadline for the buyer to finish
this purchasing process is 11 cycles. In this experiment, the buyer adopts conceder
behavior in the negotiation, and the seller employs the proposed approach to esti-
mate the buyer’s possible price in the next negotiation cycle. The estimated results
are displayed in Figure 9(a) and the regression function is:
It can be seen that in the 8th negotiation cycle, the proposed approach estimates
a price of $1.26 from the buyer in the next cycle. Then according to the historical
record in the 8th cycle, the real price given by the buyer in this cycle is $1.26, which
is exactly same as the estimation price. Furthermore, it can be seen that in cycles 4,
6, 9 and 10, the estimated prices are also the same as the real value. The estimation
prices for the 2nd, 3rd and 7th cycles are $1.05, $1.10 and $1.25, respectively, and
the real prices given by the buyer in these cycles are $1.07, $1.13, and 1.26, which
differ only slightly between the estimated prices and real prices. According to Figure
9(a), all real prices are located in the interval of [μ − 2σ , μ + 2σ ], where μ is the
estimated price and σ is the changing span. The AE10 = 0.015, which is only 1% of
buyer’s reserve price. Therefore, the prediction results by employing the proposed
approach are very reliable.
Collaborative Agents for Complex Problems Solving 393
Fig. 9 Scenario 1
In Figure 9(b), we compare results between the proposed approach and two
other estimation approaches (Tit-For-Tat and random). It can be seen that even
though the Tit-For-Tat approach can follow the trend of changes in the buyer’s
price, AE10 = 0.078 which is five times our proposed approach. For the random
approach, it cannot even catch the main trend. The AE10 for the random approach
is 0.11, which is ten times our proposed approach. The experimental results con-
vince us that the proposed approach outperforms both the Tit-For-Tat and random
approaches when a buyer adopts conceder negotiation behavior.
In the second scenario, a buyer wants to buy a keyboard from a seller. The desired
price for the buyer is in the interval of [$0, $14]. We let the buyer employ the linear
negotiation strategy, and still set the deadline to 11 cycles. The seller will employ
394 M. Zhang et al.
Fig. 10 Scenario 2
our proposed prediction function to estimate the buyer’s offer. The estimated results
are illustrated in Figure 10(a) and the estimated quadratic regression function is:
It can be seen that in the 3rd, 5th and 8th cycles, the estimated prices are exactly
the same as the real offers made by the buyer. The biggest difference between the
estimated price and the real value is just 0.4, which happens in the 9th cycle. The
average error in this experiment is only AE10 = 0.24, which is no more the 2% of
the buyer’s reserve price. The estimated quadratic regression function fits the real
prices very well.
Collaborative Agents for Complex Problems Solving 395
Fig. 11 Scenario 3
Figure 10(b) compares results for the Tit-For-Tat approach, random approach and
our proposed approach. It can be seen that the proposed approach is much closer to
the real price than the other two approaches. The average error for the Tit-For-Tat
approach is AE10 = 2.52, which is 18% of the buyer’s reserve price. The average
error for the random approach is very high – AE10 = 4.82 (34% of the buyer’s re-
serve price). A second experimental result is that when partners perform with linear
behaviors, the proposed approach also outperforms the other two approaches.
In the third scenario, a buyer wants to purchase a monitor from a seller. The
suitable price for the buyer is in [$0, $250]. In this experiment, the buyer employs a
boulware strategy in the negotiation. The deadline is still 11 cycles. The estimated
quadratic function is:
396 M. Zhang et al.
The estimated results are shown in Figure 11(a), it can be seen that the proposed
quadratic regression approach predicted buyer’s prices successfully and accurately.
Except for the 4th and 8th cycles, other estimated prices differ very little from the
buyer’s real offers. The average error in this experiment is AE10 = 4.07, which is
only 1.6% of the buyer’s reserve price. Therefore, we can say with confidence that
from these estimation results, the seller can make accurate judgement about the
buyer’s negotiation strategy, and make reasonable responses in order to maximize
its own benefit.
Finally, Figure 11(b) shows comparison results with two other estimation func-
tions for the same scenario. For the Tit-For-Tat approach, the average error is
AE10 = 57.74, which is 23% of the buyer’s reserve price. For the random approach,
the average error is AE10 = 83.12, which is 33% of the buyer’s reserve price. There-
fore, it can be seen that when the agent performs a boulware behavior, the proposed
approach significantly outperforms the other two approaches.
From these experimental results, we can conclude that the estimated quadratic
function regression approach can successfully estimate partners’ potential behav-
iors. Moreover, the estimation results are accurate and sufficiently reasonable to be
adopted by agents to modify their strategies in negotiation. The comparison results
among the three types of agent behavior estimation also demonstrate the outstanding
performance of our proposed approach.
In this section, we introduced agent negotiation for solving complex problems
between collaborative agents. Firstly, we pointed out that agent competition can
also be involved in collaborative problems. Then we introduced some basic knowl-
edge about agent negotiation for conflict resolution. Furthermore, we introduced a
partner selection approach and agents’ behavior prediction approach for complex
negotiation environments and illustrated some experimental results to show the im-
provements. In conclusion, we can say that agent negotiation is a very significant
mechanism for agents to solve conflicts which may occur during complex problem
solving procedures.
5 Conclusion
Complex problem solving requires diverse expertise and multiple techniques. MAS
is a particularly applicable technology for complex problem solving applications.
In a MAS, agents that possess different expertise and resources collaborate together
to handle problems which are too complex for individual agents. Generally, agent
collaborations in a MAS can be classified into two groups, namely agent coop-
eration and agent competition. These two kinds of collaborations are unavoidable
for most MAS applications, but both present challenges. In addition, two main ap-
proaches for complex problem solving via agent cooperation and agent competition
have been introduced – these being a dynamic team formation mechanism for co-
operative agents, and a partner selection strategy for competitive agents. These two
Collaborative Agents for Complex Problems Solving 397
approaches can be applied to coordinate utility conflicts among agents, and make a
MAS more suitable for open dynamic working environments.
Research into dynamic team formation can be extended in the following two
directions. Currently, team formation research is based on a simple agent organi-
sation. However, in many current MAS applications, more complex organisational
structures, such as congregation [2], are adopted. Building a mechanism to support
complex organisational formation is one research direction for the future. Further-
more, different organisational structures are suitable for different circumstances. In
a complex dynamic working environment, agents may need to choose different or-
ganisational structures due to a changing environment. To develop mechanisms that
enable agents to not only select cooperation partners but also dynamically choose
organisational structures is another avenue for future research.
Further work on agent negotiation can proceed in two directions, as (i) Currently,
most agent negotiation strategies and protocols can only handle the negotiation with
single issue. However, with expansion of application domains, negotiating multiple
issues will become a significant trend. Therefore, research on multi-issue negoti-
ation will become a future direction. (ii) Most negotiation environments currently
mainly focus on the static situations, but fail to take into account where a negoti-
ation environment becomes open and dynamic. In an open and dynamic environ-
ment, agents can perform more flexibly to enhance their benefits. Also an open and
dynamic negotiation environment is much more efficient in handling real world ap-
plications. Therefore, changing the negotiation environment from static to open and
dynamic is another significant research direction on the topic of agent negotiation
for the future.
Another potential direction is to extend our current research to complex domains
in which agents can show semi-competitive behaviours or temporary collaborative
behaviours in different situations.
Acknowledgements. The authors would like to acknowledge the support of the Intelligent
Systems Research Centre at the University of Wollongong.
References
1. Artikis, A., Pitt, J.: A Formal Model of Open Agent Societies. In: Proceedings of the
5th International Conference on Autonomous Agents, Montreal, Canada, pp. 192–193
(2001)
2. Brooks, C., Durfee, E., Armstrong, A.: An Introduction to Congregating in Multia-
gent Systems. In: Proceedings of 4th International Conference on Multiagent Systems,
Boston, USA, pp. 79–86 (2000)
3. Brzostowski, J., Kowalczyk, R.: On Possibilistic Case-Based Reasoning for Selecting
Partners in Multi-agent Negotiation. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS,
vol. 3339, pp. 694–705. Springer, Heidelberg (2004)
4. Candea, C., Hu, H., Iocchi, L., Nardi, D., Piaggio, M.: Coordination in Multi-agent
RoboCup Teams. Robotic and Autonomous Systems 36(2), 67–86 (2001)
5. Castelpietra, C., Iocchi, L., Nardi, D., Piaggio, M., Scalzo, A., Sgorbissa, A.: Coordina-
tion among Heterogeneous Robotic Soccer Players. In: Proceedings of IEEE/RSJ Inter-
national Conference on Intelligent Robots and Systems (IROS), Takamatsu, Japan, pp.
1385–1390 (2000)
398 M. Zhang et al.
6. Chajewska, U., Koller, D., Ormoneit, D.: Learning An Agent’s Utility Function by Ob-
serving Behavior. In: Proceedings of 18th International Conference on Machine Learn-
ing, pp. 35–42. Morgan Kaufmann, San Francisco (2001)
7. Coehoorn, R., Jennings, N.: Learning on Opponent’s Preferences to Make Effective
Multi-issue Negotiation Trade-offs. In: Proceedings of the 6th International Conference
on Electronic Commerce, ICEC 2004, Delft, Netherlands, October 2004, pp. 59–68.
ACM Press, New York (2004)
8. Eberhart, R., Simpson, P., Dobbin, R.: Computational Intelligence PC Tools. AP Profes-
sional Press, Orlando (1996)
9. Faratin, P., Sierra, C., Jennings, N.: Negotiation Decision Functions for Autonomous
Agents. Journal of Robotics and Autonomous Systems 24(3-4), 159–182 (1998)
10. Fatima, S., Wooldridge, M., Jennings, N.: Optimal Agendas for Multi-issue Negotiation.
In: Proceedings of the 2nd International Joint Conference on Autonomous Agents and
Multi-Agent Systems (AAMAS 2003), pp. 129–136. ACM Press, New York (2003)
11. Fatima, S., Wooldridge, M., Jennings, N.: An Agenda-Based Framework for Multi-Issue
Negotiation. Artificial Intelligence 152(1), 1–45 (2004)
12. Fatima, S., Wooldridge, M., Jennings, N.: Optimal Negotiation of Multiple Issues in In-
complete Information Settings. In: Proceedings of the 3rd International Joint Conference
on Autonomous Agents and Multiagent Systems (AAMAS 2004), New York, USA, pp.
1080–1087. IEEE Computer Society, Los Alamitos (2004)
13. Gerkey, B., Mataric, M.: Multi-robot Task Allocation: Analyzing the Complexity and
Optimality of Key Architectures. In: Proceedings of the IEEE International Conference
on Robotics and Automation, Taibei, China, pp. 3862–3868 (2003)
14. Horling, B., Lesser, V.: A Survey of Multi-Agent Organizational Paradigms. The Knowl-
edge Engineering Review 19, 281–316 (2005)
15. Ito, T., Parkes, D.: Instantiating the Contingent Bids Model of Truthful Interdepen-
dent Value Auctions. In: Proceedings of the 5th International Joint Conference on Au-
tonomous Agents and Multiagent Systems (AAMAS 2006), Hakodate, Japan, pp. 1151–
1159 (2006)
16. Jung, H., Tambe, M., Kulkarni, S.: Argumentation as Distributed Constraint Satisfac-
tion: Applications and Results. In: Proceedings of the 5th International Conference on
Autonomous Agents, Montreal, Canada, pp. 324–331. ACM Press, New York (2001)
17. Klir, G., Yuan, B. (eds.): Fuzzy Sets and Fuzzy Logic Theory and Applications. Prentic
Hall, Upper Saddle River (1995)
18. Koes, M., Nourbakhsh, I., Sycara, K.: Heterogeneous Multirobot Coordination with Spa-
tial and Temporal Constraints. In: Proceedings of the 20th National Conference on Arti-
ficial Intelligence (AAAI), June 2005, pp. 1292–1297. AAAI Press, Menlo Park (2005)
19. Kraus, S.: Strategic Negotiation in Multiagent Environments. The MIT Press, Cambridge
(2001)
20. Lesser, V.: Reflections on the Nature of Multi-agent Coordination and Its Implications for
an Agent Architecture. Journal of Autonomous Agents and Multi-Agent Systems 1(1),
89–111 (1998)
21. Lesser, V.: Cooperative Multiagent Systems: A Personal View of the State of the Art.
IEEE Transactions on Knowledge and Data Engineering 11(1), 133–142 (1999)
22. Lesser, V., Horling, B., Klassner, F., Raja, A., Zhang, S.: BIG: An Agent for Resource-
Bounded Information Gathering and Decision Making. Artificial Intelligence Journal,
Special Issue on Internet Information Agents 118(1-2), 197–244 (2000)
23. Li, C., Chawla, S., Rajan, U., Sycara, K.: Mechanisms for Coalition Formation and Cost
Sharing in an Electronic Marketplace. Technical Report CMU-RI-TR-03-10, Robotics
Institute, Carnegie Mellon University, Pittsburgh, PA (April 2003)
24. Milanovic, N., Malek, M.: Current Solutions for Web Service Composition. IEEE Inter-
net Computing Magazine 8(6), 51–59 (2004)
Collaborative Agents for Complex Problems Solving 399
25. Munroe, S., Luck, M., d’Inverno, M.: Motivation-Based Selection of Negotiation Part-
ners. In: 3rd International Joint Conference on Autonomous Agents and Multiagent Sys-
tems AAMAS 2004, pp. 1520–1521. IEEE Computer Society, Los Alamitos (2004)
26. Nash, J.: The Bargaining Problem. Econometrica 18, 155–162 (1950)
27. Oliveira, E., Rocha, A.: Agents Advanced Features for Negotiation in Electronic Com-
merce and Virtual Organisations Formation Process. In: Sierra, C., Dignum, F.P.M. (eds.)
AgentLink 2000. LNCS, vol. 1991, pp. 78–96. Springer, Heidelberg (2001)
28. Osborne, M., Rubenstein, A.: A Course in Game Theory. The MIT Press, Cambridge
(1994)
29. Parsons, S., Sierra, C., Jennings, N.: Agents that Reason and Negotiate by Arguing. Jour-
nal of Logic and Computation 8(3), 261–292 (1998)
30. Rathod, P., desJardins, M.: Stable Team Formation among Self-interested Agents. In:
Proceedings of AAAI Workshop on Forming and Maintaing Coalitions in Adaptive Mul-
tiagent Systems, San Jose, USA, pp. 29–36 (2004)
31. Ren, F., Zhang, M., Bai, Q.: A Fuzzy-Based Approach for Partner Selection in Multi-
Agent Systems. In: Proceedings of the 6th IEEE/ACIS International Conference on Com-
puter and Information Science (ICIS 2007), Melbourne, Australia, pp. 457–462. IEEE
Computer Society, Los Alamitos (2007)
32. Rigolli, M., Brady, M.: Towards a Behavioral Traffic Monitoring System. In: Proceed-
ings of the 4th International Joint Conference on Autonomous Agents and Multiagent
Systems (AAMAS 2005), Utrecht, Netherlands, pp. 449–454 (2005)
33. Rosenschein, J., Zlotkin, G.: Rules of Encounter: Designing Conventions for Automated
Negotiation Among Computers. MIT Press, Cambridge (1994)
34. Shen, J., Zhang, X., Lesser, V.: Degree of Local Cooperation and its Implication on
Global Utility. In: Proceedings of 3rd International Joint Conference on Autonomous
Agents and MultiAgent Systems, New York, USA, July 2004, vol. 2, pp. 546–553 (2004)
35. Sim, K.: A Survey of Bargaining Models for Grid Resource Allocation. ACM SIGE-
COM: E-commerce Exchange 5(5), 22–32 (2005)
36. Tambe, M.: Agent Architectures for Flexible, Practical Teamwork. In: Proceedings of
the 14th National Conference on Artificial Intelligence, Rhode Island, USA, pp. 22–28
(1997)
37. Tambe, M.: Towards Flexible Teamwork. Journal of Artificial Intelligence Research 7,
83–124 (1997)
38. Tambe, M.: Implementing Agent Teams in Dynamic Multi-agent Environments. Applied
Artificial Intelligence 12(2-3), 189–210 (1998)
39. Tsvetovat, M., Sycara, K.: Customer Coalitions in the Electronic Marketplace. In: Pro-
ceedings of the 4th International Conference on Autonomous Agents, Barcelona, Spain,
pp. 263–264. ACM Press, New York (2000)
40. Yamamoto, J., Sycara, K.: A Stable and Efficient Buyer Coalition Formation Scheme
for E-marketplaces. In: Proceedings of the 5th International Conference on Autonomous
Agents, Montreal, Canada, pp. 576–583. ACM Press, New York (2001)
41. Zeng, D., Sycara, K.: Bayesian Learning in Negotiation. International Journal of Human-
Computer Studies 48(1), 125–141 (1998)
42. Zhang, X., Lesser, V., Wagner, T.: Integrative Negotiation In Complex Organizational
Agent Systems. In: Proceedings of the 1st International Joint Conference on Autonomous
Agents & MultiAgent Systems (AAMAS 2002), July 2002, pp. 503–504 (2002)
43. Zhang, X., Lesser, V., Wagner, T.: Integrative Negotiation among Agents Situated in
Organizations. IEEE Transactions on Systems, Man, and Cybernetics, Part C 36(1), 19–
30 (2006)
Part V
Computer Vision
Predicting Trait Impressions of Faces Using
Classifier Ensembles
Abstract. In the experiments presented in this chapter, single classifier systems and
ensembles are trained to detect the social meanings people perceive in facial mor-
phology. Exploring machine models of people’s impressions of faces has value in
the fields of social psychology and human-computer interaction. Our first concern in
designing this study was developing a sound ground truth for this problem domain.
We accomplished this by collecting a large number of faces that exhibited strong
human consensus in a comprehensive set of trait categories. Several single classifier
systems and ensemble systems composed of Levenberg-Marquardt neural networks
using different methods of collaboration were then trained to match the human per-
ception of the faces in the six trait dimensions of intelligence, maturity, warmth,
sociality, dominance, and trustworthiness. Our results show that machine learning
methods employing ensembles are as capable as most individual human beings are
in their ability to predict the social impressions certain faces make on the average
human observer. Single classifier systems did not match human performance as well
as the ensembles did. Included in this chapter is a tutorial, suitable for the novice,
on the single classifier systems and collaborative methods used in the experiments
reported in the study.
1 Introduction
Whenever we meet new people, we immediately form impressions of them. These
impressions come from many sources: the social roles these people play, the
Sheryl Brahnam
Computer Information Systems, Missouri State University, 901 S. National,
Springfield, MO 65804, USA
e-mail: [email protected]
Loris Nanni
DEIS, IEIIT—CNR, Università di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy
e-mail: [email protected]
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 403–439.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
404 S. Brahnam and L. Nanni
quality of their clothes and grooming, their height and weight, and the way they
move, stand, and speak. One of the most important sources informing our initial as-
sessments of people are the impressions we gather from their faces. As Cicero once
remarked “Everything is in the face.” The shape and texture of a face conveys in-
formation about a person’s identity, gender, and age. Short term surface behaviors,
such as eye blinking, direction of gaze, and facial gestures provide clues regard-
ing a person’s emotional and mental state, and the texture and color of the face are
indicators of various health conditions.
Most modern people, however, would deny agreeing with Cicero’s further claim
that the face is inscribed with signs that reveal a person’s inner essence and destiny.
We are taught not to judge a book by its cover. Nonetheless, there is considerable
evidence in recent studies in social psychology that people today are predisposed
to form impressions of a person’s social status, abilities, dispositions, and character
traits based on nothing more than that person’s facial appearance. When shown a
face, not only are most people prepared to judge a person’s gender, age, and emo-
tional state, but also, as the social psychologist Bruce [12] has noted, that person’s
“personality traits, probable employment and possible fate” (p. 31). Furthermore,
the evidence indicates that people are remarkably consistent and similar, across cul-
tures and age groups, in their evaluations and reactions to faces [87].
Several theories have been advanced to explain why certain facial characteris-
tics consistently elicit specific personality impressions. One major theory is that
facial appearance is important because it guides people’s behavior towards others
that ensures the greatest chance of survival [56, 89]. Recognizing an angry face, for
example, triggers lifesaving fight/flight responses. It is theorized that faces that are
similar in structure to angry faces elicit similar responses. As Zebrowitz [87] ex-
plains, “We could not function well in this world if we were unable to differentiate
men from women, friends from strangers, the angered from the happy, the healthy
from the unfit, or children from adults. For this reason, the tendency to respond to
the facial qualities that reveal these attributes may be so strong that it overgeneral-
izes to people whose faces merely resemble those who actually have the attribute”
(pp. 14–15).
The most significant overgeneralization effects involve facial attractiveness, ma-
turity, gender, and emotion. The morphological characteristics of these overgener-
alization effects are associated with specific clusters of personality traits. Attractive
people, for example, are associated with positive character traits. They are con-
sidered more socially competent, potent, healthy, intellectually capable, and moral
than those less attractive. Attractive people are also perceived as being psychologi-
cally more adapted [47]. Facial abnormalities and unattractiveness, in contrast, elicit
negative responses and are associated with negative traits [47]. Unattractive people
are considered less socially competent and willing to cooperate [62]. They are also
considered more dishonest, unintelligent, and psychologically unstable and antiso-
cial. Unattractive people are often ignored and, if facially disfigured, avoided [16].
The unattractive are also more likely to be objects of aggression [3] and to suffer
abuse [47].
Predicting Trait Impressions of Faces Using Classifier Ensembles 405
Adults whose faces resemble those of babies (large eyes, high eyebrows, red lips
that are proportionally larger, small nose and chin) are often treated positively as
well [87], but they are also attributed childlike characteristics. Babyfaced people are
perceived to be more submissive, naı̈ve, honest, kindhearted, weaker, and warmer
than others. They are also perceived as being more helpful, caring, and in need of
protection [8]. Mature-faced individuals, in contrast, are more likely to command
respect and to be perceived as experts [87].
The overgeneralization effect of gender is strongly correlated with facial maturity
[87]. Female faces, more than male faces, tend to retain into adulthood the morpho-
logical characteristics of youth [28] and are more likely to be ascribed characteristics
associated with babyfacedness: female faces are thought to be more submissive, car-
ing, and in need of protection. Similarly, male faces, tending to be morphologically
more mature, are perceived as having the psychological characteristics typically as-
sociated with mature-faced individuals: male faces are thought to be more dominant,
intelligent, and capable.
While many social psychologists believe that facial impressions of character are
related in part to the morphological configurations that characterize emotional dis-
plays, the overgeneralization effect of emotion has not received as much attention
as some of the other overgeneralization effects. Nonetheless, there is evidence sup-
porting the idea that morphological configurations suggestive of emotional expres-
sions play a role in the formation of trait impressions. Take smiling for instance.
People react positively to smiling faces and find them disarming and thus not very
dominant [39]. A person whose lips naturally curl upward would thus be perceived
more positively than a person whose lips tend to turn downward, as illustrated in
Figure 1.
In the study reported in this chapter, we experiment with face recognition systems
made of single classifiers and ensembles to see whether these techniques are capable
Fig. 1 Illustration of the overgeneralization effect of emotion. A face with lips that natu-
rally turn upwards (left) is perceived similarly to smiling faces, that is, as low in dominance,
whereas a face with lips that turn downwards (right) is perceived as more threatening. In the
two images, only the lips differ. These faces were generated using FACES by InterQuest and
Micro-Intel
406 S. Brahnam and L. Nanni
of predicting the social meanings different facial morphologies produce. Face recog-
nition involves the classification of facial morphology mostly in terms of identity [1].
Application areas where face recognition has been successfully employed include
biometrics, information security, law enforcement, surveillance, and access control.
Since gender, age, and race are closely tied to identification, these facial character-
istics have also been the focus of many investigations [13, 49, 61, 64, 65, 85].
Our application differs significantly from most face classification tasks in that
the ground truth is not based on factual information about the subjects but rather on
the impressions faces make on the average observer. Few classification experiments
have focused on the average observer’s impressions of faces. One of the first was
[9]. In that study, holistic face classifiers, based on principle component analysis
(PCA), were trained to match the human classification of faces along the bipolar
rating extremes of the following trait dimensions: adjustment, dominance, warmth,
sociality, and trustworthiness. Although results were marginally better than chance
in matching the perception of dominance (64%), classification rates were signif-
icantly better than chance for adjustment (71%), sociality (70%), trustworthiness
(81%) and warmth (89%). The dataset used in that study, however, may have posi-
tively influenced classification rates. The faces were randomly generated using the
full database of photographs of facial features (eyes, mouths, noses, and so forth)
found in the popular composite software program FACES [30], produced by In-
terQuest and Micro-Intel. Although it was possible to generate a fairly large number
of unique faces by randomly manipulating individual facial features in the FACES
database, subject ratings of the faces produced small classes (less than 15 faces each)
that were strongly associated with the bipolar extremes for each trait dimension. We
believe that the low number of faces in each trait class inflated recognition rates.
Related to our work are face recognition experiments that have recently been per-
formed to detect facial attractiveness. In [38], for instance, faces are compared to an
archetypical mask with good recognition rates. Two sets of photographs of 92 Cau-
casian females, approximately 18 years old, were rated by subjects using a 7-point
scale. In their experiments, they classified faces into two classes, attractive (high-
est 25% rated images) and unattractive (lowest 25% rated images). Performance,
based on percentage of correctly classified images, averaged 76.0% using K-nearest
neighbor and 70.5% using linear Support Vector Machines. The authors believed
that the low number of faces in their two classes of attractive and unattractive faces
(numbering approximately 24 images each) accounted for the poor performance.
As illustrated in both these studies, a major problem with modeling human attri-
butions of faces, whether along a number of trait dimensions or in terms of attrac-
tiveness, is developing a sufficiently large database of representative faces. One of
the contributions of the study reported in this chapter and discussed in Section 3 is
the production of six databases of faces that were found to be strongly associated
with the bipolar extremes of the following six trait dimensions: intelligence, matu-
rity, dominance, sociality, trustworthiness, and warmth. To create these databases,
480 stimulus faces were artificially constructed by four artists using FACES. The
artists were instructed to produce faces they felt would be perceived as either high
Predicting Trait Impressions of Faces Using Classifier Ensembles 407
or low in the six trait dimensions. Using this method, we were able to produce trait
classes that averaged 111 faces.
A variety of approaches are available to deal with such complex pattern recogni-
tion problems as face recognition. Typically the goal is to find the best single classi-
fier for the task. In most application areas, holistic algorithms are preferred because
they are easy to implement and have been shown to outperform other methods [48].
In addition, they allow the classifier to determine the best set of features available in
the raw pixel data of the images. Holistic algorithms, however, suffer from what is
commonly referred to as the curse of dimensionality [6]. The pixel values of image
files contain a very small number relevant to the classification problem. Processing
large numbers of insignificant values greatly increases computational complexity
and degrades performance. For this reason, most holistic classification systems ap-
ply various methods to reduce dimensionality. In Section 2, we describe several
holistic face recognition algorithms as well as the general principles behind feature
reduction and extraction.
A well established technique for improving single classifier performance is based
on building ensembles from classifiers that perform less optimally as a whole but
which nonetheless contribute some essential information. Ensembles have been
shown to achieve higher classification rates than single classifier systems [37, 40].
As described in some detail in Section 2, the diversity of classifier evaluations, aug-
mented by such collaborative methods as boosting and bagging [10] and the avail-
ability of combination rules [23] are major reasons for improved performance. The
superiority of collaborative methods is borne out by the results of our study reported
in Section 4. Ensembles composed of 100 Levenberg-Marquardt neural networks
(LMNNs) using different methods of collaboration proved to be as capable as most
individual human beings are in their ability to predict the social impressions cer-
tain faces make on the average human observer. Single classifier systems, including
systems using a single LMNN, did not match human performance as well as the
ensembles did.
We believe that exploring machine models of people’s impressions of faces has
value in several fields, most notably in social psychology and human-computer in-
teraction. In psychology, building models of human perception could expand our
understanding of the specific characteristics of faces that impact human social im-
pressions. Depending on the types of classifier systems used (ensembles of neural
networks for instance), it may be possible to build representative models of the
brain’s processing of faces for impression formation.
A major area of research in human-computer interaction (HCI) involves build-
ing socially intelligent interfaces. A human-like interface capable of matching the
average observer’s impressions of a user’s face could use this information to deter-
mine an initial interaction strategy that is socially adept. The interface could assume
that users have been treated in ways that reflect the trait impressions of their faces.
People that look dominant, for instance, probably feel comfortable with an initial re-
action that shows some deference, whereas commenting on a person’s intelligence
might best be avoided altogether if the user appears unintelligent. A system that
predicts how others view people could also help the human-like interface talk more
408 S. Brahnam and L. Nanni
intelligently with users about other people, such as celebrities. Other application ar-
eas of a trait prediction system could include the development of smart mirrors that
inform people how other’s might view them. This could assist people in composing
their faces for various social settings, such as job interviews. It may also be possi-
ble to extend social perception systems of faces to include other aspects of human
appearance and behavior.
The remainder of this chapter is organized as follows. In Section 2, we pro-
vide a tutorial suitable for the novice on such basic classifiers in computer vision
as PCA and nearest neighbor (NN), support vector machines (SVMs) [84], en-
hanced subspace methods (SUB) [66], and Levenberg-Marquardt neural networks
(LMNN) [24]. We then define methods for creating classifier ensembles using mul-
tiple LMNNs for enhanced performance. In particular, we describe such collabora-
tive methods as bagging (BA), random subspace (RS), and class switching (CW).
In Section 3, we present the study design and our method of generating the stimulus
faces and evaluating subject ratings of the faces. In Section 4, we present our clas-
sifier systems and compare the performance of some simple classifiers (PCA+NN,
SUB, SVM, and LMNN) to classifier ensembles (100 LMNN selected using RS,
BA and CW). Ensembles of LMNN perform only slightly better than the best single
classifier (SVM), but the ensembles are more stable in performance across all six
trait dimensions and, as already noted, more closely approximate the performance
of individual human raters. We conclude this chapter in Section 5 by noting some
of the contributions and weaknesses of the study reported in this chapter. We also
mention our current work developing a larger database of photographs of people
that strongly produce specific trait impressions.
Although Section 2 provides a fairly comprehensive tutorial of the single classi-
fier systems and collaborative methods used in building our ensembles, we recom-
mend reading Kuncheva’s book [44] for additional details on developing ensembles
using a larger variety of collaborative methods. For general books on machine learn-
ing, pattern recognition and classification, Alpaydin [4], Duda, Hart and Stork [24],
and Russell and Norvig [73] provide particularly good complementary overviews.
In the classification stage, the extracted feature vectors are divided into training
and testing sets. Classifiers, such as the nearest neighbor (NN) [21], artificial neural
networks (ANNs), support vector machines (SVMs) [84] and Oja’s subspace (SUB)
[66] are then given the task of learning to map the training vectors to a set of labels,
or classes.
In practice, a simple 1-NN classifier is used to benchmark the performance of
other classifiers, since it performs well in most applications and involves few user-
controlled parameters. NN is usually employed in all techniques that adopt dimen-
sionality reduction (e.g., PCA). Classification in this case involves projecting an
unknown face vector onto the transformed face space and measuring its distance
from representative classes within the same space. PCA has successfully been used
to classify faces according to identity [75, 78, 79], gender [35, 65, 80], age [82],
race [52, 64], and facial expression [20, 55, 69].
ANNs are modeled after biological neurons. They are massively parallel com-
puting systems composed of simple processors that are highly interconnected. At its
simplest, a neuron, or node, in the system is given an input, i, that is transformed by
a weight, w. Node output is then dependent on a transfer function. The central idea
of an ANN is to adjust architecture parameters in the training process in such a way
that the network learns to correctly map the training vectors to their assigned classes.
Learning algorithms (based mostly on a form of gradient descent) search through
the problem space to find a function that solves this problem with the lowest cost.
ANNs have proven capable of handling a variety of face recognition tasks: gen-
der classification [26, 31, 59], face identification, [19] face detection [72], and facial
expression recognition [69]. Kohonen [41] was one of the first to use a linear au-
toassociative neural network to store and recall face images. Autoassociative neural
networks associate input patterns with themselves [81]. It is interesting to note that
a linear autoassociative neural network is equivalent to PCA [67]. Early surveys of
neural network face classification techniques can be found in [18] and [82].
SVMs are powerful binary classifiers. They determine a decision boundary in the
feature space by constructing the optimal separating hyperplane that distinguishes
the classes. Using SVMs to classify faces is a recent development [33, 60, 70],
yet SVMs have already established a proven track record [18, 70, 92]. They typi-
cally outperform PCA [32, 33, 60] and provide performance comparable to neural
networks.
Once classifiers have been trained, the measure of classifier performance is the er-
ror rate produced when presented the testing set of samples. In the post-processing
stage, various correction techniques are implemented and scores are normalized.
The percentage of misclassified test samples provides an estimate of the error rate.
For this estimate to be reliable, the training and testing sets should be sufficiently
large. In practice this is often not the case, and a small training set results in classi-
fiers that do not generalize well. If the test set is small, then the confidence levels are
also low. Cross-validation approaches, e.g., the leave-one-out and rotation schemes,
strengthen classifier generalizability and confidence levels. However, there are lim-
itations to these approaches. A bootstrap method proposed by [27] has been shown
Predicting Trait Impressions of Faces Using Classifier Ensembles 411
N 11 N 00 − N 01 N 10
Qi, j = (1)
N 11 N 00 + N 01 N 10
where N ab is the number of instances in the test set, classified correctly (a=1) or
incorrectly (a=0) by the classifier Di , and correctly (b=1) or incorrectly (b=0) by
the classifier D j . Q varies between -1 and 1, and it is 0 for statistically independent
classifiers. Classifiers that tend to recognize the same patterns correctly will have
Q > 0, and those which commit errors on different patterns will have Q < 0.
Readers wanting a more comprehensive survey of face classification techniques
should refer to [92] for single classifier systems and [37] for more recent advances
using classifier ensembles. The remainder of this section provides a basic tutorial
on the classification algorithms used in the experiments presented in this chapter.
In Section 2.1, we describe PCA (as a transform and classification method using
NN), SVM, Oja’s subspace (SUB), and the Levenberg-Marquardt neural network
(LMNN) algorithm for computing gradient descent in feedforward neural networks.
This is followed by a description in Section 2.2 of the methods we used to build
classifier ensembles, viz., bagging (BA), class switching (CW), and random sub-
space (SUB). We conclude the tutorial in Section 2.3 by listing software resources
that implement the algorithms discussed in this section and by listing some excellent
general books on machine learning and pattern recognition and classification.
Predicting Trait Impressions of Faces Using Classifier Ensembles 413
The central idea behind PCA is to find an orthonormal set of axes pointing in the
direction of maximum covariance in the data. In terms of facial images, the idea is
to find the orthonormal basis vectors, or the eigenvectors, of the covariance matrix
of a set of images, with each image treated as a single point in a high dimensional
space. It is assumed that facial images form a connected subregion in the image
space. The eigenvectors map the most significant variations between faces and are
preferred over other correlation techniques that assume every pixel in an image is
of equal importance, (see, for instance, [42]). Since each image contributes to each
of the eigenvectors, the eigenvectors resemble ghostlike faces when displayed. For
this reason, they are often referred to in the literature as holons [19] or eigenfaces
[78], and the new coordinate system is referred to as the face space [78]. Exam-
ples of eigenfaces are shown in Figure 4. Individual images can be projected onto
the face space and represented exactly as weighted combinations of the eigenface
components, as illustrated in Figure 5.
The resulting vector of weights that describes each face can be used both in face
classification and in data compression. Classification is performed by projecting a
Fig. 4 The first ten eigenfaces of the 480 stimulus faces generated for our experiments, with
the eigenfaces ordered by magnitude of the corresponding eigenvalue
414 S. Brahnam and L. Nanni
Fig. 5 Illustration of the linear combination of eigenfaces. The face to the left can be repre-
sented as a weighted combination of eigenfaces plus ψ the average face (see Equation 3)
new image onto the face space and comparing the resulting weight vector to those
of a given class [78]. Compression is achieved by reconstructing images using only
those few eigenfaces that account for the most variability [74]. PCA classification
and compression are discussed in more detail below.
PCA Classification
The principal components of a set of images can be derived directly as follows. Let
I(x, y) be a two-dimensional array of intensity values of size NxN. I(x, y) may also
be represented as a single point, a one-dimensional vector Γ of size N 2 . Let the set
of face images be Γ 1 , Γ 2 , Γ 3 , . . . Γ M , and let
Φ k = Γ k −Ψ (2)
represent the mean normalized column vector for a given face Γ , where
1 M
Ψ= ∑Γk
M k=1
(3)
is the average face of the set. PCA seeks the set of M orthonormal vectors, uk ,
and their associated eigenvalues, λk , which best describes the distribution of the
image points. The vectors uk and scalars λk are the eigenvectors and eigenvalues,
respectively, of the covariance matrix:
1 M
C= ∑ Φ k Φ Tk = AAT
M k=1
(4)
PCA is closely associated with the singular value decomposition (SVD) of a data
matrix. SVD can be defined as follows:
Φ = USVT (5)
where S is a diagonal matrix whose diagonal elements are the singular values, or
eigenvalues, of Φ , and U and V are unary matrices. The columns of U are the
eigenvectors of Φ Φ T and are referred to as eigenfaces. The columns of V are the
eigenvectors Φ Φ T and are not used in this analysis.
Faces can be classified by projecting a new face Γ onto the face space as follows:
ωk = uTk (Γ k − Ψ ) (6)
PCA Representation
Since the eigenfaces are ordered, with each one accounting for a different amount
of variation among the faces, images can be reconstructed using only those few
eigenfaces, M << M in Equation 4, that account for the most variability [74]. As
noted above, PCA results in a dramatic reduction of dimensionality and maps the
most significant variations in a dataset. For this reason, it is often used to transform
and reduce features when performing other classification procedures. In the experi-
ments reported in this chapter, we retain those components that account for 0.98 of
the variance.
Outline PCA
The basic steps necessary to perform PCA training and testing using face images
are outlined in Table 1. These steps follow from the presentation given above.
Support Vector Machines (SVMs), introduced in [84], belong to the class of max-
imum margin classifiers. They perform pattern recognition between two classes by
finding a decision surface that has maximum distance to the closest points in the
training set. The data points that define the maximum margin are called support
vectors.
416 S. Brahnam and L. Nanni
SVMs are designed to solve two-class problems. SVMs produce the pattern
classifier 1) by applying a variety of kernel functions (linear, polynomial, radial
basis function, and so on) as the possible sets of approximating functions, 2)
by optimizing the dual quadratic programming problem, and 3) by using struc-
tural risk minimization as the inductive principle, as opposed to classical sta-
tistical algorithms that maximize the absolute value of an error or of an error
squared.
Different types of SVM classifiers are used depending upon the type of input pat-
terns: a linear maximal margin classifier is used for linearly separable data, a linear
soft margin classifier is used for linearly nonseparable, or overlapping, classes, and
a nonlinear classifier is used for classes that are overlapped as well as separated by
nonlinear hyperplanes. Each of these cases is outlined below. Readers interested in
using SVM should consult [22].
Outline of SVM
where α and b are the solutions of a quadratic programming problem. Unknown test
data xt can be classified by simply computing Equation 8.
Examining Equation 8, it can be seen that the hyperplane is determined by all the
training data, xi that have the corresponding attributes of αi > 0. We call this kind
of training data support vectors. Thus, the optimal separating hyperplane is not de-
termined by the training data per se but rather by the support vectors.
The objective in this case is to separate the two classes of training data with a min-
imal number of errors. To accomplish this, some non-negative slack variables, ξi,i =
1, 2, . . . k, are introduced into the system. The penalty, or regularization parameter,
C, is also introduced to control the cost of errors. The computation of the linear soft
margin classifier is the same as the linear maximal margin classifier. Thus, we can
obtain OSH using Equations 7 and 8.
Nonlinear classifier
In this case, kernel functions such as the polynomial or RBF are used to transform
the input space to a feature space of higher dimensionality. In the feature space,
a linear separating hyperplane is sought that separates the input vectors into two
classes. In this case, the hyperplane and decision rule for the nonlinear training
pattern is Equation 9.
K
f (x) = sign( ∑ αi yi K(xt , x) + b) (9)
i=1
Where, αi and b are the solutions of a quadratic programming problem and K(xt, x)
is a kernel function.
Outline of SUB
The algorithm for the creation of the subspace related to each class is divided into
two phases:
1. Normalization: all the objects in each class are normalized such that their Eu-
clidian distance to the origin is one. This normalization is useful for employing
the norm of the projection of a pattern on a subspace as similarity measure;
2. Subspace generation: for each class, a PCA subspace is calculated (see the PCA
section for details).
The algorithm for the classification of each test image is divided in two phases:
1. Projection of the pattern: a map between the original space and the reduced
eigenspace is performed by means of the operator of projection (see the PCA
section for details);
2. Distance calculation: the norm of the projection of a pattern on each subspace is
used as similarity measure between the input vector and the class related to the
subspace. The input vector is then classified according to the maximal similar-
2
ity value argmax j=1,...,s [log(yi 2 ) − log(1 − y j )], where s is the number of
classes and y j is the vector x projected onto the space of the class j.
The LMNN algorithm was first presented in [50]. Marquardt [53] rediscovered the
algorithm approximately twenty years later. It is now a widely used optimization
algorithm, solving the problem of nonlinear least squares minimization using Gauss-
Newton’s iteration in combination with gradient descent. It is considered one of the
fastest methods for training moderate sized feedforward neural networks. For a more
comprehensive tutorial see [63].
Outline of LMNN
Gradient descent is a simple method for finding the minima in a function but suffers
from a number of convergence problems. When the gradient is small, intuitively it
Predicting Trait Impressions of Faces Using Classifier Ensembles 419
makes sense to take large steps down the gradient. Conversely, when the gradient is
large, it would be logical to take smaller steps. The exact opposite occurs in gradient
descent, which updates a parameter at each step by adding a negative of the scaled
gradient:
xi+1 = xi − λ ∇ f (10)
By using the second derivative and by expanding the gradient of f using a Taylor
series, Equation 10 can be improved as follows:
∇ f (x) = ∇ f (x0 ) + (x − x0)T ∇ f (x0 )+higher order terms of(x − x0) (11)
Ignoring the higher order terms by assuming f to be quadratic around x0 and solving
for the minimum of x by setting the lefthand side of Equation 10 to 0, we obtain
Newton’s method:
xi+1 = xi − (∇2 f (xi ))−1 ∇ f (xi ) (12)
where x0 is replaced by xi and x by xi+1 .
H = JT J (13)
g = JT e (14)
where J is the Jacobian matrix that contains the first derivatives of the network
errors with respect to the weights and biases, and e is the vector of network errors.
Levenberg proposed an algorithm that combines the above equations:
By using singular value decomposition (SVD) and other techniques to compute the
inverse, the cost of the updates is far less than the costs involved in computing the
gradient descent for parameters numbering only in the hundreds, but eventually the
costs become prohibitive when weight size increases into the thousands.
The term bagging was first introduced in [10] as an acronym of Bootstrap AGGre-
gatING. The idea is to generate random bootstrap training subsets, with replace-
ment, from the master training set. Base classifiers are then trained on each subset
and the results combined using a majority vote rule. That is, the combiner evaluates
the testing samples by querying each of the base classifiers on the sample and then
outputting the majority opinion.
Breiman’s [10] two main arguments for Bagging effectiveness are the following:
1) running many trials on uniform samples of a population results in less variant
statistical results, and 2) the majority opinion reduces noise-induced errors made by
a small minority of the base classifiers.
Key to the success of using this algorithm is the selection of appropriate base
classifiers. To guarantee diversity, the classifiers should be unstable, i.e., small vari-
ations in the training set should produce large changes on the classifiers, otherwise
the ensemble will not outperform the individual classifiers. Typical unstable classi-
fiers are neural networks, decision trees, regression trees, and linear regression.
Outline of Bagging
Build the final decision rule by combining the results of the classifiers. Several de-
cision rule can be used to combine the classifier, e.g., majority voting, the sum rule,
and the max rule [40].
Class Switching is an ensemble method based on the creation of new training sets
obtained by changing the class labels of the training patterns. The class label of each
example is switched according to a probability function that depends on an overall
switching rate. Thus, in each new training set, the label of a fixed fraction, p, of the
training patterns of the master training set is selected at random for switching.
2.3 Resources
Because of its excellent visualization tools and platform independence, MATLAB
[76] by MathWorks is commonly used for experimenting with face classification
algorithms. Numerous toolboxes have been developed that provide MATLAB users
with routines for handling images and statistical pattern recognition tasks. Math-
Works, for instance, offers an excellent image processing toolbox. MathWorks also
produces a neural network toolbox for designing and visualizing neural network al-
gorithms, with built-in support for many common neural network algorithms, such
as LMNN.
An excellent MATLAB toolbox for experimenting with statistical pattern recog-
nition is PRTools [83]. It is free for academic research. PRTools provides over 200
routines, including PCA, LMNN, Oja’s subspace, and SVM. The SVM implemen-
tation in PRTools, however, is limited. For a more comprehensive package, the OSU
SVM MATLAB toolbox developed by Ohio State University is an excellent choice.
It is available at http://sourceforge.net. Links to additional software and resources
are available at http://www.face-rec.org.
3 Study Design
In the experiments presented in this chapter, single classifiers and classifier ensem-
bles are trained to detect the social meanings people perceive in facial morphology.
Our first concern in designing our study was developing a sound ground truth for
this problem domain. Our goal was to collect a set of faces that exhibit strong hu-
man consensus in a comprehensive set of trait categories. The traits selected for this
study are a modification of Rosenberg’s [25, 29, 71] meta-analysis of a broad range
of categories and include the following: dominance, intelligence, maturity, sociality,
trustworthiness, and warmth.
As noted in the introduction, our task is unusual in that our ground truth is not
based on factual information about the subjects’ faces. In most face recognition and
classification applications, faces are associated with classes that are derived either
from the subject’s self-report (age, gender, and emotional state) or from different
views, spatial as well as temporal, of the same person. The division of faces into
relevant classes poses few problems, as the classes are clearly definable. In the clas-
sification task of matching human impressions of faces, however, the division of
faces into relevant trait classes is not a straightforward process. It is complicated
by the fact that most faces fail to elicit strong opinions in any given trait dimension
and by the fact that human beings, while consistent in their ratings, are not in total
agreement.
Figure 6 outlines the steps we took to develop our database of faces. In Section
3.1, we describe the process we used to generate 480 stimulus faces. We also discuss
some of the issues involved in selecting different facial representations (e.g., artifi-
cially constructed faces, 2-D photographs, and 3-D scans) and justify our decision
to artificially construct faces from the popular composite program FACES [30]. In
Predicting Trait Impressions of Faces Using Classifier Ensembles 423
Fig. 6 Steps taken to generate a database of representative faces for each trait dimension
(T1−6 ) with two classes, high (CH ) and low (CL )
Section 3.2, we describe our experimental design for assessing subject ratings of the
stimulus faces along the six trait dimensions using a 3-point scale. In Section 3.3,
we describe the process we used in steps 3 and 4 for determining face membership
into the two trait classes of high and low for each trait dimension. It should be noted
here that a stimulus face can belong to more than one trait database. Faces can be
rated as both submissive and trustworthy, for instance. However, within any trait
database, a stimulus face can only belong to one of the two trait classes. A face, for
example, cannot be both high and low in perceived dominance.
thought that using photographs of actual faces would address this potential prob-
lem. However, photographs are two-dimensional representations, and it could be
argued that people form impressions of faces in situ based on multidimensional
views. Three dimensional scans of actual faces also present representational dilem-
mas. How faces are seen in space for instance could affect viewer ratings. Will the
viewer control how the scans are viewed or will the scanned faces move on their
own? Even judging films of faces is problematical, as the perspective of the camera
is typically artificial and stationary.
In psychological studies, large databases of faces are not required. It is not un-
common for subjects to evaluate fewer than twenty faces. In building classification
systems, larger numbers of samples are necessary. Furthermore, the faces need to
strongly represent the various trait classes. Most faces are not extreme in their attri-
butions, so finding large numbers of faces that distinctly represent various traits is
difficult.
To generate as many representative faces as possible, we decided that it would
be best to construct faces artificially. We asked four college art students (all female
seniors) to generate 120 faces (60 female and 60 male) using the program FACES.
This produced a total of 480 stimulus faces. The artists were given three months to
complete the task and were asked to generate even groups of faces (5 male and 5
female) that they thought would be perceived by others as intelligent, unintelligent,
mature, immature, warm, cold, social, unsocial, dominant, submissive, trustworthy,
and untrustworthy. The artists were given the same definitions of these terms as
were later given the subjects who rated the stimulus faces using these same trait
descriptors (see Appendix for these definitions). Thus, we hoped to obtain at least 40
faces in each trait class that would be verified by subjects to produce the impressions
intended.
The artists were also instructed to use as many different facial features in the
program’s database as possible, with the caveat that they make the emotional ex-
pressions of the faces as neutral as possible. The program FACES contains a fairly
large set of individual features: 512 eyes, 541 noses, 570 lips, 423 jaws, 480 eye-
brows, and 63 foreheads, to list some of the more important features, so the faces
were generally unique in appearance, as illustrated in Figure 7.
The majority of students were white (69), followed by African (5), Asian (3), His-
panic (2) and other (1).
Dependent Measures. Each subject judged a full set of 120 faces created by one
of the four artists along the six trait dimensions using a forced 3-point scale with
associated descriptors. Thus, each image was judged by 20 subjects. The 1 and
3 values were given the bipolar trait descriptors of dominant/submissive, intel-
ligent/unintelligent, mature/immature, social/unsocial, trustworthy/untrustworthy,
and warm/cold, and their positions were randomized for each trait dimension and
for each image. Neutral was always located at the middle 2 value.
Subjects were given access to the trait definitions and, in some cases, behav-
ioral potential questions modeled after Berry and Brownlow [7] and Zebrowitz and
Motepare [88]. The Appendix provides the description of the traits and the behav-
ioral potential questions that were given to both the artists and the subjects.
Apparatus. The program that administered the survey was located on a campus
server and was made available to the participants in the computer labs located across
campus. Due to the large number of faces each subject was asked to rate, they were
given one week to complete the task.
Table 2 Rater Means and Standard Deviations Per Trait of the 460 Stimulus Faces
Fig. 7 Samples from the two classes (high and low) of warmth. The top row faces were
rated significantly higher in warmth. The bottom row faces were rated significantly lower in
warmth
Table 3 Number of Images in the Two Classes (High and Low) of Each Trait Dimension
2. That the mean rating was less than 1.6 for low membership and greater than 2.4
for high membership;
3. That the mode matched the class (1 for low and 3 for high)
To illustrate class membership, in Figure 7, a sample of faces that fell into the high
and low classes of warmth are presented.
Table 3 lists the total number of images that fell into the two classes for each
trait dimension. The average number of images in each class is 111 (minimum 39
and maximum 151). Except for low intelligence (39), the number of images that fell
into each class greatly exceeded our minimum expectation of 40 faces. Since trait
dimensions are correlated (recall in the introduction how morphological features
Predicting Trait Impressions of Faces Using Classifier Ensembles 427
of the overgeneralization effects were associated with clusters of traits), the total
number of images is greater than 460 because many images produced significant
trait impressions in more than one dimension.
4 Classification Experiments
In this section we describe our classification experiments using the six trait
databases. In Section 4.1 we describe the basic system architecture, and in Section
4.2 we present our experimental results.
Fig. 9 Comparison of single classifier performance (average AROC obtained in 20 runs) for
the six trait dimensions
Fig. 10 Comparison of single classifier performance averaged across all six trait dimensions
Predicting Trait Impressions of Faces Using Classifier Ensembles 429
Fig. 11 Comparison of ensemble performance (average AROC obtained in 20 runs) for the
six trait dimensions
Fig. 12 Comparison of ensemble classifier performance averaged across all six trait
dimensions
430 S. Brahnam and L. Nanni
Fig. 13 Comparison of ensemble and single classifier performance averaged across all six
trait dimensions
4.2 Results
The performance indicator adopted in this work is the area under the Receiver Op-
erating Characteristic curve (AROC) [51]. As explained in Section 2, AROC is a
two-dimensional measure of classification performance that plots the probability of
Predicting Trait Impressions of Faces Using Classifier Ensembles 431
Fig. 14 Comparison of best single (PCA+SVM) and ensemble (PCA+RS) performance (av-
erage AROC obtained in 20 runs) with subject (Raters) performance (average AROC) for the
6 trait dimensions
classifying the genuine examples correctly against the rate of incorrectly classifying
impostor examples.
In Figure 9, we compare the performance of single classifiers on each of the six
trait dimensions. SVM and LMNN greatly outperformed NN and SUB on the trait
dimensions of warmth, sociality, dominance, and trustworthiness. Looking at Ta-
ble 4 and Figure 10, we find that the average performance of SVM and LMNN are
very close, with both SVM and LMNN performing best across all six trait dimen-
sions. However, referring back to Figure 9, we see that SVM and LMNN performed
relatively poorly, compared with SUB and NN, on the trait dimension of maturity.
None of the single classifiers in our experiments was able to perform well on all six
dimensions.
In Figures 11 and 12, we compare the average AROC of the ensemble experi-
ments using the six databases. As seen in Table 4, RS, BA, and CW perform com-
paratively well, with RS performing slightly better than BA and CW. As with the
single classifier systems, ensembles had more difficulty classifying faces perceived
as intelligent and mature.
Looking at Table 4 and Figure 13, we can compare results between the single
classifier systems and the ensembles. The ensembles clearly outperform NN and
SUB. SVM and LMNN are close in performance to the ensembles; however, as we
can see in Table 4, the ensembles, unlike SVM and LMNN, performed compara-
tively well across all six trait dimensions.
432 S. Brahnam and L. Nanni
Fig. 15 Comparison of best ensemble classification (PCA+RS) and best single classifier
(PCA+SVM) performance with subject (Raters) performance averaged across all six trait
dimensions
Looking at Table 4, where we report the average AROC obtained by the subjects,
and Figures 14 and 15, we can see that the performance obtained by ensembles is
similar to the performance of the average rater, with RS exactly matching the raters
in the averaged performance of all six trait dimensions. This result leads us to believe
that machines are as capable of classifying faces according to the impressions they
make on the general observer as are most human beings.
Predicting Trait Impressions of Faces Using Classifier Ensembles 433
5 Conclusion
In this chapter we present unique face classification experiments using a variety of
collaborative methods. The experiments are unique because the systems were not
asked to classify faces according to such factual information as identity and gender
but rather the systems had to match the human perception of faces in terms of the
social impressions they make on the average observer.
One contribution of the study reported in this chapter was the development of
a sound ground truth for this problem domain. Our goal was to collect a set of
faces that exhibit strong human consensus in a comprehensive set of trait cate-
gories. To accomplish this objective, four artists were asked to construct 480 stimu-
lus faces, using the composite program FACES, with an eye towards making faces
they thought were clearly intelligent, unintelligent, mature, immature, warm, cold,
social, unsocial, dominant, submissive, trustworthy, and untrustworthy. Subjects
then rated the 480 faces using the same twelve descriptors. Since traits are corre-
lated, this process succeeded in creating trait classes that averaged over one hundred
faces each, a vast improvement over the databases of faces we used in earlier work
(see [9]).
Single classifiers and ensembles were then trained to match the bipolar extremes
of the faces in each of the six trait dimensions of intelligence, maturity, warmth,
sociality, dominance, and trustworthiness. With performance measured by AROC
and averaged across all six dimensions, results show that single classifiers, espe-
cially linear SVMs (0.74) and LMNN (0.73), performed as well as human raters
(0.77). These single classifiers, however, performed poorly in the trait dimension
of maturity. Ensembles of 100 LMNNs, constructed using BA (0.76), SUB (0.77),
and CW (0.75), compared equally well to rater performance, but were better than
the single classifiers at handling all six trait dimensions. The Random subspace
AROC, averaged across the six dimensions, exactly matched rater performance. We
concluded that machine learning methods, especially ensembles, are as capable of
perceiving the social impressions faces make on the general observer as are most
human beings.
Although research shows that people perceive personality even in abstract draw-
ings of faces [15], one shortcoming in developing the system reported in this chapter
is the possibility that classifiers trained on artificially constructed faces will not gen-
eralize to natural faces. We are currently developing studies to determine how well
our ensembles, trained with this dataset, are able to match human raters of pho-
tographs of people. In addition, we are developing a dataset of photographs of faces
that have equally large numbers of faces in each of the twelve trait classes.
As noted in the introduction, developing models of the human perception of the
social meanings of faces may have value in a number of fields, including social psy-
chology and human-computer interaction. Certainly, the human-like interfaces and
robots of the future will need to be able to see faces and other objects as human
beings see them, if they are to have more than a superficial social engagement with
us. It is not enough in human society simply to recognize what an object is; one must
434 S. Brahnam and L. Nanni
also be aware of the cultural layers of meanings that envelop each object. Our exper-
iments demonstrate that it is possible for machines to match some of these cultural
meanings to attributes possessed by the objects. For socially interactive interfaces
and robots to be believable, however, they will need the ability to integrate a host
of social impressions. Building machines that perceive the social meanings of ob-
jects will involve further research in the exciting area of collaborative computational
intelligence.
Predicting Trait Impressions of Faces Using Classifier Ensembles 435
Appendix
Table 5 Definition of each trait dimension and behavioral potential questions (modelled after
Berry and Brownlow [7] and Zebrowitz and Motepare [88]) as given to the subjects who
evaluated the stimulus faces
Dominant, Submissive, Neutral Here we are looking at how dominating the person looks.
Dominant: Is a person who is most likely to tell other peo-
ple what to do.
Submissive: Is a person who usually follows others and is
not very assertive.
A helpful question might be: “Does s/he look like someone
who would be the kind of roommate who would comply
with most of your wishes about the furniture arrangement,
quiet hours, and house rules?”
Intelligent, Unintelligent, Neutral Here we are looking at how intelligent the person looks.
Intelligent: Is a person who is possible very educated, ca-
pable, and interested in intellectual work.
Unintelligent: Is a person who probably does not value
school as s/he was not good at school subjects.
A helpful question might be: “Does s/he look like someone
you would learn from when discussing such topics as art,
politics, philosophy, or science?”
Mature, Immature, Neutral Here we are looking at how responsible the person looks.
Mature: Is a person who acts like an adult and behaves
responsibly.
Immature: Is a person who behaves in a childish or irre-
sponsible manner.
A helpful question might be: “Does s/he look like someone
you could trust to take on important responsibilities?”
Trustworthy, Untrustworthy, Neutral Here we are looking at how honest the person looks.
Trustworthy: Is a person who is mostly honest and is not
likely to steal, lie, or cheat.
Untrustworthy: Is a person who is often not honest and
who possible steals, lies, or cheats.
A helpful question might be: “Does s/he look like someone
you would ask to watch your backpack while you made a
visit to the restroom?”
Social, Unsocial, Neutral Here we are looking at how social the person looks.
Social: Is a person who is most likely very outgoing, extro-
verted, and who enjoys parties and other social activities.
Unsocial: Is a person who is introverted, a loner, shy, and
who would prefer to stay home rather than go out.
A helpful question might be: “Does s/he look like someone
you would invite to a party to enliven it?”
Cold, Warm, Neutral Here we are looking at how approachable the person is.
A helpful question might be: “Does s/he look like someone
who would turn a cold shoulder to your attempts at friendly
conversation?”
436 S. Brahnam and L. Nanni
References
1. Abate, A.F., Nappi, M., Riccioa, D., Sabatinoa, G.: 2D and 3D face recognition: A sur-
vey. Pattern Recognit. Lett. 14, 1885–1906 (2007)
2. Albright, L., Malloy, T.E., Dong, Q., Kenny, D.A., Fang, X.: Cross-cultural consensus in
personality judgments. J. Personal and Soc. Psychol. 3, 558–569 (1997)
3. Alcock, D., Solano, J., Kayson, W.A.: How individuals’ responses and attractiveness
influence aggression. Psychol. Rep. 3(2), 1435–1438 (1998)
4. Alpaydin, E.: Introduction to machine learning. MIT Press, Cambridge (2004)
5. Altınçay, H., Demirekler, M.: An information theoretic framework for weight estimation
in the combination of probabilistic classifiers for speaker identification. Speech Com-
mun. 4, 255–272 (2000)
6. Bellman, R.: Adaptive control process: A guided tour. Princeton University Press, Prince-
ton (1961)
7. Berry, D.S., Brownlow, S.: Were the physiognomists right? Personal and Soc. Psychol.
Bull. 2, 266–279 (1989)
8. Berry, D.S., McArthur, L.Z.: Perceiving character in faces: The impact of age-related
craniofacial changes on social perception. Psychol. Bull. 1, 3–18 (1986)
9. Brahnam, S.: Modeling physical personalities for virtual agents by modeling trait im-
pressions of the face: A neural network analysis. The Graduate Center of the City of
New York, Department of Computer Science, New York (2002)
10. Breiman, L.: Bagging predictors. Mach. Learn. 2, 123–140 (1996)
11. Breiman, L.: Random forest. Mach. Learn. 1, 5–32 (2001)
12. Bruce, V.: Recognising faces. Lawrence Erlbaum Associates Publishers, London (1988)
13. Brunelli, R., Poggio, T.: Hyperbf networks for gender classification. In: DARPA Image
Understanding Workshop, pp. 311–314 (1992)
14. Brunelli, R., Poggio, T.: Face recognition: Features versus templates. IEEE Trans. Pattern
Anal. and Mach. Intell. 10, 1042–1052 (1993)
15. Brunswik, E.: Perception and the representative design of psychological experiments.
University of California Press, Berkeley (1947)
16. Bull, R., Rumsey, N.: The social psychology of facial appearance. Springer, Heidelberg
(1988)
17. Burton, A.M., Bruce, V., Dench, N.: What’s the difference between men and women?
Evidence from facial measurement. Percept. 2, 153–176 (1993)
18. Chellappa, R., Wilson, C.L., Sirohey, S.: Human and machine recognition of faces: A
survey. Proceedings of the IEEE, pp. 705–740 (1995)
19. Cottrell, G.W., Fleming, M.K.: Face recognition using unsupervised feature extraction.
In: International Conference on Neural Networks, pp. 322–325 (1990)
20. Cottrell, G.W., Metcalfe, J.: EMPATH: Face, emotion, and gender recognition using
holons. In: Touretzky, D. (ed.) Adv. Neural Inf. Process Syst., pp. 564–571. Morgan
& Kaufman, San Mateo (1991)
21. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classificiation. IEEE Trans. Inf. The-
ory 1, 21–27 (1967)
22. Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines and other
kernel-based learning methods. Cambridge University Press, Cambridge (2000)
23. Czyz, J., Kittler, J., Vandendorpe, L.: Multiple classifier combination for face-based iden-
tity verification. Pattern Recognit. 7, 1459–1469 (2004)
24. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2000)
Predicting Trait Impressions of Faces Using Classifier Ensembles 437
25. Eagly, A.H., Ashmore, R.D., Makhijan, M.G., Longo, L.C.: What is beautiful is good, but
...: A meta-analytic review of research on the physical attractiveness stereotype. Psychol.
Bull. 1, 109–128 (1991)
26. Edelman, B.E., Valentin, D., Abdi, H.: Sex classification of face areas: How well can a
linear neural network predict human performance. J. Biol. Syst. 3, 241–264 (1998)
27. Efron, B.: The jackknife, the bootstrap and other resampling plans. SIAM, Philadelphia
(1982)
28. Enlow, D.H., Hans, M.G.: Essentials of facial growth. W. B. Saunders Company,
Philadelphia (1996)
29. Feingold, A.: Good-looking people are not what we think. Psychol. Bull. 2, 304–341
(1992)
30. Freierman, S.: Constructing a real-life mr. potato head. Faces: The ultimate composite
picture. The New York Times D:6 (2000)
31. Golumb, B.A., Lawrence, D.T., Sejnowshi, T.J.: Sexnet: A neural network identifies sex
from human faces. Adv. Neural Inf. Process Syst., 572–577 (1991)
32. Guo, G., Li, S.Z., Chan, K.L.: Support vector machines for face recognition. Image and
Vis. Comput., 631–638 (2001)
33. Heisele, B., Ho, P., Poggio, T.: Face recognition with support vector machines: Global
versus component-based approach. In: The Eighth IEEE International Conference on
Computer Vision, Vancouver, BC, pp. 688–694 (2001)
34. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans.
Pattern Anal. and Mach. Intell. 8, 832–844 (1998)
35. Jain, A., Huang, J.: Integrating independent components and linear discriminant analysis
for gender classification. In: Sixth IEEE International Conference on Automatic Face
and Gesture Recognition, pp. 159–163 (2004)
36. Jain, A.K., Dubes, R.C., Chen, C.C.: Bootstrap techniques for error estimation. IEEE
Trans. Pattern Anal. and Mach. Intell. 5, 628–633 (1987)
37. Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans.
Pattern Anal. and Mach. Intell. 1, 4–37 (2000)
38. Kanghae, S., Sornil, O.: Face recognition using facial attractiveness. In: The 2nd Interna-
tional Conference on Advances in Information Technology, Bangkok, Thailand (2007)
39. Keating, C.F., Mazur, A., Segall, M.H.: A cross-cultural exploration of physiognomic
traits of dominance and happiness. Ethol. and Sociobiol., 41–48 (1981)
40. Kittler, J.: On combining classifiers. IEEE Trans. Pattern Anal. and Mach. Intell. 3, 226–
239 (1998)
41. Kohonen, T.: Associative memory: A system theoretic approach. Springler, Berlin (1977)
42. Kosugi, M.: Human-face search and location in a scene by multi-pyramid architecture
for personal identification. Syst. and Comput. Jpn. 6, 27–38 (1995)
43. Kuncheva, L.I.: Clustering-and-selection model for classier combination. In:
Knowledge-Based Intelligent Engineering Systems and Allied Technologies, Brighton,
pp. 185–188 (2000)
44. Kuncheva, L.I.: Combining pattern classifiers: Methods and algorithms. Wiley, New
York (2004)
45. Kuncheva, L.I.: Diversity in multiple classifier systems. Inf. Fusion 1, 3–4 (2005)
46. Kuncheva, L.I., Whitaker, C.J.: Measures of Diversity in Classifier Ensembles and their
Relationship with the ensemble accuracy. Mach. Learn. 2, 181–207 (2003)
47. Langlois, J.H., Kalakanis, L., Rubenstein, A.J., Larson, A., Hallam, M., Smoot, M.: Max-
ims or myths of beauty? A meta-analytic and theoretical review. Psychol. Bull. 3, 390–
423 (2000)
438 S. Brahnam and L. Nanni
48. Lanitis, A., Taylor, C.J., Cootes, T.F.: Automatic interpretation and coding of face images
using flexible models. IEEE Trans. Pattern Anal. and Mach. Intell. 7, 743–756 (1997)
49. Lanitis, A., Taylor, C.J., Cootes, T.F.: Toward Automatic Simulation of Aging Effects on
Face Images. IEEE Trans. Pattern Anal. and Mach. Intell. 4, 442–455 (2002)
50. Levenberg, K.: A Method for the solution of certain nonlinear problems in least squares.
Q Appl. Math. 2, 164–168 (1944)
51. Ling, C.X., Huang, J., Zhang, H.: Auc: A better measure than accuracy in comparing
learning algorithms. In: Canadian Conference on Artificial Intelligence 2003, Halifax,
Canada, pp. 329–341 (2003)
52. Lu, X., Jain, A.K.: Ethnicity identification from face images. In: SPIE: Biometric Tech-
nology for Human Identification Conference: Biometric Technology for Human Identi-
fication, Orlando, FL, pp. 114–123 (2004)
53. Marquardt, D.: An algorithm for least-squares estimation of nonlinear parameters. SIAM
J. Appl. Math., 431–441 (1963)
54. Martı́nez-Muñoz, G., Suárez, A.: Switching class labels to generate classification ensem-
bles. Pattern Recognit. 10, 1483–1494 (2005)
55. Martinez, A.M., Benavente, R.: The ar face database. CVC Technical Report #24 (1998),
http://rvl1.ecn.purdue.edu/˜aleix/aleix_face_DB.html
56. McArthur, L.Z., Baron, R.M.: Toward an ecological theory of social perception. Psychol.
Rev. 3, 215–238 (1983)
57. Melville, P., Mooney, R.J.: Constructing diverse classifier ensembles using artificial
training examples. In: International Joint Conferences on Artificial Intelligence, pp. 505–
510 (2003)
58. Metz, C.E.: Basic principles of ROC analysis. Semin. Nucl. Med. 4, 283–298 (1978)
59. Mitsumoto, S.T., Kawai, H.: Male/female identification from 8 x 6 very low resolution
face images by a neural network. Pattern Recognit. 2, 331–335 (1996)
60. Moghaddam, B., Yang, M.-H.: Gender classification with support vector machines. In:
Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FG),
pp. 306–311 (2000)
61. Moghaddam, B., Yang, M.-H.: Learning gender with support faces. IEEE Trans. Pattern
Anal. and Mach. Intell. 5, 306–311 (2002)
62. Mulford, M., Orbell, J., Shatto, C., Stockard, J.: Physical attractiveness, opportunity, and
success in everyday exchange. American J. Sociol. 6, 1565–1592 (1998)
63. Nocedal, J., Wright, S.J.: Numerical optimization. Springer, New York (1999)
64. O’Toole, A.J., Abdi, H., Deffenbacher, K.A., Bartlett, J.C.: Classifying faces by race and
sex using an autoassociative memory trained for recognition. In: 13th Annual Conference
on Cognitive Science, Hillsdale, NJ, pp. 847–851 (1991)
65. O’Toole, A.J., Deffenbacher, K.A.: The perception of face gender: The role of stimulus
structure in recognition and classification. Mem. and Cogn., 146–160 (1997)
66. Oja, E.: Subspace Methods of Pattern Recognition. Research Studies Press Ltd., Letch-
worth (1983)
67. Oja, E.: Principal components, minor components and linear neural networks. Neural
Netw., 927–935 (1992)
68. Opitz, D., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Intell.
Res., 169–198 (1999)
69. Padgett, C., Cottrell, G.W.: A simple neural network models categorical perception of
facial expressions. In: Proceedings of the 20th Annual Cognitive Science Conference,
Madison, WI, pp. 806–807 (1998)
70. Phillips, P.J.: Support vector machines applied to face recognition. Adv. Neural Inf. Pro-
cess Syst., 803–809 (1998)
Predicting Trait Impressions of Faces Using Classifier Ensembles 439
71. Rosenberg, S.: New approaches to the analysis of personal constructs in person percep-
tion. In: Land, A.L., Cole, J.K. (eds.) Nebraska symposium on motivation, pp. 179–242.
University of Nebraska Press, Lincoln (1977)
72. Rowley, H.A., Shumeet, B., Kanade, T.: Neural network-based face detection. IEEE
Trans. Pattern Anal. and Mach. Intell. 1, 23–38 (1998)
73. Russell, S., Norvig, P.: Artificial intelligence: A modern approach. Prentice Hall, Upper
Saddle River (2002)
74. Sirovich, L., Kirby, M.: Low dimensional procedure for the characterization of human
faces. J. Opt. Soc. Am. 3, 519–524 (1987)
75. Swets, D.L., Weng, J.: Using discriminant eigenfeatures for image retrieval. IEEE Trans.
Pattern Anal. and Mach. Intell. 8, 831–837 (1996)
76. The MathWorks, Using MATLAB: The language of technical computing. The Math-
works, Inc., Natick, MA (2000)
77. Todd, J.T., Mark, L.S.: The perception of human growth. Sci. Am. 2, 132–144 (1980)
78. Turk, M.A., Pentland, A.P.: Eigenfaces for recognition. J. Cogn. Neurosci. 1, 71–86
(1991)
79. Turk, M.A., Pentland, A.P.: Face recognition using eigenfaces. In: IEEE Computer So-
ciety Conference on Computer Vision and Pattern Recognition, Silver Spring, MD, pp.
586–591 (1991)
80. Valentin, D., Abdi, H., Edelman, B.E., O’Toole, A.J.: Principal component and neural
network analyses of face images: What can be generalized in gender classification? J.
Math. Psychol. 4, 398–413 (1997)
81. Valentin, D., Abdi, H., O’Toole, A.J.: Categorization and identification of human face
images by neural networks: A review of the linear autoassociative and principal compo-
nent approaches. J. Biol. Syst. 3, 413–429 (1994)
82. Valentin, D., Abdi, H., O’Toole, A.J., Cottrell, G.W.: Connectionist models of face pro-
cessing: A survey. Pattern Recognit. 9, 1209–1230 (1994)
83. van der Heijden, F., Duin, R.P.W., de Ridder, D., Tax, D.M.J.: Classification, parameter
estimation, and state estimation: An engineering approach using MATLAB. John Wiley
& Sons, Ltd., Chichester (2004)
84. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
85. Wechsler, H., Gutta, S., Philips, P.J.: Gender and ethnic classification of Face Images. In:
3rd Int. Conf. on Automatic Face and Gesture Recognition, Nara, Japan, pp. 194–199
(1998)
86. Whitaker, C.J., Kuncheva, L.I.: Examining the relationship between majority vote ac-
curacy and diversity in bagging and boosting (2003), http://www.informatics.
bangor.ac.uk/kuncheva/papers/lkcw_tr.pdf
87. Zebrowitz, L.A.: Reading faces: Window to the soul? Westview Press, Boulder (1998)
88. Zebrowitz, L.A., Montepare, J.M.: Impressions of babyfaced individuals across the life
span. Dev. Psychol. 6, 1143–1152 (1992)
89. Zebrowitz, L.A., Montepare, J.M.: Social psychological face perception: Why appear-
ance matters. Soc. and Personality Psychol. Compass 3, 1497–1517 (2008)
90. Zebrowitz, L.A., Montepare, J.M., Lee, H.K.: They don’t all look alike: Individuated
impressions of other racial groups. J. Personal and Soc. Psychol. 1, 85–101 (1993)
91. Zenobi, G., Cunningham, P.: Using diversity in preparing ensembles of classifiers based
on different feature subsets to minimize generalization error. In: 12th Conference on
Machine Learning, pp. 576–587 (2001)
92. Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face recognition: A literature sur-
vey. ACM Comp. Surv. 4, 399–458 (2000)
The Analysis of Crowd Dynamics: From
Observations to Modelling
1 Introduction
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 441–472.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
442 B. Zhan et al.
Fig. 1 The example frames and the built background images from three different scenes. Left
to right: three different scenes; top to bottom, three example frames and the built background
images, respectively
The Analysis of Crowd Dynamics: From Observations to Modelling 443
Optical flow or optic flow is the pattern of apparent motion of objects, sur-
faces, and edges in a visual scene caused by the relative motion between an
observer (an eye or a camera) and the scene. In the survey of Beauchemin
[11] existing optical flow techniques are investigated, including: 1) differential
methods; 2) frequency based methods; 3)correlation based method; 4)multiple
motion methods and 5) template refined methods.
Section 4 describes the methods used to model crowd dynamics. First a statisti-
cal method is introduced. This method is focused on defining the main path of the
crowded scene [95]. Then a neural network based approach is proposed to capture
the crowd dynamics with a reduction of the dimensions of the input data. The self-
organizing map technique is employed for this purpose and the results have been
generated for different types of crowded scenes. Section 5 discusses the obtained
results and sheds some light on the future directions of the work on crowd analysis.
2 Background
The steady population growth, along with the worldwide urbanization, has made
the crowd phenomenon more frequent. It is not surprising; therefore, that crowd
analysis has received attention from technical and social research disciplines. The
crowd phenomenon is of great interest in a large number of applications:
Crowd Management: Crowd analysis can be used for developing crowd man-
agement strategies, especially for increasingly more frequent and popular events
such as sport matches, large concerts, public demonstrations and so on, to avoid
crowd related disasters and insure public safety.
Public Space Design: Crowd analysis can provide guidelines for the design of
public spaces, e.g. to make the layout of shopping malls more convenient to
costumers or to optimize the space usage of an office.
444 B. Zhan et al.
police or military groups [22]. The Police Academy of the Netherlands and School
of Psychology of University of Liverpool are cooperating on a project funded by the
UK Home Office: “A European study of the interaction between police and crowds
of foreign nationals considered to pose a risk to public order” [1].
On the other hand, computational methods, such as those employed in computer
graphics and vision methods, focus on extracting quantitative features and detecting
events in crowds, synthesizing the phenomenon with mathematical and statistical
models. For example, an early project funded by the EPSRC in the UK was con-
cerned with measuring crowd motion and density and hence potentially dangerous
situations [25] [87] [93]. The EU funded projects PRISMATICA [75] and ADVI-
SOR [2], completed in 2003, were concerned with the management of public trans-
port networks through CCTV cameras. The UK EPSRC funded project BEHAVE,
was concerned with the pre-screening of video sequences with the detection of ab-
normal or crime-oriented behaviour [12]. ISCAPS [42] started in 2005, a consortium
of 10 European ICT companies and academic organizations, aims to provide auto-
mated surveillance of crowded areas. SERKET, a recently started EU project aims
to develop methods to prevent terrorism [40].
Figure 2 illustrates the processes involved in crowd analysis. In a crowd scene
the attributes of importance are crowd density, location, speed, etc. This information
can be extracted either manually or automatically using computer vision techniques.
Crowd models can then be built based on the extracted information. Event discovery
is achieved using pre-compiled knowledge of the scene or using the computational
model, although both approaches can be combined. In both cases the model is up-
dated with newly extracted information.
changes often introduce noise; the scene typology affects the type of process one
requires to extract the most accurate information of a dynamics scene.
Visual surveillance methods have been developed to estimate the motion of ob-
jects and people in the scene, in isolation or in groups; a review can be found in
[37]. When video is analysed for very crowded scenes, conventional computer vi-
sion methods are not appropriate, in these cases methods must be designed to cope
with extreme clutter. Features from conventional image processing are still em-
ployed, such as colour, shape and texture etc. However, sophisticated methods have
been developed to retrieve crowd information. In the following sections the existing
state-of-the-art will be reviewed.
An important crowd feature is crowd density and it is natural to think that crowds of
different densities should receive a different levels of attention.
Research methods have been proposed for crowd analysis which employ back-
ground removal techniques such as [93], [61] and [26]. These studies makes use
of examples to map the global shape feature directly to configurations of humans,
and work under the typical assumption that the number of foreground pixels are
proportional to the number of people, which is only true when there are not serious
occlusions between people.
Image processing and pattern recognition techniques are also used for the anal-
ysis of the scene to estimate the crowd density. Marana et al. [64] assume that im-
ages of low-density crowds tend to present coarse texture, while images of dense
crowds tend to present fine textures. Self-organizing neural maps [65] combined
with Minkowski fractal dimensions [63] are employed to deduce the crowd density
from the texture of the image. The work by Marana is compared in [76] with an-
other method that uses Chebyshev moments. An optimization of performance under
different illumination conditions is discussed. Lin et al. [59] present a system that
estimates the crowd size through the recognition of the head contour using Haar
wavelet transform (HWT) and support vector machines (SVM).
Alternative methods combine several techniques, to achieve more accurate and
reliable measurements. For example, in [87], an edge-based technique is integrated
with background removal using a Kalman filter. Marana et al. [62] use different
methods including Fourier and Fractal analysis and classifiers to estimate the crowd
density level. Kong et al. in [52] [53] employ background subtraction and edge de-
tection; the work defined the extracted edge orientation and blob size histograms as
features. The relationship between the feature histograms and the number of pedes-
trian is learned from labelled training data. Obviously more cues may indicate a
more accurate solution.
2.1.2 Recognition
Conventional visual surveillance focuses on object detection and tracking. In
essence, image processing techniques are employed to extract the chromatic and
The Analysis of Crowd Dynamics: From Observations to Modelling 447
shape information of the moving objects and the background for detecting and track-
ing purposes.
For crowd dynamics modelling, detecting and tracking are also important as they
provide the location and velocity features of the dynamics. Crowded scenes add
a degree of complexity to the conventional detection and tracking problem of sin-
gle individuals. In the following sections the focus will be on methodologies for
crowded situations.
Face is the most discriminating feature of the human body, and many researchers
try to detect a pedestrian through face detection. The majority of the existing re-
search employs supervised learning methods to detect faces in a crowded situation,
for example [85] [58] [43][38].
Pedestrian detection and tracking is a well studied problem in computer vision.
Many methods have been proposed, such as using the afore mentioned background
removal technique, or combining chromatic and shape information of the tracked
pedestrians. The following sections discuss the methods that try to provide a solution
for pedestrian detection in crowded scenes.
Occlusion caused by the high clutter of the pedestrian in a crowd scene is the
major challenge for crowd detection problems. Research is being carried out to ad-
dress the problem by using human body parts, for example [91] [28] [57]. Besides
conventional cues of pedestrian appearance, space-temporal cues are also used for
detection. Brostow et al. [17] tackle the problem by tracking simple image features
and probabilistically grouping them into clusters representing independently mov-
ing entities. In extremely cluttered scenes, individual pedestrian cannot be properly
segmented in the image. However, sometimes the crowd within which the pedestri-
ans share a similar purpose can be recognized. Reisman et al. [79] propose a scheme
that uses slices in the spatial-temporal domain to detect inward motion as well as
intersections between multiple moving objects. The system calculates a probabil-
ity distribution function for left and right inward motion and uses these probability
distribution functions to infer a decision for crowd detection.
2.1.3 Tracking
Tracking has been proposed to localize the interested object in time-space. Also
the velocity feature can be derived afterwards. Though as a natural extension of
detection, tracking has its own problem to recognize and identify pedestrians in
the consecutive frames. Tracking could be regarded as the most popular topic in
visual surveillance, however currently for crowd analysis, most of the techniques
are validated only for multiple (e.g. up to 10) people.
As discussed in the last subsection, occlusions can occur very frequently when
there are many objects and people in the scene. Tracking techniques have to over-
come this problem in order to continuously track before, during and after the
occurrence of occlusions. A comprehensive review on occlusion handling can be
found in [30]. A formulation of the occlusion problem is provided, and the tech-
niques are divided in two groups: the merge-split approach, which addresses the
448 B. Zhan et al.
The first method employs a modified version of the Harris interest point detector
[31]). The Harris interest point detector provides a repeatable and distinctive de-
scriptor of the image features and it is view-point and illumination invariant. This
detector extracts feature points, making use of the three chromatic channels is de-
fined as the M matrix:
Cx ·Cx Cx ·Cy
M = G(σ ) ⊗ (1)
Cy ·Cx Cy ·Cy
In the operation the image is firstly smoothed using a standard Gaussian operator (of
deviation σ ). Cx and Cy are respectively the gradient in x and y directions of the pixel
chromatic triplet. They are estimated by applying the Gaussian derivative operator
G(σ ) of (deviation σ ) to the smoothed image, this is efficiently implemented by
using the method from [89]. The interest points are then extracted using term R,
which is calculated as a combination of the Eigen values of the M matrix:
Where κ is a constant where 00.4 ≤ κ ≤ 0.06. The points with local maximum are
selected as interest points. A multi-scale approach is used, generating the interest
points at the lowest (finest scale) layer and then projecting them up to the top (coars-
est scale) layer of the generated pyramid.
The matching is carried out in two steps: searching for the candidate matching points
by similarity, and then applying the topological constrains described later. Frequent
The second method is developed using local descriptors, but also incorporating
shape information. Inspired by the methodology used in deformable object track-
ing, edge information is extracted and descriptor points are extracted as points along
an edge with local maximum curvature. The information about an edge is main-
tained and used to impose the edgelet constraint and refine the estimate. Thus, the
advantage of using point features which are flexible to track, and the advantage of
using edge features which maintain structural information, are combined here.
The Canny edge detector is employed to extract the edge information of a given
frame. Each Canny edge is a chain of points, and all the edges are stored in an edge
The Analysis of Crowd Dynamics: From Observations to Modelling 453
list. Figures. 4 show an example image frame and the extracted edge chains with
associated bounding boxes, respectively. It can be observed that even in a scene
which depicts a crowd of moderate density, edge chains can occlude each other,
increasing the descriptor matching complexity.
The Canny edge detector is an approach which is optimal for a step edge
corrupted by white noise. The optimality of the detector is related to three
criteria. The detection criterion is about low error rate. It is important that
edges occurring in images should not be missed and that there be no responses
to non-edges. The second criterion is that the edge points be well localized.
The distance between the edge pixels as found by the detector and the actual
edge is to be at a minimum. A third criterion is to minimize multiple responses
to a single edge. Thus, based on these criteria, the Canny edge detector was
proposed and has become one of the most popular edge detectors [82].
Interest points can be quickly extracted for a sequence of frames, for instance with
the Harris corner operator used in the last section. However Harris interest points
can only represent the local characteristics of an image in isolation, while the shape
information of the moving person/people is lost. In this implementation the interest
points are from the edges and then the constraint is imposed that they lie on a specific
edge. Each edge can be represented by a parameterized curve:
x = x(t), (3)
y = y(t). (4)
X Y − Y X
κ= 3 (8)
(X 2 + Y 2 ) 2
Corner points are defined and extracted as the local maxima of the absolute value
of curvature on each edge. Thus the edge representation is changed from a point
sequence to a corner point sequence, resulting in a list of corner point sequences for
all the edges of the image.
Given two consecutive frames It and It+1 , the motion is estimated for each extracted
point of interest. For each corner point with coordinate (x, y) in It a rectangular
search window is defined centering at (x, y)in It+1 . A look-up table (LUT) contains
corner points and edge information is generated to enhance the matching. The corre-
spondence is matched by using curvature information of corner points in the search
window in LUT against the reference point. The error is calculated by the curvature.
Complex dynamics and frequent occlusions generated in crowd scenes make the
estimation of motion a very complex task. Point matching in isolation is too fragile
and prone to errors to provide a good motion estimator. If the interest points are
extracted on edge chains, then the edge constraint can be imposed and used.
For an image frame It , every edge is split to a uniform length edgelets repre-
sented by sub-sequences (so called edgelet). There are two reasons for doing this:
to avoid a very long edge that could be generated by several different objects, and
to enhance the matching of the edge fragments that are generated by occlusions.
For each corner point there are n candidate matching points. Each candidate point
belongs to an edgelet, thus there are m(m <= n) candidate matching edgelets. To
find the best match, three parameters are used: energy cost, variation of displace-
ments and the match length for each candidate, and these are combined into a single
matching score. The length of the edgelet is assumed to be sufficiently small so that
it would not split again to two or more matches. This is so that their candidate points
correspond to the same candidate sequence.
The matching is carried out over every point of the given edgelet, and an overall
matching will be examined to determine the matched edgelet.
(a) (b)
(c) (d)
Fig. 5 Two scenes of different complexity levels are illustrated. The original frames (left)
and the extracted corner points (right) which are marked with red crosses on grey edges
described motion estimation methods were validated and compared. In both of the
algorithms, constraints are applied to improve the robustness of the matching be-
tween individual descriptors. The first algorithm carries out a local check of the
spatial temporal consistency of the colour gradient, supported by the local topol-
ogy constraints, and the second one uses the points of local extreme curvature along
Canny edges and applies contour constraints.
In this test only the quality of matching of individual local descriptors is consid-
ered. For each pair of consecutive frames, local descriptors in the initial frame are
compared with their corresponding local descriptors, found by the two presented al-
gorithms, in the target/second frame, respectively. Two measures, Mean Similarity
(MS) and Mean Absolute Error (MAE), are used here.
The images in Figure 7 represent the plots of MS and MAE for the two algorithms
tested against the three sequences. MS and MAE are calculated every frame along
the sequence. In each plot the x axis represents time (the number of the frame) and
the y axis represents the values of MS and MAE, respectively. Hence, for the two
algorithms the MS and MAE for the three testing sequences are both good, though
in most of the cases the second algorithm has a higher MS and a lower MAE. Also,
along the time scale the MS and the MAE produced from the first algorithm fluctuate
a lot while the second one produces more stable results. It can be concluded that the
second algorithm has a more desirable performances than the first one.
Fig. 7 MS (top row) and MAE (bottom row) along time for the 3 testing sequence (From top to the bottom: sequence 1, sequence 2 and sequence 3),
red lines for Algorithm 1; green lines for Algorithm 2. Algorithm 2 keeps higher in MS and lower in MAE
457
458
Fig. 8 Recall (top row) and Precision (bottom row) along time for the 3 testing sequence (From top to the bottom: sequence 1, sequence 2 and sequence
3), red lines for Algorithm 1; green lines for Algorithm 2. Algorithm 2 has higher values of Recall
B. Zhan et al.
The Analysis of Crowd Dynamics: From Observations to Modelling 459
(a)
(b)
(c)
Fig. 9 Number of MCCs along time for the 3 testing sequence, red lines for Algorithm 1;
green lines for Algorithm 2 (From top to bottom: sequence 1, sequence 2 and sequence 3).
Algorithm 3 detects much more MCCs for all of the three video sequences
Precision for sequence 3 is lower than the other three. One possible reason could
be that, as sequence 3 is a far field view for a crowded scene, when mapping the
bounding box of the MCC to the second frame, local descriptors of other MCC
could be included and noise could be introduced.
460 B. Zhan et al.
When comparing the results of Recall, it can be seen that values for Algorithm 2
are always higher, though for sequence 2 and sequence 3 Precision values for Algo-
rithm 1 are slightly higher. Here another measure should be taken into consideration,
which is the number of the MCC detected by each algorithm. According to the plots
in Figure 9, in sequence 1 the average number of MCC detected by Algorithm 1
is around 20, while by Algorithm 2 the number is around 100; in sequence 2, the
numbers are around 20 and 200, respectively; in sequence 3 the numbers are around
40 and 280, respectively. Algorithm 2 detects much more MCC, especially for se-
quence 2 and 3. Due to the above fact and the fact Algorithm 2 produces higher
Recall, it can be deduced that the slight drawback of the Precision only indicates
more noise has been introduced to the assessment.
For each frame foreground, features are accumulated for every pixel, so that af-
ter a relatively long video sequence the accumulator of the foreground occurrence
throughout the whole image will have some information.
features of the human brain, which represent different sensory input by topologi-
cally ordered computational maps. SOMs are widely used in mapping multidimen-
sional data onto a low-dimensional map. Examples of applications include the anal-
ysis of banking data, linguistic data [50] and image classification [55]. This section
proposes a system learning the crowd dynamics with the SOM. The system uses
dynamics information as input; and it generates SOM which captures the dominant
recurrent dynamics.
The most common SOMs have neurons organized as nodes in a one- or two-
dimensional lattice. The neurons of a SOM are activated by input patterns in
the course of a competitive learning process. At any moment in time only one
output neuron is active, the so called winning neuron. Input patterns are from a
n-dimensional input space and are then mapped to the one- or two- dimensional
output space of the SOM. Every neuron has a weight vector which belongs to the
input space.
The desirable SOM in this application should capture the two major components
of the crowd dynamics: occurrence and orientation. Thus, a four dimensional in-
put space is chosen to be the weight space of the SOM, which can be represented
as f : (x, y, θ , ρ ). Each data from the input space can be explained as the location
where crowd moves and the motion vectors in the form of angle (θ ) and magnitude
(ρ ). The SOM used in this experiment is organized in a two-dimensional space and
represented by a square lattice.
There are two phases for tuning the SOM with an input pattern I, competing
and updating. In the competing phase every neuron is compared with I; the sim-
ilarly of I and the weights of all of the neurons are computed; and the neuron
N(iw , jw )(denoted by the neuron’s coordinates of the lattice) with highest similarity
is selected as the winning neuron. In the update phase, for each neuron N(i, j), a
distance is calculated as:
d 2 = (i − iw )2 + ( j − jw )2 (9)
d2
h(n) = exp(− ) (10)
2σ 2 (n)
where n denotes the time, which can also be explained as the number of iterations.
and σ 2 (n) decreases with the time. The weight of each neuron N(i, j) at time n + 1
is then defined by:
where w(n) and w(n+1) is the weight of the neuron at time n and n + 1. η (n) is the
function of learning rate, which always decreases with time.
464 B. Zhan et al.
4.3.2 Visualization
Figure 10 illustrates three different video sequences with different dynamics. These
video sequences have been input into the system, and Figure 11 shows the output
SOMs. In the figure SOMs are visualized in the input space, i.e. showing the weight
vector of each neuron. In the visualization, the colour arrows and their locations
are from the weight vector of neurons; the location of the arrows are from the first
two components of the weight vectors (x, y), and the arrows show the second two
components - the components of motion (θ , ρ ). The different colours of the arrows
are also indicating the different orientation of the motion.
In the first video (the left column in Figure 10) the major crowd is moving from
bottom left to top right of the scene. There is another crowd flow from bottom right
of the scene which joins the major flow. In its SOM (the first one in Figure 11)
the neurons with green arrows are clearly from the major flow and the ones with
The Analysis of Crowd Dynamics: From Observations to Modelling 465
red and purple arrows are from the minor flow. In the second video (the middle
column in Figure 10 it is an area of an entrance to a public space. So most of the
people move from top to bottom of the scene. The crowd in the upper part of the
scene is sparser and moves faster when compared to the crowd in the lower part
of the scene. There is also a minor flow, which joins the major flow from the right
of the scene. In the built SOM (the second SOM in Figure 11), again the flows are
clearly indicated. Furthermore, the SOM takes an umbrella shape, which represents
the shape of the flow constrained by the obstacles in the scene. In the third video
(the right column in Figure 10) the scene is a large open area with multiple crowd
flows. The major flow is moving from right to left; however there are several minor
flows, most of which are in the lower part of the scene. Again the SOM (the third in
Figure 11) captures the major dynamics and also some minor flows. From the three
examples, it can be concluded that the SOMs not only preserves the dominant mo-
tion vector, but also represents the shape of the regions with dominant motion of the
scenes.
5 Discussion
This chapter has described novel methods for an intelligent system which can au-
tomatically analyze crowd phenomena. The methods are based on computer vision
techniques, combined with statistical techniques and a neural network. In particular,
local descriptors matching with refined constraints are proposed to tackle the prob-
lem of crowd motion measurements. Two novel algorithms to estimate the motion
of a crowd in complex scenes are presented, evaluated and compared in this chapter.
The first algorithm employs Harris corner points and topological constraints are ap-
plied to make the matching of the points more robust. The second algorithm makes
use of shape information. Local maximum curvatures are used as local descriptors
and the edgelet constraints are enforced for the refined matching.
Statistical methods using Probability Density Functions are employed to learn the
crowd dynamics by mining the main path of the crowded scene. Two PDFs (PDFocc
and PDFor ) are generated during this process. A path recovering method is devel-
oped by calculating the probability along the path using the PDFs. The results show
that this work is a simple approach with reasonable results. Another approach of
crowd dynamics learning adapts Self Organizing Maps to capture the main recur-
rent dynamics. There are a couple of possible extensions of the work. Especially
for latter approach, analyzing the organization of the SOM would make it possible
to understand the characters of the dynamics. Also the development of a metric for
comparing SOMs could be very useful to enhance the automatic classification of
crowded scenes.
Acknowledgements. This work was partially supported by the British Telecom Group PLC.
The Analysis of Crowd Dynamics: From Observations to Modelling 467
References
1. Adang, O.M., Stott, C.: A European study of the interaction between police and
crowds of foreign nationals considered to pose a risk to public order, http://
policestudies.homestead.com/Euro2004.html
2. ADVISOR: http://advisor.matrasi-tls.fr/
3. AEA, Techology: A technical summary of the aea egress code. Technical Report 1
(2002)
4. Andrade, E., Fisher, R.: Simulation of crowd problems for computer vision. In: First
International Workshop on Crowd Simulation, vol. 3, pp. 71–80 (2005)
5. Andrade, E., Fisher, R.: Hidden Markov models for optical flow analysis in crowds. In:
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006),
Washington, DC, USA, vol. 01, pp. 460–463. IEEE Computer Society, Los Alamitos
(2006)
6. Andrade, E., Fisher, R.: Modelling crowd scenes for event detection. In: Proceedings
of the 18th International Conference on Pattern Recognition (ICPR 2006), vol. 01, pp.
175–178. IEEE Computer Society, Washington (2006)
7. Andrade, E.L., Blunsden, S., Fisher, R.B.: Performance analysis of event detection
models in crowded scenes. In: Proc. Workshop on Towards Robust Visual Surveillance
Techniques and Systems at Visual Information Engineering 2006, Bangalore, India, pp.
427–432 (2006)
8. Antonini, G., Bierlaire, M., Weber, M.: Simulation of pedestrian behaviour using a
discrete choice model calibrated on actual motion data. In: 4th STRC Swiss Transport
Research Conference, Ascona (2004)
9. Antonini, G., Venegas, S., Thiran, J.P.: A discrete choice pedestrian behaviour model in
visual tracking systems. In: Advanced Concepts for Intelligent Vision Systems, Brus-
sels, Belgium, pp. 273–280 (2004)
10. Banarjee, S., Grosan, C., Abarha, A.: Emotional ant based modeling of crowd dynam-
ics. In: Seventh International Symposium on Symbolic and Numeric Algorithms for
Scientific Computing (SYNASC 2005), pp. 279–286 (2005)
11. Beauchemin, S., Barron, J.: The computation of optical flow. ACM Computing Surveys
(CSUR) 27(3), 433–466 (1995)
12. BEHAVE: http://www.homepages.informatics.ed.ac.uk/rbf/
BEHAVE/
13. Blackman, S.: Multiple hypothesis tracking for multiple target tracking. IEEE
Aerospace and Electronic Systems Magazine 19(1), 5–18 (2004)
14. Boghossian, B., Velastin, S.: Motion-based machine vision techniques for the man-
agement of large crowds. In: The 6th IEEE International Conference on Electronics,
Circuits and Systems, vol. 2 (1999)
15. Bouguet, J.: Pyramidal Implementation of the Lucas Kanade Feature Tracker Descrip-
tion of the algorithm. Intel Corporation, Microprocessor Research Labs (2000)
16. Brenner, M., Wijermans, N., Nussle, T., de Boer, B.: Simulating and controlling civilian
crowds in robocup rescue. In: Proceedings of RoboCup 2005: Robot Soccer World Cup
IX. Osaka (2005)
17. Brostow, G., Cipolla, R.: Unsupervised Bayesian Detection of Independent Motion in
Crowds. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, vol. 1, pp. 594–601. IEEE Computer Society, Wash-
ington (2006)
468 B. Zhan et al.
18. Cai, Y., de Freitas, N., Little, J.J.: Robust visual tracking for multiple targets. In:
Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 107–
118. Springer, Heidelberg (2006)
19. Chan, M.T., Hoogs, A., Bhotika, R., Perera, A., Schmiederer, J., Doretto, G.: Joint
recognition of complex events and track matching. In: CVPR 2006: Proceedings of the
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recogni-
tion, pp. 1615–1622. IEEE Computer Society, Washington (2006) http://dx.doi.
org/10.1109/CVPR.2006.160
20. Chang, T., Gong, S., Ong, E.: Tracking multiple people under occlusion using multiple
cameras. In: British Machine Vision Conference, pp. 566–575 (2000)
21. Crowd, Dynamics: http://www.crowddynamics.com/
22. Crowd, MAGS: http://www2.ift.ulaval.ca/muscamags/
Dnd-crowdmags-project.htm
23. Cupillard, F., Bremond, F., Thonnat, M.: Behaviour recognition for individuals, groups
of people and crowd. IEE Seminar Digests 7 (2003)
24. Cupillard, F., Bremond, F., Thonnat, M., INRIA, F.: Group behavior recognition with
multiple cameras. In: Sixth IEEE Workshop on Applications of Computer Vision, 2002
(WACV 2002). Proceedings, pp. 177–183 (2002)
25. Davies, A., Yin, J., Velastin, S.: Crowd monitoring using image processing. Electronics
& Communication Engineering Journal 7(1), 37–47 (1995)
26. Dong, L., Parameswaran, V., Ramesh, V., Zoghlami, I.: Fast Crowd Segmentation Using
Shape Indexing, Rio de Janeiro, Brazil (2007)
27. Doucet, A., Godsill, S., Andrieu, C.: On sequential Monte Carlo sampling methods for
Bayesian filtering (2000)
28. Elgammal, A., Davis, L.: Probabilistic framework for segmenting people under occlu-
sion. In: Eighth IEEE International Conference on Computer Vision, 2001. ICCV 2001.
Proceedings, vol. 2, pp. 145–152 (2001)
29. Gabriel, P., Hayet, J., Piater, J., Verly, J.: Object tracking using color interest points. In:
Proceedings. IEEE Conference on Advanced Video and Signal Based Surveillance, pp.
159–164 (2005)
30. Gabriel, P., Verly, J., Piater, J., Genon, A.: The state of the art in multiple object tracking
under occlusion in video sequences. Advanced Concepts for Intelligent Vision Systems,
166–173 (2003)
31. Gouet, V., Boujemaa, N.: About optimal use of color points of interest for content-based
image retrieval. Technical Report pp. RP–4439 (2002)
32. Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall PTR, Upper
Saddle River (1994)
33. Helbing, D., Farkas, I., Vicsek, T.: Simulating Dynamical Features of Escape Panic.
Letters to Nature 407, 487–490 (2000)
34. Helbing, D., Molnár, P.: Social force model for pedestrian dynamics. Physical Review
E 51(5), 4282–4286 (1995)
35. Helbing, D., Molnar, P.: Self-organization phenomena in pedestrian crowds
(1997), http://www.citebase.org/abstract?id=oai:arXiv.org:
cond-mat/9806152
36. Horn, B., Schunck, B.: Determining Optical Flow. Artificial Intelligence 17(1-3), 185–
203 (1981)
37. Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object
motion and behaviors. IEEE Transactions on Systems, Man, and Cybernetics, Part C:
Applications and Reviews 34(3), 334–352 (2004)
The Analysis of Crowd Dynamics: From Observations to Modelling 469
38. Huang, C., Ai, H., Li, Y., Lao, S.: Vector boosting for rotation invariant multi-view
face detection. In: Tenth IEEE International Conference on Computer Vision, vol. 1,
pp. 446–453 (2005)
39. Hughes, R.: A continuum theory for the flow of pedestrians. Transportation Research
Part B: Methodological 36(6), 507–535 (2002)
40. INRIA: http://www.inria.fr/rapportsactivite/RA2005/orion/
uid1.html
41. Isard, M., Blake, A.: A mixed-state CONDENSATION tracker with automatic model-
switching. In: IEEE International Conference on Computer Vision, pp. 107–112 (1998),
http://citeseer.ist.psu.edu/isard98mixedstate.html
42. ISCAPS: http://www.iscaps.reading.ac.uk/home.htm
43. Jones, M., Viola, P.: Fast multi-view face detection. Mitsubishi Electric Research Lab
TR-20003-96 (2003)
44. Kang, H., Kim, D., Bang, S.: Real-time multiple people tracking using competitive
condensation. Proc. of the Intl. Conference on Pattern Recognition 1, 413–416 (2002)
45. Karlsson, R., Gustafsson, F.: Monte Carlo data association for multiple target tracking.
Target Tracking: Algorithms and Applications (Ref. No. 2001/174), IEE 1 (2001)
46. Khan, S.M., Shah, M.: A multiview approach to tracking people in crowded scenes
using a planar homography constraint. In: Leonardis, A., Bischof, H., Pinz, A. (eds.)
ECCV 2006. LNCS, vol. 3954, pp. 133–146. Springer, Heidelberg (2006)
47. Khan, Z., Balch, T., Dellaert, F.: MCMC-based particle filtering for tracking a vari-
able number of interacting targets. IEEE Transactions on Pattern Analysis and Machine
Intelligence 27(11), 1805–1819 (2005)
48. Kim, K., Davis, L.S.: Multi-camera tracking and segmentation of occluded people on
ground plane using search-guided particle filtering. In: Leonardis, A., Bischof, H., Pinz,
A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 98–109. Springer, Heidelberg (2006)
49. Kirchner, A., Schadschneider, A.: Simulation of evacuation processes using a bionics-
inspired cellular automaton model for pedestrian dynamics. Physica A: Statistical Me-
chanics and its Applications 312(1-2), 260–276 (2002)
50. Kirt, T., Vainik, E., Võhandu, L.: A method for comparing self-organizing maps: case
studies of banking and linguistic data. In: Eleventh East-European Conference on Ad-
vances in Databases and Information Systems ADBIS, pp. 107–115. Technical Univer-
sity of Varna, Varna (2007)
51. Koller-Meier, E., Ade, F.: Tracking multiple objects using the Condensation algorithm.
Robotics and Autonomous Systems 34(2-3), 93–105 (2001)
52. Kong, D., Gray, D., Tao, H.: Counting Pedestrians in Crowds Using Viewpoint Invariant
Training. In: British Machine Vision Conference (2005)
53. Kong, D., Gray, D., Tao, H.: A viewpoint invariant approach for crowd counting. In:
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006),
vol. 03, pp. 1187–1190 (2006)
54. Kretz, T., Schreckenberg, M.: F.a.s.t. - floor field- and agent-based simulation tool
(2006)
55. Lefebvre, G., Laurent, C., Ros, J., Garcia, C.: Supervised Image Classification by SOM
Activity Map Comparison. In: Proceedings of the 18th International Conference on
Pattern Recognition (ICPR 2006), vol. 02, pp. 728–731 (2006)
56. Legion: http://www.legion.biz/about/index.html
57. Leibe, B., Seemann, E., Schiele, B.: Pedestrian detection in crowded scenes. In: IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, 2005.
CVPR 2005, vol. 1 (2005)
470 B. Zhan et al.
58. Li, S.Z., Zhu, L., Zhang, Z., Blake, A., Zhang, H., Shum, H.: Statistical learning of
multi-view face detection. In: Proceedings of the 7th European Conference on Com-
puter Vision-Part IV, pp. 67–81. Springer, Heidelberg (2002)
59. Lin, S., Chen, J., Chao, H.: Estimation of number of people in crowded scenes using
perspective transformation. IEEE Transactions on Systems, Man and Cybernetics, Part
A 31(6), 645–654 (2001)
60. Lucas, B., Kanade, T.: An iterative image registration technique with an application
to stereo vision. International Joint Conference on Artificial Intelligence 81, 674–679
(1981)
61. Ma, R., Li, L., Huang, W., Tian, Q.: On pixel count based crowd density estimation for
visual surveillance. In: IEEE Conference on Cybernetics and Intelligent Systems, vol. 1
(2004)
62. Marana, A., da Costa, L., Lotufo, R., Velastin, S.: On the Efficacy of Texture Analysis
for Crowd Monitoring. In: Proceedings of the International Symposium on Computer
Graphics, Image Processing, and Vision, vol. 00, p. 354 (1998)
63. Marana, A., Da Fontoura Costa, L., Lotufo, R., Velastin, S.: Estimating crowd density
with Minkowski fractal dimension. In: IEEE International Conference on Acoustics,
Speech, and Signal Processing, 1999. ICASSP 1999. Proceedings, vol. 6, pp. 3521–
3524 (1999)
64. Marana, A., Velastin, S., Costa, L., Lotufo, R.: Estimation of crowd density using image
processing. In: IEE Colloquium on Image Processing for Security Applications (Digest
No: 1997/074), vol. 11 (1997)
65. Marana, A., Velastin, S., Costa, L., Lotufo, R.: Automatic estimation of crowd density
using texture. Safety Science 28(3), 165–175 (1998)
66. Marques, J., Jorge, P., Abrantes, A., Lemos, J.: Tracking Groups of Pedestrians in Video
Sequences. In: IEEE 2003 Conference on Computer Vision and Pattern Recognition
Workshop, vol. 9, p. 101 (2003)
67. Mathes, T., Piater, J.: Robust non-rigid object tracking using point distribution models.
In: Proc. of British Machine Vision Conference (BMVC), vol. 2 (2005)
68. Maurin, B., Masoud, O., Papanikolopoulos, N.: Monitoring crowded traffic scenes. In:
The IEEE 5th International Conference on Intelligent Transportation Systems, 2002.
Proceedings, pp. 19–24 (2002)
69. McKenna, S., Jabri, S., Duric, Z., Rosenfeld, A., Wechsler, H.: Tracking groups of
people. Computer Vision and Image Understanding 80(1), 42–56 (2000)
70. Mittal, A., Davis, L.: M 2 Tracker: A Multi-View Approach to Segmenting and Tracking
People in a Cluttered Scene. International Journal of Computer Vision 51(3), 189–203
(2003)
71. Mokhtarian, F., Abbasi, S., Kittler, J.: Robust and efficient shape indexing through cur-
vature scale space. In: Proc. British Machine Vision Conference, vol. 62 (1996)
72. Musse, S., Thalmann, D.: A Model of Human Crowd Behavior: Group Inter-
Relationship and Collision Detection Analysis. In: Proc. Workshop of Computer Ani-
mation and Simulation of Eurographics, vol. 97, pp. 39–51 (1997)
73. Okuma, K., Taleghani, A., de Freitas, N., Little, J., Lowe, D.: A boosted particle filter:
Multitarget detection and tracking. European Conference on Computer Vision 1, 28–39
(2004)
74. Pan, X., Han, C., Dauber, K., Law, K.: Human and social behavior in computational
modeling and analysis of egress. Automation in Construction 15(4), 448–461 (2006)
75. PRISMATICA: http://prismatica.king.ac.uk/
76. Rahmalan, H., Nixon, M., Carter, J.: On Crowd Density Estimation for Surveillance.
In: The Institution of Engineering and Technology Conference on Crime and Security,
pp. 540–545 (2006)
The Analysis of Crowd Dynamics: From Observations to Modelling 471
77. Rasmussen, C., Hager, G.: Joint probabilistic techniques for tracking multi-part objects.
In: 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recog-
nition, 1998. Proceedings, pp. 16–21 (1998)
78. Reid, D.: An algorithm for tracking multiple targets. IEEE Transactions on Automatic
Control 24(6), 843–854 (1979)
79. Reisman, P., Mano, O., Avidan, S., Shashua, A., Ltd, M., Jerusalem, I.: Crowd detection
in video sequences. In: IEEE 2004 Intelligent Vehicles Symposium, pp. 66–71 (2004)
80. Sidenbladh, H., Wirkander, S.: Tracking random sets of vehicles in terrain. In: Proc.
2003 IEEE Workshop on Multi-Object Tracking, vol. 9, p. 98 (2003)
81. Smith, K., Gatica-Perez, D., Odobez, J.: Using particles to track varying numbers of
interacting people. In: Proceedings of the 2005 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 962–969 (2005)
82. Sonka, M., Hlavac, V., Boyle, R.: Image Processing, Analysis, and Machine Vision.
Tech. rep. (1998) ISBN 0-534-95393-X
83. Stauffer, C., Grimson, W.: Adaptive background mixture models for real-time tracking.
In: 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recog-
nition (1999)
84. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge
(1998)
85. Swets, D., Punch, B.: Genetic algorithms for object localization in a complex scene. In:
IEEE International Conference on Image Processing, pp. 595–598 (1995)
86. Stanford University: CIFE Seed Project 2004-2005, 2005-2006, http://eil.
stanford.edu/egress/
87. Velastin, S., Yin, J., Davies, A., Vicencio-Silva, M., Allsop, R., Penn, A.: Automated
measurement of crowd density and motion using imageprocessing. In: Seventh Interna-
tional Conference on Road Traffic Monitoring and Control, 1994, pp. 127–132 (1994)
88. Venegas, S., Knebel, S., Thiran, J.: Multi-object tracking using particle fil-
ter algorithm on the top-view plan. Technical report, LTS-REPORT-2004-003,
EPFL (2004), http://infoscience.epfl.ch/getfile.py?mode=best&
recid=87041
89. van Vliet, L., Young, I., beek, P.: Recursive gaussian derivative filters. In: Proc. l4th
International Conference on Pattern Recognition (ICPR 1998), vol. 1, pp. 509–514.
IEEE Computer Society Press, Los Alamitos (1998), http://citeseer.comp.
nus.edu.sg/565386.html
90. Vu, V., Bremond, F., Thonnat, M.: Human Behaviour Visualisation and Simulation for
Automatic Video Understanding. In: Proc. of the 10th Int. Conf. in Central Europe on
Computer Graphics, Visualization and Computer Vision (WSCG 2002), Plzen–Bory,
Czech Republic, pp. 485–492 (2002)
91. Wu, B., Nevatia, R.: Detection of Multiple, Partially Occluded Humans in a Single Im-
age by Bayesian Combination of Edgelet Part Detectors. In: Tenth IEEE International
Conference on Computer Vision, 2005. ICCV 2005, vol. 1, pp. 90–97 (2005)
92. Wu, B., Nevatia, R.: Tracking of multiple, partially occluded humans based on static
body part detection. In: CVPR 2006: Proceedings of the 2006 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 951–958 (2006)
93. Yin, J., Velastin, S., Davies, A.: Image Processing Techniques for Crowd Density Esti-
mation Using a Reference Image. Proc. 2nd Asia-Pacific Conf. Comput. Vision 3, 6–10
(1995)
94. Zhan, B., Remagnino, P., Velastin, S.: Analysing Crowd Intelligence. In: Second AIxIA
Workshop on Ambient Intelligence (2005)
472 B. Zhan et al.
95. Zhan, B., Remagnino, P., Velastin, S.: Mining paths of complex crowd scenes. In: Ad-
vances in Visual Computing: First International Symposium, pp. 126–133 (2005)
96. Zhan, B., Remagnino, P., Velastin, S.: Mining paths of complex crowd scenes. Lecture
notes in computer science pp. 126–133 (2005), ISBN/ISSN 3-540-30750-8
97. Zhan, B., Remagnino, P., Velastin, S.: Visual analysis of crowded pedestrain scenes. In:
XLIII Congresso Annuale AICA, pp. 549–555 (2005)
98. Zhan, B., Remagnino, P., Velastin, S., Bremond, F., Thonnat, M.: Matching gradient
descriptors with topological constraints to characterise the crowd dynamics. In: IET
International Conference on Visual Information Engineering, 2006. VIE 2006, pp. 441–
446 (2006), ISSN: 0537-9989, ISBN: 978-0-86341-671-2
99. Zhan, B., Remagnino, P., Velastin, S., Monekosso, N., Xu, L.: A Quantitative Compar-
ison of Two New Motion Estimation Algorithms. In: Bebis, G., Boyle, R., Parvin, B.,
Koracin, D., Paragios, N., Tanveer, S.-M., Ju, T., Liu, Z., Coquillart, S., Cruz-Neira, C.,
Müller, T., Malzbender, T. (eds.) ISVC 2007, Part I. LNCS, vol. 4841, pp. 424–431.
Springer, Heidelberg (2007)
100. Zhan, B., Remagnino, P., Velastin, S.A., Monekosso, N., Xu, L.Q.: Motion estima-
tion with edge continuity constraint for crowd scene analysis. In: Bebis, G., Boyle, R.,
Parvin, B., Koracin, D., Remagnino, P., Nefian, A., Meenakshisundaram, G., Pascucci,
V., Zara, J., Molineros, J., Theisel, H., Malzbender, T. (eds.) ISVC 2006, Part II. LNCS,
vol. 4292, pp. 861–869. Springer, Heidelberg (2006)
101. Zhao, T., Nevatia, R.: Tracking multiple humans in complex situations. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence 26(9), 1208–1221 (2004)
102. Zhao, T., Nevatia, R.: Tracking multiple humans in crowded environment. In: Proceed-
ings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, vol. 2, pp. II–406–II–413 (2004)
Part VI
Communications for CI Systems
Computational Intelligence for the Collaborative
Identification of Distributed Systems
1 Introduction
In the last few years collaborative signal processing with distributed sources of data,
signals, images and natural phenomena has been gaining importance.
Giorgio Biagetti, Paolo Crippa, Francesco Gianfelici, and Claudio Turchetti
DIBET – Dipartimento di Ingegneria Biomedica, Elettronica e Telecomunicazioni, Università
Politecnica delle Marche, Via Brecce Bianche 12, I-60131 Ancona, Italy,
e-mail: [email protected], [email protected],
[email protected], [email protected]
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 475–500.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
476 G. Biagetti et al.
used to guarantee that every ordered pair of nodes can communicate in λ time slots.
Baek et al. established in [2] that if the sensed data is bursty in space and time, then
one can reap substantial benefits from aggregation and balancing. Georgiadis et al.
determined in [20] that starting from a characterization of the optimal superflow
amounts to obtaining a structural decomposition of the network in a sequence of
disjoint subregions with decreasing overload, such that traffic flows only from re-
gions of higher overload to regions of lower overload, it is possible to state that the
optimal superflow represents the smoothest trajectory to overflow, followed by the
network in case of instability. Toumpis et al. developed in [36] effective equations
for the design of large wireless sensor networks that can be deployed in the most
efficient manner, not only avoiding the formation of bottlenecks, but also striking
the optimal balance between reducing congestion and having the data packets fol-
low short routes. Dousse et al. determined in [12] that communications occurring
at a fixed nonzero rate imply a fraction of the nodes to be disconnected. Gastpar
et al. established in [18] that if all nodes act purely as relays for a single source-
destination pair, then capacity grows with the logarithm of the number of nodes.
Haenggi showed in [22] that the density function of the distance to the n-nearest
neighbor of a homogeneous process in RM is governed by a generalized gamma dis-
tribution. Barros et al. determined in [3] that the information as flow view provides
an algorithmic interpretation for several results, among which perhaps the most im-
portant one is the optimality of implementing codes using a layered protocol stack.
Chamberland et al. showed in [6] that on the basis of the Gartner-Ellis theorem and
similar large-deviation theory results, it is possible to establish that performance im-
proves monotonically with sensor density, whilst a finite sensor density is defined to
be optimal in the stochastic signal case. Xiao et al. proposed in [41] a new decentral-
ized estimation scheme that is universal in the sense that each sensor compression
scheme requires only the knowledge of local SNR, rather than the noise probabil-
ity distribution functions (pdf), while the final fusion step is also independent of
the local noise pdfs. Franceschetti et al. determined in [15] that, on the basis of the
Chen-Stein method of Poisson approximation, it is possible to show derivations and
generalizations that are able to improve upon and simplify previous results that ap-
peared in the literature. Xue et al. showed in [42]: (i) the exact threshold function for
μ-coverage for wireless networks modeled as points uniformly distributed in a unit
square, with every node connecting to its nearest neighbors, and (ii) that the network
will be connected with probability approaching one. Yang et al. established in [43]
that cooperative SNs with a mobile access point and no energy constraints but pos-
sibly misinformed nodes have the same capacity as with no misinformed sensors, if
polling can be performed. Luo presented in [29] that: (i) the sensors are necessary
and sufficient to jointly estimate the unknown parameter within a fixed root mean
square error, and (ii) the optimal decentralized estimation scheme suggests allocat-
ing sensors to estimate the i-th bit. Liu et al. introduced in [28] an iterative algorithm
to construct distributed quantizers that are person-by-person optimal. Tay et al. de-
rived in [35] an asymptotically optimal strategy for the case where sensor decisions
are only allowed to depend on locally available information, and global sharing of
478 G. Biagetti et al.
side information does not improve asymptotic performance, when the “Type I” error
is constrained to be small.
T
y [y (0) · · · y (Ls − 1)] , = 1, . . . , S (5)
is the discrete-space discrete-time representation of the scalar field y (t, p). This
representation holds for every system that possesses the properties of uniformity
and causality.
It is well-known that if Y is in a Hilbert space then y, defined by (6) and with
its realizations y ∈ RL×1 , can be represented by the Discrete Karhunen-Loève
Transform (DKLT) [11], also called canonical representation. The DKLT and its
inverse can be written in matrix form as
y = Φ k(x)
(7)
k(x) = ΦT y
Ryy Φ = ΦΓ (8)
2 1
0.5
0
K (a)
K (a)
0
1
2
−2
−0.5
−4 −1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
a a
1 0.5
0.5
K3(a)
K (a)
0 0
4
−0.5
−1 −0.5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
a a
0.2 0.04
0.1 0.02
K5(a)
K6(a)
0 0
−0.1 −0.02
−0.2 −0.04
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
a a
Fig. 1 The curves Cyj (x) described by the components kj (x), for j = 1, . . . , 6 of a Duffing
ordinary differential equation
With the above considerations in mind, it can be stated that once the structure of
the functional G[x, W] has been defined, the identification of the nonlinear system
is equivalent to the estimation of the matrix W from an ensemble of the system’s
input-output pairs.
In order to derive the identification algorithm, it is necessary to relate the stochas-
tic properties of the system (that allowed the development of the general theory)
to the available ensemble of realizations. Let us then refer to these N realizations
of x as x(i) ∈ RMx ×1 , with i = 1, . . . , N , and to the corresponding realizations
of y as y(i) ∈ RL×1 , with i = 1, . . . , N . Both can be put in matrix form as
X = [x(1) x(2) · · · x(N ) ] and Y = [y(1) y(2) · · · y(N ) ], where X ∈ RMx ×N and
Y ∈ RL×N . A currently used estimation R̂ of the autocorrelation matrix is given by
1
R ≈ R̂ = YYT (11)
N
where R ∈ RL×L .
CI for the Collaborative Identification of Distributed Systems 483
R̂ U = U Λ (12)
y ≈ U G[x, W] (16)
K ≈ WG . (19)
In general, being G[x, W] a nonlinear mapping, learning the proper weight matrix
W is usually computationally very expensive. In this chapter we present an identi-
fication algorithm that is defined by means of an approximating mapping based on
neural networks.
One of the main features of neural networks, allowing them to collect informa-
tion from the environment, is their ability to learn by experience. Learning is a
484 G. Biagetti et al.
crucial activity for an intelligent system, whether artificial or biological, that aims
at modeling the real world it interacts with. From a mathematical point of view,
learning is related to the ability of the neural networks to approximate some classes
of input-output functions. Several works have demonstrated that MLPs [9, 16, 26],
RBF networks [31] and AINNs [7, 37] possess this property with reference to some
classes of functions. These results show that neural networks of these kinds are
capable of approximating arbitrarily well any function belonging to a certain class,
the degree of accuracy depending on the learning algorithm as well as on the number
of neurons available. Therefore, in the last years neural networks have gained much
popularity in the modeling of natural and artificial phenomena, in the identification
and control of dynamical systems as well as in the approximation of deterministic or
random input–output functions representing unknown systems and/or their control
laws [4, 34, 14, 25, 40, 24, 39, 33, 38].
In the following we will give a brief introduction to the RBF neural networks that
have been considered as the best choice for our identifier. A radial basis function is
a real-valued function γ (x) whose value depends only on the distance from a some
point xC , called a center, so that γ (x, xC ) = γ (x − xC ). The norm is usually the
Euclidean distance. Radial basis functions are typically used to build up function
approximations of the form
Mn
f (x) = wi γ (x − xCi ) (20)
i=1
Mn
Mx
2
[G[x, W]]j = [ωj ]l exp −[χj ]l ([x]m − [Ξj ]l,m ) , j = 1, . . . , V
l=1 m=1
(21)
where Mn is the number of neurons in the RBF network, Mx is the dimension of
vector x, and ωj ∈ RMn ×1 , χj ∈ RMn ×1 , and Ξj = [ξj1 ξj2 · · · ξjMx ] ∈ RMn ×Mx
with ξjm ∈ RMn ×1 , for m = 1, . . . , Mx , are vectors or matrices of weights within
W = [w1 w2 · · · wV ]T ∈ RV ×M , with M = Mn (Mx + 2), defined so that wj =
[ωjT χT 1 T
j (ξj ) (ξj2 )T · · · (ξjMx )T ]T . Despite its complexity the neural network-
based approximations allow for great flexibility in the choice of the number of free
parameters and scale gracefully when Mx increases, thus posing themselves as an
interesting option in many circumstances.
with Φii ∈ RLS ×Vi , i = 1, . . . , S. As it is clearly stated by (22) the generic sensor
transmits the subvector k (or an approximation of it) so that the output vector y
cannot be reconstructed with a negligible error. This means that the identification
approach previously discussed cannot be applied directly to this case due to the lack
of a complete knowledge of the output y. It is easy to verify that the marginal KLT
will lead to a suboptimal solution to this problem. In general we can search for a
solution of the kind
⎡ ⎤ ⎡ ⎤T ⎡ y ⎤
h1 (x) Ψ1 0 · · · 0 1
⎢ h2 (x) ⎥ ⎢ ⎢ y2 ⎥
⎢ ⎥ ⎢ 0 Ψ2 · · · 0 ⎥ ⎥ ⎢ . ⎥
⎢
⎢ .. ⎥ = ⎣ ⎦ ⎣ . ⎥ (23)
⎣ . ⎦ · · · · · · · · · · · · . ⎦
hS (x) 0 · · · · · · ΨS yS
h(x) = ΨT y (24)
$
with Ψ = diag [Ψ1 , . . . , ΨS ] ∈ RL×V and h ∈ RV ×1 , V = Si=1 Vi . In this case
the accuracy of the distributed identification is related both to the approximation
of
the mapping k (x) and to the minimization of the error E y − ŷ2 between the
real system output y and its estimation ŷ, based on the sensors’ observations, and
given by
% &−1
ŷ = RΨ ΨT RΨ h(x) . (25)
However in this case, to the best knowledge of authors, and as it is also pointed out
in [17], it is not known a closed-form solution to this problem.
The algorithm developed by Gastpar et al., also known as distributed Karhunen-
Loève transform is an iterative procedure that aims at finding the matrix Ψ that
achieves the MSE best estimate of ŷ in (25).
h1 (x) = Ψ1 T y1 . (31)
4.2 S Sensors
Let us now consider the general case of S sensors, but in which all the variables
y1 , . . . , yS are known except for the j-th variable yj . Thus assuming the represen-
tation of y1 , . . . , yj−1 , yj+1 , . . . , yS given by the S − 1 sensors to be fixed, we
would determine the representation of yj such that E y − ŷ2 is minimum, ŷ
being as usual the approximation of y.
The approximation provided by the sensors with the fixed representation can be
expressed as ki (x) = ΨT i yi +zi with i = 1, . . . , S and i = j where zi are Gaussian
random variables of zero mean and covariance matrix Rzi ∈ RVi ×Vi , (zi ∈ RVi ).
Thus the covariance matrix R ∈ RL×L of y can be written as
⎡ ⎤
R11 R12 · · · R1S
⎢ R21 R22 · · · R2S ⎥
⎢ ⎥
R=⎢ . .. . . . ⎥ (32)
⎣ .. . . .. ⎦
RS1 RS2 · · · RSS
where Rik = E yi yT k , i, k = 1, . . . , S and Rik ∈ R
LS ×LS
. Now let Ξj ∈
LS (S−1)×V −Vj +LS
R be the matrix defined by
⎡ ⎤
R11 Ψ1 · · · R1j−1 Ψj−1 R1j R1j+1 Ψj+1 · · · R1S ΨS
⎢ R21 Ψ1 · · · R2j−1 Ψj−1 R2j R2j+1 Ψj+1 · · · R2S ΨS ⎥
⎢ ⎥
⎢ .. .. . . .. .. .. ⎥
⎢ . . .. .. . . . ⎥
⎢ ⎥
Ξj = ⎢ R
⎢ j−11 1 Ψ · · · Rj−1j−1 j−1Ψ R j−1 j R Ψ
j−1j+1 j+1 · · · R Ψ ⎥
j−1S S ⎥ ·
⎢ Rj+11 Ψ1 · · · Rj+1j−1 Ψj−1 Rj+1 j Rj+1j+1 Ψj+1 · · · Rj+1S ΨS ⎥
⎢ ⎥
⎢ .. .. .. .. .. .. .. ⎥
⎣ . . . . . . . ⎦
RS1 Ψ1 · · · RSj−1 Ψj−1 RSj RSj+1 Ψj+1 · · · RSS ΨS
488 G. Biagetti et al.
⎡ ⎤−1
Q11 + Rz1 Q12 ··· Q1j−1 ΨT
1 R1j Q1j+1 ··· Q1S
⎢ Q21 Q22 + Rz2 ··· Q2j−1 ΨT
2 R2j Q2j+1 ··· Q2S ⎥
⎢ ⎥
⎢ .. .. .. .. .. .. .. .. ⎥
⎢ . . . . . . . . ⎥
⎢ ⎥
⎢ Qj−11 Qj−12 · · · Qj−1j−1 + Rzj−1 ΨT
j−1 Rj−1j Qj−1j+1 · · · Qj−1S ⎥
⎢ ⎥
⎢ Rj1 Ψ1 Rj2 Ψ2 ··· Rjj−1 Ψj−1 Rjj Rjj+1 Ψj+1 · · · RjS ΨS ⎥
⎢ ⎥
⎢ Qj+11 Qj+12 ··· Qj+1j−1 ΨT
j+1 Rj+1j Qj+1j+1 + Rzj+1 · · · Qj+1S ⎥
⎢ ⎥
⎢ . .. .. .. ⎥
⎣ .
. . . . ... ..
.
..
.
..
. ⎦
T
QS1 QS2 ··· QSj−1 ΨS RSj QSj+1 · · · QSS + RzS
(33)
where
Qik = ΨT
i Rik Ψk , i, k = 1, . . . , S , i, k = j (34)
and Qik ∈ RVi ×Vk . Let the matrix Ξ∗j ∈ RLS (S−1)×LS consist of the columns of
$j−1 $j−1
Ξj from i=1 Vi + 1 to i=1 Vi + LS . Then Ξ∗j can be cast as
' (
Ξ∗
Ξ∗j = j
(35)
Ξ∗
j
with Ξ∗
j ∈ R
LS (S−1)(j−1)×LS
and Ξ∗
j ∈ R
LS (S−1)(S−j)×LS
, j = 1, . . . , S. Let
LS ×LS
IL S ∈ R be the identity matrix, thus, we can obtain a new matrix
⎡ ∗ ⎤
Ξj % & ∗ T
Rw,j = ⎣ ILS ⎦ Rjj − Cj H−1 j C T
j Ξj I LS Ξ ∗ T
j (36)
Ξ∗
j
where
Cj = Rj1 Ψ1 Rj2 Ψ2 · · · Rjj−1 Ψj−1 Rjj+1 Ψj+1 · · · RjS ΨS (37)
and
⎡Q + Rz1 Q12 ··· Q1j−1 Q1j+1 ··· Q1S
⎤
11
⎢ Q21 Q22 + Rz2 ··· Q2j−1 Q2j+1 ··· Q2S ⎥
⎢ .. .. .. .. .. .. ⎥
⎢ .. ⎥
⎢ . . . . . . . ⎥
Hj = ⎢
⎢ Qj−11 Qj−12 · · · Qj−1j−1 + Rzj−1 Qj−1j+1 · · · Qj−1S ⎥ ⎥
⎢ Qj+11 Qj+12 ··· Qj+1j−1 Qj+1j+1 + Rzj+1 ··· Qj+1S ⎥
⎢ ⎥
⎣ .. .. .. .. .. .. .. ⎦
. . . . . . .
QS1 QS2 ··· QSj−1 QSj+1 · · · QSS + RzS
(38)
with Rw,j ∈ RL×L , Cj ∈ RLS ×V −Vj , and Hj ∈ RV −Vj ×V −Vj . The matrix Rw,j
can be reduced to the diagonal form
Rw,j = Dw,j Λj DT
w,j (39)
where
CI for the Collaborative Identification of Distributed Systems 489
(1) (N )
Λj = diag λw,j , . . . , λw,j (40)
(1) (2) (N )
with λw,j ≥ λw,j ≥ . . . ≥ λw,j , and with Dw,j ∈ RL×N and Λj ∈ RN ×N , thus
we can obtain the representation for the j-th sensor
⎡ ⎤
T Ξ∗j
(V ) ⎣ IL S ⎦
ΨTj = Dw,j
j
(41)
Ξ∗
j
(V )
where Dw,jj ∈ RL×LS denotes the matrix consisting of the first Vj columns of the
matrix Dw,j . In this way by truncating the representation to the first Vj eigenvectors
(V )
of Dw,jj we obtain the parameters
hj (x) = ΨT
j yj . (42)
5 Experimental Results
0.8
0.6
4
0.4
0.2
p’’ [m]
0 1
−0.2
3
−0.4
−0.6 2
−0.8
−1.5 −1 −0.5 0 0.5 1 1.5
p’ [m]
Fig. 2 Domain of the equation used in the first example and mesh used for its solution.
Highlighted are the positions in which the sensors were placed
labeled 1–4, on randomly chosen knots and selected the best rate allocation for a
fixed rate V = 10, which resulted to be V1 = 1, V2 = 2, V3 = 5, V4 = 2 i.e. the
sensor 1 has one output, the sensor 2 has two outputs, the sensor 3 has five outputs
and finally the sensor 4 has two outputs.
The results are shown in Fig. 3, that displays the outputs from the sensors’ net-
work (dots) and from the trained neural network (solid line) used to approximate
their dependence on the input parameter. In the subfigures (a)–(h) all the ten out-
puts of the four sensors are reported. As you can see the perfect match between the
sensor outputs and the curve fitting performed by the trained neural network demon-
strates the very good performance of the RBF neural network based identification
algorithm.
The results of the overall identification process for the system generated by the
PDE (45) and the comparison between the distributed and the marginal KLT-based
techniques are reported in Fig. 4. Here the inputs to the sensor network (red solid
line), the estimated system output (blue dashed line) along with the approximation
that would have been achieved by a simple marginal KLT-based (dotted black line)
encoding of the system output at the same rate have been displayed. It can be easily
seen that a huge improvement of this methodology over the marginal KLT can be
achieved for the same rate V .
sensor output
sensor output
0.5
0 0
−0.5
−1
−1.5
−2 −5
1 1.5 2 2.5 3 3.5 4 1 1.5 2 2.5 3 3.5 4
input parameter input parameter
(a) (b)
sensor 2 output 2 sensor 3 output 1
4 6
interpolated trajectory interpolated trajectory
3 train points 4 train points
test points test points
2 2
sensor output
sensor output
1 0
0 −2
−1 −4
−2 −6
−3 −8
1 1.5 2 2.5 3 3.5 4 1 1.5 2 2.5 3 3.5 4
input parameter input parameter
(c) (d)
sensor 3 output 2 sensor 3 output 3
6 8
interpolated trajectory interpolated trajectory
4 train points 6 train points
test points test points
4
2
sensor output
sensor output
2
0
0
−2
−2
−4
−4
−6 −6
−8 −8
1 1.5 2 2.5 3 3.5 4 1 1.5 2 2.5 3 3.5 4
input parameter input parameter
(e) (f)
sensor 3 output 4 sensor 3 output 5
8 10
interpolated trajectory interpolated trajectory
6 train points train points
test points test points
4 5
sensor output
sensor output
0 0
−2
−4 −5
−6
−8 −10
1 1.5 2 2.5 3 3.5 4 1 1.5 2 2.5 3 3.5 4
input parameter input parameter
(g) (h)
sensor 4 output 1 sensor 4 output 2
8 8
interpolated trajectory interpolated trajectory
6 train points 6 train points
test points test points
4 4
sensor output
sensor output
2 2
0 0
−2 −2
−4 −4
−6 −6
−8 −8
1 1.5 2 2.5 3 3.5 4 1 1.5 2 2.5 3 3.5 4
input parameter input parameter
(i) (j)
Fig. 3 Outputs from the sensor networks (dots) and from the trained neural network (solid
line) used to approximate their dependence on the input parameter. (a) is the only output of
sensor 1, (b)–(c) are the two outputs of sensor 2, (d)–(h) the five outputs of sensors 3, and
(i)–(j) the two of sensor 4
CI for the Collaborative Identification of Distributed Systems 493
−0.5
−1
0 1 2 3 4 5 6 7 8 9 10
time [s]
(a)
node 2 output @ input parameter=2.4579
1.5
system output
predicted output
1 marginal prediction
0.5
amplitude
−0.5
−1
−1.5
0 1 2 3 4 5 6 7 8 9 10
time [s]
(b)
node 3 output @ input parameter=2.4579
1.5
system output
predicted output
marginal prediction
1
amplitude
0.5
−0.5
0 1 2 3 4 5 6 7 8 9 10
time [s]
(c)
node 4 output @ input parameter=2.4579
0.7
system output
0.6 predicted output
marginal prediction
0.5
amplitude
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10
time [s]
(d)
Fig. 4 Inputs to the sensor network (red solid line) and estimated system output (blue dashed
line). The dotted black line represents the approximation that would have been achieved by a
simple marginal KLT-based encoding of the system output at the same rate
494 G. Biagetti et al.
0.5
1 4
3
p’’ [m]
−0.5
−1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
p’ [m]
Fig. 5 Domain of the equation used in the second example and mesh used for its solution.
Highlighted are the positions in which the sensors were placed
∂2y
d − ∇ · (c∇y) + ay = f (46)
∂t2
where y is the unknown, d is a complex valued function on a bounded domain D in
the plane, and c, a, f are coefficients that can depend on time t, d can depend on
time t as well. The well-known wave equation is a special case of this.
We solved the following PDE,
∂2y
− c ∇2 y = 0 (47)
∂t2
corresponding to (46) with parameters d, c, a,√and f , chosen so as to model sound
propagation in a U-shaped air duct at a speed c = 343 m/s, on the mesh shown in
Fig. 5, where the equation domain D is also reported. The boundary conditions were
set to simulate total reflection at all the surfaces, except at the top left segment that
sourced into the domain a sine wave with fixed amplitude and a random frequency
ω/2π comprised between 0.5 kHz and 2 kHz. The simulated time-span and the sen-
sor locations were chosen to highlight different phenomena that may occur in wave
propagation: sensor 1 is most sensible to the excitation itself and is scarcely influ-
enced by reflections, sensors 2 and 3 are subject to multipath-induced interferences
near the bend, and sensor 4 to standing waves.
CI for the Collaborative Identification of Distributed Systems 495
−10
−20
−30
relatve MSE [dB]
−40
−50
V=24: 6 6 6 6
−60 V=48: 20 16 8 4
V=48: 12 12 12 12
V=48: 4 8 16 20
−70
V=80: 20 20 20 20
−80
−90
0 1 2 3 4 5
iteration number
Fig. 6 Relative mean square error of the reconstructed signals plotted versus the number of
iterations of the DKLT algorithm, for various rates and rate allocation. The four numbers
given in the legend are V1 , V2 , V3 , and V4 , respectively
We conducted various experiments with different rate allocations, with the results
being reported in Fig. 6. As it can be easily seen, the rate allocation plays a some-
what important role. For the three cases with a total rate of 48, it is slightly better to
allocate more outputs to sensor 4 than to sensor 1, as is reasonable, for it senses the
most “complex” signal of the four.
The results for V1 = V2 = V3 = V4 = 12 are shown in Fig. 7, that displays the
outputs from the sensors network (dots) and from the trained neural network (solid
line) used to approximate their dependence on the input parameter. In the subfigures
(a)–(h) the first two outputs of the four sensors are reported. As you can see also
in this case the match between the sensor outputs and the curve fitting performed
by the trained neural network demonstrates the very good performance of the RBF
neural network identification algorithm.
The results of the overall identification process for the system generated by the
PDE (47) and the comparison results between our distributed technique and the
marginal KLT-based technique are reported in Fig. 8. The inputs to the sensor
496 G. Biagetti et al.
6
4
4
2
sensor output
sensor output
2
0
0
−2
−2
−4
−4
−6 −6
−8 −8
0.5 1 1.5 2 0.5 1 1.5 2
input parameter input parameter
(a) (b)
sensor 2 output 1 sensor 2 output 2
25 50
interpolated trajectory interpolated trajectory
20 train points train points
40
test points test points
15
30
10
20
sensor output
sensor output
5
10
0
0
−5
−10
−10
−15 −20
−20 −30
0.5 1 1.5 2 0.5 1 1.5 2
input parameter input parameter
(c) (d)
sensor 3 output 1 sensor 3 output 2
40 30
interpolated trajectory interpolated trajectory
train points train points
30 test points test points
20
20
10
sensor output
sensor output
10
−10
−10
−20
−20
−30 −30
0.5 1 1.5 2 0.5 1 1.5 2
input parameter input parameter
(e) (f)
sensor 4 output 1 sensor 4 output 2
50 40
interpolated trajectory interpolated trajectory
train points train points
40 30
test points test points
30 20
20 10
sensor output
sensor output
10 0
0 −10
−10 −20
−20 −30
−30 −40
0.5 1 1.5 2 0.5 1 1.5 2
input parameter input parameter
(g) (h)
Fig. 7 Outputs from the sensor networks (dots) and from the trained neural network (solid
line) used to approximate their dependence on the input parameter. (a), (c), (e), and (g) are
the most significant outputs of sensors 1–4, respectively, while (b), (d), (f), and (h) are the
second significant outputs of the same sensors
CI for the Collaborative Identification of Distributed Systems 497
0.5
0
amplitude
−0.5
−1
−1.5
−2
−2.5
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
time [s]
(a)
node 2 output @ input parameter=0.72562
2
system output
1.5 predicted output
marginal prediction
0.5
amplitude
−0.5
−1
−1.5
−2
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
time [s]
(b)
node 3 output @ input parameter=0.72562
2
system output
1.5 predicted output
marginal prediction
0.5
amplitude
−0.5
−1
−1.5
−2
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
time [s]
(c)
node 4 output @ input parameter=0.72562
2
system output
1.5 predicted output
marginal prediction
1
0.5
amplitude
−0.5
−1
−1.5
−2
−2.5
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
time [s]
(d)
Fig. 8 Inputs to the sensor network (red solid line) and estimated system output (blue dashed
line). The dotted black line represents the approximation that would have been achieved by a
simple marginal KLT-based encoding of the system output at the same rate
498 G. Biagetti et al.
network (red solid line), the estimated system output (blue dashed line), and the
approximation that would have been achieved by marginal KLT-based (dotted black
line) encoding have been displayed. Also in this case a huge improvement of this
methodology over the marginal KLT can be recognized for the same rate V .
6 Conclusions
In this chapter an innovative framework for the collaborative identification of dis-
tributed systems has been presented. This approach is based on a centralized intel-
ligent identifier that makes the best identification in a distributed setting on a cho-
sen ensemble of realizations and with no constraints in terms of model kind and/or
model order. Methodologically, we defined a stochastic setting where the system
to be identified generates nondeterministic signals, i.e., stochastic processes, from
given initial conditions and random parameters of input signals. In this way, the set
of input-output pairs so obtained shows complex but identifiable geometrical rela-
tionships in the output space of the sensor network. As a subsequent step we defined
a computational intelligence technique for approximating the previously mentioned
mappings that is able to globally identify the distributed system. The global opti-
mization of the identification performance was performed in a collaborative setting,
exploiting and developing the cooperation mechanisms that underpin other related
methodologies such as the distributed KLT. The effectiveness of the proposed col-
laborative algorithm has been demonstrated in the identification of two distributed
systems whose behavior is described as the solution of a partial differential equation.
References
1. Agre, J., Clare, L.: An integrated architecture for cooperative sensing networks. IEEE
Computer 33(5), 106–108 (2000)
2. Baek, S.J., de Veciana, G.: Spatial model for energy burden balancing and data fu-
sion in sensor networks detecting bursty events. IEEE Transactions on Information The-
ory 53(10), 3615–3628 (2007)
3. Barros, J., Servetto, S.D.: Network information flow with correlated sources. IEEE
Transactions on Information Theory 52(1), 155–170 (2006)
4. Belli, M.R., Conti, M., Crippa, P., Turchetti, C.: Artificial neural networks as approxi-
mators of stochastic processes. Neural Networks 12(4-5), 647–658 (1999)
5. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Ox-
ford (1995)
6. Chamberland, J.F., Veeravalli, V.V.: How dense should a sensor network be for detection
with correlated observations? IEEE Transactions on Information Theory 52(11), 5099–
5106 (2006)
7. Conti, M., Turchetti, C.: Approximation of dynamical systems by continuous-time re-
current approximate identity neural networks. Neural, Parallel & Scientific Computa-
tions 2(3), 299–322 (1994)
CI for the Collaborative Identification of Distributed Systems 499
8. Crippa, P., Turchetti, C., Pirani, M.: Stochastic model of neural computing. In: Proc.
Eighth International Conference on Knowledge-Based Intelligent Information and Engi-
neering Systems (KES 2004), Wellington, New Zealand, vol. 2, pp. 683–690 (2004)
9. Cybenko, G.: Approximation by superpositions of sigmoidal function. Math. Control,
System, Signal 2, 303–314 (1989)
10. Dana, A.F., Hassibi, B.: On the power efficiency of sensory and ad hoc wireless networks.
IEEE Transactions on Information Theory 52(7), 2890–2914 (2006)
11. Dougherty, E.R.: Random Processes for Image and Signal Processing. SPIE—IEEE Se-
ries on Imaging Science and Engineering, Bellingham (1998)
12. Dousse, O., Franceschetti, M., Thiran, P.: On the throughput scaling of wireless relay
networks. IEEE Transactions on Information Theory 52(6), 2756–2761 (2006)
13. Dukes, P.J., Syrotiuk, V.R., Colbourn, C.J.: Ternary schedules for energy-limited sensor
networks. IEEE Transactions on Information Theory 53(8), 2791–2798 (2007)
14. Ferrari, S., Stengel, R.F.: Smooth function approximation using neural networks. IEEE
Transactions on Neural Networks 16(1), 24–38 (2005)
15. Franceschetti, M., Meester, R.: Critical node lifetimes in random networks via the chen
stein method. IEEE Transactions on Information Theory 52(6), 2831–2837 (2006)
16. Funahashi, K.: On the approximate realization of continuous mappings by neural net-
works. Neural Networks 2(3), 183–192 (1989)
17. Gastpar, M., Dragotti, P.L., Vetterli, M.: The distributed Karhunen-Love transform. IEEE
Transactions on Information Theory 52(12), 5177–5196 (2006)
18. Gastpar, M., Vetterli, M.: On the capacity of large Gaussian relay networks. IEEE Trans-
actions on Information Theory 51(3), 765–779 (2005)
19. Gastpar, M., Vetterli, M., Dragotti, P.L.: Sensing reality and communicating bits: a dan-
gerous liaison. IEEE Signal Process. Mag. 23(4), 70–83 (2006)
20. Georgiadis, L., Tassiulas, L.: Optimal overload response in sensor networks. IEEE Trans-
actions on Information Theory 52(6), 2684–2696 (2006)
21. Girosi, F., Poggio, T.: Networks and the best approximation property. Journal Biological
Cybernetics 63(3), 169–176 (1990)
22. Haenggi, M.: On distances in uniformly random networks. IEEE Transactions on Infor-
mation Theory 51(10), 3584–3586 (2006)
23. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall,
Upper Saddle River (1999)
24. Huang, G.B., Chen, L., Siew, C.K.: Universal approximation using incremental construc-
tive feedforward networks with random hidden nodes. IEEE Transactions on Neural Net-
works 17(4), 879–892 (2006)
25. Huang, G.B., Saratchandran, P., Sundararajan, N.: A generalized growing and pruning
RBF (GGAP-RBF) neural network for function approximation. IEEE Transactions on
Neural Networks 16(1), 57–67 (2005)
26. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal
approximators. Neural Networks 2(3), 395–403 (1989)
27. Kunniyur, S.S., Venkatesh, S.S.: Threshold functions, node isolation, and emergent la-
cunae in sensor networks. IEEE Transactions on Information Theory 52(12), 5352–5372
(2006)
28. Liu, B., Chen, B.: Channel-optimized quantizers for decentralized detection in sensor
networks. IEEE Transactions on Information Theory 52(7), 3349–3358 (2006)
29. Luo, Z.Q.: Universal decentralized estimation in a bandwidth constrained sensor net-
work. IEEE Transactions on Information Theory 51(6), 2209–2210 (2005)
30. Ogunfunmi, T.: Adaptive Nonlinear System Identification — The Volterra and Wiener
Model Approaches. Springer, New York (2007)
500 G. Biagetti et al.
31. Park, J., Sandberg, I.W.: Universal approximation using radial-basis-function networks.
Neural Computation 2(3), 246–257 (1991)
32. Poggio, T., Girosi, F.: Networks for approximation and learning. Proceedings of the
IEEE 78(9), 1481–1497 (1990)
33. Scardovi, L., Baglietto, M., Parisini, T.: Active state estimation for nonlinear systems:
A neural approximation approach. IEEE Transactions on Neural Networks 18(4), 1172–
1184 (2007)
34. Schilling, R.J., Carroll Jr., J.J., Al-Ajlouni, A.F.: Approximation of nonlinear systems
with radial basis function neural networks. IEEE Transactions on Neural Networks 12(1),
1–15 (2001)
35. Tay, W.P., Tsitsiklis, J.N., Win, M.Z.: Asymptotic performance of a censoring sensor
network. IEEE Transactions on Information Theory 53(11), 4191–4209 (2007)
36. Toumpis, S., Tassiulas, L.: Optimal deployment of large wireless sensor networks. IEEE
Transactions on Information Theory 52(7), 2935–2953 (2006)
37. Turchetti, C., Conti, M., Crippa, P., Orcioni, S.: On the approximation of stochastic
processes by approximate identity neural networks. IEEE Transactions on Neural Net-
works 9(6), 1069–1085 (1998)
38. Turchetti, C., Crippa, P., Pirani, M., Biagetti, G.: Representation of nonlinear random
transformations by non-Gaussian stochastic neural networks. IEEE Transactions on Neu-
ral Networks 19(6), 1033–1060 (2008)
39. Wedge, D., Ingram, D., McLean, D., Mingham, C., Bandar, Z.: On global-local arti-
ficial neural networks for function approximation. IEEE Transactions on Neural Net-
works 17(4), 942–952 (2006)
40. Wu, J.M., Lin, Z.H., Hsu, P.H.: Function approximation using generalized adalines. IEEE
Transactions on Neural Networks 17(3), 541–558 (2006)
41. Xiao, J.J., Luo, Z.Q.: Decentralized estimation in an inhomogeneous sensing environ-
ment. IEEE Transactions on Information Theory 51(10), 3564–3575 (2005)
42. Xue, F., Kumar, P.R.: On the -coverage and connectivity of large random networks. IEEE
Transactions on Information Theory 52(6), 2289–2298 (2006)
43. Yang, Z., Tong, L.: Cooperative sensor networks with misinformed nodes. IEEE Trans-
actions on Information Theory 51(12), 4118–4133 (2005)
Collaboration at the Basis of Sharing Focused
Information: The Opportunistic Networks
Abstract. There is no doubt that the sharing of information lies at the basis of any
collaborative framework. While this is the keen contrivance of social computation
paradigms such as ant colonies and neural networks, it also represented the Achilles’
heel of many parallel computation protocols of the eighties. In addition to compu-
tational overhead due to the transfer of the information in these protocols, a modern
drawback is constituted by intrusions in the communication channels, e.g. spam-
ming in the e-mails, injection of malicious programming codes, or in general attacks
on the data communication. While swarm intelligence and connectionist paradigms
overcome these drawbacks with a fault tolerant broadcasting of data – any agent has
access massively to any message reaching him – in this chapter we discuss within
the paradigm of opportunistic networks an automatically selective communication
protocol particularly suited to set up a robust collaboration within a very local com-
munity of agents. Like medieval monks who escaped world chaos and violence by
taking refuge in small and protected communities, modern people may escape the
information avalanche by forming virtual communities that do not in any case relin-
quish most ITC (Information Technology Community) benefits. A communication
middleware to obtain this result is represented by opportunistic networks.
1 Introduction
Beyond its sociological concept, an ideal form of collaboration between comput-
ing devices is represented by the paradigm of parallel computing. Having a highly
Bruno Apolloni, Guglielmo Apolloni, Simone Bassis, and Gian Luca Galliani
Department of Computer Science, University of Milan, Via Comelico 39/41,
20135 Milan, Italy
e-mail: apolloni,[email protected],
[email protected],gian luca [email protected]
Gianpaolo Rossi
Department of Information and Technology, University of Milan, Via Comelico 39/41,
20135 Milan, Italy
e-mail: [email protected]
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 501–524.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
502 B. Apolloni et al.
distance somewhat greater than with mere voice range. The device has electronic
functions so that the communication is automatic. Hence no one is bored by an
intentionally repetitive message sending or storing, and performing, in general, a
series of typical database operations. Vice versa, people may decide to whom to
send the message or conversely from whom to receive a message. Cooperation starts
when two people, a sender and a receiver, decide that it is appropriate to forward the
message to someone not directly reachable by the sender; rather, its transponding
through the receiver device could favor the forwarding. This is the scheme of multi-
hop communication, a scheme that is neither robust nor reliable per se – i.e. with
the task of transmitting a message from any agent A to any agent B – but may prove
extremely compliant with the intents of a virtual community.
According to the literature, we may describe the main features of this framework
using the metaphor of word-of-mouth recommendation [42] – i.e. a transmission of
specific information to selected people, as we may do with private conversations –
and the following functionalities of a communication network:
• device vicinity exploration. The range limitation of radio-devices requires a co-
location of transmitting/receiving devices. The movement limitations of their
owners – the agents – and their sparseness in a territory identify a speed limi-
tation on the message transmission. Paired with transmission strategies (number
of repetitions of the message forwarding and the like) it identifies an effective
transmission range [13] of the single agents w.r.t. another one within a group of
agents. The interests’ coincidence of these agents identifies the range of activity
of an opportunistic community;
• user profile. A connection between two agents is refused if they do not share
a common interest in the message to be transmitted. This implies two levels of
message: a beacon message1 to identify the presence of candidate receivers and
the preferential transmission band, and the true message to be transmitted. We
may imagine a graduation of affinity between the mating agents. However, we
prefer to discard the role of pure transponding functionality as a service provider
[28]. The messages are vehicled exclusively through agents interested in the con-
veyed information;
• data dissemination. In the old children’s telegraph game, the kids line up in a
row. Then the one in the leftmost position whispers a sentence in the ear of the
kid on his right, and so on. The general result is that the last kid on the right
hears a sentence that is almost completely different from the original. The cause
may be mere background noise or, otherwise, cheating on the part of one or more
kids. In the opportunistic network this must not occur. Hence no manipulation is
admitted in a message, only its forwarding. The sole corruption may come from
the total disappearance of the message, due to the transmission policies [35]. This
may be a drawback for some members of the community, but it also represents
1 In a wireless network, beaconing refers to the continuous transmission of small packets
(called beacons) that advertise the presence of a base station (access point). The mobile
units sense the beacons and attempt to establish a wireless connection. In opportunistic
networks all mobile units may both transmit and sense beacons.
Collaboration for Sharing Focused Information: The Opportunistic Networks 505
a benefit since it defines a spatial and temporal frontier for the community. It
means no spam, information focusing and confidence growth.
• privacy/distributedness of the information. An opportunistic network is a com-
pletely distributed system. Each device is the router of the next move with the
sole responsibility on the part of the receiver to refuse the communication. To
decide a transaction, sender and receiver are anonymous but must declare their
interest profile. The degree of identity protection may be more or less high as a
function of the cryptographic protocols employed, where a breakeven point must
be sought between safety and transmission/computation load. Moreover, we may
adopt a reputation system to avoid spammer intrusions [27].
We follow this story from the different perspectives of a kid and his father from a
Far East country, say China, visiting a world expo in Europe as part of a group, and
a hostess coming to help them (see the strip in Fig. 1). Both have the above r-d,
whereas each of them wants to see and do something different. After his father has
been talking about future business deals at the various stands, the son receives on
the r-d an announcement about a photography exhibition of interest to him at an-
other pavilion – the yellow one. He tells his father and, assuming he knows, goes to
the yellow pavilion. But the father didn’t really get the message, so the face-to-face
communication between the two is broken. We may imagine on the one hand the
kid waiting for the father in a recognizable puzzled posture (e.g. sitting for a long
time looking lost and concerned), on the other the father realizing he has lost the kid
when ending one discussion and about to move on to the next scheduled appoint-
ment. We may figure many ways of exploiting the r-d facility. One is that a hostess
recognizes the kid’s trouble. She doesn’t speak Chinese, but may select a local pre-
coded message on the r-d – say the number 12 : “are you lost?” – which will appear
on the LCD both in English and in Chinese, so that it can be understood by the kid.
506 B. Apolloni et al.
Then the same device may be used to forward both an emergency message to the
leader of the group (assumed to be nearby) and to the expo security staff, as well
as a specific message to the relatives of the kid who will recognize the message ID.
Both messages should ensure a happy ending.
Collaboration for Sharing Focused Information: The Opportunistic Networks 507
This simple case highlights some strategic features expected from the r-d:
1. Beside the radio waves, the local communication channels of the device should
be both audio, for emergency signal, and visual, committed to a small LCD
screen. Three buttons: up, down and ok, will manage the communications.
2. The memory of the device should store an ID of its user, for instance made up
of: language, age, sex and preference re the message content he likes to receive.
The storing may be done either in remote, via the Internet, or directly upon device
delivery.
3. In addition, the user may decide to store special messages he receives that may
represent a record of his visits and hints about future tourist proposals.
4. The opportunistic functionality of the device concerns the circulation of service
and emergence messages. They are improved by some local functionalities, con-
sisting of translation of pre-coded inquiries that may be either sent in remote or
formulated locally (such as “are you lost?”).
10 10 10
y y y
5 5 5
2 4 6 8 10 12 x 2 4 6 8 10 12 x 2 4 θ 6 8 10 12 x
5 5 5
Fig. 2 Joint traces of two cars (plain and dashed curves respectively) when: (a) both move
according to a Brownian motion behavior; (b) the former moves only in one quadrant (ab-
solute value of the Brownian motion components) from a trigger time on; and (c) an oracle
rotates this trajectory toward the other car with some approximation (quantified by the ray of
a proximity circle)
corresponds to taking the absolute value of the elementary steps in that direction,
so as to work with Chi distributed addends in place of Gaussian ones (see Fig. 2(b)).
In order to overcome analytical complications we propose this simple scheme.
As the difference between two Gaussian variables is a Gaussian variable too, we
may use (1) also to describe the components√ of the distance Δ between the two cars
before τ . We just need to multiply them by 2 so as XΔ (t) ∼ N0,√2t and similarly
for YΔ (t). Moreover, if we move to polar coordinates (r, θ ) with x = r cos θ , y =
r sin θ , the density function fΔ of Δ becomes
1 − r2
fΔ (r, θ ) = re 4t (2)
4π t
which looks for the joint density function √ of (R, Θ ), with R a Chi variable with 2
degrees of freedom scaled by a factor 2t and Θ a variable uniformly distributed
in [0, 2π ). Our assumption about the pursuit is that, with reference to the distances
D1 and D2 of the two cars from the position of the first one at time τ , you are able
to maneuver Θ1 from τ on, so that when D1 = D2 also Θ1 = Θ2 (see Fig. 2(c)). As
mentioned before, per se the probability of a match between two points representing
the cars is null. Thus your task is unrealistic. However, intentionality recovers fea-
sibility thanks to the fact that in practice it is enough that the angles are sufficiently
close to entangle the two cars. The actual correspondence with the pursuit dynamics
is facilitated by some free coefficients which will be embedded in the model.
With this assumption we are interested in the time t when D1 = D2 . Given the
continuity of the latter we may measure only a probability density with t. In other
words, at any change of the sign in the difference D1 − D2 with the running of the
two cars, there will correspond a matching time as a specification of a continuous
variable T . Since both D1 and D2 scale with the square root of time, expressing their
dependence on the trigger time τ and the pursuit time t, we have
Collaboration for Sharing Focused Information: The Opportunistic Networks 509
√ √
D1 (t) = t χ 21 ; D2 (t) = 2 τ + t χ 22 (3)
z2
where χ2 is a random variable with density function f χ2 (z) = ze− 2 . Thus, after
equating D1 (t) with D2 (t) we obtain
√
D1 (t) χ22 2τ + t
1= = √ (4)
D2 (t) χ21 t
under the condition χ22 ≥ χ21 . Denoting with T the random variable with specifica-
tions t and T with specifications τ , this equation finds a stochastic solution in the
random variable
−1
T χ221
V= =2 −1 (5)
T χ222
It follows the same distribution law of the ratio between two unconstrained Chi
square variables, i.e. an F variable with parameter (2, 2) [19], whose cumulative
distribution function (CDF) reads
1 v
FV (v) = 1 − I (v) = I (v) (6)
1 + v [0,∞) 1 + v [0,∞)
where I[a,b] (x) is the indicator function of x w.r.t. the interval [a, b], thus being 1 for
a ≤ x ≤ b, 0 otherwise.
Let us make some general considerations about processes with memory. To start
from the very beginning, for any ordered variable T , such that only events on their
sorted values are of interest to us, the following master equation holds
P(T > t|T > k) = P(T > q|T > k)P(T > t|T > q) ∀k ≤ q ≤ t (7)
It comes simply from the fact that in the expression of the conditional probability
we may separate the conditioned variables from the conditioning ones. While (7)
denotes the time splitting in the fashion of the Chapmann–Kolmogorov theorem
[37] as a general property of any sequence of data, equation (8) highlights that events
(T > t) and (T > k) are by definition never independent. What is generally the target
of the memory divide in random processes is the time t − k elapsing between two
events. In this perspective, the template of the memoryless phenomena descriptor
is the Poisson process, whose basic property is P(T > t) = P(T > q)P(T > t − q),
if t > q. It says that if a random event (for instance a hard disk failure) did not
510 B. Apolloni et al.
log F T log F T
1.000
1.000
0.500
0.500
0.100 0.100
0.050 0.050
0.010 0.010
0.005 0.005
0.001 0.001
Fig. 3 CCDF LogLogPlot when T follows: (a) a Pareto law with α = 1.1 and k = 1; (b) a
negative exponential law with λ = 0.1. Parameters were chosen to have the same mean 11
occur before time q and you ask what will happen within time t, you must forget
this former situation (it means that the disk did not become either more robust or
weaker), since your true question concerns whether or not the event will occur at a
time t − q. Hence your true variable is τ = T − q, and the above property is satisfied
by the negative exponential distribution law with P(T > t) = e−λ t , for constant λ 2 ,
since with this law (7) reads
Note that this distribution, commonly called the Pareto distribution, is defined only
for t ≥ k, with k > 0 denoting the true time origin, where α identifies the distribution
with the scale of its logarithm. The main difference w.r.t. the negative exponential
distribution is highlighted by the LogLogPlots of the complementary cumulative
distribution function (CCDF) F T in Fig. 3: a line segment with a Pareto curve (see
picture (a)) in contrast to a more than linearly decreasing curve with the exponential
distribution (picture (b)).
The difference between the graphs in Fig. 3 shows that, for a same mean value of
the variable, we may expect this occurrence in a more delayed time if we maintain
memory of it as a target to be achieved rather than if we rely on chance.
We recover the distribution of V coming from the dodgem car model by extending
(6) as follows
2 Variants with λ = 1/β (t) allow simple adaptation of the law to more complex phenomena
when β (t) is not far from being a constant.
Collaboration for Sharing Focused Information: The Opportunistic Networks 511
b
FV (v) = 1 − (11)
b + (v/c + 1)a
which we call shifted Pareto, with b and c playing the role of both scale and shift
parameters. The latter stands for a key feature of a memory dependent model; the
former of a memoryless framework. The exponent a plays a role similar to α ’s. With
this formula we aim to approximately capture many variants of v, as for both mod-
eling variations and model adaptation to real data. For instance we almost recover
2
2v α
FV (v) = 2 2 I[0,∞) (v) (12)
v α + (v + 2) α
obtained by extending the dependence on t from square root to a generic power
1/α in model (1). Though somewhat structurally different from (12), (11) almost
coincides with (12). In particular, with a = b = c = 1 and v = v − 1 it takes the
form (6) into which (12) translates when α = 2. Actually, we may get satisfactory
approximations in a relatively wide range of parameters. Moreover V in (11) ranges
from −c to +∞. Hence, when we refer to a variable in [0, +∞), we use the truncated
version of (11) that is given by
(b + c)
FV (v) = 1 − (13)
b + (v/c + 1)a
To obtain the pursuit times we need to multiply v by the trigger time; we must also
add the latter to the product in order to obtain the contact times. In the next section
we will see that both contact times and intercontact times remain approximately in
the same family, provided we have a suitable distribution law of the trigger times.
We will also study some manageable deviations from this model.
From our model we are left with a power law describing the ratio between pursuit
and trigger times. Since t = vτ , to complete the description of contact times we need
a model for the trigger time too. Let fT be its probability density function (PDF),
defined in a range (τinf , τsup ). Since t + τ = (v + 1)τ , we obtain FW with W = T + T
by computing
max{w,τsup }
FW (w) = FV (w/τ − 1) fT (τ )dτ (14)
τinf
Fig. 4 CCDF LogLogPlot of contact times with a trigger time varying according to distribu-
tion law: (a) uniform ; (b) Pareto; and (c) negative exponential
FT (τ ) = 1 − τ −λ (15)
log F T
1.00
0.50
0.20
0.10
0.05
0.02
0.1 1 10 100 1000 104 logt
Fig. 5 Recovering the intercontact ECCDF shape through our mobility model
We can presently find in the literature a few real user-traces databases which we
generally consider limited, specific to particular environments and difficult to man-
age. However, a common feature emerging from these traces is that the distributions
underlying mobility phenomena are heavy-tailed distributions having the Pareto law
as a common template [7, 25, 33, 30, 10]. Actually, the CCDF synthesizing these
tracks shows two different traits that appear linear (see Fig. 6(a)), thus suggesting
descriptions in terms of: a double Pareto curve (a lower power curve followed by a
greater power one), or, alternatively, a temporal sequencing of a Pareto trait prose-
cuting with an exponential distribution that quickly nears 0 [29] (see Figs. 6(b) and
(c), respectively); or, in contrast, the sequencing of an exponential trait and a Pareto
trait as we suggested in the previous sections.
This behavior has no well-assessed theoretical model in the literature. So,
synthetic models normally produce quite different time distributions which find
justification more in human mobility abstract (possibly simplifying) hypotheses
than in experimental feedbacks. Thus, at a very elementary level, we find a Random
Walk Mobility Model [8] described in terms of a Brownian process [17], where
each user randomly chooses a direction and a speed at each time-step. The inter-
action between users is enhanced in two directions: i) the dynamics of the single
agent, and ii) the correlation between their moves. As for the former, in the Random
514 B. Apolloni et al.
5001000 5000
1 104 5 104 5
1 10 5001000 5000
1 104 5 104 5
1 10 5001000 5000
1 104 1 4 105
5 10
log t log t logt
(a) (b) (c)
Fig. 6 CCDF LogLogPlot of a T randomly varying according to: (a) the Cambridge/Haggle
Crawdad dataset; and (b) a double Pareto law; and (c) a power law followed by an exponential
one
WayPoint Mobility Model [48, 26] each user randomly selects a target point in the
region where users move and goes toward it (the pursuit phase) with a speed uni-
formly chosen in a fixed interval. Once he arrives he remains there for a fixed time
and then starts moving again with parameters drawn independently from the pre-
vious ones. The model is made more sophisticated in various ways. For instance,
according to the Gauss-Markov model, the direction and speed at time t depend on
their values at time t − 1 [36]; otherwise the path toward the goal region takes into
account the obstacles represented by buildings [12] etc. As for the second direc-
tion, correlations between users are considered in the Group Mobility Model and its
variants [9].
3.2.1 Requirements
The design of the Pocket Trace Recorder, or PTR, has both functional and architec-
tural requirements. The former are related to trace collection, recording and trans-
ferring to a server station for off-line analysis. The primary focus of the PTRs design
is the collection of data that describe the contacts among encountering devices. The
distribution of the intercontact times between mobile devices is the key ingredient
to estimate the delays in the system and to generate a mobility model seamlessly
reproducing real human mobility. On the other hand, the amount of contacts and
how long they last, for both individual and group interactions, provide significant
information about the network capacity and the people’s way of moving according
to some spatial, social or functional law [36].
Collaboration for Sharing Focused Information: The Opportunistic Networks 515
The Pocket Trace Recorder is a portable radio device with the overall architecture
described in Fig. 7. It uses the Cypress CY8C29566 microcontroller and the radio
module AUREL, model RTX-RTLP. The radio range has been limited to 10 meters
in order to maintain a sparse PTR distribution even in an office area and to limit
power consumption. This combination allows a very low power consumption that
lets the experiments last for the required time with common batteries NiMh, AA
1.2V . Each PTR has a 1 MB flash memory where more than 50, 000 contacts can be
stored.
The PTR firmware implements the first two layers of ISO-OSI model [32]:
Manchester coding is used at the physical layer, while a CSMA non-persistent MAC
protocol that regulates access to the 2400b/s channel characterizes the data-link
layer. Within the latter layer, beacons are the only frames that a device has to ex-
change with its neighbors. The beacon payload is composed of: the PTR identifier;
a set of bits representing the internal state (and which are used for diagnosis pur-
poses); and the current time. The local time is set at the configuration time. The total
516 B. Apolloni et al.
clock drift in 3-week experiments has been evaluated in 10-12 seconds and, so far,
we have not executed any run-time clock synchronization. After sending its beacon,
a PTR enters a sleep mode and wakes up whenever it receives a beacon from the
neighborhood.
Each PTR uses an USB interface to communicate with the Pocket Viewer, the
Desktop application software, which has been used to configure the devices, collect
the recorded data at the end of the experiment and support data analysis and device
monitoring.
l l l
2000 7000 600
6000
500
1500 5000
(1) 4000
400
1000 300
3000
2000 200
500
1000 100
(2) 0.10
0.05
0.100
0.050
0.100
0.050
0.02
0.010 0.010
0.01
0.005 0.005
400 1000
800
(1) 300
600
200
400
100
200
0.20 0.100
(2) 0.10
0.05
0.050
0.010
0.02
0.005
0.01
Fig. 8 Overview of the PTR connections’ timing. Part I: Intercontact times; Part II: intracon-
tact times. On each part, line (1): episode lengths l vs. their starting time t (in hours); line (2):
episode lengths ECCDF with time expressed in seconds; column (a): encounters between a
specific PTR pair; column (b): encounters of a specific PTR with all remaining ones; column
(c): only for intercontacts, net intercontact times independently of the mate ID
being characterized by a reduced number of parameters (see (13)) that are relatively
easy to infer. Fig. 8 reports an excerpt of the experimental distributions. Namely we
see: 1) the log of intercontact and intracontact lengths with the daily times, and 2)
their synthesis in terms of ECCDF. For each item we have an exemplar related to
the above three layers: single pair, one versus others and each versus every other.
The type (a) diagrams highlight a common path that the PTR encounters during
the course of a day, whereas a more detailed analysis could reveal biases with the
weekdays and preferences of the single PTR owners. All these biases seem how-
ever to be well adsorbed by the Pareto-like distribution as shown in Fig. 8. Here
the experimental graphs are satisfactorily recovered with curves as in (13) through
confidence regions (whose computation together with a general inference procedure
are remanded to the next section), apart from a slight hump overposition that we
518 B. Apolloni et al.
intracontact FT
3000 1.0
2500
0.8
2000
0.6
1500
0.4
1000
500 0.2
0 1000 2000 3000 4000 5000 6000 7000 200 400 600 800 1000
intercontact t
(a) (b)
Fig. 9 The non intentional features of the encounters. (a) Scatterplot of intercontact-
intracontact times drawn from same records as in column (b) of Fig. 8. (b) Exponential CDF
of the first intercontact times when the contacts have almost zero duration
already observed on the sloping trait of the picture in Fig. 5 and interpret through
the simple constructive model in the previous section.
We obtained feedback from the experimental data on the matter presented
through the pictures in Fig. 9. From part (a) of this figure – representing the scatter-
plot of intercontact-intracontact times – we see that most of the intracontact times
lasting less than 2 seconds are linked to small intercontact times, actually those
located within the plateau. Whereas the high correlation between the two times
(around 0.27) depends on the PTR blindness to other contacts once one contact
has been stated, the distribution of these times within the plateau when the intracon-
tact time is less that 2 seconds means the encounters are non intentional. Indeed, as
shown in Fig. 9(b) the distribution is exponential, figuring only crosswalks that are
random and have an almost null duration.
1a
1 + bu
t=c −c (17)
1−u
2. Master equations. The actual connection between the model and the observed
data is tossed in terms of a set of relations between statistics on the data and
unknown parameters that come as a corollary of the sampling mechanism. With
these relations we may inspect the values of the parameters that could have gen-
erated a sample with the observed statistic from a particular setting of the seeds.
Hence, if we draw seeds according to their known distribution – uniform in our
case – we get a sample of compatible parameters in response [4]. In order to en-
sure this sample clean properties, it is enough to involve sufficient statistics w.r.t.
the parameters [41] in the master equations. Unluckily, because of the shift terms,
the parameters are so embedded in the density function of T that we cannot iden-
tify such statistics for them. Rather we focus on the statistics tmed = t(!m/2+1") ,
r % &
s1 = ∑m i=1 t(i) /tmed − 1 and s2 = ∑i=!m/2+1" log t(i) /tmed , where t(i) denotes
m
the i-th item within the sample sorted in ascending order, and propose the fol-
lowing master equations having analogous notation for u(i) and umed
1
1 + bumed a
tmed = c −c (18)
1 − umed
r
⎛ 1+bu ⎞ 1a
m
(i)
1−u(i)
s1 = ∑ ⎝
⎠ − 1 (19)
i=1
1+bumed
1−umed
/
m 1 + bu(i) 1 + bumed m 0
s2 = ∑ log − log + + 1 log c (20)
i=! m +1"
1 − u(i) 1 − umed 2
2
We cannot state a general course of these statistics with the parameters. However,
in the region where we expect to find a solution this course is monotone in a
and b, with atilting attitude of c.
In greater detail, the
true ratio term in s1 and
1+bu 1a 1
1a
1+bumed
s2 should be 1−u
(i)
−1 1−umed − 1 , which we approximately
(i)
simplify as in (19) to make the solution easier. Moreover, the last term in (20)
does not derive from the explaining function of T . Rather, it is added to introduce
a shaking in the numeric search of a solution, in the assumption that c is close to
1, hence its logarithm to 0, while shifts from this value raise tmed and decrease
s2 . To achieve the first condition together with a suitable range of values, we
normalize the ti s with a wise shift and rescaling. We also modulate r, ruling a
balance between second and third statistics, in order to avoid disastrous paths
followed by the search algorithm leading to diverge. This denotes the fact that at
the moment, due to the difficulties of the involved numerical optimization tasks,
the estimation procedure is not completely automatic.
520 B. Apolloni et al.
log F T
1.000
0.500
0.100
0.050
0.010
0.005
0.001
1 10 100 1000 104 logt
protocol optimization theory based on specific hypotheses that definitely shed light
on the problem, but require deep simulations and experimental campaigns to get
truly operational results.
Our guess is that, on the one hand, non intensive communication networks which
are moderately delay tolerant, like the one sketched in our example, may find rea-
sonable solutions having the above drawbacks as caveats. On the other hand, we still
need to assess some leading principles, such as the Pareto-like laws ruling intention-
ality, to have a robust vision of the phenomenon in a substantially distribution-free
mode that will direct the results in more complex cases. As in any new operational
field, these principles will be the offspring of former pioneering applications, where
the latter take advantage of the methodology novelty when even broad results be-
come satisfactory and rewarding.
5 Conclusions
In the previous century, scientists elaborating ideas around computing machines de-
voted primary attention to the program instructions, i.e. on how to process data. In
contrast, in the new century the focus seems mainly on communication protocols,
i.e. on how to communicate the data to be processed. Within this vein, a state-of-the-
art template of communication paradigms is the Internet, where in essence we may
appreciate that the most efficient communication protocol consists in the absence of
social constraints, apart from some technological utilities. Hence no special permis-
sion or authentication is required in general, and on demand a user may typically
receive thousands of records in response to an inquiry. The limitations of the web
are obvious to everybody: a mass of data is available to any node of the network,
with no guarantee of their value. Thus, except for skillful users capable of drawing
the desired information, the vast majority of people are bombarded by a huge set
of data, with very little time and limited skills for processing them. At its worst,
the single user is affected by a set of mainly unstructured stimuli whose cumulative
effect is not far from a Gaussian noise. To desaturate this information flow we may
operate either from the bottom with feature selection methods such as random sub-
space [6, 24], or directly on the source of the data by selecting those of interest to us.
Opportunistic networks are a tool for the latter option, where the selection derives
from three factors: i) topological reasons, given the short range of the transpon-
der devices; ii) transmission protocols, which are not based on the message content
but on the transmission partners ID’s; and, above all, iii) common intentions of the
transmission partners, automatically giving rise to virtual communities at the basis
of a collaborative computational intelligence.
In this paper we show, through the analysis of real-life intercontact times, the
intentionality of the agents to be at the root of memory endowed processes form-
ing the communities. From a statistical perspective this is due to the longer tails of
the time distributions w.r.t. those of memoryless processes. Indeed, these tails allow
us to relax the typical synchronization constraints that are at the root of the con-
ventional parallel processing mechanisms. Rather, when the parallelism is exploited
522 B. Apolloni et al.
References
1. Andrews, G.E., Askey, R., Roy, R.: Special functions (Encyclopedia of Mathematics and
its Applications), vol. 71. Cambridge University Press, Cambridge (1999)
2. Algorithmic Inference, Wikipedia (2009), http://en.wikipedia.org/wiki/
Algorithmic_inference
3. Apolloni, B., Bassis, S., Gaito, S., Malchiodi, D.: Appreciation of medical treatments
by learning underlying functions with good confidence. Current Pharmaceutical De-
sign 13(15), 1545–1570 (2007)
4. Apolloni, B., Malchiodi, D., Gaito, S.: Algorithmic Inference in Machine Learning, 2nd
edn. Advanced Knowledge International, Magill, Adelaide (2006)
5. Brooks, F.P.: The Mythical Man-Month: Essays on Software Engineering, 20th Anniver-
sary edn. Addison-Wesley Professional, Reading (1995)
6. Bertoni, A., Folgieri, R., Valentini, G.: Bio-molecular cancer prediction with ran-
dom subspace ensembles of support vector machines. Neurocomputing 63(C), 535–539
(2005)
7. Bhattacharjee, D., Rao, A., Shah, C., Shah, M., helmy, A.: Empirical modeling of
campus-wide pedestrian mobility: Observations on the USC campus. In: Proceedings
of the IEEE Vehicular Technology Conference, pp. 2887–2891 (2004)
8. Boudec, J.Y.L., Vojnovic, M.: Perfect simulation and stationarity of a class of mobility
models. In: INFOCOM, pp. 2743–2754 (2005)
9. Camp, T., Boleng, J., Davies, V.: A survey of mobility models for Ad Hoc network
research. Wireless Communications & Mobile Computing (WCMC): Special issue on
Mobile Ad Hoc Networking: Research, Trends and Applications 2 (2002)
10. Chaintreau, A., Hui, P., Diot, C., Gass, R., Scott, J.: Impact of human mobility on oppor-
tunistic forwarding algorithms. IEEE Transactions on Mobile Computing 6(6), 606–620
(2007)
11. Chen, X., Murphy, A.: Enabling disconnected transitive communication in mobile ad hoc
networks. In: Proceedings Workshop Principles of Mobile Computing, pp. 21–27 (2001)
12. Choffnes, D., Bustamante, F.: An integrated mobility and traffic model for vehicular
wireless networks. In: Proc. of the 2nd ACM International Workshop on Vehicular Ad
Hoc Networks (VANET) (2005)
13. Deng, J., Han, Y.S., Chen, P.N., Varshney, P.K.: Optimal transmission range for wire-
less ad hoc networks based on energy efficiency. IEEE Transactions on Communica-
tions 55(9) (2007)
14. Dorigo, M., Stutzle, T.: Ant Colony Optimization. Bradford Books. The MIT Press,
Cambridge (2004)
15. Drabkin, V., Friedman, R., Kliot, G., Segal, M.: RAPID: Reliable probabilistic dissemi-
nation in wireless ad-hoc networks. In: SRDS 2007: 26th IEEE International Symposium
on Reliable Distributed Systems, Beijing, China (2007)
16. Efron, B., Tibshirani, R.: An introduction to the Boostrap. Chapman and Hall/Freeman,
New York (1993)
17. Einstein, A.: Investigations on the theory of the Brownian Movement. Dover Publication
Ltd (1956)
Collaboration for Sharing Focused Information: The Opportunistic Networks 523
18. Ferro, E., Potorti, F.: Bluetooth and wi-fi wireless protocols: A survey and a comparison.
IEEE Wireless Communications 12(1), 12–26 (2005)
19. Fisher, M.A.: On the mathematical foundations of theoretical statistics. Philosophical
Transactions of the Royal Society of London Ser. A 222, 309–368 (1925)
20. Georgii, H.O.: Gibbs measures and phase transitions. de Gruyter, Berlin (1988)
21. Gonzalez, M.C., Hidalgo, C.A., Barabasi, A.L.: Understanding individual human mobil-
ity patterns. Nature 453, 779–782 (2008)
22. Grossglauser, M., Tse, D.: Mobility increases the capacity of ad hoc wireless networks.
IEEE/ACM Transaction on Networking 10(4), 477–486 (2002)
23. Gustafson, J.L.: Reevaluating amdahl’s law. Communications of the ACM 31, 532–533
(1988)
24. Ho, T.: The random subspace method for constructing decision forests. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
25. Jain, R., Lelescu, D., Balakrishnan, M.: An empirical model for user registration patterns
in a campus wireless lan. In: Proceedings of the Eleventh Annual International Confer-
ence on Mobile Computing and Networking (mobiCom), pp. 170–184 (2005)
26. Johnson, D., Maltz, D.: Dynamic source routing in Ad Hoc wireless networks. In:
Imielinski, T., Korth, H.F. (eds.) Mobile Computing, pp. 153–181. Kluwer Academic
Publishers, Dordrecht (1996)
27. Josang, A., Ismail, R., Boyd, C.: A survey of trust and reputation systems for online
service provision. Decision Support Systems 43(2), 618–644 (2007)
28. Kangasharju, J., Heinemann, A.: Incentives for opportunistic networks. In: AINAW
2008: Proceedings of the 22nd International Conference on Advanced Information Net-
working and Applications - Workshops, pp. 1684–1689. IEEE Computer Society, Wash-
ington (2008)
29. Karagiannis, T., Le Boudec, J., Vojnovic, M.: Power law and exponential decay of inter
contact times between mobile devices. Tech. rep., Microsoft Research, Cambrdidge, UK
(2007)
30. Kim, M., Kotz, D., Kim, S.: Extracting a mobility model from real user traces. In: Pro-
ceedings of IEEE INFOCOM (2006)
31. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Sci-
ence 220(4598), 671–680 (1983)
32. Kurose, J.F., Ross, K.W.: Computer Networking – A Top-Down Approach, 4th edn. Pear-
son Addison Wesley, London (2008)
33. Lelescu, D., Kozat, U., Jain, R., Balakrishnan, M.: Model T++: An empirical joint space-
time registration model. In: Proceedings of ACM MOBIHOC, pp. 61–72 (2006)
34. McCulloch, W., Pitts, W.: A logical calculus of ideas immanent in nervous activity. Bul-
letin of Mathematical Biophysics 5, 115–133 (1943)
35. Muqattash, A., Krunz, M.: CDMA-based MAC protocol for wireless ad hoc networks.
In: MobiHoc 2003: Proceedings of the 4th ACM international symposium on Mobile ad
hoc networking & computing, pp. 153–164. ACM, New York (2003)
36. Musolesi, M., Mascolo, C.: A community based mobility model for Ad Hoc network
research. In: REALMAN 2006: Proceedings of the second international workshop on
multi-hop Ad Hoc networks: from theory to reality, pp. 31–38. ACM Press, New York
(2006), http://doi.acm.org/10.1145/1132983.1132990
37. Papoulis, A.: Probability, Random Variables, and Stochastic Processes, 2nd edn.
McGraw-Hill, New York (1984)
38. Rohatgi, V.K.: An Introduction to Probablity Theory and Mathematical Statistics. Wiley
Series in Probability and Mathematical Statistics. John Wiley & Sons, New York (1976)
524 B. Apolloni et al.
39. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-
propagating errors. Nature 323, 533–536 (1986)
40. Schuler, D.: Social computing. Communication of ACM 37(1), 28–29 (1994)
41. Stigler, S.: Studies in the history of probability and statistics. xxxii: Laplace, fisher and
the discovery of the concept of sufficiency. Biometrika 60(3), 439–445 (1973)
42. Straub, T., Heinemann, A.: An anonymous bonus point system for mobile commerce
based on word-of-mouth recommendation. In: Liebrock, L.M. (ed.) Proceedings of the
2004 ACM Symposium on Applied Computing, pp. 766–773. ACM Press, New York
(2004)
43. Surowiecki, J.: The Wisdom of Crowds: Why the Many Are Smarter Than the Few
and How Collective Wisdom Shapes Business, Economies, Societies and Nations. Little,
Brown (2004)
44. Tapscott, D., Williams, A.D.: Wikinomics: How Mass Collaboration Changes Every-
thing. Penguin Books Ltd, Lodon (2007)
45. Toh, C.K.: Ad Hoc Mobile Wireless Networks. Prentice Hall Publishers, Englewood
Cliffs (2002)
46. Tseng, Y.C., Ni, S.Y., Chen, Y.S., Sheu, J.P.: The broadcast storm problem in a mobile
ad hoc network. Wireless Networks 8(2/3), 153–167 (2002)
47. Wilks, S.S.: Mathematical Statistics. Wiley Publications in Statistics. John Wiley, New
York (1962)
48. Yoon, J., Liu, M., Noble, B.: Random waypoint considered harmful. In: Proceedings of
INFOCOM, IEEE (2003), http://citeseer.ist.psu.edu/yoon03random.
html
Part VII
Artificial Immune Systems
Exploiting Collaborations in the Immune
System: The Future of Artificial Immune
Systems
1 Introduction
The natural immune system, one of nature’s most complex and fascinating systems,
first provided inspiration for computer scientists in the 1990s. Since then, the rapidly
evolving paradigm of Artificial Immune Systems (AIS) has developed, with a
Emma Hart
Napier University, Edinburgh
e-mail: [email protected]
Chris McEwan
Napier University, Edinburgh
e-mail: [email protected]
Despina Davoudani
Napier University, Edinburgh
e-mail: [email protected]
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 527–558.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
528 E. Hart, C. McEwan, and D. Davoudani
can be already achieved by other more traditional methods 2 . To this end, we argue
that application of immune inspiration to these problem domains is failing to exploit
the true potential of the metaphor of the immune system as a complex system.
Secondly, research and therefore algorithm development has increasingly
become driven by engineering requirements; algorithms are developed to solve a
particular problem, and then tweaked and tuned to solve the particular problem in-
stance at hand. This prevents the development of principled algorithms which can
be generalised to other domains. This trend goes side-by-side with an increasing
tendency for algorithms to move further and further away from the biology that
originally inspired them. This is often referred to by Stepney et al [44] as “reason-
ing by metaphor” - Timmis argues further in [50] that the limited perspective of the
immune system exploited in current AIS algorithms will ultimately limit the success
of the field. On a more positive note, it is worth noting that this trend is beginning to
be reversed - recent publications such as [48, 34, 25] present new, inter-disciplinary
viewpoints which have roots firmly in the underlying immunology.
Thirdly, and perhaps underlying both of the previous points however has been a
tendency to attempt to exploit what is commonly regarded as classical immunology.
This refers to the common assumption that the immune system’s major function is
to separate self from non-self, where self defines the ‘normal’ state of the immune
system host and non-self everything else. The point that this is a general percep-
tion is perhaps made best with reference to the Wikipedia definition of the immune
system “The immune system defends the body by recognizing agents that represent
self and those that represent non-self, and launching attacks against harmful mem-
bers of the latter group”. It does not seems surprising therefore that this viewpoint
often results in the perception in those working outside of the immediate field that
AIS algorithms are associated mostly with domains such as security and anomaly
detection, as a form of defence.
However, despite the ubiquity of the self/non-self theory of immunology, a num-
ber of alternative camps exist in the immunological world, each with sizeable fol-
lowings, and proposing a number of different theories which question not only the
mechanisms by which the immune system is held to operate but more fundamen-
tally, the actual role of the immune system itself. Such theories are not necessarily
mutually exclusive to classical immunology, rather they present a different perspec-
tive on both the functioning and the role of the immune system which has significant
potential for engineering systems. For example, while host defence is clearly a crit-
ical function, [12, 45] have proposed that it cannot be the only function of interest.
In a radical departure from accepted thinking, recent work by leading immunologist
Irun Cohen has suggested that the immune system plays a much more fundamen-
tal role in the body than simply protecting it from harm, and instead, its function
is that of body maintenance. Although this may seem a subtle semantic point, in
this view (expounded in detail by Cohen in [13]), detection of harmful situations
becomes merely a special case of overall body maintenance. The immune system
2 Furthermore, many alternative, perhaps older fields, offer algorithms which are backed
up by theory (which tends to be currently lacking in AIS), and hence are more readily
accepted.
530 E. Hart, C. McEwan, and D. Davoudani
instead is seen as a cognitive system (in the sense that is computes state and acts
upon that state) which continuously provides body maintenance to the host. In the
body, the term maintenance covers a wide and diverse spectrum of functions, rang-
ing from healing of cuts and bruises, to inflammation, to mending broken bones. In
[14] Cohen likens the functioning of the immune system to that of a computational
machine, in that it must compute its current state and act upon that. His sugges-
tion that by computing both its internal and external state the immune system can be
re-configurable, adaptive, secure, and provide self-healing functionality in an unpre-
dictable and dynamic environment is clearly appealing from an engineering context.
Regarding the immune system as a system which provides maintenance rather
than defence considerably widens the scope of engineered applications which might
benefit from an immune inspired approach. Additionally, the properties just men-
tioned suggest application of immune-inspired mechanisms to a far richer field than
is currently apparent. The motivation of this article therefore is as follows. First,
we ground the discussion by focusing on the features and properties of challeng-
ing engineered systems which necessitate novel computational approaches. We then
describe a number of immune-mechanisms which are yet to be fully exploited in
computational systems. The discussion re-positions the immune system as a com-
plex, collaborative system of multiple signals and actors. A number of examples of
systems in which steps are currently being taken to implement some of the mecha-
nisms are then described. We conclude with a discussion of an emerging field, that
of immuno-engineering which promises a methodology which will facilitate maxi-
mum exploitation of immune mechanisms in the future.
note that the complexity introduced to software systems by several emerging com-
puting scenarios goes beyond the capabilities of traditional computer science and
software engineering abstractions [59]. We propose that when attempting to ad-
dress such challenges in building and maintaining truly complex systems, then the
immune system metaphor is ripe for exploration [27].
The properties we wish to endow on complex engineered systems are exhibited
by the immune systems of many complex living systems. Such systems possess an
immune system which comprises of an innate component which endows the host
with rapid, pre-programmed responses and an adaptive component which is capa-
ble of learning through experience. Much of the desired functionality of the sys-
tem arises from the interplay between these subsystems and the regulatory effect
they have on each other. Together, these operate over multiple time-scales, from
seconds to the entire lifetime of the organism, endowing a system with the ability
to function, and maintain itself over its lifetime. Application of immune-inspired
mechanisms to engineered systems which are required to exhibit the same kind of
properties will need to be in stark contrast to the type ‘traditional’ AIS algorithm
development prevalent in the literature. For example, many efforts have been made
to derive optimisation or classification algorithms by looking to a natural system
which arguably does not exhibit the hoped for functionality at all (optimisation) or
where the functionality is one smaller part of a bigger picture (classification). The
features of applications which are likely to profit from this area are summarised in
previous research by one of the authors in collaboration with Timmis [28], and are
listed below. We stress that the biggest rewards are likely to be observed in applica-
tions which exhibit all of these features — systems which exhibit only one or two
features from the list are unlikely to profit fully.
1. They will be embodied
2. They will exhibit homeostasis
3. They will benefit from interactions between the innate and adaptive immune sys-
tems
4. They will consist of multiple, heterogeneous interacting components
5. They will be easily and naturally distributed
6. They will be required to perform life-long learning
The properties above are mainly self-explanatory, with perhaps the exception
of the first two: Embodiment from the point of view of the type of systems we
describe is perhaps best understood in terms described succinctly by Stepney in
[43]. A system which is embodied in its environment is a system which can sense
and manipulate its own environment, with its internal state depending on what it
senses, and its actions depending on its state. The system’s actions may change the
environment which affects what it subsequently senses ... and thus its subsequent
actions, thus producing a complex dynamical feedback. Homeostasis in a system
generally describes the ability of an organism to regulate its internal environment
such that it remains in a stable and constant condition and has a natural mapping to
an engineered system which must maintain itself in some viability zone [7] in order
to operate successfully.
532 E. Hart, C. McEwan, and D. Davoudani
Neal and Trapnell provide further elaboration on point (4) above; in [37] they
discuss in detail the numerous actors and interactions in the immune system. The
arguments presented in [28] are extended, stating that appropriate exploitation of
immune collaborations will have the following benefits:
• Systems requiring multiple representations of actors, information and interac-
tions should become tractable and appropriate
• Systems requiring distributed detection of correlations of events and/or patterns
can be considered
• Systems requiring close integration and distribution of pattern recognition mech-
anisms with existing complex engineered systems can be considered.
This is a compelling view. Therefore, in the remainder of the article we first pro-
vide a brief overview of some of the mechanisms that are observed in the natural
immune system in which collaboration plays a crucial role, and in which we believe
provide much inspiration for the computer scientists solving complex problems in
the future. The material covers two perspectives. Firstly, we consider mechanisms
apparent in the natural immune system which are currently unexploited in artifi-
cial systems. Secondly, we discuss recent work in immunology which attempts to
reposition the immune system away from a pure defence mechanism to a complex,
self-organising computational system. Finally, we suggest some future directions
for AIS research; complex problems require complex solutions — paying closer at-
tention to the intricacies of the immune systems rather than resorting to “reasoning
by metaphor” can potentially reap rich rewards.
The natural immune system is one of the most complex systems in nature, consisting
of numerous players and mechanisms. Research in immunology flourishes world-
wide - in 1970, Jerne [32] estimated that the number of immunologists in the world
had tripled every 20 years since the late 19th century, and Coutinho[15] recently
estimates the current number as around 40000, with a new paper in immunology
published on average every 15 minutes. Despite this, many of its mysteries remains
unsolved, and a particular paradox of the field is that despite huge advances in ba-
sis science, there has been few clinical applications resulting from this [15]. There
are several competing (though not necessarily mutually exclusive) theories as to
how immune function is achieved, each associated with a plethora of mechanisms.
Although unsatisfactory from a clinical point of view, this is not necessarily a hin-
drance for the computer scientist — indeed, one might argue that exactly the oppo-
site is true. Unencumbered by the need to explain experimentally observed data, the
computer scientist is at liberty to pick and choose from the theories of the immune
system described in the literature 3 .
3 The computer scientist does not have a completely free hand; as described in Section 6
and advocated by [44] immune-inspired algorithms should result from careful abstraction
of natural mechanisms and be under-pinned by theory.
Exploiting Collaborations in the Immune System 533
role of the latter two players here; a further key innate player, the dendritic cell, is
discussed in more detail in the next section. The system works efficiently due to the
cooperation of the various cells; only via a collaborative effort can the innate system
respond quickly and strongly when under attack. The role of some of the actors and
the mechanisms by which they cooperate are now described.
2.2.1 Macrophages
Anther important class of cells is the family of cells known as Natural Killer Cells
(NK cells). These cells kill invaders by forcing them to commit suicide. NK cells
appear to directly target invaders – a two signal exchange between a potential target
cell and the NK is thought to determine whether or not the target is killed. Further-
more, NK cells also release the activating cytokine IFN-γ . As with macrophages,
NK cells require activating signals from the environment; several signals, produced
only when the body is under attack, have been identified.
2.2.3 Cooperation
The clever cooperation between these cells which results in an efficient system is
shown in Figure 1. For example, signals from an invading bacterium activate the
NK cell; this responds by producing INF-γ , the signal that activates macrophages.
Exploiting Collaborations in the Immune System 535
macrophage
killer signal
Fig. 1 Cooperation between macrophages and natural killer cells provides a self-regulatory
system, adapted from [42]
The activated macrophage becomes hyper-activated when it receives this signal and
the signal from the invading bacterium. The hyper-activated macrophage produces
the killer cytokine TNF. This cytokine is detected by the macrophage itself which
causes it to release another molecule IL-2. Together, the released IL-12 and the TNF
activate NK cells which produce more INF-γ ..... and a positive feedback loop is set
up which in turn primes more macrophages. The cooperation between the different
classes of cells play a dual role; it provides reinforcement via positive feedback
loops and also provides confirmation to cells that their diagnosis of a situation is
correct.
Currently, there exist very few applications in AIS which attempt to exploit the
immunological features described in Sections 2.2.1 to 2.2.3. One the one hand, the
innate immune system is inherently more simple than the adaptive immune system;
yet, until very recently, almost all focus in the computational literature was on the
adaptive system. Secondly, many natural systems function perfectly well with only
an innate immune system - perhaps computer scientists ought to look to the innate
immune system in the first place in order to design artificial systems. For example,
it seems clear that progress could be made in a number of robotics and control
applications (particularly those concerned with fault tolerance) by exploiting the
type of feedback loops exemplified in the immune system just described, without
having to resort to more complex systems at all.
the adaptive system, the innate system essentially computes the state of the body,
and returns this state to the adaptive system. The state information returned to the
adaptive system includes information regarding the type and location of the attack,
thus imparting knowledge on how and where to react. This is achieved via dendritic
cells, which scout the body tissues to determine its state, integrate the information,
and then signal to the adaptive system whether to react or not.
For this reason, the dendritic cell is often referred to as the ‘sentinel’ of the im-
mune system [42]. Dendritic cells reside in the epithelial tissues of the body (for
example the skin). These cells migrate through the tissue, sampling the tissue in
their vicinity. Essentially, the dendritic cells soak up molecular debris (for exam-
ple, bacteria or other pathogenic material) and additionally, sense molecular signals
present in the tissue. Some of these signals derive from safe or normal events such
as regular, or pre-programmed cell-death (apoptosis). Other signals are derived from
potentially dangerous events — for example, an exogenous signal known as a PAMP
is produced exclusively by pathogens. Another class of endogenous signals known
as danger signals are produced by by cells which die as a result of stress or from
attack. Exposure to sufficient levels of either signal results in the dendritic cell ma-
turing into one of two states, known as semi-mature or mature. The matured cell
then migrates back to the nearest lymph node through a complex system of lym-
phatic vessels via a process known as chemotaxis.
The lymph nodes function as molecular dating agencies where the different im-
mune cells of the body congregate — their small volume increases the probability
of cellular interactions. In particular, the dendritic cells that reach the lymph node
carry a snapshot of the current state of the tissues back. The snapshot contains two
important pieces of information: antigen, i.e. material causing a problem, and also
the signals representing the context under which the material was collected. This
snapshot is viewed by the reactive immune cells, in particular T-Cells, and a pro-
cess of communication, and collaboration between cells ensues which ultimately
results in activation or tolerance of the immune system, depending on the content
and context of the information presented. Returning DCs which have been exposed
to antigenic material in the context of safe signals are known as semi-mature; these
induce tolerance in the T-cells present in the lymph node. Those DCs which have
returned having been exposed to antigen in the context of PAMP and Danger sig-
nals induce a reactive response. This is shown in Figure 2. Thus, antigens are no
longer considered dangerous per se, it is their context and resulting effect on the
environment that determine their ultimate outcome.
This aspect of dendritic cell behaviour has inspired computational research con-
cerned with anomaly detection, in which the basic problem is to determine whether
to ignore or react to patterns of information in a system (for example, in a computer).
This approach shifts the focus of a detection algorithm to understanding the effects
of an intrusion to a system, rather than the signature of the intrusion itself. This has
perhaps significant advantages in that the effects on a system are potentially easier
to measure than patterns of incoming information which may be numerous and di-
verse in their nature e.g. [25]. Furthermore, the dendritic-cell-algorithm (DCA) of
Greensmith et al encapsulates a time dependent method of reacting to the effects of
Exploiting Collaborations in the Immune System 537
Fig. 2 The pathways of dendritic cell differentiation are dependent on the amount and type
of signal received. Mature cells eventually activate an immune response in the lymph node.
Semi-mature cells collect antigen, i.e. non-self material, but do not activate a response to this
material
an accumulation of signal over time, rather than reacting to individual incoming pat-
terns or signatures. The dendritic system has also inspired applications in the broad
area of wireless sensor networks [17, 16, 18]. These applications exploit two es-
sential properties of this system, the first concerned with the action of the dendritic
cells themselves and the second with the lymphatic dating agency just discussed.
The former mechanism exploits the notion of DCs as mobile agents which scout
an environment, collecting information about the contents of that environment. The
latter is concerned with how the collective information returned by numbers of DCs
to the lymph node is integrated and interpreted and ultimately results in a system
wide response. We return to this in more detail later in the chapter in Section 3.1 in
a discussion of the cognitive immune system.
B-Cells are white blood cells produced by the bone-marrow that mature to pro-
duce antibodies. Each antibody can bind to a specific set of antigens according to
its molecular shape, with the specific antigen to which it binds referred to as its
cognate antigen. This process gives rise to the popular “lock and key” metaphor of
immunology, describing the process by which antibodies (keys) and bind to specific
antigens (locks). Most AIS models adopt a simplified view of the immunology, ex-
ploiting the fact that ultimately, recognition of a cognate antigen by a B-cell causes
it to proliferate, producing clones which have receptors which recognise the same
antigen - a process known as clonal selection. However, the process is actually more
complex:
T-Cells are similar to B-Cells in appearance, and display on their surface
antibody-like receptors known as TCRs, T-cell receptors. Like B-cells, these re-
ceptors also bind to their cognate antigen, however the process has a number of
crucial differences. Firstly, T-Cells only recognise protein antigen, unlike B-cells
which can recognise any organic molecules. Most importantly however, T-Cells can
only recognise antigen that is presented to them displayed on the surface of a further
class of molecule known as MHC, in contrast to B-cells which require no help to
recognise their cognate antigens. MHC processes antigen into peptide fragments,
which are displayed on the MHC surface. T-cells recognise short linear chains of
these peptides which correspond to contiguous sequences in the primary structure
of the antigen itself. B-cell binding on the other hand involves direct binding of the
BCR to the antigen, which therefore recognise the secondary, or surface structure of
the antigen, i.e. amino-acids that are discontinuous in the primary structure but are
brought together in the folded protein. These differences in the level of abstraction
at which recognition occurs play an important role in the collaborative response of
these cells, and one which we exploit in a machine learning context as explained
later in Section 4.1.
Recognition of cognate antigen by T-Cells causes them to become activated, at
which point they secrete cytokines. These cytokines produced by the T-Cell provide
a crucial second signal to a B-Cell which has also recognised a cognate antigen,
confirming to the B-Cell that it should become activated. Without this signal, B-
Cells which have recognised cognate antigen through their own receptors (BCRs)
do not become activated. Thus, the interaction between the T-Cell and B-Cell is
critical in turning on the immune response.
In theoretical immunology, many of the seminal models (typically described as
coupled differential equations) are concerned with the dynamics of the clonal selec-
tion process – how receptors bind ligands and then induce proliferation, secretion
and mutation of antibodies in the immune repertoire. This is extended in idiotypic
network theories [32] by incorporating the ability of antibodies to bind both antigen
and other antibodies, but the fundamental processes in the models remain similar. In
both cases, inter-clonal competition, i.e the relative ability of clones to proliferate,
is a function of fitness in ability to bind ligands and thus be selected, but there is no
real sense of co-operation amongst clones.
A lineage of work in theoretical immunology (starting notably from Jerne [32]
and continuing through to work by Carneiro [9]) has evolved these models to
Exploiting Collaborations in the Immune System 539
Fig. 3 Left: Two modes of B-T co-operation (i) B1 achieves the sustained surface proximity
to T necessary for T-Help via MHC-Presentation. By virtue of T-Cell receptor/B-Cell receptor
(TCR/BCR) morphological similarity, the resulting secreted antibody has little affinity with
the TCR and both clones proliferate in an explosive positive feedback loop. (ii) B2 interacts
via direct BCR recognition of the TCR. The resulting secreted antibody thus has affinity
and directly suppresses T in a negative feedback loop. Right: an illustration of how these
complementary interactions “close the loop” and drive each other into a stable configuration
via idiotypic interactions between B-Clones
co-operative interaction with T-Helper cells. The two modes of possible interaction
are illustrated in Figure 3 where each induce positive and negative feedbacks loops,
respectively.
B-Clone activation (and thus antibody secretion) is limited by available T-Help;
T-Help, in turn, is limited by the suppressive effects of TCR-specific antibody on
T-Clones. The complementary waves of immune response initiated by the antigen
(then by antibody, then anti-antibody etc) thus self-regulates, with ultimate tolerance
or immunity resulting from the indirect competition between T-Clones and antigen
to survive suppression from antibody. As B-Clones compete to garner available T-
Help, only the fittest achieve activation and secretion, relegating weaker clones out
of the repertoire and focusing it to best reflect the antigenic environment of the host.
Thus there are several forms of asymmetry in the model: (i) T-Clones act as a
driving force, limiting factor and target of the immune response, much like their cog-
nate antigen; however T-Help is provided to B-Clones via a separate mechanism that
is crucially, independent of the “shape” of the BCR (ii) TCR-specific B-Clones have
an inherent advantage in binding T-Clones, and thus receiving T-Help, because their
binding is based on a BCR-TCR rather than an MHC-TCR pathway and there are
many more BCR on a B-Cells surface (iii) Complementary B-Clones can only sus-
tain traditional oscillatory relationships while both antigen and T-Clone survive; the
eradication of either T-Clone (tolerance) or antigen (immunity) leads to a collapse
of the repertoire where one response dominates and the other is suppressed, and (iv)
T-Clones only bind to single points in the shape-space, whereas B-Clones can bind
to several, non-contiguous points (to simulate protein surface binding where distant
peptides on the proteins primary structure are brought together on the surface of the
folded secondary structure).
The latter two asymmetries are described in detail in [34], and are motivated
by the wish to extend Carneiro’s model beyond its original context [35] in order
to apply it in a machine-learning context, however, the principles are still relevant
in an immunological context. Figure 4 illustrates the mechanism in the case of a
tolerance-inducing response to a high-dose antigen. The TCR-specific response is
able to eradicate T-Clones before the antigen-specific response can eradicate anti-
gen. The absence of TCR leaves the tolerance-inducing B-Clone in a slow decaying
resting state. The immunity-inducing B-Clone, still stimulated by the now tolerated
antigen, transitions into an induced state where lack of available T-Help forces the
clone into a fast decaying anergetic death. The situation for immunity is equivalent
but reversed. In both case, the difference between slow and fast decay ensures that
the immune system “remembers” the correct response for an extended period and
rapidly “forgets” the wrong response.
From a computational perspective, there are a number of benefits in being able
to produce such behaviours in an engineered system. Later (Section 4.1), we de-
scribe how this model can be interpreted in a machine learning context, in a doc-
ument classification or query expansion application. In this type of application, a
system must learn how to respond to incoming data, for example, information con-
tained in a stream of documents. In such a scenario, the system might respond to
words in the documents, depending on the context in which they are presented and
Exploiting Collaborations in the Immune System 541
Fig. 4 Left: A schematic illustration of how eradication of either T-Clone or antigen (in
this case, T-Clone) causes different state changes in immunity and tolerance inducing B-
Clones that lead to fast decay (forgetting) and slow decay (memory). In both cases the loss of
available T-Help from B-T co-operation is the trigger for rapid decay. Right: A graph of the
dynamics of competing tolerance-inducing and immunity-inducing responses to a high-dose
antigen. In this case tolerance out-competes immunity and the dominant response transitions
into a slow decaying memory state while the recessive response is quickly eliminated
the previous history of presentation. The concepts governing the B/T cell interac-
tions provide a possible mechanism for this by which response to any word (or
other feature in a different domain) is a dynamic one governed by the competing
B/T responses. The balance of the competing response ultimately results in a win-
ner which determines whether tolerance of immunity to a feature is observed. Cru-
cially for AIS, this mechanism does not require any a priori knowledge of good/bad,
542 E. Hart, C. McEwan, and D. Davoudani
are critical points; most AIS literature 6 , relies on either pre-labelling data into one
class or another, or presenting examples of one class. The latter approach makes an
assumption that data not represented by examples in the presented class belongs to
the other which is clearly not valid in a number of applications. Furthermore, the
notion that many features of a set of data may not be sufficiently discriminatory to
provide any classification ability is generally ignored in AIS literature — all features
are implicitly assumed to be discriminatory.
This has a direct mapping to a machine learning context. For example, consider
the task of trying to discern relevant information that may be contained in a con-
tinuous stream of incoming documents. Words in the documents map to proteins;
some of these words can be considered random noise, and therefore should be toler-
ated. Other words, although not noise, are too ubiquitous to be useful. These words
also must be tolerated - the remainder provoke some level of response. Just as in
the immune system, where the morphological and dynamic dependencies between
proteins determine their current ‘label’ as opposed to some pre-assigned labelling,
the categorisation of a word in a document stream depends not on the word itself but
the context it finds itself in. The key factor is that the model does not at any stage
treat words independently7, but that the response to any word depends on its inter-
action with other words. Interactions might include the context the word is found in
(e.g. the sentence the word appears in), how similar the word is to other words being
presented, and the dynamics of the presentation of the word over time. Determining
what to react to and what to tolerate is precisely what the model of tolerance derived
from the B-T cell interaction model attempts to achieve. The co-operation between
T-Clones and B-Clones is the essential mechanism that makes this work:
• The degree of suppression toward T-Clones and their cognate antigen are essen-
tially a judgement on the signal-to-noise ratio of that particular antigenic shape:
weakly significant (low-dose) and weakly discriminatory (persistent/high-dose)
antigen are actively tolerated, while those in-between invoke various degrees of
immune response. The asymmetry and magnitude of the response acts as a con-
fidence margin of the appropriateness of the response.
• The B-Clone repertoire is a constructive representation of the antigenic en-
vironment, made viable by clonal selection against present surface patterns.
Competitive exclusion over available T-Help regulates the complexity of this
representation. The extremes of tolerance and immunity by extinction further
emphasises the structure underlying the viable repertoire, by removing inappro-
priate responding cells and deactivating useful but currently unnecessary cells.
The high turnover and disposability of individual components of the repertoire
allows the representation to adapt rapidly to changes in the environment.
Individual cells in the repertoire have a small window of contribution and contri-
butions overlap considerably. The capacity for successful clones to proliferate lends
them extra weight when integrating these responses into an executive decision. This
6 An exception being the DCA algorithm of Greensmith et al [25].
7 Which would simply reduce the model to a statistical model.
Exploiting Collaborations in the Immune System 547
increases confidence in the system wide response, even if there is much contradic-
tion at the component level: the variance of individual responses is reduced. Con-
versely, the higher levels of suppression toward more confident responses changes
the antigenic environment of the host. This makes redundant aspects of the reper-
toire less viable; that is, bias in the preceding responses is compensated in later
responses. Furthermore, this additional diversity in the data (antigen), as well as
diversity in the repertoire, allows the system as a whole to perform better than its
single best component, which is the theoretical upper bound of an majority vote (see
e.g. [33]).
which models might be derived which capture features which are currently outwith
the scope of isolated models of statistical learning – how to adapt over time, how
to self-regulate, how to co-evolve with adversaries, how to distribute responsibility
and manage resources under physical constraints.
The immune system must maintain a healthy host and deploy lethal effector
mechanisms. The cost of errors in either case can be fatal or disabling. It runs
constantly over the lifetime of the host, interacting with the host’s physiology, the
external environment and other hosts. This is an application area severely under-
developed in the computational intelligence literature. It is not difficult to imagine
scenarios where these behaviours are essential, particularly as computing devices
becomes increasingly ubiquitous, ad-hoc and unmanaged. The immune metaphor
provides a coherent, economic and comprehensive framework for thinking about
and tackling these domains.
programmable and extracts information and acts upon it in collaboration with its lo-
cal neighbours. Thus, computational tasks as well as data are disseminated through
the network. Secondly, specks have a much smaller range of communication than
in typical networks, being of the order of tens of centimetres, rather than metres
or kilometres. This switches the burden of energy-usage to reception rather than
transmission, in contrast to a WSN. Specks are designed to be inherently mobile
rather than static, introducing further engineering and software constraints. Finally,
SpeckNets operate in an asynchronous manner, with different operations occurring
dynamically according to need, with aperiodic data transfer.
Thus, the difficulties associated with the specks’ engineering constraints com-
bined with challenging issues in the software development of such a computational
system require a fundamentally new approach to development. The interactions be-
tween many autonomous, spatially distributed units must result in coherent global
behaviours. As powerful central units are not part of the network, the system must
find its way to co-ordination through alternative pathways. Furthermore, the system
must be able to cope with unpredictable conditions, such as erratic communica-
tion and open, sometimes harsh, environments. The domain is a perfect example
of a system which exhibits the properties defined by Hart and Timmis in [28] in
Section 1.2.
550 E. Hart, C. McEwan, and D. Davoudani
1. Dendritic cells circulate through body tissues, sampling exogenous and endoge-
nous signals.
2. Dendritic cells return to the lymph nodes when they become mature, where they
deliver a snapshot of the current environment.
3. The maturation state under which DCs return to the lymph provides crucial in-
formation to the system regarding how it should react.
4. The lymph nodes in the body are distributed; the large lymph nodes are strate-
gically located to areas of the body that are closer to sources of input from the
environment.
From this in [16, 18] we have derived the outline of a model which maps tissues
in the body to specks, and messages sent between specks to dendritic cells. We
currently distinguish two different types of specks:
• Tissue specks correlate to tissues in the body, and contain sensors for monitoring
the external environment (e.g. pressure, temperature etc.). They can also provide
endogenous signals, for example relating to their own internal state (i.e. battery
power). These specks constitute the majority of specks in any given environment.
• Integration specks correspond to lymph nodes. These specks receive informa-
tion from dendritic cells, process it, and determine an appropriate response.
These specks may have greater processing power than tissue specks (but are not
required to).
A typical environment will contain a high ratio of tissue specks over integration
specks. Although in the body, lymph nodes are strategically placed, this is not fea-
sible in a typical speck deployment, where ultimately, applications are envisaged
in which thousands of specks may be sprayed at random into an environment. To
take account of this, we have studied a number of models of SpeckNets which adopt
random placements of integration specks. Dendritic cells are mapped to scouting
messages. Messages originate at integration specks and traverse the tissue specks,
where they collect both exogenous and endogenous environmental information from
each speck visited. Eventually they return to an integration speck where the infor-
mation collected is filtered and aggregated. Eventually, a decision may be made by
the integration node to act upon the collective information. This may result in one
or more of several possible actions. For example, consider an application in which
a SpeckNet might be used to monitor temperature in an environment. As well as
maintaining a functioning network, the SpeckNet should be able to regulate fluctu-
ations in temperature in local regions by activating or deactivating heat sources, as
well as indicate unusual sources of heat, for example, a fire. Integration nodes there-
fore might send out effector messages which modify the external environment, e.g
turning a heat source on or off; alternatively, the integration node may modify the
endogenous variables of the system, for example, alerting tissue specks to modify
their endogenous parameters or increasing the rate at which it sends out scouting
messages in order to gather more information.
An overview of the model is given in Figure 8. Scouting messages originate from
an integration speck and are relayed through the SpeckNet in essentially a random
552 E. Hart, C. McEwan, and D. Davoudani
Fig. 8 A high level overview of immune-inspired model used to achieve achieve functionality
and self-regulation and maintenance in a SpeckNet
walk. As they visit tissue specks, they collect information from the speck regard-
ing its local perception of the environment. For example, this may take the form of
an average value of a sensor reading over a specified window size. The parameters
used for processing sensor readings from the tissues and the functions applied to
local sensor values depend on the application and are set accordingly. The scouting
message aggregates this information as it travels (again in an application depen-
dent manner) and eventually either expires, matures or semi-matures depending on
the information it has collected. The conditions which cause a scouting message to
mature may for example include measuring variance of exogenous signals or mon-
itoring of endogenous signals such as battery power. This change of state triggers a
return to the integration node where the scouting message presents its information.
The integration node is able to estimate context based on the proportion of differ-
ent types of messages returning and also by aggregating information contained in
the expired messages. Once again, aggregating information from the network via
scouting messages is configurable with respect to the type of information needed.
The model just described has been tested in simulation using a simulation tool,
SpeckSim [1], developed to simulate a SpeckNet environment, taking account of
the physical constraint apparent in such a network. For example, radio communi-
cation is asynchronous, messages can be lost, and specks can go down at any time
(either temporarily or permanently). Experimentation with SpeckNets under a num-
ber of random configurations using a heat-monitoring scenario has confirmed that
integration nodes can successfully obtain a local picture of their environment, and
detect local changes. There is clearly a balance to be struck concerning the local-
ity of the information obtained by an integration node — increasing the number of
these nodes in the system results in each node obtaining a very local picture; fewer
nodes give a more global overview. [16, 18] describe these experiments in detail,
presenting technical results. Continuing research in this vein is now focusing on the
processes that occur in the integration nodes which integrate the information from
returning scouting messages and formulate a response. One avenue of investigation
focuses on collaborative voting methods, where a majority vote based on returned
Exploiting Collaborations in the Immune System 553
states might determine the eventual outcome. Another research direction will focus
on the collaboration between B and T-cells evident in the Carneiro model described
in Section 3 which allow the integration nodes to learn a model which will en-
able tolerance or reactivity to naturally emerge in a system based on the stream of
information arriving at the nodes.
The previous section has illustrated how one example of a collaborative mechanism
in the natural immune system can be practically exploited in an engineered system.
Taking a step back from this, we conclude the paper by examining how systems such
as the one described might be designed in the future via a principled approach which
results in generic, well-understood mechanisms which have wide applicability to a
range of domains, rather than being problem or domain specific.
Timmis et al propose a new branch of engineering to be known as immuno-
engineering which is inspired by the work of Orosz [39]. Orosz defines immuno-
ecology and immuno-informatics (definitions 1 and 2 respectively), stating that
immuno-informatics addresses the mechanisms by which the immune system con-
verts stimuli into information, how it processes and communicates that information,
and how the information is used to promote an effective immuno-ecology. [39].
Following on from this, Timmis et al outline a vision for a new type of engineering
they term immuno-engineering (definition 3) in [49], which they argue can be used
for the development of biologically grounded and theoretically understood AIS.
This is envisaged to enable the construction of robust, engineered artifacts via a
bottom-up approach to engineering.
Fig. 9 The conceptual framework proposed by Stepney et al, taken from [44]
7 Conclusions
The natural immune system is a vastly complex system, which functions due to
a complicated web of interactions that occur between multiple cells and multiple
Exploiting Collaborations in the Immune System 555
signals. Although AIS has developed a strong, and thriving field in its own right
over the past decade or so, in this article we have argued that the true potential of the
immune system has not yet been exploited in computational systems. The metaphor
is much richer than one might perceive given a glance through the AIS literature. We
have outlined some of the functional properties of the immune system that emerge as
a result of these collaborations, and shown how these properties are desirable, if not
essential, in engineered systems, particularly those that are required to function au-
tonomously. Some of the immune mechanisms which might be exploited to achieve
this functionality have been described in order to point the reader in the right direc-
tion. These mechanisms must be considered carefully in order to prevent a recourse
to reasoning by metaphor - the conceptual framework and the immuno-engineering
approach described provided a scientifically appropriate method for achieving this.
Any field, whatever the inspiration, must ultimately prove its worth by standing on
firm theoretical foundations, rather than simply exploiting its novelty. Exploiting
the similarities between immune mechanisms and other more traditional domains
such the boosting domain discussed earlier in the article may offer some mileage in
this respect. Progress may also be made via the immuno-engineering approach de-
scribed in Section 6 which advocates a principled abstraction of immune metaphors,
through a series of abstractions, at a mathematical and computational level.
Our argument is perhaps summed up most succinctly by the words of Neal et al
in [37] who propose
...the importance of the nature of the interactions in the immune system leads naturally
to the expectation that far more complex and ambitious immune inspired computation
than is currently attempted is required and should be possible.
References
1. http://www.specknet.org/dev/specksim
2. Artificial Immune Systems, Proceedings of International Conferences, Lecture Notes in
Computer Science. Springer (2001-2008)
3. Andrews, P., Timmis, J. In: Silico Immunology, chap. Alternative Inspiration for Artifi-
cial Immune Systems: Exploiting Cohen’s Cognitive Immune Model. Springer (2007)
4. Andrews, P., Timmis, J. In: Silico Immunology, chap. Alternative Inspiration for Artifi-
cial Immune Systems: Exploiting Cohen’s Cognitive Model. Springer (2007)
5. Andrews, P.S., Timmis, J.: Inspiration for the next generation of artificial immune sys-
tems. In: Jacob, C., Pilat, M.L., Bentley, P.J., Timmis, J.I. (eds.) ICARIS 2005, vol. 3627,
pp. 126–138. Springer, Heidelberg (2005)
6. Arvind, D., Elgaid, K., Krauss, T., Paterson, A., Stewart, R., Thayne, I.: Towards an
integrated design approach to specknets. In: IEEE Int. Conf. Communications 2007, ICC
2007, pp. 3319–3324 (2007)
556 E. Hart, C. McEwan, and D. Davoudani
7. Bersini, H.: Why the first glass of wine is better than the seventh. In: Jacob, C., Pilat,
M.L., Bentley, P.J., Timmis, J.I. (eds.) ICARIS 2005. LNCS, vol. 3627, pp. 100–111.
Springer, Heidelberg (2005)
8. Burnet, F.: The clonal selection theory of acquired immunity. Cambridge University
Press, Cambridge (1959)
9. Carneiro, J., Coutinho, A., Faro, J., Stewart, J.: A model of the immune network with
b-t cell co-operation. i - prototypical structures and dynamics. Journal of Theoretical
Biology 182, 513–529 (1996)
10. Carneiro, J., Coutinho, A., Stewart, J.: A model of the immune network with b-t cell co-
operation. ii - the simulation of ontogenisis. Journal of Theoretical Biology 182, 531–547
(1996)
11. de Castro, L., Von Zuben, F.: Learning and optimization using the clonal selection prin-
ciple. IEEE Transactions on Evolutionary Computation 6(3), 239–251 (2002)
12. Cohen, I.: The cognitive paradigm and the immunological homunculus. Immunology
Today (1992)
13. Cohen, I.: Tending Adam’s garden: evolving the cognitive immune self. Elsevier Aca-
demic Press, Amsterdam (2000)
14. Cohen, I.: Real and artifical immune systems: computing the state of the body. Nature
(2007)
15. Coutinho, A.: Immunology at the crossroads. European Molecular Biology Organisation
Reports 3(11), 1008–1011 (2002)
16. Davoudani, D., Hart, E.: Computing the state of specknets: An immune-inspired ap-
proach. In: To be published in Proc. Int. Symposium on Performance Evaluation of
Computer and Telecommunication Systems, SPECTS 2008, Edinburgh, UK, June 16–
18 (2008)
17. Davoudani, D., Hart, E., Paechter, B.: An immune-inspired approach to speckled com-
puting. In: de Castro, L.N., Von Zuben, F.J., Knidel, H. (eds.) ICARIS 2007. LNCS,
vol. 4628, pp. 288–299. Springer, Heidelberg (2007)
18. Davoudani, D., Hart, E., Paechter, B.: Computing the state of specknets: Further analysis
of an innate immune-inspired model. In: Bentley, P.J., Lee, D., Jung, S. (eds.) ICARIS
2008. LNCS, vol. 5132, pp. 95–106. Springer, Heidelberg (2008)
19. De Castro, L.N., Timmis, J.: Artificial Immune Systems: A New Computational Intelli-
gence Approach. Springer, London (2002)
20. Farmer, J.D., Packard, N.H., Perelson, A.S.: The immune system, adaptation and ma-
chine learning. Physica 22, 187–204 (1986)
21. Forrest, S., Hofmeyr, S., Somayaji, A.: Computer immunology. Commun. ACM 40(10),
88–96 (1997)
22. Forrest, S., Perelson, A., Allen, L., Cherukuri, R.: Self-nonself discrimination in a com-
puter. In: Proceedings of the IEEE Symposium on research, security and privacy, pp.
202–212 (1994)
23. Freund, Y., Schapire, R.E.: A decision theoretic generalisation of on-line learning and
an application to boosting. Journal of Computer and System Sciences 55(1), 119–139
(1997)
24. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of
boosting (1998), http://citeseer.ist.psu.edu/friedman98additive.
html
25. Greensmith, J., Aickelin, U., Twycross, J.: Articulation and clarification of the dendritic
cell algorithm. In: Bersini, H., Carneiro, J. (eds.) ICARIS 2006. LNCS, vol. 4163, pp.
404–417. Springer, Heidelberg (2006)
Exploiting Collaborations in the Immune System 557
26. Hart, E., Bersini, H., Santos, F.: How affinity influences tolerance in an idiotypic net-
work. J. Theor. Biology (2007)
27. Hart, E., Davoudani, D., McEwan, C.: Immunological inspiration for building a new
generation of autonomic systems. In: Autonomics, p. 9 (2007)
28. Hart, E., Timmis, J.: Application areas of ais: The past, the present and the future. Ap-
plied Soft Computing 8, 191–201 (2008)
29. Hofmeyr, S., Forrest, S.: Immunity by design. In: Proceedings of GECCO 1999, pp.
1289–1296 (1999)
30. Holland, J.: Adaptation in Natural and Artificial Systems. MIT Press, Cambridge (1992)
31. Janeway, C.A., Travers, P., Walport, M., Schlomchik, M.: Immunobiology. Garland
(2001)
32. Jerne, N.: Towards a network theory of the immune system. Annals of Immunology (Inst.
Pasteur) 125, 373–389 (1974)
33. Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and
Computation 108, 212–261 (1994)
34. McEwan, C., Hart, E., Paechter, B.: Revisiting the central and peripheral immune system.
In: de Castro, L.N., Von Zuben, F.J., Knidel, H. (eds.) ICARIS 2007. LNCS, vol. 4628,
pp. 240–251. Springer, Heidelberg (2007)
35. McEwan, C., Hart, E., Paechter, B.: Towards a model of immunological tolerance and
autonomous learning. Submitted to Natural Computing (2008)
36. Neal, M.: Meta-stable memory in an artificial immune network. In: Timmis, J., Bentley,
P.J., Hart, E. (eds.) ICARIS 2003. LNCS, vol. 2787, pp. 168–180. Springer, Heidelberg
(2003)
37. Neal, M., Trapnell, B.J.: Go Dutch: Exploit Interactions and Environments with Artificial
Immune Systems. In: Silico Immunology, Springer, Heidelberg (2007)
38. Neuman, Y.: The immune self, the sign and the testes. Semiotics, Evolution, Energy,
Development, 85–109 (2005)
39. Orosz, C.: Desgin Principles for Immune System and Other Distributed Autonomous
Systems, chap. An Introduction to Immuno-ecology and Immuno-informatics. Oxforf
Univ.Press, Oxforf (2001)
40. Perelson, A.S., Weisbuch, G.: Immunology for physicists. Review of Modern Physics 69
(1997)
41. Schapire, R.E.: The strength of weak learnability. Machine Learning 5, 197–227 (1990),
http://citeseer.ist.psu.edu/schapire90strength.html
42. Sompayrac, L.: How the immune system works, 3rd edn. Blackwell Publishing, Malden
(2008)
43. Stepney, S.: In Silico Immunology, chap. Embodiment. Springer, Heidelberg (2006)
44. Stepney, S., Smith, R., Timmis, J., Tyrrell, A., Neal, M., Hone, A.: Conceptual frame-
works for artificial immune systems. Journal of Unconventional Computing 1(3), 315–
338 (2005)
45. Stewart, J.: Cognition without neurons: adaptation, learning and memory in the immune
system. In: CC AI, pp. 7–30 (1994)
46. Stewart, J., Coutinho, A.: The affirmation of self: A new perspective on the immune
system. Artificial Life 10, 261–276 (2004)
47. Stibor, T., Timmis, J., Eckert, C.: On the use of hyperspheres in artificial immune systems
as antibody recognition regions. In: Bersini, H., Carneiro, J. (eds.) ICARIS 2006. LNCS,
vol. 4163, pp. 215–228. Springer, Heidelberg (2006)
48. Timmis, J., Andrews, P., Owens, N., Clark, E.: An interdisciplinary perspective on arti-
ficial immune systems. Evolutionary Intelligence 1(1), 5–26 (2008)
558 E. Hart, C. McEwan, and D. Davoudani
49. Timmis, J., Hart, E., Neal, M., Stepney, S., Tyrrell, A.: Immuno-engineering. In: 2nd
IFIP International Conference on Biologically Inspired Collaborative Computing. IEEE
Press, Los Alamitos (2008)
50. Timmmis, J.: Artificial immune systems - today and tomorrow. Natural Computing 6(1),
1–18 (2007)
51. Varela, F., Coutinho, A., Dupire, B., Vaz, n.: Cognitive networks: Immune, neural and
otherwise. J. Theoretical Immunology (1988)
52. Vargas, P., de Castro, L., Michelan, R., Von Zuben, F.: An immune learning classifier
network for autonomous navigation. In: Timmis, J., Bentley, P.J., Hart, E. (eds.) ICARIS
2003. LNCS, vol. 2787, pp. 69–80. Springer, Heidelberg (2003)
53. Vargas, P., de Castro, L., Von Zuben, F.: Mapping artificial immune systems into learning
classifier systems. In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2003.
LNCS (LNAI), vol. 2661, pp. 163–186. Springer, Heidelberg (2003)
54. Voigt, D., Wirth, H., Dilger, W.: A computational model for the cognitive immune system
theory based on learning classifier systems. In: de Castro, L.N., Von Zuben, F.J., Knidel,
H. (eds.) ICARIS 2007. LNCS, vol. 4628, pp. 264–275. Springer, Heidelberg (2007)
55. Watkins, A., Timmis, J.: Exploiting parallelism inherent in airs: an artificial immune clas-
sifier. In: Nicosia, G., Cutello, V., Bentley, P.J., Timmis, J. (eds.) ICARIS 2004. LNCS,
vol. 3239, pp. 427–438. Springer, Heidelberg (2004)
56. Whitbrook, A., Aickelin, U.J.G.: Idiotypic immune networks in mobile robot control.
IEEE Transactions on Systems, Man and Cybernetics, Part B 37(6), 1581–1598 (2007)
57. Wong, K., Arvind, D.: Speckled computing: Disruptive technology for networked infor-
mation appliances. In: Proceedings of the IEEE International Symposium on Consumer
Electronics (ISCE 2004), pp. 219–223 (2004)
58. Wong, K., Arvind, D.K.: Specknets: New challenges for wireless communication pro-
tocols. In: Third International Conference on Information Technology and Applications.
ICITA 2005, vol. 2, pp. 728–733 (2005)
59. Zambonelli, F., Van Dyke Parunak, H.: Signs of a revolution in computer science and
software engineering. In: Petta, P., Tolksdorf, R., Zambonelli, F. (eds.) ESAW 2002.
LNCS, vol. 2577, pp. 13–28. Springer, Heidelberg (2003) (revised papers)
Part VIII
Parallel Evolutionary Algorithms
Evolutionary Computation: Centralized,
Parallel or Collaborative
Heinz Mühlenbein
Abstract. This chapter discusses the nature and the importance of spatial inter-
actions in evolutionary computation. The current state of evolution theories is
discussed. An interaction model is investigated which we have called Darwin’s
continent-island cycle conjecture. Darwin argued that such a cycle is the most ef-
ficient for successful evolution. This bold conjecture has not yet been noticed in
science. We confirm Darwin’s conjecture using an evolutionary game based on the
iterated prisoner’s dilemma. A different interaction scheme, called the stepping-
stone model is used by the Parallel Genetic Algorithm PGA. The PGA is used to
solve combinatorial optimization problems. Then the Breeder Genetic Algorithm
BGA used for global optimization of continuous functions is described. The BGA
uses competition between subpopulations applying different strategies. This kind of
interaction is found in ecological systems.
1 Introduction
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 561–595.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
562 H. Mühlenbein
knowledge. In this chapter I will concentrate on the third approach. It relies on the-
ories of evolution and of computation. The theory of computation is well advanced,
so the problems of evolutionary computation lie in implementing theories of evolu-
tion. If a convincing constructive (or even mathematical) theory of evolution existed,
then evolutionary computation would be just a matter of implementation - which of
the major evolutionary forces to implement in what detail.
But does biology possess a constructive theory of evolution? Here the opinions
differ considerably. The main stream theory of evolution is called New or Modern
Synthesis. Its followers claim that it reconciles Darwin’s idea of continuous small
variations with gene flows derived from population genetics. The second major force
of the Modern Synthesis is Darwin’s concept of natural selection. But are these two
forces sufficient to explain the wonders of evolution at least in some broad terms?
There is no doubt that Modern Synthesis is able to explain the change of gene
frequencies on a small time scale. If there is enough diversification, then the theory
correctly predicts further changes for a short time. But can it explain the evolution
on a large time scale with new species arising and old species vanishing?
The outline of the chapter is as follows. First I recall the state of art of evolu-
tion theories, because they are used as models for evolutionary computation. Then I
describe different genetic algorithms, centralized, parallel and co-evolutionary. The
major part of the chapter deals with the investigation of Darwin’s conviction that
space is as important as time for evolution to take place. Especially we1 will ana-
lyze an important conjecture of Darwin which has been unnoticed sofar. We have
called it the Continent-island cycle conjecture. This conjecture is analyzed using
evolutionary games. Here a number of different spatial distributions are compared.
Then I describe the parallel genetic algorithm PGA and its use in combinatorial opti-
mization. In the final section co-evolution of sub-populations is used for continuous
optimization.
The term collaboration does not appear in textbooks of biology. In a restricted
form collaboration is investigated in ecology. Collaboration in the general sense is
considered to be an important component of human societies and therefore part of
sociology. Biology researches cooperation driven by interactions - between individ-
uals, between species, between geographic distributed sub-populations, within in-
sect colonies etc. In this chapter we investigate spatial distributions which vary over
time. The most interesting distribution is the continent-island cycle. This might also
be a model for successful collaboration in human societies.
in the Anglo-Saxon countries because of the battle of its supporters against some
orthodox believers of religion2.
In Germany Ernst Haeckel was a strong advocate of Darwin’s theory. Neverthe-
less he wrote as early as 1863 - only four years after the publication of the Origin:
“Darwin’s evolution theory is by no means finished, instead it gives only the first
outline of a future theory. On the one hand we are not aware of all the other rela-
tions, which may be equally important in the origin of species to natural selection,
which was emphasized far too much by Darwin. And in many cases the external
conditions of existence of an-organic nature like climate and habitat, geographic and
topographic conditions, to which the organism have to adapt, should be considered
no less important than these relations....Another, and no doubt the most important
shortcoming of Darwin’s theory lies in the fact, that it gives no indication of the
spontaneous creation or the first genesis of the one or the few oldest organisms from
which all other organisms developed [19]3 ”.
It is outside the scope of this paper to discuss the above problems in detail. They
are still controversial in biology. In order to refresh the memory of the reader I recall
some important terms in evolutionary biology
• Genotype: The molecular basis of inheritance as defined by genes and chromo-
somes.
• Phenotype: The actual appearance of the living beings.
• Species: A group of organisms capable of inter-breeding and producing fertile
offspring. More precise measures are based on the similarity of DNA or mor-
phology.
Another important concept in population genetics is the fitness. It describes the
capability of an individual of certain genotype to reproduce, and usually is equal
to the proportion of individual genes in the next generation. An individual’s fitness
is manifested through its phenotype. As the phenotype is affected by both genes
and environment, the fitness of different individuals with the same genotype are
not necessarily equal, but depend on the environment on which the individuals live.
However, the fitness of the genotype is considered to be an averaged quantity, it will
reflect the outcomes of all individuals with that genotype.
This is a very careful definition, but how can this fitness be measured? It needs the
next generation! I will not discuss this complicated issue further. All mathematical
models of population genetics assume that the fitness is given. In the simplest case
of a single gene with two alleles a and A, we have the genotypes with genotypes aa,
aA, AA with corresponding fitness values w00 , w01 , w11 .
In order to illustrate the current state of art of evolution theories, I shortly describe
two representative examples. The first one is expressed in the book of Maynard
Smith and Szathmary [47]. They see evolution as the evolution of complexity in
terms of genetic information and how it is stored, transmitted, and translated. This
2 This controversy is still not settled, if one considers the many supporters of “intelligent
design” in the US.
3 Translation by the author.
564 H. Mühlenbein
before → after
replicator molecules → population of molecules in compartments
independent replicator → chromosomes
RNA as gene and enzyme → DNA and protein
procaryote → eucaryote
asexual clones → sexual population
protist → plants, animals, fungi
solitary individuals → colonies
societies of primates → human societies
approach has led them to identify several major transitions, starting with the origin
of life and ending with the origin of human language (see Table 1).
The authors “solve’” some of the transition problems with a very narrow version
of the Modern Synthesis. “We are supporters of the gene centered approach pro-
posed by Williams and refined by Dawkins.” In the gene centered approach, also
called the selfish gene concept [7], the genes are the major actors. They possess an
internal force to proliferate as much as possible.
Let me illustrate the gene centered approach with the kin selection concept. In the
gene centered approach fitness measures the quantities of copies of the genes of an
individual in the next generation. It doesn’t really matter how the genes arrive in the
next generation. That is, for an individual it is equal beneficial to reproduce itself, or
to help relatives with similar genes to reproduce, as long as at least the same number
of copies of the individual’s genes get passed on to the next generation. Selection
which promotes this kind of helper behavior is called kin selection. It has even been
put into a mathematical rule by Hamilton [20]! An altruistic act will be done if
Here C means the cost in fitness to the actor, B the benefit in fitness and r the relat-
edness. Let us discuss a simple example. Consider a father and his children, which
are drowning. Here r = 0.5. Let us assume that the father has to make a sacrifice,
this means C = 1. First assume B = 1. Then the father will not make a sacrifice for a
single child, but it needs three children at least. But if the father is not able to father
new children, then he makes the sacrifice for a single child! (see also the discussion
of altruism in [47]).
The selfish gene concept has been opposed by a small group in biology, most no-
tably the late Stephen J. Gould. Recently even philosophers of science formulate a
basic critic. “The synthetic theory bypassed what were at the time intractable ques-
tions of the actual relationship between stretches of chromosomes and phenotypic
traits. Although it was accepted that genes must, in reality, generate phenotypic
differences through interaction with other genes and other factors in development,
genes were treated as black boxes that could be relied on to produce phenotypic
variation with which they were known to correlate [17].”
Evolutionary Computation: Centralized, Parallel or Collaborative 565
Metaphorically speaking: Each organism travels on a unique trace in this four di-
mensional space.
One of the major weakness of the Modern Synthesis is the separation of the indi-
viduals and the environment. The fitness is averaged over individuals and environ-
ments. Let O(t) = (O1 (t), . . . ON (t)) denote the vector of individuals at generation
t. Then we can formulate a simple system model. Each individual Oi (t) (mainly
566 H. Mühlenbein
It seems impossible to obtain numerical values for this fitness. Therefore theo-
retical biology has made many simplifications. The environment is kept fixed, i.e
g(E(t)) = const, the influence of other individuals is described by some averages of
the population, etc.. The above model is still too simple, because each individual is
developing in a close interaction with its environment.
The model given by 2 has not yet been used in population genetics, but special-
ized cases are applied commonly in population dynamics [21] or ecology. Given
two species with population sizes N and M, the following equations are used
The population sizes of the next generation depends on the interaction of the
two species at generation t. The interaction can be positive, this means that both
species are supporting each other. If the interaction is negative we have the classical
predator-prey system [18].
The development problem in evolutionary models has been addressed recently
by the developmental system theory [40]. Unfortunately the theory is very informal,
it has been formulated from a philosopher’s point of view. Therefore I will describe
the nucleus of an evolution theory as it has been stated by Anatol Rapaport [43].
The theory is based on the concept of an organism. “According to a soft defini-
tion, a system is a portion of the world that is perceived as a unit and that is able to
maintain its identity in spite of changes going on in it. An example of a system par
excellence is a living organism. But a city, a nation, a business firm, a university are
organisms of a sort. These systems are too complex to be described in terms of suc-
cession of states or by mathematical methods. Nevertheless they can be subjected to
methodological investigations [43].”
Rapaport then defines: “Three fundamental properties of an organism appear in
all organism-like systems. Each has a structure. That is, it consists of inter-related
parts. It maintains a short-term steady state. That is to say, it reacts to changes in
the environment in whatever way is required to maintain its integrity. It functions.
It undergoes slow, long term changes. It grows, develops, or evolves. Or it degen-
erates, disintegrates, dies. Organisms, ecological systems, nations, institutions, all
have these three attributes: structure, function, and history, or, if you will, being,
acting, and becoming.”
Taking the importance of individual development into account, I divide becom-
ing into developing and evolving. Development is the process creating a grown-up
Evolutionary Computation: Centralized, Parallel or Collaborative 567
There exist a myriad of evolutionary algorithms which model parts of general evolu-
tionary models described in the previous section. The most popular algorithm is the
genetic algorithm GA which models evolution by sexual reproduction and natural
selection. The GA was invented by Holland [22]. The optimization problem is given
by a fitness function F(x).
Genetic Algorithm
1 Define a genetic representation of the problem; set t = 0
1 Create an initial population P(0) = x01 , . . . x0N
1 Compute the average fitness F = ∑Ni F(xi )/N. Assign each individual the
normalized fitness value F(xti )/F
1 Assign each xi a probability p(xi ,t) proportional to its normalized fitness.
Using this distribution, randomly select N vectors from P(t). This gives
the set S(t)
1 Pair all of the vectors in S(t) at random forming N/2 pairs. Apply
crossover with probability pcross to each pair and other genetic operators
such as mutation, forming a new population Pt+1
1 Set t = t + 1, return to STEP2
In the simplest case the genetic representation is just a bit-string of length n, the
chromosome. The positions of the strings are called loci (sing. locus) of the chro-
mosome. The variable at a locus is called a gene, its value an allele. The set of chro-
mosomes is called the genotype which defines a phenotype (the individual) with a
certain fitness. The crossover operator links two searches. Part of the chromosome
of one individual (search point) is inserted into the second chromosome giving a
new individual (search point). We will later show with examples why and when
crossover helps the search.
568 H. Mühlenbein
So Darwin postulates, that the islands should reconvert to a large continent. There
will again be severe competition eliminating the specialized forms. This briefly
570 H. Mühlenbein
subgroups, with migration sufficiently restricted (less than one migrant per genera-
tion) and size sufficiently small to permit appreciable local differentiation.
Four different models for spatially structured populations have been investigated
mathematically
• the one-island model
• the island model
• the stepping stone model
• the isolating by distance model
In the one-island model, an island and a large continent are considered. The large
continent continuously sends migrants to the island. In the island model, the popu-
lation is pictured as subdivided into a series of randomly distributed islands among
which migration is random.
In the stepping-stone model migration takes place between neighboring islands
only. One and two dimensional models have been investigated. The isolation by
distance model treats the case of continuous distribution where effective demes are
isolated by virtue of finite home ranges ( neighborhoods) of their members. For
mathematical convenience it is assumed that the position of a parent at the time it
gives birth relative to that of its offspring when the latter reproduces is normally
distributed.
Felsenstein [9] has shown that the isolating by distance model leads to unrealis-
tic clumping of individuals. He concluded, that this model is biologically irrelevant.
There have been many attempts to investigate spatial population structures by com-
puter simulations, but they did not have a major influence on theoretical biology. A
good survey of the results of the different population models can be found in [10].
Population models with oscillation like Darwin’s continent-island cycle have not
been dealt with.
The issue raised by Wright and Fisher is still not settled. Phase 3 of Wright’s the-
ory has been recently investigated by Crow [6]. He concludes: ”The importance of
Wright’s shifting-balance theory remains uncertain, but we believe whatever weak-
nesses it may have, they are not in the third phase.”
The problem of spatial population structures is now reappearing in the theory of
genetic algorithms. The plain GA is based on Fisher’s model. It is a well known
fact, that the GA suffers from the problem of premature convergence. In order to
solve this problem, many genetic algorithms enforce diversification explicitly, vi-
olating the biological metaphor. A popular method is to accept an offspring only
if it is genetically more than a certain factor different from all the members of the
population.
Our parallel genetic algorithm PGA tries to introduce diversification more
naturally by a spatial population structure. Fitness and mating is restricted to
neighborhoods. In the PGA we have implemented the isolation by distance model
and the stepping stone model. The three phases of Wright’s theory can actually be
observed in the PGA. But the relative importance of the three phases are different
than Wright believed. The small populations do not find better peaks by random
processes. The biggest changes of the population occur at the time after migration
572 H. Mühlenbein
The creative forces of evolution take place at migration and few generations
afterwards. Wright’s argument that better peaks are found just by chance in small
subpopulations is wrong.
In our opinion the most important part of Wright’s theory is what Wright pos-
tulated as “the appearance of still more successful centers of diffusion at points of
contact”. The difference of the evolution in a large continent and small isolated is-
lands, has been recently investigated by [42].
We believe that static fitness functions cannot model natural evolution. In a real
environment the fitness of an individual depends on the outcome of its interactions
with other organisms in the environment. The fitness cannot be specified in advance.
Therefore we used for a simulation of complex spatial population structures an evo-
lutionary game.
Move C D
C 3/3 0/5
D 5/0 1/1
Given these payoffs, it is easily shown that mutual defection is the only Nash
equilibrium. Of course, the intrigue of the Prisoner’s Dilemma is that this unique
Evolutionary Computation: Centralized, Parallel or Collaborative 573
There have been many attempts to investigate the IPD with genetic algorithms. The
first simulation was performed by Axelrod [2]. Axelrod considered strategies where
the moves are based on the game’s past three-move history. The major focus of Axel-
rod’s study was on strategies evolving against a fixed environment. Each individual
played against eight representative strategies. Marks [31] extended the investigation
to bootstrap evolution, where the individuals play against each other. Miller [33]
used finite automata to represent strategies. Furthermore he investigated the effect
of informational accuracy on the outcome of the simulation. All three researchers
used the plain genetic algorithm for evolving the population. They have been inter-
ested in equilibrium states and “optimal” strategies. We concentrate on the evolution
of the behavior of the total population.
The PGA has been extended to simulate different population structures. The ma-
jor enhancements of the PGA to the plain genetic algorithm are the spatial popula-
tion structure, the distributed selection and the local hill-climbing. The individuals
are active. They look for a partner for mating in their neighborhood. The partner is
chosen according to the preference of the individuals. The best individual in a neigh-
borhood has the chance to get as many offspring as the global best individual of the
population. The PGA therefore has a very “soft” selection scheme. Each individual
has the chance that on average 50% of its genes are contained in the chromosome of
an offspring. The offspring replaces the parent. In order not to complicate the sim-
ulations the individuals are not allowed to improve their fitness by learning. This
means their strategy is fixed during their lifetime.
We now turn to the problem of genetic representation of strategies.
574 H. Mühlenbein
strategy
C * * * C C ALL-C
D * D D * * ALL-D
C * D C D C TIT-FOR-TAT
The sign * denotes that the allele on this locus does not have any influence on the
performance of the strategy. The ALL-C strategy in row one is defined as follows.
The player starts with C, then only two outcomes are possible, CD or CC. In both
cases the player plays C. The outcomes DD and DC are not possible, therefore
the entries in these columns are irrelevant. Altogether there are twelve different
bit-strings which define an ALL-C strategy. The problem of this straightforward
genetic representation is that we have a distinction between the representation and
the interpretation. The program which interprets the representation is not part of the
genetic specification and therefore not subjected to the evolution process.
But we have a clear distinction between genotype, phenotype and behavior. The
genotype is mapped into some phenotype, the phenotype together with the environ-
ment (in our case the other phenotypes) defines the strategy. Let us take the famous
TIT-FOR-TAT as an example. In TIT-FOR-TAT the player makes the move the op-
ponent made the game before. In an environment where only C is played, TIT-FOR-
TAT cannot be distinguished from an ALL-C player. A different behavior can only
be recognized if there exists an individual who occasionally plays D.
The mapping from genotype to phenotype is many-to-one. This makes a behavior
oriented interpretation of a given genetic representation very difficult. There exist no
simple structure of the genotype space. The Hamming distance between two ALL-
C genetic representations can be as large as four, whereas the Hamming distance
between two very different strategies like ALL-C and ALL-D can be as small as
one. An example is shown below
strategy
C C D D C C ALL-C
C C D D C D ALL-D
Evolutionary Computation: Centralized, Parallel or Collaborative 575
We see that for k = 0 the invaders play against the inhabitants only, the case k = 1
gives the panmictic population normally considered in the theory of evolutionary
games. Here the plays are performed according to the frequency of the actors. In
the case of k > 1 we have a clustering effect. The players play more often within
their groups (inhabitants, invaders). For k = 1/s the effect is most dramatic. The
mixed terms with P(I, J) vanish, thus the invaders and the inhabitants play within
their group only. This is a very crude model of a structured population, but it can be
used to show some important points.
A strategy is called collective stable if no strategy can invade it. A new strategy is
said to invade if the newcomer gets a higher score than the native strategy, this means
that F(I) < F(J). In order to obtain a simple formula, we assume that s is small
approximate F(I) by P(I, I). Thus the small number of plays between inhabitants
and invaders is ignored. We get
It is now easily seen that even ALL-C can withstand the invasion of ALL-D, if
there is a strong preference for each strategy to play only against each other. With
our payoff values we obtain that ALL-C will not be invaded by ALL-D if k > 0.5s−1 .
But also the other invasion is possible. ALL-C can invade an ALL-D population as
long as they “stick together”. This means they play, even after the invasion, much
more against each other than against ALL-D. 4
In a one-dimensional spatial population structure with fixed neighborhoods the
situation is more difficult. The contest between the strategies happens at the bound-
ary of the neighborhoods, whereas the individuals in the interior play only against
members of their own group. In this spatial structure the success of the invasion is
therefore totally determined by the outcomes at the boundary.
It is almost impossible to investigate realistic spatial population structures by
analytical methods, one has to use simulations. This was first done by Axelrod ([1],
pp. 158-168). Axelrod investigated a simple 2-D structure where each player had
four neighbors. The selection was very strong. If a player had one or more neighbors
which had been more successful, the player converted to the strategy of the most
successful of them. Axelrod’s major conclusion was that mutual cooperation can be
sustained in a (not too highly connected) territorial system at least as easily as it
can be in a freely mixing system. We will extend Axelrod’s work. First, different
population structures are compared and second, the strategies evolve controlled by
the genetic algorithm.
500
450
400
350
300
250
200
150
100
minimum payoff
50 maximum payoff
average payoff
0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
Generations
500
minimum payoff
maximum payoff
450
average payoff
400
350
300
250
200
150
100
50
0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
Generations
500
450
400
350
300
250
200
150
100
minimum payoff
50 maximum payoff
average payoff
0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
Generations
The effect was dramatic. Now the population always settled on non-cooperative
behavior. The situation changed with our second scheme. This extension we called
the family game. Each mating produces two offspring. After the mating the family
consisting of the two parents and the two offspring plays an IPD tournament. The
winner replaces the parent. With this selection scheme the population settled on
cooperative behavior.
The explanation of this result is simple. In the IPD non-cooperative strategies can
be eliminated if the cooperative individuals stick together. In a single contest, ALL-
D can never be beaten. It is outside the scope of this paper to compare the family
game with kin selection proposed in sociobiology [54].
In Figure 4 the continent-island cycle is shown. One easily recognizes the cycle
(20 generations island, 20 generations continent). During the continent phase the
variance is reduced, during the island phase it is increased.
In Figure 5 the average fitness of the population is shown for five different pop-
ulation structures. The simulation started with a homogeneous ALL-D population.
We investigated whether the populations will change to cooperation. We see that the
population which is subjected to the continent-island cycle is first to arrive at co-
operation. This result was consistent in ten runs. A closer analysis of the strategies
showed that the winning cooperative strategies are not naive like ALL-C, but they
resemble TIT-FOR-TAT.
In a further set of experiments we changed the game during the course of the sim-
ulation, for instance we changed the IPD to the chicken game. The spatial structured
580 H. Mühlenbein
500
450
400
350
300
250
200
150
100
50
0
0 20 40 60 80 100 120 140 160 180 200
Generations
500
Startpopulation: 50 All-D
Panmictic Population
Random Neighbours
450
Ringpopulation
Continent-Ring-Oscilation
400 Continent Cycle
350
300
250
200
150
100
50
0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
Generations
populations adapted much faster to the new game than a large panmictic popula-
tion. This is one of the extensions that have been already proposed by Axelrod for
investigation ([1], p.221).
There have been attempts to “prove” that genetic algorithms make a nearly opti-
mal allocation of trials. This result is called the “Fundamental Theorem of Genetic
Algorithms” (Goldberg [13]) We have shown already in [34] that the above claim
is only valid for simple optimization problems. In fact, in [37] we have proven a
correct schema theorem, based on Boltzmann selection and our Estimation of Dis-
tribution family of algorithms [29, 36].
The search strategy of a genetic algorithm can be explained in simple terms. The
crossover operator defines a scatter search [12] where new points are drawn out
of the area which is defined by the old or “parent” points. The more similar the
parents are, the smaller will be the sampling area. Thus crossing-over implements
an adaptive step-size control.
But crossing-over is also exploring the search space. Let us assume that the com-
binatorial problem has the building block feature. We speak of a building block fea-
ture if the substrings of the optimal solutions are contained in other good solutions.
In this case it seems a good strategy to generate new solutions by patching together
substrings of the old solutions. This is exactly what the crossover operator does.
We want to recall, that in the PGA the crossover operator is not applied to all TSP
configurations, but only to configurations which are a local minima. Our local search
is a fast version of the 2-opt heuristic developed by Lin [28]. It is a 2-opt without
checkout. It gives worse solutions than 2-opt, but the solution time scales only lin-
early with the number of cities.
We have later found that the efficiency of the PGA increases with the quality of
the local search. But the major goal of the PGA work on the TSP was to investi-
gate the problem independent aspects i.e. the population structure and the selection
schedule. Therefore many generations were needed, which could only be obtained
by a fast local search method.
We turn to a popular benchmark problem, the ATT-532 problem solved to opti-
mality in [41]. The PGA with a population size of 64 and truncated 2-opt as local
search method got a tour length of 0.10% above optimal in ten runs of t = 1080s
(1000 generations,15000 local searches) on a 64 processor system, the average fi-
nal tour length was 0.19% above optimal [14]. This is a substantial improvement
over the results in [49] for genetic 2-opt search. It demonstrates the robustness of
the parallel genetic algorithm. The PGA finds good solutions with a simple local
search also.
This implementation had some influence in the development of heuristics for the
TSP. There is a section about the different PGA implementation in Johnson and
McGeoch’s seminal paper [26].
We will compare our heuristic with a very fast and efficient heuristic proposed
by Johnson [24]. It is called iterated Lin-Kernighan search. In his implementation a
new start configuration is obtained by an unbiased 4-opt move of the tour at hand.
Then a new L-K search is started. If the search leads to a tour with a smaller tour
length, the new tour is accepted.
Johnson reports the following results. In time t = 2700s (500 L-K searches) the
optimal tour (length 27686) was output 6 of 20 IterL-K runs, the average final tour-
length was 0.05% above optimal. Multiple L-K runs gave much worse results. A
single L-K run averages 0.98% above optimal in time t = 120s. 100 L-K runs gave
584 H. Mühlenbein
a tour length of 0.25% above optimal. It needed 20000 L-K runs (t = 530hours) to
obtain a tour of length 27705.
Why is IterL-K more efficient than independent L-K runs? The success of IterL-
K depends on the fact that good local L-K minima are clustered together and not
randomly scattered. The probability to find a good tour is higher nearby a good tour
than nearby a bad tour.
We have shown in [34] that 2-opt local minima are clustered. Furthermore we
could show the following relation: The better the solutions are, the more similar
they are. This relation is the reason for the success of Johnson’s IterL-K. The rela-
tion holds, if the problem has the building block feature, which is necessary for the
success of the crossover operator of our genetic algorithm.
Iterated hill-climbing needs a fine tuned mutation rate to get with high probability
to the attractor region of a new local minimum. In the TSP case Johnson found that a
simple 4-opt move is sufficient. In other combinatorial problems it is more difficult
to find a good mutation rate and a good local heuristic like the Lin-Kernighan search
for the TSP. Therefore we share the opinion of Johnson that the TSP is in practice
much less formidable than its reputation would suggest [24]. An in-depth evaluation
of heuristics for the solution of large TSP problems can be found in [26].
min ∑
P 1≤i< j≤n
wi j
gi =g j
σ (P) is defined as
1 m 1 m
σ 2 (P) = ∑
m i=1
|Pi |2
− ( ∑ |Pi )2
m i=1
In order to solve the GPP, we have to define the genetic representation and the
genetic operators. In the simplest representation, the value (allele) gi on locus i
on the chromosome gives the number of the partition to which node vi belongs.
But this representation is highly degenerate. The number of a partition does not
have any meaning for the partitioning problem. An exchange of two partition num-
bers will still give the same partition. All together m! chromosomes give the same
fitness value.
F(G ) = ∑ wi j
1≤i< j≤n
gi =g j
All m! chromosomes code the same partitioning instance, the same “phenotype”.
The genetic representation does not capture the structure of the problem. We did not
find a better genetic representation, so we decided that the crossover operator has to
be “intelligent”. Our crossover operator inserts complete partitions from one chro-
mosome into the other, not individual nodes. It computes which partitions are the
most similar and exchanges these partitions. Mathematically speaking, the crossover
operator works on equivalence classes of chromosomes.
Figure 6 shows an example. The problem is to partition the 4×4 grid into four
partitions.
The crossover operator works as follows. Partition 2 has to be inserted into B.
The crossover operator finds, that partition 4 of B is the most similar to partition
2 in A . It identifies partition 2 of A with partition 4 of B. Then it exchanges the
alleles 2 and 4 in chromosome B to avoid the problems arising from symmetrical
solutions. In the crossover step it implants partition 2 of chromosome A into B.
After identifying all gene-loci and alleles which lead to a non-valid partition a
repair operator is used to construct a new valid chromosome. Mutation is done after
the crossover and depends on the outcome of the crossover. In the last step a local
hill-climbing algorithm is applied to the valid chromosome.
For local hill-climbing we can use any popular sequential heuristic. It should be
fast, so that the PGA can produce many generations. In order to solve very large
problems, it should be of order O(n) where n is the problem size. Our hill-climbing
algorithm is of order O(n2 ), but with a small constant. In order to achieve this small
constant, a graph reduction is made. The general outline of our hill-climbing algo-
rithm is as follows:
Local search for the GPP
1. Reduce the size of the graph by combining up to r nodes into one hyper-node
2. Apply the 2-opt of Kernighan and Lin [27] to the reduced Graph. For the GPP it
is defined as follows:
586 H. Mühlenbein
A B
1 1 2 2 3 4 4 3
3 3 2 2 3 3 4 1
3 3 4 1 2 2 4 1
4 4 4 1 2 2 1 1
check symmetries
A B
1 1 2 2 3 2 2 3
3 3 2 2 3 3 2 1
3 3 4 1 4 4 2 1
4 4 4 1 4 4 1 1
3 ? 2 2 open allels: 3 1
crossing-over 3 3 2 2
open genloci: 2 11
4 4 ? 1
4 4 1 1
3 3 2 2
repair 3 3 2 2
4 4 1 1
4 4 1 1
1200
maximum
average
1000 minimum
costs
800
600
.
431
400
0 100 200 300 400 500
generation
For EVER918 the PGA found the best solution computed so far [53]. Further
investigations have indicated that it will be difficult to construct an efficient iterative
Lin-Kernighan search for the m-GPP. First, the quality of an average L-K solution
is bad for the GPP. Second, it is difficult to determine a good mutation rate which
jumps out of the attractor region of the local minimum. This has been demonstrated
in [53]. There the following relation was shown: the better the local minimum, the
larger its attractor region.
In summary: The m-GPP problem is more difficult to solve than the TSP. The
PGA got results which are better than other known heuristics.
zi = xi + ρi · δ (k) (10)
Evolutionary Computation: Centralized, Parallel or Collaborative 589
for j = 1, . . . ν
and i − j > 0, i+ j ≤ 0
dN1 N1 N1
= r1 · N1 1 − − α12 (14)
dt K1 K2
dN2 N2 N1
= r2 · N2 1 − − α21 (15)
dt K1 K2
Here N1 , N2 denote the population sizes of the two species, r1 , r2 are the growth
rates, K1 , K2 the carrying capacities and α12 , α 21 the interaction coefficient. This
equation has been studied intensively [8]. It is very useful for understanding the
complex patterns which may arise by two interacting species. For a competition
scheme to be implemented these equations cannot be used because the interaction
coefficients cannot be specified in advance. In analogy to the above model the fol-
lowing model has been implemented.
The gain criterion (G) defines how the population size of each group is modi-
fied according to its quality. Normally, the size of the group with the best quality
increases, the sizes of all other groups decrease. The following scheme increases
the size of the best group (w0 ) by the accumulated loss of the others. The loss of a
group is proportional to its population size. The loss factor κ ∈ [0; 1] defines the rate
of loss.
The change of the population sizes is computed from the following equations:
⎧
⎪ S
⎨ ∑ N t · κ : Qi (w) > Q j (w)
Δ Ni = j=1, j =i
j ∀ j, j = i (17)
⎪
⎩
−Ni · κ
t : else
where Nit denotes the size of group i and S denotes the number of groups. The
loss factor κ is normally set to 0.125.
The population size of each group of the next generation is given by:
t
t+1
Ni + Δ Ni : Nit + Δ Ni ≥ N min
Ni = (18)
Nimin : else
The size of the population is only reduced if it is greater than the minimal
size N min .
This gain criterion leads to a fast adaptation of the group sizes. Each group looses
the same percentage of individuals.
The evaluation interval η and the migration interval θ are rather robust exter-
nal parameters. Normally we set η = 4 and θ = 16.
If one compares equations 17 with the generalized Lotka-Volterra equation 14
the following major difference can be observed. Our equations are linear whereas
the Lotka-Volterra equations contain the nonlinear term Ni · N j . The reason for this
difference is that the Lotka-Volterra equations model individual competition. If there
are many predators and each one captures two preys, then the reduction of the preys
depends on the number of predators. In contrast, our competition scheme evaluates
whole groups by taking the best individual as evaluation criterion.
The current competition scheme seems appropriate in cases when the strategies
used by the different groups differ substantially. Sometimes a competition model
might be better where even the size of the total population may vary.
the limited resource by one individual of a species — the higher the consumption
factors the lower the number of individuals which can be supported by that resource.
We implemented this extension by introducing a normalized population size Ñ.
Ñi = γi · Ni (19)
The gain criterion of equation 17 is now applied to the normalized population sizes.
The sum of the normalized population sizes remains constant because it is limited
by the limited resource K.
S
∑ Ñi = K (20)
i=1
For γi = 1.0 for i = 1, . . . , S we obtain the basic model.In contrast to the basic
model the sum of the real population sizes varies during a simulation. This extended
competition scheme can be very effective for multi-modal problems where it is use-
ful to locate the region of attraction of the global (or a good local) optimum by a
breadth search and to do the fine adaptation by an exploring strategy afterwards. In
this case the strategy performing breadth search gets a lower γ than the other strat-
egy. So the total population size is high at the beginning when the breadth search
works and low at the end when the fine adaption is done. Thus, the whole population
size is adapted during the run by the competition model.
Numerical results for difficult test functions can be found in [45]. A discussion
about he evolution of the population sizes during a run can be found in [44].
8 Conclusion
Complex spatial population structures are seldom used in evolutionary computa-
tion. In this chapter we have investigated the stepping-stone model, competing sub-
populations, and Darwin’s continent-island cycle. For Darwin’s conjecture an evo-
lutionary algorithm was used where the fitness of each individual is given by the
competition with other individuals. The competition is modeled by evolutionary
games. The parallel genetic algorithm PGA uses the stepping-stone interaction. It
runs totally in parallel. The selection is distributed and done by each individual in
its neighborhood. Faster convergence can be obtained by the Breeder Genetic Algo-
rithm BGA. It models breeding as it is done by a human breeder. For really difficult
optimization problems the competing BGA has been developed. It uses compet-
ing sub-populations which are bred using different strategies. Occasionally good
individuals migrate to other sub-populations. The sizes of the sub-populations are
adjusted according to their performance.
Darwin’s cycle model seems also a good starting point for investigating the devel-
opment of new ideas in human societies, be it in science or art. It takes small groups
or even a single individual to try out new ideas. But for the ideas to be accepted
a large community is needed. In a large community many individuals evaluate the
new ideas, only the most promising eventually survive.
Evolutionary Computation: Centralized, Parallel or Collaborative 593
References
1. Axelrod, R.: The evolution of cooperation. Basic, New York (1984)
2. Axelrod, R.: The evolution of strategies in the iterated prisoner’s dilemma. In: Davis, L.
(ed.) Genetic algorithms and Simulated Annealing, pp. 32–41. Morgan Kaufmann, Los
Altos (1987)
3. Batz, M.: Evolution von Strategien des Iterierten Gefangenen Dilemma. Master’s thesis,
Universität Bonn (1991)
4. Cavalli-Sforza, L.L., Feldman, M.W.: Cultural Transmission and Evolution: A Quantita-
tive Approach. Princeton University Press, Princeton (1981)
5. Cohoon, J.P., Hedge, S.U., Martin, W.N., Richards, D.: Punctuated equilibria: A parallel
genetic algorithm. In: Grefenstette, J.J. (ed.) Proceedings of the Second International
Conference on Genetic Algorithms, pp. 148–154. Lawrence Erlbaum, Mahwah (1987)
6. Crow, J.F., Engels, W.R., Denniston, C.: Phase three of wright’s shifting balance theory.
Evolution 44, 233–247 (1990)
7. Dawkins, R.: The Selfish Gene, 2nd edn. Oxford University Press, Oxford (1989)
8. Emlen, J.M.: Population Biology: The Coevolution of Population Dynamics and Behav-
ior. Macmillan Publishing Company, New York (1984)
9. Felsenstein, J.: A pain in the torus: Some difficulties with models of isolation by distance.
Amer. Natur. 109, 359–368 (1975)
10. Felsenstein, J.: The theoretical population genetics of variable selection and migration.
Ann. Rev. Genet. 10, 253–280 (1976)
11. Fisher, R.A.: The Genetical Theory of Natural Selection. Dover, New York (1958)
12. Glover, F.: Heuristics for integer programming using surrogate constraints. Decision Sci-
ences 8, 156–166 (1977)
13. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning.
Addison-Wesley, Reading (1989)
14. Gorges-Schleuter, M.: Asparagos: An asynchronous parallel genetic optimization strat-
egy. In: Schaffer, H. (ed.) 3rd Int. Conf. on Genetic Algorithms, pp. 422–427. Morgan-
Kaufmann, San Francisco (1989)
15. Gorges-Schleuter, M.: Genetic Algorithms and Population Structures - A Massively Par-
allel Algorithm. PhD thesis, University of Dortmund (1991)
16. Gould, S.J., Eldredge, N.: Punctuated equilibria: the tempo and mode of evolution re-
considered. Paleobiology 3, 115–151 (1977)
17. Griffiths, P.E.: The philosophy of molecular and developmental biology. In: Blackwell
Guide to Philosophy of Science. Blackwell Publishers, Malden (2002)
18. Gurney, W.S.C., Nisbet, R.M.: Ecological Dynamics. Oxford University Press, New York
(1998)
19. Haeckel, E.: Über die Entwicklungstheorie Darwin’s. In: Gemeinverständliche Vorträge
und Abhandlungen aus dem Gebiet der Entwicklungslehre. Emil Strauss, Bonn (1902)
20. Hamilton, W.D.: The genetical evolution of social behavior I and II. Journal of Theoret-
ical Biology 7, 1–16, 17–52 (1964)
21. Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics. Cambridge
University Press, Cambridge (1998)
22. Holland, J.H.: Adaptation in Natural and Artificial Systems. Univ. of Michigan Press,
Ann Arbor (1975/1992)
23. Jablonka, E., Lamb, M.J.: Evolution in Four Dimensions. MIT Press, Cambridge (2005)
24. Johnson, D.S.: Local optimization and the traveling salesman problem. In: Paterson, M.S.
(ed.) Automata, Languages and Programming. LNCS, vol. 496, pp. 446–461. Springer,
Heidelberg (1990)
594 H. Mühlenbein
25. Johnson, D.S., Aragon, C.R., McGeoch, L.A., Schevon, C.: Optimization by simu-
lated annealing: An experimental evaluation; part i, graph partitioning. Operations Re-
search 37, 865–892 (1989)
26. Johnson, D.S., McGeoch, L.A.: The traveling salesman problem: a case study. In: Aarts,
E., Lenstra, J.K. (eds.) Local Search in Combinatorial Optimization, pp. 215–310. Wiley,
Chichester (1997)
27. Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell
System Technical Journal 2, 291–307 (1970)
28. Lin, S.: Computer solutions of the traveling salesman problem. Bell. Syst. Techn.
Journ. 44, 2245–2269 (1965)
29. Mahnig, T., Mühlenbein, H.: A new adaptive Boltzmann selection schedule SDS. In:
Proceedings of the 2001 Congress on Evolutionary Computation, pp. 183–190. IEEE
Press, Los Alamitos (2001)
30. Manderick, B., Spiessens, P.: Fine-grained parallel genetic algorithm. In: Schaffer, H.
(ed.) 3rd Int. Conf. on Genetic Algorithms, pp. 428–433. Morgan-Kaufmann, San Fran-
cisco (1989)
31. Marks, R.E.: Breeding hybrid strategies: Optimal behavior for oligopolist. In: Schaffer,
H. (ed.) 3rd Int. Conf. on Genetic Algorithms, pp. 198–207. Morgan Kaufmann, San
Mateo (1989)
32. Maturana, H.R., Varela, F.J.: Autopoiesis and Cognition: The Realization of the Living.
D. Reidel, Boston (1980)
33. Miller, J.K.: The coevolution of automata in the repeated prisoner’s dilemma. Technical
report, Santa Fe Institute (1989)
34. Mühlenbein, H.: Evolution in time and space - the parallel genetic algorithm. In: Rawl-
ins, G. (ed.) Foundations of Genetic Algorithms, pp. 316–337. Morgan Kaufmann, San
Mateo (1991)
35. Mühlenbein, H., Gorges-Schleuter, M., Krämer, O.: Evolution algorithms in combinato-
rial optimization. Parallel Computing 7, 65–88 (1988)
36. Mühlenbein, H., Höns, R.: The factorized distribution algorithm and the minimum rel-
ative entropy pronciple. In: Pelikan, M., Sastry, K., Cantu-Paz, E. (eds.) Scalable Opti-
mization via Probabilistic Modeling, pp. 11–37. Springer, New York (2006)
37. Mühlenbein, H., Mahnig, T.: Evolutionary optimization and the estimation of search
distributions with applications to graph bipartitioning. Journal of Approximate Reason-
ing 31(3), 157–192 (2002)
38. Mühlenbein, H., Schlierkamp-Voosen, D.: Predictive Models for the Breeder Genetic
Algorithm I. Continuous Parameter Optimization. Evolutionary Computation 1, 25–49
(1993)
39. Mühlenbein, H., Schlierkamp-Voosen, D.: The science of breeding and its application to
the breeder genetic algorithm. Evolutionary Computation 1, 335–360 (1994)
40. Oyama, S.: Evolutions’s Eye. Duke University Press, Durham (2000)
41. Padberg, W., Rinaldi, G.: Optimization of a 532-city symmetric traveling saleman prob-
lem by branch and cut. Op. Res. Let. 6, 1–7 (1987)
42. Parisi, D., Ugolini, M.: Living in enclaves. Complexity 7, 21–27 (2002)
43. Rapaport, A.: Modern systems theory – an outlook for coping with change. General
Systems XV, 15–25 (1970)
44. Schlierkamp-Voosen, D., Mühlenbein, H.: Strategy adaptation by competing subpopula-
tions. In: Davidor, Y., Schwefel, H.-P., Männer, R. (eds.) PPSN 1994. LNCS, vol. 866,
pp. 199–208. Springer, Heidelberg (1994)
45. Schlierkamp-Voosen, D., Mühlenbein, H.: Adaptation of population sizes by competing
subpopulations. In: Proceedings IEEE Conference on Evolutionary Computation, pp.
330–335. IEEE Press, New York (1996)
Evolutionary Computation: Centralized, Parallel or Collaborative 595
46. Maynard Smith, J.: Evolution and the Theory of Games. Cambridge University Press,
Cambridge (1982)
47. Maynard Smith, J., Szathmary, E.: The Major Transitions in Evolution. W.H. Freeman,
Oxford (1995)
48. Tanese, R.: Distributed genetic algorithm. In: Schaffer, H. (ed.) 3rd Int. Conf. on Genetic
Algorithms, pp. 434–440. Morgan-Kaufmann, San Francisco (1989)
49. Ulder, N.L.J., Pesch, E., van Laarhoven, P.J.M., Bandelt, H.-J., Aarts, E.H.L.: Improving
tsp exchange heuristics by population genetics. In: Maenner, R., Schwefel, H.-P. (eds.)
Parallel Problem Solving from Nature, pp. 109–116. Springer, Heidelberg (1991)
50. Voigt, H.-M., Mühlenbein, H.: Gene Pool Recombination and the Utilization of Covari-
ances for the Breeder Genetic Algorithm. In: Michalewicz, Z. (ed.) Proc. of the 2nd IEEE
International Conference on Evolutionary Computation, pp. 172–177. IEEE Press, New
York (1995)
51. Voigt, H.-M., Mühlenbein, H., Cvetković, D.: Fuzzy recombination for the continuous
breeder genetic algorithm. In: Eshelman, L.J. (ed.) Proc. of the Sixth Int. Conf. on Ge-
netic Algorithms, pp. 104–112. Morgan Kaufmann, San Francisco (1995)
52. von Laszewski, G.: Ein paralleler genetischer Algorithmus für das Graph Partition-
ierungsproblem. Master’s thesis, Universität Bonn (1990)
53. von Laszewski, G.: Intelligent structural operators for the k-way graph partitioning prob-
lem. In: Belew, R.K., Booker, L. (eds.) Procedings of the Fourth International Conference
on Genetic Algorithms, pp. 45–52. Morgan Kaufmann, San Mateo (1991)
54. Wilson, D.S., Dugatkin, L.A.: Nepotism vs tit-for-tat, or, why should you be nice to your
rotten brother. Evol. Ecology 5, 291–299 (1991)
55. Wright, S.: The distribution of gene frequencies in populations. Proc. Nat. Acad. Sci 24,
253–259 (1937)
56. Wright, S.: Factor interaction and linkage in evolution. Proc. Roy. Soc. Lond. B 162,
80–104 (1965)
Part IX
CI for Clustering and Classification
Fuzzy Clustering of Likelihood Curves for
Finding Interesting Patterns in Expression
Profiles
Abstract. Peptides derived from proteins are routinely analysed in so-called bottom-
up proteome studies to determine the amounts of corresponding proteins. Such stud-
ies easily sequence and analyse thousands of peptides per hour by the combination
of liquid chromatography and mass spectrometry instruments (LC-MS). However,
quantified peptides belonging to the same protein do not necessarily exhibit the
same regulatory information in all cases. Several causes can produce these regu-
latory inconsistencies at the peptide level. Quantitative data might be simply in-
fluenced by specific properties of the analytical procedure. However, it can also
indicate meaningful biological processes such as the post-translational modification
(PTM) of amino acids regulated in individual protein regions. This article describes
a fuzzy clustering approach allowing the automatic detection of regulatory peptide
clusters within individual proteins. The approach utilises likelihood curves to sum-
marise the regulatory information of each peptide, based on a noise model of the
used analytical workflow. The shape of these curves directly correlates with both
the regulatory information and the underlying data quality, serving as a representa-
tive starting point for fuzzy clustering of peptide data assigned to one protein.
1 Introduction
Cellular processes are mediated by proteins acting e.g. as enzymes in different
metabolic or signalling pathways. Their activity is determined by (i) their abundance
Claudia Hundertmark
Department for Cell Biology, Helmholtz Centre for Infection Research,
Inhoffenstr. 7, D-38124 Braunschweig, Germany
e-mail: [email protected]
Lothar Jänsch
Department for Cell Biology, Helmholtz Centre for Infection Research,
Inhoffenstr. 7, D-38124 Braunschweig, Germany
e-mail: [email protected]
Frank Klawonn
Department of Computer Science, University of Applied Sciences,
Braunschweig/Wolfenbuettel, Salzdahlumer Str. 46/48, D-38302 Wolfenbuettel, Germany
e-mail: [email protected]
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 599–622.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
600 C. Hundertmark, L. Jänsch, and F. Klawonn
controlled by gene expression, and (ii) modifications made following (post) their
synthesis (translation) at ribosomes. These post-translational modifications (PTMs)
alter the chemical structure of protein-constituting amino acids. The modifications
occur only at specific regions of the protein sequence and often control essential
intra- and intermolecular binding and activity properties of the modified proteins.
Therefore, proteome research, i.e. the systematic characterisation of proteins, aims
to develop quantitative strategies suitable for both protein expression and PTM anal-
yses. In so-called bottom-up approaches proteins are routinely digested first into
peptides, resulting in complex samples comprising unmodified and modified pro-
tein regions. Following this, all peptides are separated by liquid chromatography
and can be analysed quantitatively by mass spectrometry (LC-MS). Thus, a com-
parative investigation of peptides derived from cells in different physiological states
or/and under variable environmental conditions provides data that characterizes pro-
teins as well as the post-translational modifications that are involved in biological
processes.
If a protein has changed its function at the expression level, it is likely that pep-
tides representing different regions of this protein will be regulated in a consistent
manner. In contrast, scientists have to consider PTMs if individual peptides present-
ing individual protein regions are differently regulated.
Regulatory information can simply be presented as ratios (regulation factor,
expression ratio), which are calculated from pairwise comparisons of detected
amounts of the same peptide under different conditions. However, calculating the
expression states of proteins itself requires a statistical strategy to combine the
regulatory information of peptides belonging to the same protein correctly. Signal
intensities in mass spectrometry are compromised by noise, resulting in variable
expression ratios even under assumed constant experimental conditions. Thus, reg-
ulatory peptide data for one protein are often similar but never identical, raising the
question: which variation is caused by simple noise and which indicates individually
regulated protein regions that should be excluded from general protein expression
calculations?
We have recently established a noise model-based workflow for the iTRAQTM
technology frequently used in quantitative proteome research [8]: Different types of
iTRAQTM reporter molecules are linked to peptides and produce sample-specific
ions (reporter masses of 114, 115, 116 or 117 Dalton) during MS analyses. The
measured relative reporter intensities correlate with the relative ratios of peptides in
comparatively analysed biological samples. Following this, a mathematical model
calculates the noise inherent in a peptide’s regulatory information that is generally
decreasing with increasing intensity of the detected iTRAQTM reporter signals. The
noise model allows calculation of both the most likely regulation factor and the
probability of alternative regulations, based on the underlying MS data qualities.
Both aspects are graphically summarised in likelihood curves that were established
systematically for all regulatory peptide data. Overlapping curves of peptides be-
longing to the same protein often form a kind of main cluster indicating the general
expression state of the total protein. In contrast, the regulation of individual protein
regions is probably significant, if the likelihood curve of the corresponding peptide
Fuzzy Clustering of Likelihood Curves for Finding Interesting Patterns 601
does not substantially overlap with other curves of the main cluster. Those outlying
curves are called outliers. Importantly, these cases can detect regulated PTMs but
also can reveal peptides that cannot be assigned unambiguously to one protein.
Therefore, computer-aided clustering of regulatory peptide data is a feasible re-
source for both quality management in proteome studies and the detection of impor-
tant biological processes, such as post-translational modifications. In this chapter a
prototype-based fuzzy clustering approach is presented to inspect the variations of
regulatory peptide data at the protein level. In a first step, regulatory information is
calculated and visualised by a probabilistic approach resulting in likelihood curves
for each individual peptide. Then, the likelihood calculations for all peptides be-
longing to one protein are inspected by fuzzy clustering in order to detect outlying
curves. Since the algorithm for the detection of peptide clusters is based on fuzzy
clustering, our collaborative approach combines probabilistic concepts as well as
principles from soft computing. However, fuzzy clustering is usually based on data
points and its application to likelihood curves was a challenging task. An integrative
concept is presented and discussed in this article with particular respect to distance
and quality measures.
2.1 Proteomics
Protein-dependent gene regulation determines the selection and synthesis of differ-
ent mRNAs, resulting in the translation of proteins at ribosomes. Proteins are large
organic compounds made of 20 different types of amino acid and consist of up to
several thousands of amino acids. Short sequences comprising less then 100 amino
acids are usually termed peptides.
Once produced, proteins mediate, control and regulate almost all cellular pro-
cesses and establish the physiological and reactive capacities of organisms. Some
proteins have structural or mechanical functions, such as actin in the cytoskeleton or
myosin in muscles. However, the majority of proteins act as enzymes, which catalyse
chemical reactions. Each organic compound (e.g. metabolite) is the product of enzy-
matic activities. In addition, proteins constitute a dynamic network which transmits
and integrates environmental and internal signals that are indispensable for cellu-
lar communications. The molecular interactions of signalling proteins are frequently
regulated by post-translational modifications which are again catalysed by enzymes.
Currently, about 200 [13] different modifications have been described for proteins,
altering their structure and concomitantly their activity state, localisation or stability.
In contrast to the genome, the proteome, i.e. the sum of all proteins of a cell
is per se highly dynamic and varies significantly with regard to its qualitative and
quantitative composition during the cell cycle and depending on the environmental
conditions. Proteomics aims at the identification and representative characterisation
of all proteins in a cell under defined conditions. Since technologies for absolute
protein quantifications are still limited and the corresponding strategies very time
and cost intensive, proteome studies are usually comparative, yielding relative quan-
tifications. The quantifications are based either on staining procedures and resulting
signals from gel separated proteins or in the case of LC-MS/MS based on labelling
strategies and sample specific ion intensities (see below).
under specific conditions and to define the networks, processes and signalling path-
ways involved. LC-MS/MS typically investigates protein-derived peptides. Thus,
strategies for the quantification and characterisation of proteins should preferen-
tially assess the peptide level.
Besides SILAC [10], iTRAQTM – introduced by [12] in 2004 – became the stan-
dard for relative quantifications of automatically sequenced peptides. iTRAQTM al-
lows differential labelling and relative as well as absolute peptide quantification of
up to eight different samples in parallel. During the labelling process only one type
out of eight iTRAQTM molecules is linked covalently to every peptide from one bi-
ological sample. All eight iTRAQTM molecules have the same structure and molec-
ular weight, but differ in the distribution of incorporated isotopes (Figure 1). In the
intact molecule the total mass is balanced and each labelling reaction introduces an
identical mass shift. However, under the conditions of peptide sequencing (MS/MS)
iTRAQTM also produces fragment ions that differ in mass and serve as sample spe-
cific reporters: Same peptides (with identical amino acid sequence) from different
biological samples, which were labelled differentially and are subsequently pooled
exhibit the same biochemical properties and total masses. Consequently, identical
peptides from different samples co-elute at the same time from chromatographic
columns, enter with the same molecular weight the MS device and are subjected
commonly to the fragmentation process. The ratios of the released iTRAQTM re-
porter ions correlate with the relative abundance of the analysed peptides as part of
the investigated samples.
Fig. 1 Chemical constitution of the iTRAQTM molecules: reporter group, balance group and
reactive group (taken from [11])
Fuzzy Clustering of Likelihood Curves for Finding Interesting Patterns 605
Fig. 2 iTRAQTM workflow: proteins from samples A and B are digested, peptides are
iTRAQTM labelled and combined. When performing LC-MS/MS peptide bonds between
the amino acids as well as bonds of iTRAQTM molecules and peptides are broken. Subse-
quently, peptides are used for identification and iTRAQTM molecules are used for relative
quantification
Table 1 Two different peptides are found in both analysed samples. The samples were la-
belled quantitatively with iTRAQTM reagents 115 and 117 that were experimentally found
with ion intensities of 200 and 400, respectively. Therefore, the 115-labelled sample con-
tains half the amount of peptide 1 than the 117-labelled sample (regulation factor = 0.5). In
constrast, the fragmentation pattern of peptide 2 exhibited reporter intensities of 80 and 40
indicating a regulation factor of 2
expression ratio calculation more while detecting low intensities than it is the case
with high intensities. Therefore, the noise inherent in intensity measurements must
be estimated. How can this be done?
Each intensity is measured several times. In the ideal case without noise and
without regulation, all measurements should be identical. Since noise cannot be
ruled out, it is impossible to know the true intensities. Due to the fact that normally
only very few samples (between two and eight) are analysed in parallel, available
data do not conform to statistical methods. Hence, performing noise estimation from
the sample to be analysed is not recommended. In order to specify the noise char-
acteristics, we prepared a special training dataset. We repeatedly analysed selected
synthesised and iTRAQTM labelled peptides, generating intensities over a broad
dynamic range. Of course, the obtained data are only reliable specifically for that
instrument, which was used for the measurements.
The noise follows a log-normal distribution whose variance depends on the (true)
intensity. It does not seem appropriate to assume a normal distribution of the noise
directly, since intensities are always non-negative. Since calculations are much eas-
ier with normal distributions, in most cases we will consider the data after taking
their logarithm.
The general
problem to be solved is as follows. A data set of the following form
(1) (k ) (1) (k )
is given: y1 , . . . , y1 1 , . . . , yn , . . . , yn n . (Here we use the transformed data.)
Fig. 3 Regulation factors versus intensity of iTRAQTM 115 labelled sample (both logarith-
mic transformed). Intensity-dependent noise: ratios derived from high intensities are signifi-
cantly more accurate than ratios derived from low intensities
608 C. Hundertmark, L. Jänsch, and F. Klawonn
(1) (k )
yi , . . . , yi i represents ki noisy measurements of the same (logarithmic) unknown
intensity μi .
(1) (k )
We assume that the subsample yi , . . . , yi i originates from independent samples
of a normal distribution with unknown mean μi and unknown variance σi . From
experiments we know that the variances follow a certain tendency. Small intensities
are less reliable (more noisy) than larger ones. In order to take this into account, we
assume that we have
σ (μi ) = a + re−λ μi (1)
( j)
n ki
1 (y − μi )2
∏ ∏ (a + re−λ μi )√2π exp − 2(a i+ re−λ μi )2 . (2)
i=1 j=1
The factors are simply the densities of normal distributions with mean μi and devi-
ation σi = a + re−λ μi .
As mentioned before, the maximisation of L does not only involve the determi-
nation of the parameters a, r and λ , but also the estimation of the μi . Assuming
the parameters a, r and λ to be fixed at the moment, we estimate the μi -values in
the following way. Since the maximisation of the log-likelihood is equivalent to
the maximisation of the likelihood itself, we consider – as usual in maximum like-
lihood estimation – the log-likelihood. When the parameters a, r and λ are fixed,
the μi -values can be optimised independently. This means we have to maximise the
log-likelihoods
ki √ ( j)
(yi − μi )2
L̃i = ∑ − ln( 2π ) − ln(h(μi )) − (3)
j=1 2(h(μi ))2
d L̃i ki
h ( μi ; θ )
d μi
= ∑ −
h(μi ; θ )
j=1
(x − μi )(h(μi ; θ ) + (xi − μi )h (μi ; θ ))
( j) ( j)
+ i = 0 (4)
h 3 ( μi ; θ )
ki
d L̃i
d μi
= ∑ (λ re−λ μi )(a + re−λ μi )2
j=1
+ (xi − μi )((a + reλ μi ) + (xi − μi )(−λ re−λ μ )
( j) ( j)
= 0 (5)
Solving (5) for μi yields the maximum likelihood estimation for μi , assuming the
parameters a, r and λ to be fixed. This is done in a numerical manner by a simple
bisection strategy. As one boundary for bisection, we choose the mean value of the
( j)
yi . The second one is determined by systematically searching to the left and right
from this value until the sign of (5) changes.
The optimisation of the parameters a, r and λ is carried out by a stochastic heuris-
tic algorithm, an evolution strategy [1] with adaptive mutation rates, population size
= 10, number of children = 25, maximum tolerated number of succeeding genera-
tions in which no improvement could be achieved = 20 and maximum number of
iterations = 200. The fitness of a parameter combination (a, r, λ ) is given by (2),
where the μi are determined as described above based on solving (5).
All of the likelihood plots that are presented in the following examples are de-
rived from proteins of differentially stimulated and quantitatively analysed cells.
An approach to the calculation of the best fitting regulation factor (expression ratio)
for a group of peptides was derived from the noise model. Generally, this method
can be applied to two different cases; firstly, the calculation of protein regulation
factors. Several proposals were submitted in the last few years [4, 9], but until now
none of those considered the reliability of the underlying peptide signals. The sec-
ond application is the calculation of expression ratios of multiple observed identical
peptides. As usual in MS/MS, the identification of every detected peptide is linked
to an MS/MS spectrum. Since multiple selection for fragmentation of one peptide
is not unusual, MS/MS datasets may contain redundancies in the form of multiple
measured peptides. In the following we will refer to every single identification of a
peptide as a matched MS/MS spectrum.
According to the noise model, expression ratios derived from high intensities
have to outweigh the expression ratios derived from low intensities. In order to find
the most suitable expression ratio for a peptide from a group of possibly different ex-
pression ratios we scan all expression ratios c j inside an individually specified inter-
val [cmin , . . . , cmax ] for all MS/MS spectra that match the actual considered peptide.
A likelihood value l j is determined corresponding to every c j . The most suitable
overall expression ratio cbest is that one which results in the maximum likelihood
value lbest .
In addition to the calculation of a protein’s and peptide’s best fitting regulation
factor, this method can be used for the detection of mismatched peptides. Often,
parts of related proteins are identical in their amino acid sequence. A peptide that
was identified by MS/MS can not be matched unambiguously to one of the related
proteins if its sequence is part of more than one protein. Thus, for regulated peptides
it has to be tested whether their sequence occurs exclusively only in one protein. If
not, such regulatory events can indicate to significant mistakes in the identification
reports. The determination of outliers is done by fuzzy clustering (Section 4).
Fig. 4 Protein CSK21 HUMAN: 5 peptides combined to a single likelihood curve represent-
ing the total protein
can be combined in such a way that some special peptides can be plotted separately
in contrast to the remaining peptides of the protein, which are given by the protein
curve (Figure 6).
612 C. Hundertmark, L. Jänsch, and F. Klawonn
Fig. 6 Protein K1C14 HUMAN: 3 peptides combined to the protein likelihood curve, 1 pep-
tide represented by an individual curve
function with constraints. The most common approach is the so called probabilistic
clustering with the objective function
c n
f = ∑ ∑ umij di j (6)
i=1 j=1
In this equation it is assumed that the number of clusters c is fixed. How to de-
termine the number of clusters will be discussed later on. ui j is the membership de-
gree of data object x j to the ith cluster. di j is some distance measure specifying the
distance between data object x j and cluster i, for instance the (squared) Euclidean
distance of x j to the ith cluster centre when the data objects are simple points, not
likelihood curves as in the case of the investigations here. The parameter m > 1,
called the fuzzifier, controls by how much clusters may overlap. The constraints (7)
lead to the name probabilistic clustering, since in this case the membership degree
ui j can also be interpreted as the probability that x j belongs to cluster i. The pa-
rameters to be optimised are the membership degrees ui j and the cluster parameters
that are not given explicitly here. They are hidden in the distances di j . Since this
is a non-linear optimisation problem, the most common approach to minimise the
objective function (6) is to alternatingly optimise either the membership degrees or
the cluster parameters while considering the other parameter set as fixed. Assuming
the cluster parameters and therefore the values di j as fixed, the best choice for the
membership degrees is given by
1
ui j =
1 . (8)
di j m−1
∑ck=1 dk j
If di j = 0 for one or more clusters, one must deviate from (8) and assign x j with
membership degree 1 to one of the clusters with di j = 0 and choose ui j = 0 for the
other clusters i.
The update equation for the cluster parameters or cluster prototypes strongly de-
pends on the type of cluster. For the specific case of likelihood curves, an algorithm
is proposed in the following section.
Cluster validity measures are used to validate a clustering result in general and
also to determine the number of clusters. In order to fulfil the latter task, the cluster-
ing might be carried out with different numbers of clusters and the one yielding the
best value of the validity measure is assumed to have the correct number of clusters.
A straight forward validity measure is the objective function (6) itself. However,
(6) will always decrease with increasing number of clusters. Therefore, if the num-
ber of clusters is determined based on (6), the procedure is as follows. The number
of clusters c is increased step by step starting from c = 1 and (6) is evaluated each
614 C. Hundertmark, L. Jänsch, and F. Klawonn
Discretisation leads to
lx
di j = 1 − ∑ min{x j , vi }.
(t) (t)
(12)
t=1
For the identification of the best partition of all peptide curves into c clusters (c
fixed) c prototypes are initialised firstly. Subsequently, for all prototypes i ∈ [1 . . . c]
and all peptides j ∈ [1 . . . n] the membership degrees ui j are calculated by Eq. (8).
The initialisation and update scheme for the cluster prototypes is described in detail
in the following.
The listing below gives a general idea of the algorithm.
result[];
for ( 1 ≤ c ≤ n ){
f(pold) = ∞;
p := initialise prototypes (curves, c);
d := calculate distances (curves, p);
u := calculate u (d);
f(p) := evaluate cluster (u, d);
while ( |f(p) − f(pold) | > ε ){
pold := p;
f(pold) := f(p) ;
p := update prototypes (p, curves, u);
d := calculate distances (curves, p);
u := calculate u (d);
f(p) := evaluate cluster (u, d);
}
result[c-1] = pold ;
}
return result;
616 C. Hundertmark, L. Jänsch, and F. Klawonn
Initialisation
Updating
The aim of repeated updating is the generation of a new set of prototypes from the
previous set of prototypes. In the case of crisp clustering, where a likelihood curve
l j either belongs to a cluster i (ui j = 1) or not (ui j = 0), the update procedure is ex-
plained very easily. The more curves are overlapping at a position x, then the more
the objective function for the clustering will be reduced when the prototype curve
has a high value there as well. At first, areas with a high number of overlapping
curves are added to the prototype. Step by step, less overlapping areas are added
as well, and the procedure is finished when the area under the prototype likelihood
curve reaches the value 1. Therefore, the new prototype likelihood curve is com-
posed of those areas, where the most likelihood curves are overlapping. Besides the
number of overlapping curves the weight wi j
c n
wi j = ∑ ∑ um
(t) (t)
ij (13)
i=1 j=1
strongly affects the development of a prototype i from likelihood curves l j and the
former prototype.
The update procedure for the actual prototype i is as follows: We assume that
the horizontal axis is divided into T intervals of equal length l. Initially, all points
(t)
p j , that are part of the likelihood curve j, 0 ≤ j ≤ n, are evaluated by application
of equation (13) related to the considered prototype i. Therefore, points belonging
to a likelihood curve which is similar to prototype i (high membership degree ui j ),
are evaluated better than those which belong to a curve that is less overlapping with
prototype i. Furthermore, the weight is increasing with every additional curve, which
is overlapping with curve j in the considered interval.
A simple heuristic strategy to add the most interesting points to the new prototype
is the following: All of the points are sorted in decreasing order with respect to
their weights, regardless of their belonging to a special peptide curve. One after
another, the points with the highest weights are added to the prototype likelihood
curve under the constraint that every newly added point must be directly adjacent
to the present dataset. This means that the x-coordinate of the new point x p must
not exceed the borders of the present interval of support of the partially constructed
likelihood curve [xmin , ..., xmax ] for more than one interval length l (xmin − l ≤ x p ≤
xmax + l).
Fuzzy Clustering of Likelihood Curves for Finding Interesting Patterns 617
Fig. 7 Protein PCTK1 HUMAN: Resulting prototype after initialisation with the labelled
curve and all possible updates. The prototype likelihood curve consists mainly of those areas
where most of the data likelihood curves overlap
Figure 7 presents the resulting prototype after initialisation with the labelled
curve and all possible updates. In this example we assumed that all six pep-
tides build a single cluster (c = 1) and the prototype representing this cluster was
calculated.
Δ = | fc−1 − fc | (14)
4.3 Examples
In order to demonstrate the difficulties described in Section 4.2 in determination of
the optimal number of clusters copt and to present some results, we now give two
examples. Every one of the following figures consists of a protein likelihood plot
(top) and a combined illustration of the validity measures partition entropy, partition
coefficient as well as the objective function.
First of all, we show a plot containing 5 curves (Figure 8). By visual inspection
they are arranged in a single cluster. The calculated overlaps of all 5 curves with the
final prototype for c = 1 are presented in Table 2. Since the overlap of all 5 curves
with the prototype is greater than the half of each curves’ total area, the algorithm
terminates with the outcome that all peptide curves are to be considered as one single
cluster. Validity measures, on the other hand, give no clear result. The outcome of
the investigation of partition entropy is quite clear and proposes 2 clusters, partition
coefficient analysis slightly tends towards 1 cluster. The objective function’s slope
finally gives no information about the quality of a single cluster in principal and is
not to be regarded in this case.
Table 2 Overlapping areas of likelihood curves and resulting prototype in the case of c = 1.
As the overlaps are at least half of the curve’s total area in every case, the clustering algorithm
returns that one single cluster was found
The likelihood plot of the second example (Figure 9) clearly shows 2 clusters,
each containing 2 curves. The results of the analysis of area overlaps of peptide
curves and prototype in the case of c = 1 are given in Table 3. As 2 of the 4 curve
overlaps are less than 50%, the existence of one single cluster can be excluded. Par-
tition entropy as well as the slope of objective function yield the result that the data
builds 2 clusters. However, the interpretation of the partition coefficient is not easy,
Fuzzy Clustering of Likelihood Curves for Finding Interesting Patterns 619
Fig. 8 Protein LYN HUMAN: Top: 5 peptide curves clustering into 1 cluster. Bottom: parti-
tion entropy giving copt = 2 (+), objective function resulting copt = 2 (•), partition coefficient
giving copt = 1 (∗)
620 C. Hundertmark, L. Jänsch, and F. Klawonn
Fig. 9 Protein MK01 HUMAN: Top: 4 peptide curves clustering into 2 clusters. Bottom:
partition entropy giving copt = 2 (+), objective function resulting copt = 2 (•), partition coef-
ficient ranging between copt = 1 and copt = 2 (∗)
Fuzzy Clustering of Likelihood Curves for Finding Interesting Patterns 621
Table 3 Overlapping areas of likelihood curves and resulting prototype in the case of c = 1.
As the overlaps are not at least the half of the curve’s total area in every case, the clustering
algorithm returns that more than one single cluster was found
as its value is between 1 and 2 clusters. What is the reason for this ambiguous be-
haviour? If the optimal number of clusters were copt = 2 as in this case, partition
coefficient c=2 must be higher than partition coefficient c=3 . Since this is an exam-
ple with n = 4 likelihood curves in a total of 3 clusters represented by 3 different
prototypes means that nearly every curve has its own prototype and the partition
coefficient c=3 will rarely be significantly less than partition coefficientc=2 . In any
case, samples resulting with partition coefficient c=3 ≥ partition coefficientc=2 do
occur. This kind of problem is caused by the very low number of clustering objects.
Depending on initialisation and the further design of prototypes both partition co-
efficient and partition entropy can be affected. As slope of objective function on
the other hand is a more reliable measure we prefer to use this one instead of the
established validity measures in our special case.
5 Conclusions
We have introduced a noise model and derived likelihood curves for the visualisation
of regulatory information and the analysis of the robustness of quantitative LC-
MS/MS data after iTRAQTM -labelling. Furthermore, we presented an approach for
fuzzy clustering of likelihood curves, in order to reveal erroneous measurements, the
assignment of a peptide to a wrong protein or special modifications of peptides. The
aim of clustering likelihood curves is to group the curves and to discover proteins
where the peptide curves are split into two or more clusters. Upcoming problems
concerning the determination of the number of clusters by means of well-established
validity, means partition coefficient and partition entropy were solved by definition
of new criteria specific to the available data for the detection of both a single cluster
and multiple clusters as well as the number of clusters in the last case.
By the means of the workflow – estimation of the noise inherent in quantitative
LC-MS/MS data analysed by using the iTRAQTM method, calculation and subse-
quently clustering of likelihood curves – advanced systematic analyses of those data
is possible.
The presented approach is easily transferable to other noisy data, if the noise
can be specified by a noise model. Then, after once-only estimation by means of a
suitable biological sample, all derived applications, like calculation and clustering
of likelihood curves for the visualisation of regulatory information and robustness,
can be used.
622 C. Hundertmark, L. Jänsch, and F. Klawonn
References
1. Bäck, T.: Evolutionary Algorithms in Theory and Practise. Oxford University Press, Ox-
ford (1996)
2. Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum
Press, New York (1981)
3. Bezdek, J., Keller, J., Krishnapuram, R., Pal, N.: Fuzzy Models and Algorithms for Pat-
tern Recognition and Image Processing. Kluwer, Boston (1999)
4. Boehm, A.M., Pütz, S., Altenhöfer, D., Sickmann, A., Falk, M.: Precise protein quantifi-
cation based on peptide quantification using itraq. BMC Bioinformatics 8, 214 (2007)
5. Höppner, F., Klawonn, F., Kruse, R., Runkler, T.: Fuzzy Cluster Analysis. Wiley, Chich-
ester (1999)
6. Hu, J., Qian, J., Borisov, O., Pan, S., Li, Y., Liu, T., Deng, L., Wannemacher, K., Kur-
nellas, M., Patterson, C., Elkabes, S., Li, H.: Optimized proteomic analysis of a mouse
model of cerebellar dysfunction using amine-specific isobaric tags. Proteomics 6(15),
4321–4334 (2006)
7. Hundertmark, C., Fischer, R., Reinl, T., May, S., Klawonn, F., Jnsch, L.: Ms-specific
noise model reveals the potential of itraqTM quantitative proteomics (2008) (submitted)
8. Klawonn, F., Hundertmark, C., Jansch, L.: A maximum likelihood approach to noise es-
timation for intensity measurements in biology. In: Proc. Sixth IEEE International Con-
ference on Data Mining Workshops ICDM Workshops 2006, pp. 180–184 (2006)
9. Lin, W.-T., Hung, W.-N., Yian, Y.-H., Wu, K.-P., Han, C.-L., Chen, Y.-R., Chen, Y.-J.,
Sung, T.-Y., Hsu, W.-L.: Multi-q: a fully automated tool for multiplexed protein quanti-
tation. J. Proteome. Res. 5(9), 2328–2338 (2006)
10. Ong, S.-E., Blagoev, B., Kratchmarova, I., Kristensen, D.B., Steen, H., Pandey, A.,
Mann, M.: Stable isotope labeling by amino acids in cell culture, silac, as a simple and ac-
curate approach to expression proteomics. Mol. Cell. Proteomics. 1(5), 376–386 (2002)
11. Pierce, A., Unwin, R.D., Evans, C.A., Griffiths, S., Carney, L., Zhang, L., Jaworska, E.,
Lee, C.-F., Blinco, D., Okoniewski, M.J., Miller, C.J., Bitton, D.A., Spooncer, E., Whet-
ton, A.D.: Eight-channel itraq enables comparison of the activity of 6 leukaemogenic
tyrosine kinases. Mol. Cell. Proteomics (2007)
12. Ross, P.L., Huang, Y.N., Marchese, J.N., Williamson, B., Parker, K., Hattan, S., Khain-
ovski, N., Pillai, S., Dey, S., Daniels, S., Purkayastha, S., Juhasz, P., Martin, S., Bartlet-
Jones, M., He, F., Jacobson, A., Pappin, D.J.: Multiplexed protein quantitation in
saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell. Pro-
teomics. 3(12), 1154–1169 (2004)
13. Walsh, C.T., Garneau-Tsodikova, S., Gatto, G.J.: Protein posttranslational modifications:
the chemistry of proteome diversifications. Angew. Chem. Int. Ed. Engl. 44(45), 7342–
7372 (2005)
A Hybrid Rule-Induction/Likelihood-Ratio
Based Approach for Predicting Protein-Protein
Interactions
Abstract. We propose a new hybrid data mining method for predicting protein-
protein interactions combining Likelihood-Ratio with rule induction algorithms. In
essence, the new method consists of using a rule induction algorithm to discover
rules representing partitions of the data, and then the discovered rules are inter-
preted as “bins” which are used to compute likelihood ratios. This new method is
applied to the prediction of protein-protein interactions in the Saccharomyces Cere-
visiae genome, using predictive genomic features in an integrated scheme. The re-
sults show that the new hybrid method outperforms a pure likelihood ratio based
approach.
1 Introduction
Protein-protein interactions are involved in almost every cellular function, from
DNA replication and protein synthesis to regulation of metabolic pathways [1].
Proteins interact with each other by physically binding themselves or with other
molecules in the cell and form larger complexes to perform specific cellular func-
tions. Hence, the study of protein-protein interactions is of utmost importance to
understand their functions [7, 2], and detailed information about the interactions
of proteins can have potentially very useful applications, e.g., predicting disease-
related genes by looking at their interactions [28] as well as a potential use in de-
veloping new drugs that can specifically interrupt or modulate protein interactions
[41]. Also, the study of these interactions at the genomic level can help understand-
ing the large scale organization and features of the underlying network and the role
of individual proteins within the network [46].
Consequently a number of experimental techniques for determining protein-
protein interactions have been developed [39, 20, 12, 17]. Unfortunately the
Mudassar Iqbal, Alex A. Freitas, and Colin G. Johnson
Centre for Biomedical Informatics and Computing Laboratory, University of Kent,
Canterbury, U.K.
e-mail: mi26,a.a.freitas,[email protected]
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 623–637.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
624 M. Iqbal, A.A. Freitas, and C.G. Johnson
features corresponding to protein pairs under consideration and their class attribute
(interacting or non-interacting), one can estimate the likelihood of interaction for a
given feature, and overall likelihood is estimated using a naive Bayesian formula-
tion by assuming independance among all the features. Rhodes et al.[33] extend this
Bayesian approach for predicting protein interaction to the human genome. The gen-
eral idea of all the integrative methods is that one could combine various relatively
weak features in a setting in which overall prediction is boosted by this integration
of data. Some interesting observations are drawn in [22] regarding this data integra-
tion for protein interaction prediction. A detailed analysis of this data integration
using different classifiers is researched in [5].
Our work is partly inspired by the work done by [21], in which they proposed a
Bayesian method using the MIPS (Munich Information center for Protein Sequences
[24]) complexes catalog as a gold standard for positive interactions, and a list of
proteins in separate sub-cellular compartments as negative interactions, as there is
no particular data set of experimentally determined non-interactions. They integrate
multiple genomic data corresponding to protein pairs, including correlation in ex-
pression levels, functional similarity based measure, etc., as well as other experi-
mental data about protein interactions, as predictive features for these positives and
negatives. We use many of the protein pair features used in [21] and a subset of their
gold standard non-interactions to conduct a data mining experiment in order to ana-
lyze the effect of hybridizing simple naive Bayesian style likelihood based method
with some rule induction algorithms. Rule induction algorithms learn classification
rules given the predictive features as well as the class attribute of a set of exam-
ples (protein pairs in this case). Those learned rules can be used to predict unknown
protein interactions. We first analyze a simplified version of the naive Bayes clas-
sification method without using any prior information and analyze its behavior for
different possible values of sensitivity and specificity of prediction. Then we com-
bine that simplified naive Bayes formulation with another data mining algorithm,
namely a rule induction algorithm which learns IF-THEN type classification rules
from data.
In essence, we propose a new hybrid approach where we use the partitioning of
the data corresponding to the induced rules as “bins” from which likelihood ratios
are computed and used to classify the data. We present a ROC (receiver operating
characteristic) curve analysis of results obtained using different threshold levels on
the calculated likelihood values. Since these rules consist of multiple antecedents
coping with attribute interactions, the bins defined by these rules should give us a
better insight as compared to the uniform binning of attributes used in general naive
Bayesian methods. We have applied this hybrid method to a specific biological ap-
plication here, e.g., prediction of protein-protein interactions in the yeast S. Cere-
visiae using multiple genomic features, but the underlying principles of the method
626 M. Iqbal, A.A. Freitas, and C.G. Johnson
are not application domain dependent, and indeed it can be applied to a wide range
of classification problems in different application domains.
1.3 Organisation
There are many approaches to building models involving rule sets for a classifi-
cation problem. One most common and widely used approach is the separate-and-
conquer approach, which we will discuss in some detail in the next section. Another
popular classification method is the divide-and-conquer technique by building deci-
sion trees [31]. In the work below we describe a rule induction algorithm that uses
aspects of both of these approaches. Therefore, we begin by reviewing the basic
concepts of these two methods.
1 PART builds rule sets using partial decision trees.
A Hybrid Rule-Induction/Likelihood-Ratio Based Approach 627
x1<20
T F
x3>50 x2<30
T F T F
T F T F
A number of methods have been devised for the induction of decision trees from
data sets. The most widely used methods are those based on information gain, first
introduced by Quinlan [30, 31]. This begins by constructing putative tree “stumps”
[43], based on a number of options for the condition in the root node (how these
options are constructed is algorithm and data-type specific). The training set is then
distributed between the edges adjacent to this node based on this criterion, and a
measure of the balance of classes associated with each of these edges is calculated.
This measure is highest when an edge contains only one class (as there is no more
decision to be made) and lowest when there is an equal balance of classes (as no
information has been provided by the consideration of that condition). Based on this
measure, the condition that maximises this information gain is chosen. This is then
recursively repeated for lower levels of the tree, until one of the following conditions
is satisfied: all classes are classified correctly (i.e. there are no “impure” edges);
no more non-contradictory conditions can be created; or some algorithm-specific
criterion for the simplicity of representation is satisfied (to avoid overfitting).
A Hybrid Rule-Induction/Likelihood-Ratio Based Approach 629
P(pos) P(pos)
O prior = = (1)
P(neg) 1 − P(pos)
Where P(pos) and P(neg) is the fraction of positives and negatives respectively
among all pairs of proteins in the training data. The posterior odds that a pair of
proteins interacts given the predictive features f1 ... fn is:
P(pos| f1 ... fn )
O posterior = = O prior ∗ L( f1 ... fn ) (2)
P(neg| f1 ... fn )
L( f1 ... fn ) is the likelihood ratio and is defined as:
P( f1 ... fn |pos)
L( f1 ... fn ) = (3)
P( f1 ... fn |neg)
Making the Naive Bayes assumption that the predictive features are independent
from each other given the class (positive or negative), the likelihood ratio can be
easily calculated as the product of individual likelihood ratios for each feature fi as
per Eq.4.
P( fi |pos)
L( f1 ... fn ) = ∏ L( fi ) = ∏ (4)
i=1..n i=1..n P( f i |neg)
total interactions and non-interactions and a reliable estimate of prior odds does not
seem to be available, we do not use the prior odds at all in this formulation, and
hence the posterior odds are the same as the likelihood ratio. Since the prior odds
are not used, we analyze the predictive accuracy obtained for different threshold
cutoffs of likelihood ratio values, instead. Hence, in this paper, we use a likelihood
based approach for the prediction of protein interactions.
A popular type of data mining methods consist of building predictive models in the
form of IF-THEN classification rules. More precisely, each rule has the form:
IF (condition(s) on attribute value(s)) THEN (class value)
Hence, each rule represents a relationship between the predictor attributes
(features) and the goal attribute. Rules are discovered using the training set. The
discovered rules are then used to predict the class value of examples in the test
set, unseen during training [9]. Rule induction methods are known to present the
knowledge discovered from the data in a comprehensible form to the users. Such
comprehensible rules can be very helpful for the domain experts, for example bi-
ologists in our case, who can validate the discovered rules and potentially get new
insight about the data. The discovered rules also have the potential to represent new
knowledge about the problem at hand.
A variety of approaches exist for learning accurate and comprehensible rules
from the data [43]. One line of research is to begin with building a decision tree and
then transform it into a set of rules [31]. However, in the literature the term rule in-
duction is often used to refer to an algorithm which discovers rules somewhat more
flexible than a decision tree, in the sense that the discovered or induced rules cover
data space regions that can have some overlap (unlike the leaf nodes of a decision
tree, which represent non-overlapping data space regions). Most rule induction al-
gorithms use the previously discussed separate-and-conquer approach, which tries
to determine the most powerful rule that underlies the data by sequentially adding
conditions on the attributes to the rule, separates out those examples that are covered
by the rule and repeats the procedure on the remaining examples [6].
For the problem at hand, we use a method called PART [8] for the classification of
protein-protein pairs (examples or data instances) into interacting or non-interacting.
PART involves features of both decision tree building and rule induction algorithms
– both of which were reviewed above. PART is available for use in the freely avail-
able data mining package WEKA3 [43]. The basic idea of this method is that it uses
the separate-and-conquer strategy, as in the case of rule induction algorithms, in that
it builds a rule, removes the examples it covers and continues creating rules for the
remaining examples until none are left. But it differs from most rule induction al-
gorithms in the way a rule is induced. To build a single rule, first a pruned decision
3 Waikato Environment for Knowledge Analysis.
632 M. Iqbal, A.A. Freitas, and C.G. Johnson
tree is built for the current set of examples. Then the leaf with the largest coverage is
made into a rule, and the tree is discarded. This process is iteratively repeated until
all training examples are covered by the induced set of rules. The details about the
PART method and its comparison to other competing methods are in [8].
value t, a test example (protein pair) is predicted to have interaction (positive class)
if and only if the value of likelihood (Eq. 3) is greater than or equal to t. This kind of
analysis gives us an opportunity to evaluate the classifier not just by the total number
of classification errors it makes, but rather allows us to analyze what is the tradeoff
among two different types of errors, i.e., false positive predictions and false negative
predictions. It plots true positive rate (sensitivity) vs false positive rate (1-specificity),
where each point in the curve belongs to a particular threshold on the likelihood value.
In this way we can analyze the effect of different thresholds on predictive accuracy
instead of analyzing the effect of a single threshold using prior odds.
We use the 10-fold cross validation procedure [43] in all experiments reported
here. Both positive and negative interaction data along with the predictive features
is divided into ten equal folds, respectively. For each experiment, we divide the
data (for both positive and negative classes along with their features separately)
randomly in ten equal folds. Each time we use nine out of ten folds as training and
the remaining one fold as a test. This process is repeated ten times, each time using
a different fold as the test set. Likelihood values estimated during the training run
are used to predict protein-protein interactions in the test examples. Sensitivity and
specificity are defined by Eq. 5 and 6,
TP
Sensitivity = (5)
T P + FN
0.9
LIKE−PART
LIKE
0.8
0.7
True Positive Rate(Sensitivity)
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False Positive Rate (1−Specificity)
Fig. 2 ROC Curve for LIKE-PART and Pure Likelihood-based Approach (LIKE)
634 M. Iqbal, A.A. Freitas, and C.G. Johnson
TN
Speci f icity = (6)
T N + FP
Where T P,T N,FP and FN are the number of true positives, true negatives, false
positives and false negatives, respectively. A ROC curve for a good classifier will be
as close as possible to the upper left corner of the graph, with a large area under the
curve. Fig.2 shows the ROC curves for pure likelihood-based approach (hereafter
called LIKE) and the hybrid method (hereafter called LIKE-PART, i.e., Likelihood
based classifier using PART for finding rules/bins). The corresponding areas under
the curve are 0.8862 and 0.9325, showing a better predictive performance of the
LIKE-PART hybrid.
We can see from the Figure 2 that taking into account the multi-attribute binning
or the rules produced by the base rule learner has enhanced the overall performance
of the classifier significantly, even though the features in this data are not so well
correlated, as reported in [22]. Table 1 reports the results for the likelihood cutoffs
which correspond to maximum predictive accuracies for both methods. A statistical
significance test performed on the accuracy values over ten folds for these likelihood
cutoffs gives a p-value of 0.0000017, which indicates that LIKE-PART outperforms
the LIKE method very significantly.
6 Conclusions
In this work, we have addressed a challenging and important bioinformatics prob-
lem, namely the prediction of protein-protein interactions using a hybrid data mining
technique combining rule induction methods with likelihood ratio based classifiers.
We used integration of different genomic features for a small data set and imple-
mented two versions of a likelihood ratio based classifier. We did not use any prior
odds, but rather used only likelihood ratio and presented a range of results using a
ROC curve for different thresholds of the likelihood values used as a minimum value
for the prediction of positive examples. We proposed a new hybrid method which
used a known Rule Induction algorithm (PART) to induce rules from the training
set taking into account possible attribute interactions and then interpret each rule as
a bin for the likelihood based classifier. Since these bins were produced by taking
into account attribute interaction, they avoid the unrealistic assumption of indepen-
dence between attributes that is made by a pure likelihood based classifier. Then we
compared the ROC curve of this new hybrid PART/Likelihood-based method with
the ROC curve of the pure likelihood-based method and we observe that the hybrid
A Hybrid Rule-Induction/Likelihood-Ratio Based Approach 635
References
1. Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P.: Molecular Biology
of the Cell, 2nd edn. Garland, New York (1989)
2. Aloy, P., Russell, R.B.: Structural systems biology: modelling protein interactions. Nat.
Rev. Mol. Cell. Biol. 7(3), 188–197 (2006)
3. Bock, J.R., Gough, D.A.: Predicting protein-protein interactions from primary structure.
Bioinformatics 17(5), 455–460 (2001)
4. Bock, J.R., Gough, D.A.: Whole proteome interaction mining. Bioinformatics 19(1),
125–135 (2003)
5. Browne, F., Asuaje, F., Wang, H., Zheng, H.: An assessment of machine and statisti-
cal learning approaches to inferring networks of protein-protein interactions. Journal of
Integrative Bioinformatics 3(2) (2006)
6. Cohen, W.W.: Fast effective rule induction. In: Proceedings of the 12th International
Conference on Machine Learning, pp. 115–123. Morgan Kaufmann, San Francisco
(1995)
7. Eisenberg, D., Marcotte, E.M., Xenarios, I., Yeates, T.O.: Protein function in the post-
genomic era. Nature 405(6788), 823–826 (2000)
8. Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In:
Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Fran-
cisco (1998)
9. Freitas, A.A.: Data Mining and Knowldge Discovery with Evolutionary Algorithms.
Springer, Heidelberg (2002)
10. Furnkranz, J.: Separate-and-conquer rule learning. Artificial Intelligence Review 13(1),
3–54 (1999)
11. Galperin, M.Y., Koonin, E.V.: Whos your neighbor?New computational approaches for
functional genomics. Nat. Biotechnol. 18, 609–613 (2000)
12. Gavin, A.C., Bsche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J.,
Rick, J.M., Michon, A.M., Cruciat, C.M., Remor, M., Hfert, C., Schelder, M., Brajen-
ovic, M., Ruffner, H., Merino, A., Klein, K., Hudak, M., Dickson, D., Rudi, T., Gnau,
V., Bauch, A., Bastuck, S., Huhse, B., Leutwein, C., Heurtier, M.A., Copley, R.R., Edel-
mann, A., Querfurth, E., Rybin, V., Drewes, G., Raida, M., Bouwmeester, T., Bork, P.,
Seraphin, B., Kuster, B., Neubauer, G., Superti-Furga, G.: Functional organization of
the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147
(2002)
13. Ge, H., Liu, Z., Church, G.M., Vidal, M.: Correlation between transcriptome and inter-
actome mapping data from Saccharomyces Cerevisiae. Nat. Genet. 29, 482–486 (2001)
636 M. Iqbal, A.A. Freitas, and C.G. Johnson
14. Giordana, A., Sale, C.: Learning structured concepts using genetic algorithms. In: Slee-
man, D., Edwards, P. (eds.) Proceedings of the 9th International Workshop on Machine
Learning, pp. 169–178 (1992)
15. Goh, C., Bogan, A.A., Joachimiak, M., Walther, D., Cohen, F.E.: Co-evolution of Pro-
teins with their Interaction Partners. J. Mol. Biol. 299, 283–293 (2000)
16. Goh, C., Cohen, F.E.: Co-evolutionary Analysis Reveals Insights into ProteinProtein In-
teractions. J. Mol. Biol. 324, 177–192 (2002)
17. Ho, Y., Gruhler, A., Heilbut, A., Bader, G.D., Moore, L., Adams, S.L., Millar, A., Tay-
lor, P., Bennett, K., Boutilier, K., Yang, L., Wolting, C., Donaldson, I., Schandorff, S.,
Shewnarane, J., Vo, M., Taggart, J., Goudreault, M., Muskat, B., Alfarano, C., Dewar,
D., Lin, Z., Michalickova, K., Willems, A.R., Sassi, H., Nielsen, P.A., Rasmussen, K.J.,
Andersen, J.R., Johansen, L.E., Hansen, L.H., Jespersen, H., Podtelejnikov, A., Nielsen,
E., Crawford, J., Poulsen, V., Srensen, B.D., Matthiesen, J., Hendrickson, R.C., Gleeson,
F., Pawson, T., Moran, M.F., Durocher, D., Mann, M., Hogue, C.W., Figeys, D., Tyers,
M.: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass
spectrometry. Nature 415, 180–183 (2002)
18. Iqbal, M., Freitas, A.A., Johnson, C.G.: Protein Interaction Inference Using Particle
Swarm Optimization Algorithm. In: Marchiori, E., Moore, J.H. (eds.) EvoBIO 2008.
LNCS, vol. 4973, pp. 61–70. Springer, Heidelberg (2008)
19. Iqbal, M., Freitas, A.A., Johnson, C.G., Vergassola, M.: Message-Passing Algorithms
for the Prediction of Protein Domain Interactions from Protein-Protein Interaction Data.
Bioinformatics 24(18), 2064–2070 (2008)
20. Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., Sakaki, Y.: A comprehensive two
hybrid analysis to explore the yeast protein interactome. PNAS 98, 4569–4574 (2001)
21. Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Sny-
der, M., Greenblatt, J.F., Gerstein, M.: A Bayesian Networks Approach for Predicting
Protein-Protein Interactions from Genomic Data. Science 302, 449–453 (2003)
22. Lu, L.J., Xia, Y., Paccanaro, A., Yu, H., Gerstein, M.: Assessing the limits of genomic
data integration for predicting protein networks. Genome Res. 15, 945–953 (2005)
23. Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: De-
tecting protein function and protein-protein interactions from genome sequences. Sci-
ence 285, 751–753 (1999)
24. Mewes, H.W., Frishman, D., Gldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Mor-
genstern, B., Mnsterktter, M., Rudd, S., Weil, B.: MIPS:a database for genomes and
protein sequences. Nucleic Acids Res. 30, 31–34 (2002)
25. Michalski, R.S.: On the quasi-minimal solution of the covering problem. In: Proceedings
of the 5th International Symposium on Information Processing (FCIP 1969) (Switching
Circuits), Bled, Yugoslavia, vol. A3, pp. 125–128 (1969)
26. Michalski, R.S.: AQVAL/1—Computer implementation of a variable-valued logic sys-
tem V L1 and examples of its application to pattern recognition. In: Proceedings of the
First International Conference of Pattern Recognition, pp. 3–17 (1973)
27. Michalski, R.S., Mozetič, I., Hing, J., Lavrač, N.: The multi-purpose incremental learn-
ing system AQ15 and its testing application to three medical domains. In: Proceedings
of the Fifth National Conference on Artificial Intelligence, pp. 1041–1045 (1986)
28. Oti, M., Snel, B., Huynen, M.A., Brunner, H.G.: Predicting disease genes using protein-
protein interactions. J. Med. Genet. 43, 691–698 (2006)
29. Pagallo, G., Haussler, D.: Boolean feature discovery in empirical learning. Machine
Learning 5, 71–99 (1990)
30. Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
A Hybrid Rule-Induction/Likelihood-Ratio Based Approach 637
31. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo
(1993)
32. Quinlan, J.R., Cameron-Jones, R.M.: Induction of logic programs: FOIL and related
systems. New Generation Computing 13(3-4), 287–312 (1995)
33. Rhodes, D.R., Tomlins, S.A., Varambally, S., Mahavisno, V., Barrette, T., Kalyana-
Sundaram, S., Ghosh, D., Pandey, A., Chinnaiyan, A.M.: Probabilistic model of the hu-
man protein-protein interaction network. Nature Biotechnology 23(8), 951–959 (2005)
34. Salwinski, L., Miller, C.S., Smith, A.J., Pettit, F.K., Bowie, J.U., Eisenberg, D.: The
Database of Interacting Proteins: 2004 update. NAR 32, D449–D451 (2004)
35. Schaffer, C.: Overfitting avoidance as bias. Machine Learning 10, 145–154 (1993)
36. Shoemaker, B.A., Panchenko, A.R.: Deciphering ProteinProtein Interactions. Part-I: Ex-
perimental Techniques and Databases. PLoS Computational Biology 3(3), e42 (2007)
37. Shoemaker, B.A., Panchenko, A.R.: Deciphering ProteinProtein Interactions. Part-II:
Computational Methods to Predict Protein and Domain Interaction Partners. PLoS Com-
putational Biology 3(4), e43 (2007)
38. Thatcher, J.W., Shaw, J.M., Dickinson, W.J.: Marginal fitness contributions of non-
essential genes in Yeast. PNAS 95, 253–257 (1998)
39. Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D.,
Narayan, V., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover,
D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., Rothberg,
J.M.: A comprehensive analysis of protein-protein interactions in Saccharomyces cere-
visiae. Nature 403(1), 623–627 (2000)
40. Utgoff, P.E.: Shift of bias for inductive concept learning. In: Michalski, R., Carbonell,
J., Mitchell, T. (eds.) Machine Learning: An Artificial Intelligence Approach, vol. II, pp.
107–148 (1986)
41. Valencia, A., Pazos, F.: Computational methods for the prediction of protein interactions.
Current Opinion in Structural Biology 12, 368–373 (2002)
42. von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P.:
Comparative assessment of large-scale data sets of protein-protein interactions. Na-
ture 417(6887), 399–403 (2002)
43. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques,
2nd edn. Morgan Kaufmann, San Francisco (2005)
44. Xenarios, I., Salwnski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D.: DIP: The
Database of Interacting Proteins. A research tool for studying cellular networks of protein
interactions. NAR 30, 303–305 (2002)
45. Yamanishi, Y., Vert, J.P., Kanehisa, M.: Protein network inference from multiple genomic
data: a supervised approach. Bioinformatics 20(suppl.1), i363–i370 (2004)
46. Yook, S.H., Oltvai, Z.N., Barabsi, A.L.: Functional and topological characterization of
protein interaction networks. Proteomics 4, 928–942 (2004)
47. Yu, H., Greenbaum, D., Xin Lu, H., Zhu, X., Gerstein, M.: Genomic analysis of essen-
tiality within protein networks. Trends Genet. 20, 227–231 (2004)
Improvements in Flock-Based Collaborative
Clustering Algorithms
Abstract. Inspiration from nature has driven many creative solutions to challenging
real life problems. Many optimization methods, in particular clustering algorithms,
have been inspired by such natural phenomena as neural systems and networks, nat-
ural evolution, the immune system, and lately swarms and colonies. In this paper,
we make a brief survey of swarm intelligence clustering algorithms and focus on the
flocks of agents-based clustering and data visualization algorithm, (FClust). A few
limitations of FClust are then discussed with proposed improvements. We thus pro-
pose the FClust-annealing algorithm that decreases the number of iterations needed
to converge and improves the quality of resulting clusters. We also propose a (K-
means+FClust) hybrid algorithm which decreases the complexity of FClust from
quadratic to linear, with further improvements in the cluster quality. Experiments on
both artificial and real data illustrate the workings of FClust and the advantages of
our proposed variants.
1 Introduction
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 639–672.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
640 E. Saka and O. Nasraoui
One of the many different approaches used for clustering is swarm intelligence
(SI). Swarm intelligence is an artificial intelligence paradigm which is mainly in-
spired from the dynamics of several societies in nature, such as ant-colonies, bird-
flocks, fish-schools, etc. SI is based on the social, collective and structured behav-
ior of decentralized, self-organized agents [16, 36]. Although these agents have a
very limited individual capacity, cooperatively they perform many complex tasks.
Characteristics of swarm intelligence are: 1) Collaboration: agents in the swarm
collaborate or interact with the environment and each other; 2) Collective intel-
ligence: whereas agents in the swarm are mostly unintelligent, the collaborating
system, or swarming mechanism results in an intelligent system; 3) Inspiration from
nature; and 4) Decentralized control. In this chapter, we will mainly focus on using
SI for clustering.
Given the above definition, the most popular swarm intelligence clustering
algorithms are:
1. Ant-clustering
2. Particle swarm clustering
3. Flocks of agents-based clustering
There are mainly two approaches for ant-based clustering. In the first version,
data is randomly placed on a grid. Then the ants move around the grid and form
clusters by picking up and dropping the data items while moving [20]. Later, this
version was improved in [34, 9, 10]. In the second version of the ant clustering
algorithm, ANTCLUST, ants represent data items. Initially, none of the ants are
assigned to a cluster, i.e. none of the ants have a label. During the clustering process,
in each iteration, two randomly selected ants meet each other. Then, according to
some defined behavioral rules, they may form a new cluster, one of the ants may
be assigned to an existing cluster, one of the ants maybe removed from a cluster, or
clustering quality measures may be updated [11, 17, 18, 19].
Clustering with particle swarms is based on particle swarm optimization[16, 15].
In the clustering problem, each particle encodes all cluster centroids. In other words,
each particle represents a complete clustering solution [1, 22].
Lately, an approach based on flocks of agents, known as FClust [30, 29], was
used for data clustering. This approach is inspired by bird flocks. Each agent of the
flock represents a data item. Initially, agents are placed on a planar surface (here-
inafter referred to as the visualization panel). Then, in each iteration, their speed
gets updated according to the neighboring agents. In the end, similar agents start
moving together and they form clusters. This makes FClust especially useful for
data visualization. Although the experimental results given in [30] were acceptable,
we observed that the standard deviations of the number of clusters, the cluster error,
and the number of required iterations were rather high. Moreover, FClust was not
successful for each data set. Another disadvantage was the high computational cost
which can make FClust costly for many real time applications.
This chapter starts by reviewing algorithms for particle swarm clustering and
ant clustering in Section 2. Section 3 reviews flocks of agents-based clustering,
while pointing to their limitations. Then we will present several modifications in
Improvements in Flock-Based Collaborative Clustering Algorithms 641
A hybrid model of PSO clustering with the K-means clustering algorithm was pre-
sented in [22], where one of the particles was initialized with the result of K-
means. Another (PSO+K-means) hybrid model was used for document clustering
Improvements in Flock-Based Collaborative Clustering Algorithms 643
in [5, 7], where the results illustrated that the hybrid PSO algorithm can generate
more compact results than K-means and PSO. A survey and a modified PSO-based
clustering algorithm was presented in [2].
In the second version of the ant clustering algorithm, ANTCLUST, each ant rep-
resents a data item. Initially, none of the ants are assigned to a cluster, i.e. none
of the ants have a label. Then, during the clustering process, in each iteration, two
644 E. Saka and O. Nasraoui
randomly selected ants meet each other. According to some defined behavioral
rules, they may form a new cluster, one of the ants may be assigned to an exist-
ing cluster, one of the ants may be removed from a cluster, or clustering quality
measures may be updated [17, 18, 19]. The basic idea is that agents who carry simi-
lar data items attract each other, while agents who carry dissimilar agents repel each
other, which results in the formation of groups. A general outline of the ANTCLUST
Algorithm is given in Algorithm 3. and behavioral rules are given below.
1: Map ants to data items and initialize ants’ quality measures. Initially ants do not have any
labels, i.e. they do not belong to any cluster.
2: for iteration=1 to max do
3: Randomly choose two ants and apply the behavioral rules above.
4: iteration++
5: Delete nests that do not contain enough ants.
6: Reassign ants without labels to the most similar nests.
Another clustering algorithm called AntTree uses the ability of building mechani-
cal structures of ants and builds a tree structure [3]. In this version, each data to be
clustered represents a node of the tree and the algorithm searches for the optimal
edges.
Improvements in Flock-Based Collaborative Clustering Algorithms 645
were manually generated. The user looked at the visualization panel and selected
the clusters. The results showed that MSF performed better than K-Means. Lately,
in [30], a detailed flock clustering algorithm (FClust) was presented with a stopping
criteria and automated cluster extraction algorithm. An application of this approach
to Web usage mining can be found in [33]. In the following sections, we will de-
scribe the FClust algorithm, discuss its limitations, and suggest improvements with
real life application examples.
Data visualization using flocks of agents is suitable for any kind of data set where
one can define a similarity measure between data items. A flock consists of several
agents, with each agent mapped to and representing one data record. As mentioned
in Section 3, flocks are different from ordinary particles because they have orien-
tation. Agents in a flock are attracted to similar agents and are repelled by the dif-
ferent agents. Moreover, the distance between the agents depends on the similarity
between the data items that are mapped to those agents. Therefore, the visualiza-
tion panel visualizes the similarity relation between the data items. Normally, data
sets with at most three attributes can be visualized by a simple plot. However, when
there are more than three attributes, this becomes harder. In particular, when there
is a huge number of attributes, as in web usage data, data visualization becomes a
challenging job.
When flocks of agents are compared to other swarm intelligence algorithms, such
as ants and particle swarms, we find that flocks are more suitable for data visualiza-
tion. In the case of ants, data items are moved on a rigidly structured grid by ants and
placed on the same stack with similar items. However, distance does not necessarily
represent the similarity, as in the case of flocks. The distance between two neigh-
boring agents in a flock is inversely proportional to the similarity between them
whereas the similarity between two ants only increases the chance of data items
being neighbors, but does not define how close/far they are. The distance between
two more similar data items may be bigger than the distance between two other
items which are less similar. In the case of particle swarms, each particle does not
represent one data item, but rather represents a clustering solution itself. Therefore,
particle swarms may not be as suitable for data visualization, either.
It has been mentioned that neural networks can also be used for data visualization.
However, most neural networks have a rather static structure whereas a flock of
agents is inherently dynamic [31]. Also, neural networks are a centralized learning
mechanism, whereas agent flocks are decentralized.
values range between 0 and 1. Agents may be placed randomly or some background
information can be used to place them. Then, they start moving around. As they
meet other agents in a defined neighborhood, they try to remain at an ideal distance
from each other, which is determined according to the similarity of the original data
items that agents are representing. The more the data items are similar, the smaller
the ideal distance will be. Ideal distances are computed for each agent pair once at
the beginning of the algorithm, based on the intrinsic properties or attributes of the
data items. If neighboring agents are further apart than the ideal distance, there will
be an attraction force between them and the agents will try to move closer to each
other. In contrast, if the distance is less than the ideal distance, then there will be a
rejection force, and agents will move apart from each other. Given this basic idea,
Algorithm 4. gives the procedure for Flocks of Agents Clustering (FClust).
The similarity threshold, simth , in (1) is computed via Equation (2). If the similarity
threshold is too large, then the algorithm will fail to converge, and if it is too small,
then different clusters risk being combined into one cluster.
simaverage + simmax
simth = (2)
2
The most common method for stopping the algorithm is using human experts
[31, 6, 30]. An expert keeps watching the visualization panel until stable clusters
are formed. At that time, the algorithm is stopped. However, an automated method
was also presented in [30]. The visualization panel, i.e. a 2D continuous space
([0,1]x[0,1]), was divided into 20 cells. For each cell, the spatial entropy which
depends on the proportion p of agents located in this cell was computed, and if the
observed minimum entropy has not improved for the last 3 × n iterations, where n
is the number of data records, the algorithm is stopped. The spatial entropy is given
in Equation (3).
20
ES = − ∑ p(i) × ln(p(i)) (3)
i=0
The problem with this criterion is that the entropy may remain unchanged for a
while if new neighbors do not meet, but after a meeting occurs, changes may start
re-occurring. Thus, the above stopping criterion cannot capture these delayed dy-
namics, and thus a risk that the algorithm will be stopped before convergence.
When FClust is run, flocks of agents are visually observable. However, the clusters
are not explicitly formed and the data is not yet assigned to clusters. Similar to
the stopping criterion, one method of forming clusters is using human experts. The
person marks the clusters and assigns agents to the clusters. Since there is a one-
to-one mapping between agents and data records, the data will also end up being
clustered. In addition to this, an automated procedure was presented in [30], which
is given in Algorithm 5.. Basically, a new cluster is created for an unlabeled agent.
Then the neighboring agents of this cluster are explored, and all the agents which
are similar to at least one of the agents in the cluster are inserted into this cluster.
New agents are inserted to the cluster until no more agents can be inserted. Then
the procedure restarts by creating another new cluster, and stops when all the agents
are labeled.
After cluster formation, a post processing phase is needed to cluster the original
(input) data, to validate the results, and if possible to interpret the clusters.
650 E. Saka and O. Nasraoui
The FClust algorithm, given in Algorithm 4., needs to compare every agent to every
other agent in order to compute the ideal distance initially, and then to update the
agent’s velocity based on its neighboring agents. Although some complexity reduc-
tion suggestions, such as using a neighborhood matrix, was given [30], the worst
case time complexity remains O(n2 ), where n is the number of data records. Simi-
larly, the memory complexity is also O(n2 ) to keep the ideal distances, in addition
to O(n) memory needed for keeping agent locations, velocities and amplitudes.
Although the experimental results given in [30] were acceptable, we observed that
the standard deviations of the number of clusters, the cluster error, and the number
of required iterations were high. Moreover, FClust was not successful for each data
set. Another disadvantage was the high computational cost which makes FClust
unsuitable for many real time applications.
In addition to the above limitations, we observed that convergence strongly de-
pends on the similarity threshold. When the similarity threshold is too high, the al-
gorithm may not converge, and when the threshold is too low, the algorithm may not
differentiate between different clusters, and thus end up combining some of them.
Therefore, Equation (2) is not suitable for every dataset. Furthermore, if the data
similarity values follow a power low distribution, then the similarity threshold given
in Equation (2) will produce a very high similarity threshold. Therefore agents will
not be able to form clusters. In other words, the clustering algorithm will not con-
verge. Another problem occurs when there are connecting agents between clusters,
meaning that instead of being well separated, a bridge of data points connects two
clusters. As a result the clusters are labeled as the same, even though they should be
labeled differently. To solve this problem to some extent, an alternative formulation
will be presented in Section 4. Moreover, the ideal distance formula in [30] requires
mostly unique data records. Otherwise, if many similarity values are 1, the ideal dis-
tance computation results in an infinite value in Equation 1 because simth = 1. An
alternative formulation, which can handle many 1-similarities is given in Section 4.1.
Improvements in Flock-Based Collaborative Clustering Algorithms 651
Another problem observed is that the ideal distance computation results in an infinite
value via Equation (1) if many similarity values are 1, because simth = 1. Equation
5 proposes an alternative formulation, which can handle many 1-similarities.
1−sim(i, j)
dideal (i, j) = 1−simth × dth , simth = 1 (5)
0, simth = 1
Additionally, when the number of data records is small, setting the distance thresh-
old is very hard because, if it is too low, the agents cannot meet each other, while
if it is too high, the ideal distances computed via Equations (1) and (5) will be too
high, and cluster formation will not occur. As a solution, a new parameter for ideal
distance tuning is added, as given in Equation (6). The ideal distance threshold con-
stant, dideal th , used in Equation (6), is also used, replacing dth during the cluster
formation given in Algorithm 5., line 5.
1−sim(i, j)
dideal (i, j) = 1−simth × dideal th , simth = 1 (6)
0, simth = 1
0.8
0.7
0.6
d_th
0.5
0.4
0.3
0.2
0.1
0
0 20 40 60 80 100 120
t
number of iterations increases. In that case, dideal is computed using Equation (6).
A possible cooling schedule for dth is given in Equation (7) and shown in Figure 1.
K-means is a fast algorithm with O(n) time complexity. However, the number of
expected clusters, K, needs to be given as an input to the K-means algorithm and K-
means may not cluster the data successfully if the cluster boundaries are not hyper-
spherical. Unlike K-means, FClust has an adaptive way to extract a reasonable num-
ber of clusters without any boundary restrictions. However, the complexity cost for
FClust is higher than K-means. In this section, we propose a (K-means+FClust)
Hybrid Algorithm which aims to benefit from the advantages of both K-means and
FClust.
where n is the number of data records, d is the number of attributes and π (χi ) =
cluster to which χi is assigned.
The complexity is O(n ∗ K ∗ I ∗ d) where n is the number of points, K is the
number of clusters, I is the number of iterations, and d is the number of attributes.
1: Read sessions.
2: Arbitrarily select K records as centroids out of the n data records.
3: repeat
4: for all Data record χi do
5: Find the closest centroid ci using Euclidean Distance.
6: Assign data record χi to the cluster γi .
7: for all Cluster γ j do
∑χi ∈γi χi
8: Update its centroid c j = || ∑χi ∈γi χi ||
9: until stopping criterion is met.
in the hybrid approach, each agent is mapped to the closest group of data records via
its group centroid, since each agent represents a group centroid. The time complex-
ity for K-means, lines 1 to 2 in Algorithm 7., is O(n), and the time complexity of
the FClust part, in lines 3 to 4, is O(K 2 ). Since the number
√ of agents K is very small
compared to the number of agents n, as long as K ≤ n (which is always the case
in practice), the time complexity and memory complexity of the (K-means+FClust)
hybrid is O(n). Therefore the hybrid version reduces the time and memory complex-
ities from quadratic to linear.
6 Experimental Results
In this section, we describe our experiments and their results. We start by describ-
ing the datasets that we used in our experiments, in Section 6.1. Then, we proceed
to explaining how post processing is applied to extract the clusters in Section 6.2.
Finally, Section 6.3 presents the experimental results observed for FClust, FClust-
annealing, and (K-means+FClust) Hybrid on different datasets. No experiments are
presented for (K-means+FClust) Hybrid-Annealing because, the aim behind anneal-
ing is to initially have a bigger neighborhood and then reduce it with time to speed
up the convergence of clusters. However, in the hybrid version, since the number
of agents is already small, a bigger distance threshold, thus a wider neighborhood
size, is being used, and naturally convergence is very fast. Therefore, annealing is
not used for the hybrid experiments.
6.1 Datasets
As shown in Table 1, datasets I and II are synthetic datasets, whereas dataset WebM
consists of real Web usage sessions. Iris and Pima are also real life datasets from
the UCI machine learning repository 2 . Datasets I and II have 2 attributes and are
thus suitable to show the clustering results visually. Dataset WebM, the Web us-
age data, consists of Web usage sessions, where each session is a bag of visited
URLs. Each session includes the URLs that were visited during that session. In
Web usage mining, each URL or item is considered as one dimension which re-
sults in a huge dimensionality. To compare the proposed improvements and the hy-
brid algorithm with the original FClust algorithm, datasets Iris and Pima are also
used. In the experiments, two different similarity measures are used, the Manhattan
based similarity for datasets I and II, and the cosine similarity for dataset WebM.
The Manhattan based similarity is used for the linearly normalized Iris and Pima
datasets.
The Manhattan based (L1) similarity of two data records xi and x j is given by
1 A k
sim(xi , x j ) = 1 − ∑ |xi − xkj |,
A k=1
(10)
Table 1 Datasets
where A denotes the number of attributes and xki denotes the kth attribute of data
record xi . When the data is linearly normalized to [0, 1], the Manhattan based simi-
larity is the same as the 1-norm similarity, which is used in (10).
Given that si and s j are two sessions, each formed of a list of |si | and |s j |
URLs visited in each user session respectively, the cosine similarity is computed
as follows:
2
|si s j |
sim(si , s j ) = (11)
|si | × |s j |
1: Error ← 0
2: for all Data record pairs (i, j), where i = j do
3: if i and j have the same class label, but different cluster labels then
4: Error + +
5: else if i and j have different class labels, but the same cluster label then
6: Error + +
7: Return Error/Number o f pairs
Moreover, if the data consists of Web user sessions, then profiles are extracted
as shown in Algorithm 9., line 6. In Algorithm 8., line 3, Algorithm 9., line 3, and
Algorithm 10., line 5, ST H , denotes the session threshold, i.e. the minimum number
of sessions required for a cluster to be valid. If a data record is a bag of items, as
in the case of Web usage data, the value of item count threshold used in Algorithm
9., line 6 is given in Equation (12), where ICT F denotes the Item Count Threshold
Frequency, where 0≤ICT F≤ 1 is a real number. As a result, each set of URLs,
extracted in line 6 of Algorithm 9., can be considered as a pattern that represents a
Web user profile that summarizes the sessions assigned to that cluster.
6.3 Results
In Sections 6.3, we start with 2D datasets to allow us to do a visual evaluation.
Then we proceed with the Web usage data as a challenging, high dimensional, real
658 E. Saka and O. Nasraoui
0.8 0.8
0.6 0.6
Y coordinate
Y coordinate
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
0.8
0.8
0.6
Y coordinate
0.6
Y coordinate
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
(c) Clustering result for dataset. (d) Agent clusters generated after post pro-
cessing and assigning agents to clusters.
Fig. 2 Clustering a dataset with two clusters using FClust where dth =0.04, simth =0.91
life data example. Finally, we present our results for the Iris and Pima datasets and
compare the clustering outputs to the data class provided as part of the datasets.
Figure 2(a) shows an example of a data set with 2 clusters, and the resulting agents
space are shown in Figure 2(b). In Figure 2(d), agent clusters which include more
than ST H data records are shown. And in Figure 2(c), the clustered data is shown.
Figure 3 shows the result for a more complicated data set, given in Figure 3(a), using
the Manhattan based (i.e. L1) similarity given in Equation (10). Likewise, Figure
3(d) is the post-processed version of Figure 3(b). When we compare the results in
Figure 2(c) and Figure 3(c), we observe that, when the data clusters are not strictly
separated, the cluster formation algorithm, Algorithm 5., may suffer from a bridging
effect that results in merging two distinct clusters. Note that, since the synthetic
datasets used in our experiments already had attributes between 0 and 1, we did not
linearly normalize them in [0,1].
Figure 4 shows the results after more iterations compared to Figure 3. This shows
that, if the agents’ movement is stopped in a wrong state, different clusters may be
Improvements in Flock-Based Collaborative Clustering Algorithms 659
0.8 0.8
0.6 0.6
Y coordinate
Y coordinate
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
0.6
Y coordinate
0.6
Y coordinate
0.4
0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
(c) Clustering result for dataset. (d) Agent clusters generated after post pro-
cessing and assigning agents to clusters.
Fig. 3 Clustering a dataset with three clusters using FClust at iteration 24400 where dth =0.04,
simth =0.86
assigned to the same cluster. Therefore, the stopping criteria is crucially important
for overlapping data sets.
The next results are for two weeks worth of Web usage data for a Computer
Engineering and Computer Science department’s website. We have chosen this
data set, because it has have previously undergone extensive experiments and val-
idation in [26, 25, 24], hence it is considered a benchmark data set. In Figure
5(a), the algorithm did not converge because the similarity threshold computed
according to Equation (2) was too high to form good clusters. With the average
similarity and maximum similarity given in Table 1, the similarity threshold is
computed as 0.53, which is very high given that the average similarity is 0.06.
Therefore this example visually shows that the similarity threshold given in Equa-
tion (2) is not suitable for data with similarities distributed as a power law, as can
be verified in Figure 5(b) and Figure 5(c) (the log-log plot exhibiting several linear
segments).
If we compute the similarity threshold using (4), it is 0.15. The value of α was set
to 2.5 in this set of experiments. We obtain convergence as shown in Figure 6, where
ST H = 10 and ICTF = 0.10. Several examples of the extracted profiles are presented
660 E. Saka and O. Nasraoui
0.8 0.8
0.6 0.6
Y coordinate
Y coordinate
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
0.8 0.8
0.6 0.6
Y coordinate
Y coordinate
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
(c) Clustering result for dataset. (d) Agent clusters generated after post process-
ing and assigning agents to clusters.
Fig. 4 Clustering a dataset with three clusters using FClust at iteration 30700 where dth =0.04,
simth =0.86
5
x 10
14
Iteration = 12000 7
10
1
12
0.8 10 6
10
Frequency
8
0.6
log(Frequency)
Y coordinate
5
6 10
0.4
4
4
10
0.2 2
0
0 0 0.2 0.4 0.6 0.8 1 10
3
(a) Agents for Web usage data, (b) Cosine similarity histogram, (c) Log-log plot of similar-
ities.
Fig. 5 Clustering the Web usage data using FClust with similarity threshold computed ac-
cording to Equation (2). (a) no convergence because of an improper similarity threshold (b)
Similarity histograms, (c) Log-log plot of similarities exhibiting power law properties
in Table 2. For example, Pro f ile1 represents a group of users (possibly prospective
students) checking the department’s main web pages. Pro f ile2 represents a group
of student users taking the course CECS 352 (Joshi is the instructor teaching the
course).
Improvements in Flock-Based Collaborative Clustering Algorithms 661
Table 2 Several examples from the profiles of a Web usage data extracted using FClust where
ST H = 10 and ICT F = 0.1
Figure 7 shows the clustering results for the dataset with 2 clusters using FClust-
annealing. Comparing this result with Figure 2, we can see that the annealing results
in fewer iterations to convergence (1410 vs. 2792 iterations).
Figure 8 also shows that annealing not only accelerates the convergence, but also
results in better quality clusters (no bridging effect). In Figure 3, even though there
were 3 separate clusters in the agent space, after 24,400 iterations, as seen in Fig-
ures 3(b) and 3(d), clustering the data domain as shown in Figure 3(c) showed that
the clustering process did not converge. Figure 4 also confirms the fact that there
is a tendency to combine 3 clusters into one cluster unless a better cluster forma-
tion algorithm is applied. Although the classical FClust suffers from these prob-
lems, FClust-annealing clearly differentiates between these clusters. In addition to
662 E. Saka and O. Nasraoui
Y coordinate
cluster 13 cluster 28
cluster 14 cluster 29
cluster 15 cluster 30
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
(a) Clustered agents before pruning. (b) After post-processing with Algorithm 9.
Fig. 6 Generated profiles from web usage data using FClust in 10250 iterations where
dth =0.04, simth =0.15, ST H =10, ICTF=0.10
0.8 0.8
0.6 0.6
Y coordinate
Y coordinate
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
0.8 0.8
0.6 0.6
Y coordinate
Y coordinate
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
(c) Agent clusters generated after post processing and (d) Agent clusters generated after post processing and
assigning agents to clusters. assigning agents to clusters.
Fig. 7 Clustering a dataset with two clusters using FClust-annealing where dth =started from
1 down to 0.04, simth =0.91
0.8 0.8
0.6 0.6
Y coordinate
Y coordinate
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
0.8 0.8
0.6 0.6
Y coordinate
Y coordinate
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
(c) Clustering result for dataset. (d) Agent clusters generated after post processing and
assigning agents to clusters.
Fig. 8 Clustering a dataset with three clusters using FClust-annealing where dth =started from
1 down to 0.04, simth =0.91, dideal th for FClust and post-processing is 0.04
Y coordinate
Y coordinate
cluster 13
cluster 14
cluster 15
cluster 16
cluster 17
0.4 0.4 0.4 cluster 18
cluster 19
cluster 20
cluster 21
cluster 22
cluster 23
0.2 0.2 0.2 cluster 24
cluster 25
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate X coordinate
(a) Clustering result for all (b) Clustered agents before (c) Agent clusters generated
agents. pruning. after post-processing and as-
signing agents to clusters.
Fig. 9 Clustering the Web usage session data using FClust-annealing where dth =started from
1 down to 0.04, simth =0.15 , dideal th for FClust and post-processing is 0.04
Compared to Figure 6 which took 10,250 iterations, better quality clusters are now
formed in only 6,840 iterations. Also 30 clusters were formed in Figure 6 whereas,
with annealing, 25 clusters are formed. By checking the post-processed profiles, we
observed that the decrease in the number is not a loss of information but rather a
better convergence (broken clusters were combined). In FClust, some clusters were
664 E. Saka and O. Nasraoui
Table 3 Some samples from the Web user profiles, extracted using FClust-annealing where
ST H = 10 and ICT F = 0.1
URL URL
Frequency
Profile 1 (includes 116 sessions)
0.90 /
0.69 /cecs computer.class
0.43 /courses index.html
0.42 /courses100.html
0.41 /courses.html
0.29 /people.html
0.28 /people index.html
0.28 /faculty.html
0.20 /courses300.html
0.18 /degrees.html
0.17 /courses200.html
0.13 /general.html
0.13 /general index.html
0.13 /facts.html
0.13 /research.html
0.11 /grad people.html
Profile 2 (Includes 31 sessions)
0.90 / joshi/courses/cecs352
0.35 / joshi/courses/cecs352/slides-index.html
0.35 / joshi/courses/cecs352/handout.html
0.35 / joshi/courses/cecs352/outline.html
0.29 / joshi/courses/cecs352/text.html
0.26 / joshi/courses/cecs352/environment.html
0.13 / joshi
0.13 /
0.13 / joshi/courses/cecs438
0.13 / joshi/courses/cecs352/proj
broken and their agents could not meet each other on the agents visualization panel
and therefore could not be unified.
Figure 10 shows the results of the (K-means+FClust) Hybrid algorithm for Dataset
I. The disadvantage of K-means is that it requires the number of clusters as input,
however FClust can extract the number of clusters automatically. With the hybrid
approach, 8 clusters are generated with K-means, as shown in Figure 10(b), where
each agent was mapped to one cluster center generated by K-means. From these,
FClust generated the 2 clusters, shown in Figure 10(c). Figure 10(b) represents the
K-means result, (i.e. before starting the iterations of FClust). Compared to Figure 2,
where 2,792 steps were needed for FClust’s convergence, only 122 iterations were
Improvements in Flock-Based Collaborative Clustering Algorithms 665
0.8 0.8
0.6 0.6
Y coordinate
Y coordinate
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
0.8 0.8
0.6 0.6
Y coordinate
Y coordinate
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
(c) Agent clusters after applying thresholding. (d) Data clusters after applying thresholding.
Fig. 10 Stable output of (K-means+FClust) hybrid on a 2 cluster-data set where simth =0.88
(using Eqn.(2)), dth = 0.4. dideal th for FClust=0.04, dideal th for forming clusters = 0.08
needed for the Hybrid version. Although the hybrid version speeds up the process
considerably, it does not necessarily suffer from the bridging effect observed during
the cluster extraction in post-processing.
Figure 11 is a collection of figures showing the results of the (K-means+FClust)
Hybrid Algorithm given in Algorithm 7. for Dataset II. When the results are com-
pared with the simple FClust Algorithm in Figures 3 and 4, it can be observed that
the hybrid algorithm is faster thanks to fewer iterations and to the modest linear
computational cost.
0.8 0.8
0.6 0.6
Y coordinate
Y coordinate
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
0.8 0.8
0.6 0.6
Y coordinate
Y coordinate
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X coordinate X coordinate
(c) Agent clusters after applying thresholding. (d) Data clusters after applying thresholding.
Fig. 11 Stable output of (K-means+FClust) hybrid on a 3 cluster-data set where simth =0.73
(using Eqn.(2)), dth = 1.0. dideal th for FClust=0.04, dideal th for cluster forming =0.08.
one run, FClust produced 4 clusters around 1500 iterations, with a cluster error rate
of approximately 0.25, computed via Algorithm 11., while around 2500 iterations,
it formed 3 clusters. However, the error rate was still high (0.27). Later, the number
of clusters dropped to 2 and around 7000 iterations, it found 3 clusters with an error
rate of 0.08. More iterations resulted in high error rates (around 0.22) again with 2
clusters. So the system will cycle between 2, 3 and 4 clusters. These are different
but somehow stable clustering options. However, it was observed that, results with
more than 5 clusters with the given parameters, are produced due to an insufficient
number of iterations. Again as we have noted in Section 5.4, since every pairwise
ideal distance is one constraint/objective to be satisfied, there are n × (n − 1)/2 ob-
jectives. Theoretically this can lead to up to n × (n − 1)/2 Pareto solutions on the
Pareto front. The actual number in practice is much less however, since several of
these constraints can be satisfied simultaneously.
The FClust-annealing version decreases the number of iterations drastically. It
starts forming cluster centers from the first iterations, and even though the centers
are created in less than 10 or 20 steps, they still seem to be reasonably good. For
the first few iterations, the cluster centers changed rapidly since the neighborhoods
were wide. Yet clusters were still formed after post-processing. Finally, the clus-
tering scheme which gave the minimum error was reported. Annealing converges
to a meaningful cluster formation faster than FClust. Moreover, in just a few itera-
tions, it can already present different possible clustering options. This process and
the changes in the formed cluster numbers and their errors with the iterations, for
10 different runs on the Iris dataset, are shown in Table 5. For each run, the first row
is the number of clusters generated at the corresponding iteration number, which is
given as the column label. Similarly, the second row shows the error computed via
Algorithm 11.
In FClust-annealing and the (K-means+FClust) Hybrid, since the neighborhood
is wider, clusters are formed faster, and there will be fewer agents on the visualiza-
tion panel between agent flocks. Later (in the case of annealing), when the neigh-
borhoods become narrow, the chance of flocks meeting and affecting each other
decreases, which causes a decrease in the exploration for better clustering options.
Therefore FClust-annealing and the (K-means+FClust) Hybrid are more prone to
getting stuck in local optima. Some random moves could be added to the algorithm
to increase the opportunity for exploration. One of the problems observed with the
(K-means+FClust) Hybrid on the Iris data set was that the algorithm may separate
the members of the first cluster into two groups. During the clustering using the
Hybrid algorithm, K was set to 8 in K-means, dideal th was 0.04, and during post
processing, it was 0.08.
To stop the Hybrid algorithm, the ideal distance error, for each pair of agents
is computed, and when its difference compared to the previous iteration fell below
0.009, the algorithm was stopped. Figure 12 shows the ideal distance error versus
the iteration number for the Iris dataset using the (K-means+FClust) Hybrid algo-
rithm. The irregularities observed as sudden increases in the error correspond to
the time when clusters reached the border of the visualization panel and contin-
ued moving, thus wrapping around toward the opposite side of the panel. That said,
668 E. Saka and O. Nasraoui
Table 5 Number of clusters extracted and corresponding error at different iteration steps
for 10 different runs of clustering the Iris data using FClust-annealing where dth =0.4,
dideal th =0.04
.
Iteration No 1 10 50 100 200 300 400 500 1000 1500 2000 2500
Run 1 1 3 6 6 5 6 5 4 4 4 3 4
0.67 0.16 0.26 0.26 0.24 0.24 0.18 0.16 0.15 0.15 0.13 0.15
Run 2 0 3 2 2 2 2 2 2 2 2 3 2
0.15 0.23 0.23 0.23 0.23 0.23 0.22 0.23 0.24 0.22 0.22
Run 3 0 1 5 5 7 7 5 4 4 4 2 3
0.67 0.23 0.21 0.24 0.24 0.21 0.19 0.19 0.19 0.22 0.18
Run 4 0 2 2 2 2 2 2 3 3 4 4 3
0.24 0.23 0.22 0.22 0.23 0.22 0.17 0.17 0.18 0.19 0.09
Run 5 0 2 4 4 4 4 4 4 2 3 3 4
0.32 0.23 0.27 0.27 0.28 0.28 0.28 0.23 0.18 0.18 0.11
Run 6 1 0 4 4 3 3 4 3 5 5 4 4
0.67 0.28 0.2 0.17 0.18 0.27 0.22 0.21 0.22 0.18 0.19
Run 7 0 1 3 5 4 4 3 3 3 4 4 4
0.67 0.19 0.16 0.14 0.17 0.15 0.16 0.15 0.1 0.13 0.11
Run 8 0 0 5 6 4 5 4 4 4 2 2 2
0.26 0.28 0.22 0.2 0.22 0.22 0.26 0.22 0.22 0.22
Run 9 1 0 5 3 3 3 3 3 2 2 2 2
0.67 0.19 0.13 0.15 0.13 0.14 0.14 0.22 0.22 0.22 0.22
Run 10 0 1 4 4 4 6 4 3 3 2 2 3
0.67 0.27 0.22 0.24 0.22 0.13 0.16 0.17 0.22 0.23 0.1
Avg. No of Clusters 0.3 1.3 4 4.1 3.8 4.2 3.6 3.3 3.2 3.2 2.9 3.1
Average Error 0.67 0.411 0.237 0.218 0.212 0.212 0.203 0.192 0.198 0.192 0.192 0.159
18
16
14
12
10
2
0 5 10 15 20 25 30 35 40 45
Iteration
Fig. 12 Evolution of the ideal distance error with iterations for clustering the Iris dataset
using (K-means+FClust) Hybrid algorithm. Big changes correspond to major changes due to
agents wrapping around the agent space boundaries
Table 6 Average result of 10 different runs of clustering the Pima data set using FClust-
annealing, where dth started from 1 and decreased to 0.04
stopping because, a Human typically follows the bigger flocks of agents, thus the
agents which are spread around the visualization panel do not affect the human
observer as much as they may affect the automated stopping mechanism. In the
FClust-Annealing algorithm, 2 clusters of the Pima dataset were formed around
20 to 30 iterations, with a cluster error rate between 0.45 and 0.50. For the Pima
data set, with smaller cluster size threshold values, clusters would be observed in
fewer iterations. However, we kept this threshold constant as n/20 to be able to
compare the proposed algorithms with the original FClust algorithm. Table 6 shows
that an average of 1651 iterations for FClust-annealing were sufficient to get a state
of the visualization panel which would require an average of 3995 iterations of
the original FClust algorithm. Similarly, K was 8 and dideal th was 0.04 for the the
Hybrid algorithm. During post processing, dideal th was 0.08. We also observed that,
for both the Iris and Pima data sets, minimum cluster errors were observed for the
FClust-annealing experiments (0.11 for Iris and 0.43 for Pima) compared to all other
algorithms.
FClust with the Spherical K-Means algorithm [8] for clustering high-dimensional
data such as web usage data and text documents.
Our experiments have illustrated how the cluster formation algorithm was suscep-
tible to the bridging effect for overlapping clusters, and seems to be very sensitive
to threshold parameter tuning. Therefore, future studies are needed to devise better
automated stopping criteria and algorithms to form better clusters given a state of
the visualization panel.
References
1. Abraham, A., Das, S., Roy, S.: Swarm Intelligence Algorithms for Data Clustering. In:
Soft Computing for Knowledge Discovery and Data Mining, pp. 279–313. Springer, US
(2008)
2. Abraham, A., Das, S., Roy, S.: Swarm Intelligence Algorithms for Data Clustering. In:
Soft Computing for Knowledge Discovery and Data Mining, pp. 279–313. Springer, US
(2008)
3. Azzag, H., Monmarche, N., Slimane, M., Venturini, G.: Anttree: a new model for cluster-
ing with artificial ants. In: The 2003 Congress on Evolutionary Computation CEC 2003,
vol. 4, pp. 2642–2647 (2003)
4. Couzin, I.D., Krause, J.E.N.S., James, R., Ruxton, G.D., Franks, N.R.: Collective mem-
ory and spatial sorting in animal groups. Journal of Theoretical Biology 218(1), 1–11
(2002)
5. Cui, X., Potok, T.E.: Document clustering analysis based on hybrid pso+kmeans algo-
rithm. Journal of Computer Sciences (Special Issue), 27–33 (2005)
6. Cui, X., Potok, T.E.: A distributed agent implementation of multiple species flocking
model for document partitioning clustering. In: Klusch, M., Rovatsos, M., Payne, T.R.
(eds.) CIA 2006. LNCS, vol. 4149, pp. 124–137. Springer, Heidelberg (2006)
7. Cui, X., Potok, T.E., Palathingal, P.: Document clustering using particle swarm optimiza-
tion. In: IEEE Swarm Intelligence Symposium (2005)
8. Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clus-
tering. Machine Learning 42(1-2), 143–175 (2001)
9. Handl, J., Knowles, J., Dorigo, M.: On the performance of ant-based clustering. In: Pro-
ceedings of the Third International Conference on Hybrid Intelligent Systems (2003)
10. Handl, J., Knowles, J., Dorigo, M.: Strategies for the increased robustness of ant-based
clustering. In: Di Marzo Serugendo, G., Karageorgos, A., Rana, O.F., Zambonelli, F.
(eds.) ESOA 2003. LNCS, vol. 2977, pp. 90–104. Springer, Heidelberg (2004)
11. Handl, J., Meyer, B.: Ant-based and swarm-based clustering. Swarm Intelligence 1(2),
95–113 (2007)
12. Heppner, F., Grenander, U.: A stochastic nonlinear model for coordinated bird flocks. In:
Krasner, S. (ed.) The Ubiquity of Chaos, pp. 233–238. AAAS, Washington (1990)
13. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs
(1988)
14. Jain, A.K., Murthy, M., Flynn, P.: Data clustering: A review. ACM Computing Reviews
(1999)
Improvements in Flock-Based Collaborative Clustering Algorithms 671
15. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of the IEEE In-
ternational Conference on Neural Networks, vol. 4, pp. 1942–1948 (1995)
16. Kennedy, J., Eberhart, R., Shi, Y.: Swarm Intelligence. Morgan Kaufmann Publishers
Inc, San Francisco (2001)
17. Labroche, N., Monmarche, N., Venturini, G.: A new clustering algorithm based on the
chemical recognition system of ants. In: Proceedings of the 15th European Conference
on Artificial Intelligence (2002)
18. Labroche, N., Monmarche, N., Venturini, G.: Antclust: Ant clustering and web usage
mining. In: Cantú-Paz, E., et al. (eds.) GECCO 2003. LNCS, vol. 2723, pp. 25–36.
Springer, Heidelberg (2003)
19. Labroche, N., Monmarche, N., Venturini, G.: Web sessions clustering with artificial ants
colonies. In: WWW 2003, The Twelfth International World Wide Web Conference, Bu-
dapest, Hungary (2003)
20. Lumer, E.D., Faieta, B.: Diversity and adaptation in populations of clustering ants. In:
Proceedings of the third international conference on Simulation of adaptive behavior:
from animals to animats 3, pp. 501–508. MIT Press, Cambridge (1994)
21. MacQueen, J.: Some methods for classification and analysis of multivariate observations.
In: Cam, L.M.L., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on
Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California
Press (1967)
22. van der Merwe, D.W., Engelbrecht, A.P.: Data clustering using particle swarm optimiza-
tion. In: Proceedings of IEEE Congress on Evolutionary Computation 2003 (CEC 2003),
vol. 1, pp. 215–220 (2003)
23. Millonas, M.M.: Swarms, phase transition, and collective intelligence. In: Langton, C.G.
(ed.) Artificial life III. Addison Wesley, Reading (1994)
24. Nasraoui, O., Krishnapuram, R., Frigui, H.: Extracting web user profiles using relational
competitive fuzzy clustering. International Journal on Artificial Intelligence Tools 9(4),
509–526 (2000)
25. Nasraoui, O., Krishnapuram, R., Joshi, A.: Mining web access logs using a relational
clustering algorithm based on a robust estimator. In: Proc. of the Eighth International
World Wide Web Conference, Toronto, pp. 40–41 (1999)
26. Nasraoui, O., Krishnapuram, R., Joshi, A.: Relational clustering based on a new robust
estimator with application to web mining. In: Proceedings of the North American Fuzzy
Information Society, New York City, pp. 705–709 (1999)
27. Omran, M., Engelbrecht, A.P., Salman, A.: Particle swarm optimization method for
image clustering. International Journal of Pattern Recognition and Artificial Intelli-
gence 19(3), 297–322 (2005)
28. Omran, M., Salman, A., Engelbrecht, A.P.: Image classification using particle swarm
optimization. In: Conference on Simulated Evolution and Learning, vol. 1, pp. 370–374
(2002)
29. Picarougne, F., Azzag, H., Venturini, G., Guinot, C.: On data clustering with a flock of
artificial agents. In: Proceedings of the 16th IEEE International Conference on Tools
with Artificial Intelligence, ICTAI 2004 (2004)
30. Picarougne, F., Azzag, H., Venturini, G., Guinot, C.: A new approach of data clustering
using a flock of agents. Evolutionary Computation 15(3), 345–367 (2007)
31. Proctor, G., Winter, C.: Information flocking: Data visualisation in virtual worlds using
emergent behaviours. In: Heudin, J.-C. (ed.) VW 1998. LNCS, vol. 1434, pp. 168–176.
Springer, Heidelberg (1998)
32. Reynolds, C.W.: Flocks, herds, and schools: A distributed behavioral model. Computer
Graphics 21(4), 25–34 (1987)
672 E. Saka and O. Nasraoui
33. Saka, E., Nasraoui, O.: Simultaneous clustering and visualization of web usage data us-
ing swarm-based intelligence. In: Proceedings of the 20th IEEE International Conference
on Tools with Artificial Intelligence, ICTAI 2008 (2008)
34. Vizine, A.L., de Castro, L.N., Hruschka, E.R., Gudwin, R.R.: Towards improving clus-
tering ants: an adaptive ant clustering algorithm. Informatica 29, 143–154 (2005)
35. Weiss, G. (ed.): Multiagent Systems: A Modern Approach To Distributed Artificial In-
telligence. The MIT Press, Cambridge (2000)
36. White, T., Pagurek, B.: Towards multi-swarm problem solving in networks. In: De-
mazeau, Y. (ed.) Proceedings of the 3rd International Conference on Multi-Agent
Systems (ICMAS 1998). IEEE Press, Paris (1998)
Combining Statistics and Case-Based Reasoning
for Medical Research
1 Introduction
Case-based Reasoning (CBR) uses previous experience represented as cases to un-
derstand and solve new problems. A case-based reasoner remembers former cases
similar to the current problem and attempts to modify solutions of former cases to
fit the current problem.
The fundamental ideas of CBR originated in the late eighties (e.g,. [24]). In the
early nineties CBR emerged as a method that was firstly described by Kolodner [14].
Later on, Aamodt and Plaza presented a more formal characterisation of the CBR
method. Figure 1 shows the Case-based Reasoning cycle developed by Aamodt and
Plaza [1], which consists of four steps: retrieving former similar cases, adapting
their solutions to the current problem, revising a proposed solution, and retaining
Rainer Shmidt
Institute for Medical Informatics and Biometry, University of Rostock, Germany
e-mail: [email protected]
Olga Vorobieva
Institute for Medical Informatics and Biometry (as above) and Sechenow
Institute of Evolutionary Physiolonary and Biochemistry, St.Petersburg, Russia
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 673–696.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
674 R. Schmidt and O. Vorobieva
new learned cases. However, there are two main subtasks in Case-based Reasoning
[14, 1]: The retrieval (the search for a similar case), and the adaptation (the modifi-
cation of solutions of retrieved cases). For retrieval, many similarity measures and
sophisticated retrieval algorithms have been developed within the CBR community.
The most common ones are indexing methods [14] like tree-hash retrieval [28],
which are useful for nominal parameter values, retrieval nets [15], which are useful
for ordered nominal values, and nearest neighbour search [5], which is useful for
metric parameter values.
The second task, the adaptation, is a modification of solutions of former simi-
lar cases to fit for a current one. If there are no important differences between a
current and a similar case, a simple solution transfer is sufficient. Sometimes only
few substitutions are required, but at other times the adaptation is a very compli-
cated process. So far, for adaptation only very domain independent methods like
compositional adaptation [29] currently exist.
typical and exceptional ones, and the reasoning of physicians takes them into ac-
count [10]. In medical knowledge based systems there are two types of knowledge,
objective knowledge, which can be found in textbooks, and subjective knowledge,
which is limited in space and time and changes frequently.
The problem of updating the changeable subjective knowledge can partly be
solved by incrementally incorporating new up-to-date cases [10]. Both kinds of
knowledge can be clearly separated: Objective textbook knowledge can be repre-
sented in forms of rules or functions, while subjective knowledge is contained in
cases.
ISOR offers a dialogue to guide the search for possible reasons in all components
of the data system. The exceptional cases belong to the case base. This approach
is justified by a certain mistrust of statistical models by doctors, because modelling
results are usually non-specific and “average oriented” [12], which reflects a lack of
attention to individual “imperceptible” features of specific patients.
The usual Case-Based Reasoning assumption is that a case base with complete
solutions is available [14, 1, 21]. Our approach starts with a situation where such a
case base is not available but has to be set up incrementally. The general program
flow is shown in Figure 2. The main steps are:
1. Construct a model,
2. Point out the exceptions,
3. Find causes why the exceptional cases do not fit the model, and
4. Set up a case base.
So, Case-Based Reasoning is combined with a model, in this specific situation
with a statistical one. The idea to combine CBR with other methods is not new.
Care-Partner, for example, resorts to a multi-modal reasoning framework for the
co-operation of CBR and rule-based reasoning (RBR) [4]. Montani [19] rather uses
Combining Statistics and Case-Based Reasoning for Medical Research 677
CBR to provide evidence for a hybrid system in the domain of diabetes. Another
way of combining hybrid rule bases with CBR is discussed by Prentzas and Hatzil-
geroudis [22]. The combination of CBR and model-based reasoning is discussed in
[27]. However, statistical methods are used within CBR mainly for retrieval and re-
tention [6, 23]. Arshadi and Jurisica [3] propose a method that combines CBR with
statistical methods such as clustering and logistic regression.
The first application of ISOR is on hemodialysis and fitness. Unfortunately, the
data set contains many missing data items, which makes the process of finding ex-
planations for exceptional cases difficult. So, we decided to attempt to first solve the
missing data problem. This is done by partly applying CBR again.
Hemodialysis means stress for a patient’s organism and has significant adverse
effects. Fitness is the most available and a relative cheap way of support. It is meant
to improve a physiological condition of a patient and to compensate negative dial-
ysis effects. One of the intended goals of this research is to convince patients of
the positive effects of fitness and to encourage them to actively participate in the
fitness program. This is important because dialysis patients usually feel sick, they
are physically weak, and they do not want any additional physical load [7].
At the University clinic in St. Petersburg, a specially developed complex of phys-
iotherapy exercises including simulators, walking, swimming and so on, is offered
to all dialysis patients. However, only some of them actively participate, whereas
some others participate but are not really active. The purpose of this fitness offer is
to improve the physical conditions of the patients and to increase the quality of their
lives. The hypothesis is that actively participating in the fitness program improves
the physical condition of dialysis patients.
For statistics, this means difficulties in applying statistical methods based on cor-
relation and it limits the usage of a knowledge base developed for normal people.
Non-homogeneity of observed data, many missing data, many parameters for a rel-
atively small sample size, all this makes the data set practically impossible for usual
statistical analysis.
Since the data set is incomplete, additional or substitutional information has to be
found from other available data sources. These are databases – the already existent
individual base and the sequentially created case base – and the medical expert as a
special source of information.
Subsequently the “research time period” has to be determined. Initially, this period
was planned to be twelve months, but after a while the patients tend to give up the
fitness program. This means, the longer the time period, the more data are missing.
Therefore, a compromise between time period and sample size had to be made; a
period of six months was chosen.
The next question is whether the model should be quantitative or qualitative? The
observed data are mostly quantitative measurements. The selected factors are also
quantitative in nature. On the other hand, the goal of this research is to find out
whether physical training improves or worsens the physical condition of dialysis
patients.
One patient does not have to be compared with another patient. Instead, each
patient has to be compared with his/her own situation some months ago, namely
just before the start of the fitness program. The success should not be measured
in absolute values, because the health statuses of patients are very different. Thus,
even a modest improvement for one patient may be as important as the great im-
provement of another. Therefore, we simply classify the development into two cat-
egories: “better” and “worse”. Since the usual tendency for dialysis patients is to
Combining Statistics and Case-Based Reasoning for Medical Research 679
worsen over time, those few patients where no changes could be observed are added
to the category “better”.
The three main factors are supposed to describe the changes of the physical con-
ditions of the patients. The changes are assessed depending on the number of im-
proved factors:
The final step is to define the type of model. Popular statistical programs offer a large
variety of statistical models. Some of them deal with categorical data. The easiest
model is a 2x2 frequency table. The “better/ worse” concept fits this simple model
very well. So the 2x2 frequency table is accepted. The results are presented in Table
1. According to the assumption after six months of active fitness the conditions of
the patients should be better.
Statistical analysis shows a significant dependence between the patients activity
and improvement of their physical condition. Unfortunately, the most popular Pear-
son Chi-square test is not applicable here because of the small values “2” and “3”
in Table 1. But Fisher’s exact test [13] can be used. In the three versions shown in
Table 1 a very strong significance can be observed. The smaller the value of p is, the
more significant the dependency.
Exceptions. The performed Fisher test confirms the hypothesis that patients do-
ing active fitness achieve better physical conditions than non-active ones. How-
ever, there are exceptions, namely active patients whose health conditions did not
improve.
Exceptions need to be explained. Explained exceptions build the case base. Ac-
cording to Table 1, the stronger the model, the more exceptions can be observed
and have to be explained. Every exception is associated with at least two problems.
The first one is “Why did the patient’s condition get worse?” Of course, “worse” is
meant in terms of the chosen model. Since there may be some factors that are not
included in the model but have changed positively, the second problem is “What
680 R. Schmidt and O. Vorobieva
has improved in the patient’s condition?” To solve this problem significant factors
where the values improved have to be searched.
In the following section the set-up of a case base on the strongest model version
is explained.
done by a short dialogue (Figure 4) and ISOR’s algorithms remain intact. Artificial
cases can be treated in the same way as real cases: they can be revised, deleted,
generalised and so on.
2.2.2 Solving the Problem “Why Did Some Patients Conditions Became
Worse?”
A set of solutions of different origin and different nature is obtained. There are three
categories of solutions: additional factor, model failure, and wrong data.
Additional factor. The most important and most frequent solution is the influence
of an additional factor. However, three main factors are obviously not enough to
describe all medical cases. Unfortunately, for different patients different additional
factors are important. When ISOR has discovered an additional factor as explanation
for an exceptional case, the factor has to be confirmed by the medical expert before
it can be accepted as a solution. One of these factors is Parathyroid Hormone (PTH).
An increased PTH level can sometimes explain a worsened condition of a patient
[7]. PTH is a significant factor, but unfortunately it was measured for only some
patients.
Another additional factor as a solution is blood phosphorus level. The princi-
ple of artificial cases was used to introduce the factor phosphorus as a new so-
lution. One patient’s record contained many missing data. The retrieved solution
meant high PTH, but PTH data in the current patient’s record was missing too.
The expert proposed an increased phosphorus level as a possible solution. Since
data about phosphorus data was also missing, an artificial case was created that in-
herited all retrieval attributes of the query case, whereas the other attributes were
recorded as missing. According to the expert, high phosphorus can provide an
explanation. Therefore it is accepted as an artificial solution or a solution of an
artificial case.
Some exceptions can be explained by indirect indications, which can be consid-
ered as another sort of additional factor. One of them is a very long period of dialysis
(more than 60 months) before a patient began with the fitness program.
Model failure. We regard two types of model failures. One of them is deliberately
neglected data. As a compromise we only considered data collected in the chosen
six months period, whereas further data of a patient might be important. In fact,
three of the patients did not show an improvement in the considered six months,
but did so in the following six months. So, they were wrongly classified and should
really belong to the “better” category. The second type of model failure is based
on the fact that the two-category model was not precise enough. Some exceptions
could be explained by a tiny and not really significant change in one of the main
factors.
Wrong data are usually due to a technical mistake or to data not really proved.
One patient, for example, was reported as actively participating in the fitness pro-
gram but really was not.
682 R. Schmidt and O. Vorobieva
There are at least two criteria to select factors for the model. First, a factor has to
be significant, and second there must be enough patients for which this factor was
measured at least for six months. So, some principally important factors were ini-
tially not taken into account because of missing data. The list of solutions includes
these factors (Figure 4): haemoglobin and maximal power (watt) achieved during
control training. Oxygen pulse and oxygen uptake were measured in two different
situations, namely during the training and before training in a state of rest. Therefore
we have two pairs of factors: oxygen pulse in state of rest (O2PR) and during train-
ing (O2PT); maximal oxygen uptake in state of rest (MUO2R) and during training
(MUO2T). Measurements made in a state of rest are more indicative and significant
than those made during training. Unfortunately, most measurements were made dur-
ing training. Only for some patients did corresponding measurements in a state of
rest exist. Therefore O2PT and MUO2T were accepted as main factors and were
incorporated into the model. On the other hand, O2PR and MUO2R are solutions
for the current problem “What in the patient’s condition improved?”
In the case base every patient is represented by a set of cases, and every case
represents a specific problem. This means that a patient is described from different
points of view, and accordingly different keywords are used for retrieval.
Earlier Later
Better 18 10
Worse 6 16
Combining Statistics and Case-Based Reasoning for Medical Research 683
However, there are six exceptional cases, namely those active patients starting
early and their conditions worsened. These exceptions belong to the case base, the
explanations of them are high PTH or high phosphorus level.
2.4 Example
The following example demonstrates how ISOR attempts to find explanations for
exceptional cases. Because of data protection no real patient can be used. It is an
artificial case but nevertheless it is a typical situation.
Query patient: a 34-year old woman started with fitness after five months of dial-
ysis. Two factors worsened, namely Oxygen pulse and Oxygen uptake, and conse-
quently the condition of the patient was assessed as worsened too.
Problem: Why the patient’s condition deteriorated after six months of physical
training?
Case base: It does not only contain cases but more importantly a list of general
solutions. For each of the general solutions there exists a list that contains specific
solutions based on the cases in the case base. The list of general solutions contains
these five items (Figure 3):
1. Concentration of Parathyroid Hormone (PTH),
2. Period of dialysis is too long,
3. An additional disease,
4. A patient was not very active during the fitness program, and
5. A patient is very old.
Individual base. The patient suffers from a chronic disease, namely from asthma.
Adaptation. Since the patient started with a fitness program already after five
months of dialysis, the second general solution can be excluded. The first general
solution might be possible, though the individual base does not contain any infor-
mation about PTH. Further laboratory tests showed PTH = 870, which means that
PTH is a solution.
Since an additional disease, bronchial asthma, is found in the individual base,
this solution is checked. Asthma is not contained as a solution in the case base, but
the expert concludes that asthma can be considered as a solution. Concerning the
remaining general solutions, the patient is not too old and proclaims that she was
active at fitness.
Adapted case. The solution consists of a combination of two factors, namely a high
PTH concentration and an additional disease, asthma.
The goal is to find an explanation for the exceptional case “D5”. In point 7 of the
menu it is shown that all selected factors worsened (-1), and in point 8 the factor
values according to different time intervals are depicted. All data for twelve months
are missing (-9999).
The next step means creating an explanation for the selected patient “D5”. From
the case base ISOR retrieves general solutions. The first retrieved solution in this
example, the PTH factor, denotes that the increased parathyroid hormone blood
level may explain the failure. Further theoretical information (e.g. normal values)
about a selected item can be received by pressing the button “show comments”. The
PTH value of patient “D5” is missing (-9999). From menu point 10 the expert user
can select further probable solutions. In the example, an increased phosphorus level
(P) is suggested. Unfortunately, phosphorus data are missing too. However, the idea
of an increased phosphorus level as a possible solution should not be lost. So, an
artificial case should be generated.
The final step means inserting new cases into the case base. There are two kinds
of cases, query cases and artificial cases. Query cases are stored records of real
686 R. Schmidt and O. Vorobieva
patients from the observed database. Artificial cases inherit the key attributes from
the query cases (point 7 in the menu). Other data may be declared as missing. Using
the update function, the missing data can be inserted later on. In the example of
Figure 4, the generalised solution “High P” is inherited, it may be retrieved as a
possible solution (point 9 of the menu) for future cases.
4 Missing Data
Databases with many variables have specific problems. Since it is very usually dif-
ficult to overview their content, a priory, a user does not know how complete a
data set is. Are there any data missing? How many of them and where are they
located?
In the dialysis data set, many data are missing in a random fashion, without any
regularity. The main cause is that many measurements were not taken.
It can be assumed that the data set contains groups of interdependent variables
but a priory it is not known how many such groups there are, what kind of variables
are dependent, and in which way they are dependent. However, we intend to make
use of all possible forms of dependency to restore missing data, because the more
complete the observed data base is, the easier it should be to find explanations for
exceptional cases and, furthermore, the better the explanations should be. Even for
setting up the model the expert user should select those parameters as main factors
with only few missing data. So, the more data that are restored, the better the choice
for setting up the model can be.
A data analysis method is often assessed according to its tolerance to missing
data (e.g., in [18]). In principle, there are two main approaches to the missing data
problem. The first approach is a statistical restoration of missing data. Usually it is
based on non-missing data from other records.
The second approach suggests methods that accept the absence of some data. The
methods of this approach can be differently advanced, from simply excluding cases
with missing values up to rather sophisticated statistical models [17, 8].
Gediga and Düntsch [9] propose the use of CBR to restore missing data. Since
their approach does not require any external information, they call it a “non-invasive
imputation method”. Missing data are supposed to be replaced by their correspon-
dent values of the most similar retrieved cases. However, the dialysis data set con-
tains rather few patients, which means that the “most similar” case for a query case
might not be very similar at all.
So, why don’t we just apply statistical methods? Statistical methods require
homogeneity of the sample. However, there are no reasons to expect the set of
dialysis patients to be a homogenous sample. Since the data consists of many pa-
rameters, sometimes missing values can be calculated or estimated from other pa-
rameter values. Furthermore, the number of cases in the data set is rather small,
whereas in general, statistical methods are more appropriate the larger the number
of cases.
Combining Statistics and Case-Based Reasoning for Medical Research 687
There are three types of numerical solutions: exact, estimated, and binary. Some
examples and restoration formulas are shown in Table 3. All types of solutions are
demonstrated by examples in the next section.
When a missing value can be completely restored, it is called an exact solution.
Exact solutions are based on other parameters. A medical expert has defined them
as specific relations between parameters, using ISOR. As soon as they have been
used once, they are stored in the case base of ISOR and can be retrieved for further
cases.
Since estimated solutions are usually based on domain independent interpolation,
extrapolation, or regression methods, a medical expert is not involved. An estimated
solution is not considered as full reconstruction but just as an estimation.
688 R. Schmidt and O. Vorobieva
4.2.2 Examples
The following three typical examples demonstrate how missing data are restored.
can of course be transformed in two other ways and so it can be applied to restore
values of PV and the weight of the patient. The formula contains specific medical
knowledge that was once given as a case solution by an expert.
In ISOR, cases are mainly used to explain further exceptional cases that do not fit
the initial model. One such secondary application is the restoration of missing data.
The solutions given by the medical expert are stored in the form of cases so that
they can be retrieved for solving further missing data cases. Such a case stored in
the case base has the following structure:
1. Name of the patient
2. Diagnosis
3. Therapy
4. Problem: missing value
5. Name of the parameter of the missing value
6. Measurement time point of the missing value
7. Formula of the solution (the description column of Table 3)
8. Reference to the internal implementation of the formula
690 R. Schmidt and O. Vorobieva
values are not available, there are three alternatives to proceed. First, to find an exact
solution formula where all required parameter values are available, second to find
an estimation formula, and third to attempt to restore the required values too. Since
for the third alternative there is the danger that this might lead to an endless loop,
this process can be manually stopped by pressing a button in the dialogue menu.
When for an estimated solution required values are missing, ISOR asks the expert.
The expert can suggest an exact or an estimated solution. Of course, such an
expert solution also has to be checked for the availability of the required data. How-
ever, the expert can even provide just a numerical solution, a value to replace the
missing data – with or without an explanation of this suggested value.
Furthermore, adaptation can be differentiated according to its domain depen-
dency. Domain dependent adaptation rules have to be provided by the expert and
they are only applicable to specific parameters. Domain independent adaptation uses
general mathematical formulae that can be applied to many parameters. Two or more
adaptation methods can be combined.
In ISOR a revision occurs. However, it is a rather simple one. It is not as sophis-
ticated as, for example, the theoretically one described by Lieber [16]. Here, it is
just an attempt to find better solutions. An exact solution is obviously better than an
estimated one. So, if a value has been restored by estimation and later on (for a later
case) the expert has provided an appropriate exact formula, this formula should be
applied to the former case too. Some estimation rules are better than others. So it
may happen that later on a more appropriate rule is incorporated in ISOR. In princi-
ple, the more new solution methods are included in ISOR, the more former already
restored values can be revised.
5 Results
At first, we undertook some experiments to assess the quality of our restoration
method, subsequently we attempted to restore the real missing data, and finally we
set up a new model for the original hypothesis that actively participating in the
fitness program improves the conditions of the patients.
Table 4 Summary of randomly deleted and restored values. Only the deleted values were
attempted to restore, but not the really missing ones
Table 5 Closeness of the restored values. The numbers in brackets show the deviations on
average in percentage
To test the method a random set of parameter values was deleted from the ob-
served data set. Subsequently, the method was applied and it was attempted to re-
store the deleted values - but not the ones actually missing!
Table 4 summarises how many deleted values could be restored. Since, for those
12 parameters that were only measured once and remain constant throughout, no
values were deleted (and none of them are really missing), they are not considered in
Table 4. More than half of the deleted values could be at least partly restored, nearly
a third of the deleted values could be completely restored, about 58% of restoration
occurred automatically. However, 39% of the deleted values could not be restored
at all. The main reasons are that for some parameters no proper method is available
and that specific additional parameter values are required that are sometimes also
missing.
Another question concerns the quality of the restoration. That means how close
are the restored values to the real values? We have to distinguish between exact,
estimated and binary restored values. Just one of the 13 binary restored values was
wrong. However, this mainly shows the “quality” of the expert user, who proba-
bly was rather cautious and made binary assessments only when he/she felt very
sure. The deviation (percentage) between the restored values and the real ones is
shown in Table 5. Concerning the two exactly restored values with more than 5%
deviation, we consulted the expert user, who consequently altered one formula,
Combining Statistics and Case-Based Reasoning for Medical Research 693
which had been applied for both values. For the estimated values, it is conspicu-
ous that for a few values the deviation is rather large. The probable reason is that
general estimation methods have problems in discovering underlying patterns in
certain cases. For example, with sequences such as 5, 7, 10 and so on, restoration
can be quite straightforward. On the other hand, sequences like 5, 10, 3 can prove
problematic.
It is no surprise that more missing values could be restored (Table 6) than ran-
domly deleted ones (see Table 2). As all restoration methods rely on other parameter
values, the more parameter values you have, the better the chance of restoring miss-
ing values. In the experiment (Table 4) not just the randomly deleted values were
missing but also the real missing ones.
After this restoration we return to the original problem, namely to set up a model
for the hypothesis that actively participating in the fitness program improves the
conditions of the patients (see section 2.1). Since many missing values have been
restored, the expert user can choose other main factors to set up the model, includ-
ing ones where many data had been missing previously. In fact, the expert user now
chose a different third factor than before, namely PTH instead of WorkJ. The result-
ing strongest model is shown in Table 7.
Patient’s Fisher
physical condition Active Non-active Exact p
Better 39 1
< 0.0001
Worse 11 21
694 R. Schmidt and O. Vorobieva
The result is obviously much better than the previous model (see Table 1 in
section 2.1). However, since the missing data problem is not responsible for all
exceptional cases, for this model some (eleven) exceptional cases still have to be
explained.
6 Conclusion
In this chapter, it has been proposed to use CBR in ISOR to explain cases that do
not fit a statistical model. Here one of the simplest statistical models was presented.
However, it is relatively effective, because it demonstrates statistically significant
dependencies. In our example, relating fitness activity to health improvement for
dialysis patients, the model covers about two thirds of the patients, whereas the
other third can be explained by applying CBR.
Since binary qualitative assessments (better or worse) were chosen, very small
changes appear identical to very large ones. As a future step, it is intended to define
these concepts more precisely, especially to introduce more assessments.
The presented method makes use of different sources of knowledge and informa-
tion, including medical experts. This approach seems to be a very promising method
to deal with a poorly structured database, with many missing data, and with situa-
tions where cases contain different sets of attributes.
Additionally, a method to restore missing values was developed. This method
combines general domain independent techniques with expert knowledge, which is
delivered as formulae for specific situations (treated as cases) and can be used for
later similar situations too. The expert knowledge is gained within a conversational
process between the medical expert, ISOR, and the system developer. Since the time
of the expert is valuable, he/she is only consulted when absolutely necessary.
In ISOR, all main CBR steps are performed: retrieval, adaptation, and revision.
Retrieval (of usually a list of solutions) occurs with the help of keywords. Adap-
tation (just like part of the restoration process of missing data) is an interactive
process between ISOR, a medical expert, and the system developer. In contrast to
many CBR systems, in ISOR revision plays an important role. The whole knowl-
edge is contained in the case base, namely as solutions of former cases. No further
knowledge base is required.
In principle, the active incorporation of a medical expert into the decision making
process seems to be a promising idea. Already in our previous work [26], a success-
ful Case-Based Reasoning system was developed that performed a dialog with a
medical expert user to investigate therapy inefficacy.
Since CBR seems to be appropriate for medical applications (see section 2.1)
and many medical CBR systems have already been developed, it makes sense
to combine both ideas, namely to build systems that a both, case-oriented and
dialog-oriented.
Smirnov, director of the Institute for Nephrology of St-Petersburg Medical University and
Natalia Korosteleva, researcher at the same Institute, for collecting and managing the data.
References
1. Aamodt, A., Plaza, E.: Case-Based Reasoning: foundation issues. Methodological vari-
ation and system approaches. AI Commun. 7(1), 39–59 (1994)
2. Aha, D.W., McSherry, D., Yang, Q.: Advances in conversational Case-Based Reasoning.
Knowledge Engineering Review 20, 247–254 (2005)
3. Arshadi, N., Jurisica, I.: Data Mining for Case-based Reasoning in high-dimensional bi-
ological domains. IEEE Transactions on Knowledge and Data Engineering 17(8), 1127–
1137 (2005)
4. Bichindaritz, I., Kansu, E., Sullivan, K.M.: Case-based Reasoning in Care-Partner. In:
Smyth, B., Cunningham, P. (eds.) EWCBR 1998. LNCS, vol. 1488, pp. 334–379.
Springer, Heidelberg (1998)
5. Broder, A.: Strategies for efficient incremental nearest neighbor search. Pattern Recog-
nition 23, 171–178 (1990)
6. Corchado, J.M., Corchado, E.S., Aiken, J., Fyfe, C., Fernandez, F., Gonzalez, M.: Max-
imum likelihood Hebbian learning based retrieval method for CBR systems. In: Ashley,
K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, pp. 107–121. Springer, Hei-
delberg (2003)
7. Davidson, A.M., Cameron, J.S., Grünfeld, J.-P. (eds.): Oxford Textbook of Nephrology,
vol. 3. Oxford University Press, Oxford (2005)
8. Fleiss, J.: The design and analysis of clinical experiments. John Wiley & Sons, Chich-
ester (1986)
9. Gediga, G., Düntsch, I.: Maximum Consistency of Incomplete Data via Non-Invasive
Imputation. Artificial Intelligence Review 19(1), 93–107 (2003)
10. Gierl, L.: Klassifikationsverfahren in Expertensystemen für die Medizin. Mellen Univ.
Press, Lewiston (1992)
11. Gierl, L., Bull, M., Schmidt, R.: CBR in Medicine. In: Lenz, M., Bartsch-Spörl, B.,
Burkhard, H.-D., Wess, S. (eds.) Case-Based Reasoning Technology. LNCS, vol. 1400,
pp. 273–297. Springer, Heidelberg (1998)
12. Hai, G.A.: Logic of diagnostic and decision making in clinical medicine. Politheknica
publishing, St. Petersburg (2002)
13. Kendall, M.G., Stuart, A.: The advanced theory of statistics, 4th edn. Macmillan pub-
lishing, New York (1979)
14. Kolodner, J.: Case-Based Reasoning. Morgan Kaufmann Publishers, San Mateo (1989)
15. Lenz, M., Auriol, E., Manago, M.: Diagnosis and decision support. In: Lenz, M.,
Bartsch-Spörl, B., Burkhard, H.-D., Wess, S. (eds.) Case-Based Reasoning Technology.
LNCS, vol. 1400, pp. 51–90. Springer, Heidelberg (1998)
16. Lieber, J.: Application of the revision theory to adaptation in Case-Based Reasoning:
The conservative adaptation. In: Weber, R.O., Richter, M.M. (eds.) ICCBR 2007. LNCS,
vol. 4626, pp. 239–253. Springer, Heidelberg (2007)
17. Little, R., Rubin, D.: Statistical analysis with missing data. John Wiley & Sons, Chich-
ester (1987)
18. McSherry, D.: Interactive Case-Based Reasoning in sequential diagnosis. Applied Intel-
ligence 14(1), 65–76 (2001)
696 R. Schmidt and O. Vorobieva
19. Montai, S., Magni, P., Bellazzi, R., Larizza, C., Roudsari, C., Carsson, E.R.: Integra-
tion model-based decision support in multi-modal reasoning system for managing type
1 diabetic patients. Artificial Intelligence in Medicine 29(1-2), 131–151 (2003)
20. Nilsson, M., Sollenborn, N.: Advancements and trends in medical case-based reasoning:
An overview of systems and system developments. In: Proceedings Seventeenth Interna-
tional Florida Artificial Intelligence Research Society Conference, pp. 178–183. AAAI
Press, Menlo Park (2004)
21. Perner, P. (ed.): Case-Based Reasoning on Images and Signals. Springer, Berlin (2007)
22. Prentzas, J., Hatzilgeroudis, I.: Integrating Hybrid Rule-Based with Case-Based Rea-
soning. In: Craw, S., Preece, A.D. (eds.) ECCBR 2002. LNCS, vol. 2416, pp. 336–349.
Springer, Heidelberg (2002)
23. Rezvani, S., Prasad, G.: A hybrid system with multivariate data validation and Case-
based Reasoning for an efficient and realistic product formulation. In: Ashley, K.D.,
Bridge, D.G. (eds.) ICCBR 2003, vol. 2689, pp. 465–478. Springer, Heidelberg (2003)
24. Schank, R.C., Leake, D.B.: Creativity and learning in a case-based explainer. Artificial
Intelligence 40, 353–385 (1989)
25. Schmidt, R., Montani, S., Bellazzi, R., Portinale, L., Gierl, L.: Case-Based Reasoning for
Medical Knowledge-based Systems. International Journal of Medical Informatics 64(2-
3), 355–367 (2001)
26. Schmidt, R., Vorobieva, O.: Case-Based Reasoning Investigation of Therapy Inefficacy.
Knowledge-Based Systems 19(5), 333–340 (2006)
27. Shuguang, L., Qing, J., George, C.: Combining case-based and model-based reasoning:
a formal specification. In: Proc. APSEC 2000, p. 416 (2000)
28. Stottler, R.H., Henke, A.L., King, J.A.: Rapid retrieval algorithms for case-based reason-
ing. In: Proc of 11th Int Joint Conference on Artificial Intelligence, pp. 233–237. Morgan
Kaufmann, San Mateo (1989)
29. Wilke, W., Smyth, B., Cunningham, P.: Using Configuration Techniques for Adaptation.
In: Lenz, M., Bartsch-Spörl, B., Burkhard, H.-D., Wess, S. (eds.) Case-Based Reasoning
Technology. LNCS, vol. 1400, pp. 139–168. Springer, Heidelberg (1998)
Collaborative and Experience-Consistent
Schemes of System Modelling in Computational
Intelligence
Witold Pedrycz
C.L. Mumford and L.C. Jain (Eds.): Computational Intelligence, ISRL 1, pp. 697–723.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2009
698 W. Pedrycz
1 Introductory Comments
Complex phenomena such as those encountered in human-centric systems , in which
the human factor is predominant, are highly multifaceted. This concerns a panoply of
economic and social systems which can be looked at and comprehended from var-
ious perspectives (points of view) and levels of abstraction. These phenomena are
distributed and generate significant masses of data which become available locally
with eventual restriction on their possible availability on a global basis. The global
economy as a system is not described by a single model but its holistic view is formed
by studying its behavior at more local and individually selected levels where building
a model could be more feasible. Afterwards through interaction between the models
and reconciliation of their findings a general global model is sought. The distributed
and collaborative way of global model building becomes an interesting tendency that
is worth adopting in the current practice of system modeling. Distributivity of the sys-
tems is a result of the existence of locally available data. Collaborative interaction
supports a coherent formation of findings and facilitates the reconciliation of differ-
ences and reinforcement of some commonalities (general findings).
There are two fundamental dimensions of the overview perspective established
for the phenomena under considerations (refer to Figure 1). First, a certain perspec-
tive arises on a basis of some features (attributes) of the phenomenon. Different sub-
sets of features could offer complementary and equally relevant views of the system
under discussion. The individual subsets of features could be disjoint. They can also
overlap. Each of these subsets gives rise to various models describing the system
from the different individual standpoint. These models, when combined together,
are helpful in forming a global and comprehensive view at the phenomenon. The
second coordinate is associated with the concept of cognitive perspective [8, 9, 10].
The most promising level of information granularity to be captured within the de-
veloped model is used to establish a suitable cognitive perspective.
The collaboration predominantly occurs at the level of information granules
[14, 15] which can be represented as fuzzy sets, sets , rough sets and others. There
are two interesting scenarios of substantial generality which will be discussed in
detail. In the first one, whose essence is outlined in Figure 2(a), the development
(reconciliation) of the information granules is realized by running any algorithm of
information granulation which processes the locally available data while taking into
Collaborative and Experience-Consistent Schemes of System Modelling 699
Fig. 1 Two fundamental dimensions of system modeling: through collections of features (at-
tributes, variables) and by admitting a variable level of granularity and establishing a suitable
cognitive perspective. Note two models (depicted as squares) are formed by considering spe-
cific subsets of features and selected information granularity
y =fi (x, ai ), i=1, 2, . . . ,c where Ai are fuzzy sets defined in the multidimensional
input space and fi denotes a local model endowed with some parameters (ai ). What
if we encounter individual data sites D[1], D[2], . . . , D[P] for which such models
have to be constructed? Not only do they have to be formed on a basis of locally
available data D[ii], ii =1, 2, . . . , P but they should collaborate, exchange their find-
ings and reconcile eventual differences. In other words, the communication involves
knowledge instead of data. In the communication scheme of this nature we witness
an effect of knowledge sharing. Formally, the underlying knowledge being shared
between the individual sites can be represented as K[ii]. For instance, for the rule
based-systems, K[ii] = {Ai [ii], i=1, 2, . . . , c}where Ai[ii] are the information gran-
ules (fuzzy sets) formed at D[ii]. In this way, the knowledge of these fuzzy sets
is communicated to the other data sites. We may have another format of K[ii] be-
ing a more comprehensive version of knowledge sharing which concerns both the
information granules and the local models, that is K[ii] = {Ai [ii], fi [ii], ai [ii]}.
The study brings forward a number of developments which form a conceptual
and algorithmic framework of collaborative Computational Intelligence. In Section
2, we present the fundamentals of collaborative clustering where we show how in-
formation granules – fuzzy sets – emerge as a result of knowledge sharing. Then
in Sections 3 and 4 we present the algorithmic aspects of the method by showing
a general flow of computing and the pertinent computing details. Hierarchies of
clusters are introduced in Section 5. Experience-consistent fuzzy modeling brings
along a new concept and it is presented in the context of rule-based fuzzy models
and neural networks. Section 6 is devoted to experience-consistent fuzzy models
whose development embraces experimental data and some previous experience -
knowledge hints coming in the form of previous models. In the sequel, the concept
of experience consistency is further discussed in application to the design of radial
basis function neural networks (Section 7). Conclusions are presented in Section 8.
To focus our discussion and discuss an algorithmic setup in a tangible fashion,
we consider information granules that are constructed through fuzzy clustering, and
Fuzzy C-Means (FCM), in particular [1, 2, 3, 4, 11, 13]. We assume that all data sites
D[1], D[2], . .. , D[P] comprise data positioned in the same n-dimensional feature
space Rn .
2 Collaborative Clustering
The communication of knowledge involves a structure K[ii] which embraces a col-
lection of information granules – fuzzy clusters. Considering that such clusters have
been constructed with the use of the FCM algorithm, they are fully characterized
in terms of prototypes and partition matrices. As a matter of fact, these two charac-
terizations are equivalent in the sense that for the given data {x1 , x2 , . . . , xN } the
prototypes are expressed by means of the partition matrix while the partition matrix
comes with the entries whose computing involves the knowledge of the prototypes.
The prototypes and partition matrices are the two possible communication vehicles
between the data sites. Given the fact that the data sites concern different data sets
Collaborative and Experience-Consistent Schemes of System Modelling 701
Fig. 3 Data sites and communication realized through passing prototypes and the consecutive
generation of the induced partition matrices U∼ [ii|jj]
i=1, 2,. . . , c; k=1, 2,. . . ,N[ii] and xk ∈D[ii]. Refer also to Figure 3 which high-
lights the essence of this mechanism of the collaboration by visualizing a way in
which the communication links have been established.
Proceeding with all other data sites, D[1], . . . , D[ii-1], D[ii+1],. . . , D[P], we
end up with P-1 induced partition matrices , U∼ [ii|1], U∼ [ii|2],. . . ., U∼ [ii|ii-1],
U∼ [ii|ii+1],. . . ., U∼ [ii|P]. The minimization of differences between the U[ii] and
U∼ [ii|jj] is used to establish some collaborative activities occurring between the
data sites. At the ii-th site, the clustering is guided by the augmented objective func-
tion assuming the following form
N[ii] c P N[ii] c
Q[ii] = ∑ ∑ u2ik [ii] xk − vi2 + β ∑ ∑ ∑ (uik [ii] − u∼ik [ii| j j])2 dik2 (2)
k=1 i=1 j j=1 k=1 i=1
j j =ii
702 W. Pedrycz
Given a finite number of disjoint data sites with patterns defined in the same
feature space, develop a scheme of collective development and reconciliation
of a fundamental cluster structure across the sites that it is based upon ex-
change and communication of local findings where the communication needs
to be realized at some level of information granularity. The development of
the structures at the local level exploits the communicated findings in an active
manner through minimization of the corresponding objective function aug-
mented by the structural findings developed outside the individual data site.
We also allow for retention of key individual (specific) findings that are es-
sential (unique) for the corresponding data site.
Fig. 4 The essence of collaborative clustering in which we aim at striking a sound balance
between local findings (produced at the level of locally available data) and the findings com-
ing from other data sites (sensors) building some global characterization of data. Shown are
only communication links between data site D[ii] and all other data sites
Initial phase Carry out clustering (FCM) for each data site producing a collection
of prototypes {vi [ii]}, i=1,2,. . . ,c for each data site.
Collaboration
Iterate {successive phases of collaboration}
Communicate the results about the structure determined at each data site.
For each data site (ii)
{
Minimize performance index (2) at each data site by iteratively proceeding with
the calculations of the partition matrix and the prototypes, that is
⎡ P
⎤ P
β ∼ [ii| j j] β u∼
⎢ ∑ u js ⎥ ∑ rs [ii| j j]
1 ⎢ ⎢ c j j=1 ⎥ j j=1
⎥
urs [ii] = c 2 ⎢1 − ∑
j j =ii j j =ii
⎥+ (3)
j=1 [1 + β (P − 1)] ⎦
d ⎢ ⎥ [1 + β (P − 1)]
∑ drs2 ⎣
j=1 js
Collaborative and Experience-Consistent Schemes of System Modelling 705
and
N[ii] P N[ii]
∑ u2rk [ii]xkt + β ∑ ∑ (urk [ii] − u∼ 2
rk [ii| j j]) xkt
k=1 j j=1 k=1
j j =ii
vrt [ii] = (4)
N[ii] P N[ii]
∑ u2rk [ii]+β ∑ ∑ (urk [ii] − u∼
rk [ii| j j])
2
k=1 j j=1 k=1
j j =ii
It is worth noting that the proximity matrix is more abstract in this form than the
original partition matrix it is based upon. It “abstracts” the clusters themselves and
this is what we really need in this construct. Given the proximity matrix, we cannot
“retrieve” the original entries of the partition matrix it was generated from.
706 W. Pedrycz
Let us consider now the ii-th data site with its partition matrix U[ii] and the
induced partition matrices U ∼ [ii| j j], jj =1, 2, . . . , ii-1, ii+1, . . . , P. To quantify the
consistency between the structure revealed at the ii-th data site with those existing
at remaining sites by computing the following expression
P
1
W [ii] = 2
(N [ii]/2)
∑ Prox(U[ii]) − Prox(U ∼ [ii| j j]) (6)
j j=1
j j =ii
More specifically, we consider that the distance between the corresponding prox-
imity matrices is realized in the form of the Hamming distance. In other words, we
have
where prox(k1,k2 )[ii] denotes the (k1 , k2 )- entry of the proximity matrix U[ii]. Sim-
ilarly, Prox(k1,k2 )∼ [ii|jj] is the corresponding (k1 , k2 ) entry of the proximity matrix
produced by the induced partition matrix U∼ [ii|jj]. In a nutshell, rather than working
at the level of comparing the individual partition matrices (which requires knowl-
edge of the explicit correspondence between the rows of the partition matrices), we
generate their corresponding proximity matrices that allows us to carry out com-
parison at this more abstract level. Next summing up the values of W[ii] over all
data sites, we arrive at the global level of consistency of the structure discovered
collectively through the collaboration
The lower the value of W, the higher is the consistency between the “P” structures.
Likewise the values of W being reported during the successive phases of the collab-
oration can serve as a sound indicator as to the progress and quality of the collabora-
tive process and serve as a suitable termination criterion. In particular, when tracing
the successive values of W, one could stop the collaboration once no further changes
in the values of W are reported. The use of the above consistency measure is also
essential when gauging the intensity of collaboration and adjusting its level through
changes of β . Let us recall that this parameter shows up in the minimized objective
function and shows how much other data sites impact the formation of the clusters at
the given site. Higher values of β imply stronger collaborative linkages established
between the sites. By reporting the values of W treated as a function of β , that is
W =W(β ), we can experimentally optimize the intensity of collaboration. One may
anticipate that while for low values of β no collaboration occurs and the values of
W tend to be high, large values of β might lead to competition and subsequently the
values of W(β ) may tend to be high. Under some conditions, no convergence of the
collaboration process could be reported. There might be some regions of optimal
Collaborative and Experience-Consistent Schemes of System Modelling 707
Fig. 5 Computation of a membership function of a fuzzy set of type-2; note that in order to
maximize the performance index, we rotate the linear segment of the membership function
around the modal value of the fuzzy set. Small dark boxes denote available experimental data.
The same estimation procedure applies to the right-hand side of the fuzzy set
(b) Simultaneously, we would like to make the fuzzy set as specific as possible so
that is comes with some well defined semantics. This requirement is met by
making the support of A as small as possible, that is mina |u –a|
To accommodate the two conflicting requirements, we have to combine these two
constraints (a) and (b) into the form of a single scalar index which in turn becomes
maximized. Two alternatives could be sought, say
∑ A(zi )
i
maxa =u (9)
|u − a|
or
∑ (1 − A(zi ))(u − a) (10)
i
The linearly decreasing portion of the membership function positioned at the right-
hand side of the modal value (u) is optimized in the same manner. We exclude a
trivial solution of a = u in which case the fuzzy set of type-2 collapses to a type-1
fuzzy set (with numeric values of membership function). We use this construct in
the formation of granular prototypes and fuzzy sets of type-2.
The choice of the number of clusters at each data site is beyond this study as this
topic is well covered in the existing literature and supported by various algorithmic
means including an extensive suite of cluster validity indexes. Given this, the algo-
rithmic settings discussed so far have to be augmented. The major step would be
to present information granules at each data site at the level of granularity that has
been accepted before collaboration. There are several possible ways of doing this.
Here we consider the one which uses clusters of the prototypes. Consider the ii-th
data site. Before each phase of collaboration, we cluster the prototypes of this data
site {vi [ii]}, i=1, 2, . . . , c[ii] and the prototypes communicated from all remaining
data sites, {vi [jj]}, i=1, 2, . . . , c[jj], jj=1, 2, ii-1, ii+1,. . . ,P. The number of clusters
is the same as the number of clusters at this data site. The results are denoted by v∼ i
i=1, 2,. . . , c[ii]. These new prototypes are used in the next steps of the collaborative
clustering. More specifically, the minimized objective function comes in the form
c[ii]
Q[ii] = ∑ u2ik [ii] xk − vi [ii]2 + β ∑ u2ik [ii] vi [ii] − v∼i [ii]2 (11)
i,k i=1
Initial phase
Carry out clustering (FCM) for each data site producing a collection of proto-
types {vi [ii]}, i=1,2,. . . ,c[ii] for each data site.
Collaboration
Iterate {successive phases of collaboration}
Communicate the results about the structure determined at each data site.
For each data site (ii)
{
Collect all prototypes from other sites at data site (ii) and run FCM on that collec-
tion of all prototypes by selecting the same number clusters at that site to generate
new prototypes v∼ [ii]. Minimize performance index (11) at each data site by it-
eratively proceeding with the iterative calculations of the partition matrix and the
prototypes, that is
1
urs [ii] = (12)
c[ii]
x −v [ii] +β v [ii]−v∼
2
r [ii]
2
∑ s r 2 r 2
j=1 xs −v j [ii] +β v j [ii]−v∼j [ii]
and
710 W. Pedrycz
N N
∑ u2rk [ii]xkt + β ∑ u2rk [ii]v∼
rt [ii]
k=1 k=1
vrt [ii] = N
(13)
∑ u2rk [ii] (1 + β )
k=1
clusters. In the case when cc = c[1] + c[2]+. . . + c[P] there is no interaction at all
(each prototypes retains its identity) and the values of γi (U)[ii] are all equal to 1 not
affecting the form of the objective function and thus not changing the prototypes.
The strength of the structural interaction controlled by the values of the number of
clusters “cc” may affect the dynamics of collaboration with the likelihood that its
lower values associated with stronger collaboration may imply eventual instability.
Given some experimental data, construct a model which is consistent with the
findings (models) produced for some previously available data. Owing to the
existing requirements such as data privacy or data security of data as well as
some other technical limitations, access to these previous data is not available.
Instead we can take advantage of the knowledge coming in the form of the
parameters of the existing models.
realized. For instance, it is common that the currently available data are quite limited
in terms of its size (which implies a limited evidence of the data set) while the
previously available data sets could be substantially larger meaning that relying on
the models formed in the past could be beneficial for the development of the current
model. There is also another reason in which the experience –driven component
plays a pivotal role. The data set D could be quite small and affected by a high
level of noise – in this case it becomes highly legitimate to seriously consider any
additional experimental evidence available around.
In the realization of consistent-oriented modeling, we consider the following sce-
nario. Given is a data set D using which we intend to construct a fuzzy rule-based
model. There is a collection of data sets D1 , D2 , . . . , DP . For each of them devel-
oped is an individual fuzzy model. Those local models are available when seeking
consistency with the fuzzy models formed for Dii , ii=1, 2, . . . , P. At the same time,
it is worth stressing that the data sets themselves are not available to any processing
and modeling realized at the level of D.
The underlying architectural details of the rule-based model considered in this
study are as follows. For each data site D and Dii , we consider the rules with local
regression models assuming the form
Data D
-if x is Bi then y = aTi x (15)
where x ∈ Rn+1 and Bi are fuzzy sets defined in the n-dimensional input space, i=1,
2,. . . , c. The local regression model standing in the i-th rule is a linear regression
function described by a certain vector of parameters ai . More specifically, the n-
dimensional vector of the original input variables is augmented by a constant input
so we have x =[x1 x2 . . . xn 1]T and a =[a1 a2 . . . an a0 ]T where a0 stands for a bias
term that translates the original hyperplane.
The same number of rules (c) is encountered at all other data sites, D1 , D2 , . . . ,
DP . The format of the rules is the same as for D, that is for the ii-th data sited Dii
we have
-if x is Bi [ii] then y = ai [ii]T x (16)
As before the fuzzy sets in the condition part of the i-th rule are denoted by Bi [ii]
while the parameters of the local model are denoted by ai [ii]. The index in the square
brackets refers to the specific data site, that is Dii for ai [ii].
Alluding to the format of the data at D, it comes in the form of input – output pairs
(xk , yk ), k=1, 2,. . . , N which are used to carry out learning in a supervised mode. The
previously collected data sets denoted by D1 , D2 , . . . , DP consists of N1 , N2 , and NP
data points. We assume that due to some technical and non-technical reasons, the
data available at D j cannot be shared with D. However, the communication between
the data sites can be realized at a higher conceptual level such as those involving the
parameters of the fuzzy models.
Collaborative and Experience-Consistent Schemes of System Modelling 713
For given fuzzy sets of conditions, the determination of the parameters of the linear
models is standard and well documented in the literature. Considering the form of
the rule-based system, the output of the fuzzy model is determined as a weighted
combination of the local models with the weights being the levels of activation of
the individual rules. More specifically we have
c
ŷk = ∑ ui (xk )aTi xk (18)
i=1
where uik = ui (xk ) is a membership degree of the k-th data xk to the i-th cluster being
computed on a basis of the already determined prototypes in the input space. In a
nutshell Equation (18) comes as a convex combination of the local models which
aggregates the local models by taking advantage of the weight factors expressing a
contribution of each model based upon the activation reported in the input space.
The essence of the consistency-driven modeling is to form local regression mod-
els occurring in the conclusions of the rules on a basis of data D while at the same
time making the model perform in a consistent manner (viz. close enough) to the
rule-based model formed for the respective Di ’s. The following performance index
strikes a sound balance between the model formed exclusively on a basis of data D
and the consistency of the model with the results produced by the models formed
on a basis of some other data sites Di ’s, that FM[j](xk )
P
V= ∑ (FM(xk ) − yk )2 + α ∑ ∑ (FM(xk ) − FM[ j](xk ))2 (19)
xk ∈D j=1 xk ∈D
yk ∈D yk ∈D
values wi (xk ) computed in the standard manner as being encountered when running
the FCM algorithm, that is
1
wi (xk )[ j] = (20)
c
1/
xk −vi [ j] m-2
∑ xk −vl [ j]
l=1
The transferred parameters of the local models obtained at the j-th data site produce
the output of the model FM[j](xk ) obtained at D as a weighted sum of the form
c
FM[ j](xk ) = ∑ wi (xk )[ j]aTi ( j)xk (21)
i=1
where xk ∈D.
The minimization of the performance index V for some predefined value of α
leads to the optimal vectors of the parameters of the linear models ai (opt), i=1,
2,. . . , c which is reflective of the process of satisfying the consistency constraints.
After some algebra, the final result comes in the form
1
aopt = X̂ # (y + α y1 + α y2 + .... + α yP) (22)
αP + 1
where yi is a vector of the outputs of the i-th fuzzy model (formed on a basis of
Di ) where the corresponding coordinate of this vector the output obtained for the
corresponding input, that is
Collaborative and Experience-Consistent Schemes of System Modelling 715
⎡ ⎤
FM[i](x1 )
⎢ FM[i](x2 ) ⎥
yi = ⎢
⎣
⎥
⎦
FM[i](xN )
Fig. 7 A quantification of the global behavior of the consistency – based fuzzy model
The optimization scheme in Equation (19) along with its evaluation mechanisms
governed by Equation (24) can be generalized by admitting various levels of im-
pact each data Di might have in the process of achieving consistency. To do so, we
introduce some positive weights wi , i=1, 3, . . . p which are afterwards used in the
performance index
P
V= ∑ (FM(xk ) − yk )2 + α ∑ w j ∑ (FM(xk ) − yk )2 (25)
xk ∈D j=1 xk ∈D
yk ∈D yk ∈D
Lower values of wi indicate lower influence of the model formed on a basis of data
Di when constructing the model for data D. The role of such weights is particularly
apparent when dealing with data Di which are in some temporal or spatial rela-
tionships with respect to D. In these circumstances, the values of the weights are
reflective of how far (in terms of time or distance) the sources of the individual data
are from D. For instance, if D j denotes a collection of data gathered some time ago
in comparison to the currently collected data Di , then it is intuitively clear that the
weight w j is lower than wi .
As an auxiliary performance index that expresses a quality of the model for which
Equation (19) has been minimized with α being selected with regard to Equation
(25), we consider the following expression
1
N xk∑
Q∼ = (FM(xk ) − yk )2 (26)
∈D
yk ∈D
The values of Q∼ considered vis-à-vis the results expressed by Equation (26) are
helpful in assessing an extent the fuzzy model optimized with regard to data D while
Collaborative and Experience-Consistent Schemes of System Modelling 717
If x is B then Y = A0 ⊕ A1 ⊗ x1 ⊕ A2 ⊗ x2 ⊕ . . . ⊕ An ⊗ xn (27)
The symbols ⊕ and ⊗ being used above underline the nonnumeric nature of the
arguments standing in the model over which the multiplication and addition are
carried out. For given numeric inputs x =[x1 , x2 , . . . , xn ]T the resulting output Y of
this local regression model is again a triangular fuzzy number Y = <w, y, z> where
their parameters are computed as follows
Modal value y = a0 + a1 x1 + a2 x2 + . . . + an xn
Lower bound w = a0 + min(a1. x1 , a1+ x1 ) + min(a2. x2 , a2+ x2 ) + . . . + min(an. xn , an+ xn )
Upper bound z = a0 + max(a1. x1 , a1+ x1 ) + max(a2. x2 , a2+ x2 ) + . . . + max(an. xn , an+ xn )
The above process is of the formation of the fuzzy numbers of the local regression
model of the rule is repeated for all rules. At the end we arrive at the rules of the
form
If x is B1 then Y = A10 ⊕ A11 ⊗ x1 ⊕ A12 ⊗ x2 ⊕ . . . ⊕ A1n ⊗ xn (28)
If x is B2 then Y = A20 ⊕ A21 ⊗ x1 ⊕ A22 ⊗ x2 ⊕ . . . ⊕ A2n ⊗ xn
...
If x is Bc then Y = Ac0 ⊕ Ac1 ⊗ x1 ⊕ Ac2 ⊗ x2 ⊕ . . . ⊕ Acn ⊗ xn
Collaborative and Experience-Consistent Schemes of System Modelling 719
Given this structure, the input vector x implies the output fuzzy set with the fol-
lowing membership function
c
Y = ∑ wi (x) ⊗ [Ai0 ⊕ (Ai1 ⊗ x1 ) ⊕ (Ai2 ⊗ x2 ) ⊕ ... ⊕ (Ain ⊗ xn )] (29)
i=1
Owing to the fact of having fuzzy sets of the parameters of the regression model
in the conclusion part of the rules, Y becomes a fuzzy number rather than a single
numeric value.
Radial basis function (RBF) neural networks consist of three layers of processing
elements. The first one is made up of source nodes (sensory units). The second layer
comprises a collection of highly-dimensional receptive fields (radial basis func-
tions). The output layer consists of a linear unit which linearly aggregates activation
levels of the receptive fields. Schematically, the overall structure of the network is
presented in Figure 8. The activation level of the i-th receptive field Ri caused by x
is governed by the expression [7]
1
Ri (x) = c
2/(m−1) (30)
||x−vi ||
∑ ||x−v j ||
j=1
where x is the input to the network while vi is the prototype (center) of the i-th
receptive field. The above expression stems from the fact that the receptive fields
are formed through fuzzy clustering, say the FCM method and Equation (30) is
reflective of the way in which such receptive fields (clusters) have been formed.
The output of the network is formed as a weighted sum of the activation levels of
the receptive fields, see Figure 8.
C
ŷk = w0 + ∑ Ri (xk )wi (31)
i=0
Fig. 8 A schematic view of a RBF neural network (the number of receptive fields is equal
to c)
characterized by the prototypes of the clusters {vi [ii]}. Furthermore the connections
of the linear neuron are optimized resulting in the vector w[ii]. In summary, the RBF
NN is fully characterized by the collection of the prototypes and the vector of the
connections which jointly could be described as knowledge acquired from the data
D[ii]. To underline this fact, we use the notation K[ii] = { {vi [ii]}, w[ii], c}. Note
that communicating knowledge is to make K[ii] available to the user. The number
of clusters (receptive fields) at D and D[ii] is the same. However this assumption is
not critical at all and we could envision working with the networks built at different
level of detail (different number of the clusters)
In this setting, the idea of the experience-consistent learning of the network trans-
lates into an effective usage of experience K[1] , K[2] ,. . . , K[P] to construct a
RBF NN. We further refine this general statement into a functionally meaningful
optimization problem. The underlying optimization criterion – performance index
comes in the following form
N C
V= ∑ ( ∑ wi Ri (xk ) − target k )2 +
k=1 i=0
xk ,target k ∈D
(32)
P N C C
+α ∑ ∑ ( ∑ wi Ri (xk ) − ∑ wi [ii]Ri [ii](xk )) 2
ii=1 k=1 i=0 i=0
xk ∈D
Collaborative and Experience-Consistent Schemes of System Modelling 721
The prototypes of the receptive fields at D[ii] after being used in the context of D
give rise to the receptive fields denoted here by Ri [ii]. The data set D is composed
of input-output pairs { (xk , targetk )}, k=1,2, . . . , N. The objective is to minimize V
by adjusting the weights of the linear neuron w
Given the additive format of Equation (32) which consists of two main compo-
nents, when minimizing V we attempt to achieve a balance between the model built
only on the basis of data D (the first part of Equation (32) and the results produced
formerly by the models for data D[ii], ii = 1, 2, . . . , P (the second term of Equation
(32)). The balance is established by choosing a certain positive value of α . Notably,
the higher the value of α , the stronger the impact coming from the experience accu-
mulated in the form of the previously constructed models. If α tends to zero, then
the RBF NN is constructed on the basis of the currently available data D. Consid-
ering the nature of the second term in the performance index in Equation (3), we
could say that it plays a role similar to the regularization mechanism quite often
considered in the training of neural networks.
As becomes clear, the result of the learning depends upon the level of impact
of the experience-based component (already designed neural networks). While the
general tendency could be easily controlled by changing the value of α , choosing its
suitable value is not clear at all. An approach we can take comes with the following
motivation: the optimal RBF NN should perform well not only on D but also on
all other data sets D[1], D[2], . . . , D[P]. In other words, we compute a quality of
the constructed model on D and then transfer the knowledge K to D[1], D[2],. . . ,
D[P] and assess the quality of the network over there. The overall performance
of the experience-consistent RBF NN is quantified in the form of the following
index
2
C
G= 1
N ∑ ∑ wi (opt)Ri (xk ) − target k +
xk ,target k ∈D i=0
2
C
1
+ N1 ∑ ∑ wi (opt)Ri (xk ) − target k
xk ,target k ∈D[1] i=0
2
C
+...... + NP1
∑ ∑ wi (opt)Ri (xk ) − target k .
xk ,target k ∈D[P] i=0
(33)
In other words, G expressed by Equation (33) measures the global performance of
the optimal neural network when all data are taken into consideration. Apparently G
is a function of α and the optimized level of consistency is that for which G attains
its minimal value, namely αopt = arg Min G(α ).
The optimization scheme Equation (32) along with its evaluation mechanisms
governed by Equation (33) can be generalized by admitting the various levels of
impact that each data D[ii] could exhibit in the process of reaching consistency.
We introduce some positive weights Wii , ii=1, 2, . . . P which are included in the
performance index
722 W. Pedrycz
N C
V= ∑ ( ∑ wi Ri (xk ) − target k )2 +
k=1 i=0
xk ,target k ∈D
(34)
P N C C
+α ∑ ∑ Wii ( ∑ wi Ri (xk ) − ∑ wi [ii]Ri [ii](xk )) 2
ii=1 k=1 i=0 i=0
xk ∈D
Lower values of Wii indicate lower influence of the model formed on a basis of data
Dii when constructing the model for data D. The role of such weights is particularly
apparent when dealing with data Dii , which are in some temporal or spatial rela-
tionships with respect to D. In these circumstances, the values of the weights are
reflective of how far (in terms of time or distance) the sources of the individual data
are from D. For instance, if D j j denotes a collection of data gathered some time ago
in comparison to the currently collected data Dii , then it is intuitively clear that the
corresponding value of weight W j j should assume lower values than Wii .
8 Conclusions
We have stressed that the distributed and collaborative nature of systems has to be
addressed when dealing with fuzzy models (whose design methodology has been
predominantly focused on a centralized development scheme). The collaborative
construction of fuzzy rule-based models relies on fuzzy clusters and the schemes
of collaborative clustering are of genuine interest with this regard. In the study,
we have elaborated on the two main avenues of the formation of the clusters, viz.
collaborative clustering and a buildup of hierarchies of clusters which offer some
general directions of more detailed algorithmic pursuits.
Acknowledgements. Support from the Natural Sciences and Engineering Research Council
of Canada (NSERC) and Canada Research Chair (CRC) is gratefully acknowledged.
References
1. Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. Plenum
Press, New York (1981)
2. Hoppner, F., et al.: Fuzzy cluster analysis. J. Wiley, Chichester (1999)
3. Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Englewood Cliffs
(1988)
4. Jain, A., Duin, R., Mao, J.: Statistical pattern recognition: a review. IEEE Transactions
on Pattern Analysis and Machine Intelligence 22, 4–37 (2000)
5. Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2, 193–218 (1985)
6. Loia, V., Pedrycz, W., Senatore, S.: P-FCM: a proximity-based fuzzy clustering for user-
centered web applications. Int. J. of Approximate Reasoning 34, 121–144 (2003)
7. Pedrycz, W.: Conditional fuzzy clustering in the design of radial basis function neural
networks. IEEE Transactions on Neural Networks 9, 601–612 (1998)
8. Pedrycz, W.: Collaborative fuzzy clustering. Pattern Recognition Letters 23, 675–686
(2002)
Collaborative and Experience-Consistent Schemes of System Modelling 723